World Community Grid - View Thread - Results on AMD-FX Inconclusive?

World Community Grid Forums

Category: Completed Research

Forum: GO Fight Against Malaria

Thread: Results on AMD-FX Inconclusive?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 62

[ ]

Author

This topic has been viewed 8511 times and has 61 replies

Chris Holvenstot
Cruncher
USA
Joined: Aug 26, 2011
Post Count: 19
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

5 year badge for Help Fight Childhood Cancer

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project


Results on AMD-FX Inconclusive?

A few days ago I brought a new system into the fold. It is based on the new AMD FX/Bulldozer 8120 CPU. I currently have it loaded with Linux running the 2.6.32 kernel.

It has been in production with WCG for the past day and a half and is selecting tasks from all of the available projects. However, to date, all five GFAM tasks completed on this system have been placed in the "inconclusive" hopper.

Tasks from ohter BOINC and WCG projects are completed and validated without error. My other systems, both Intel and AMD Phenom II based do not seem to be having validation issues with these tasks.

Has anyone else had problems or heard of problems with GFAM and the FX/Bulldozer CPUs? (other than performance - I think we all understand that this series of CPUs is not a big step-up from the Phenom family in this regard)

One big difference between the FX/Bulldozer family and my Intel and Phenom systems is that the FX/Bulldozer has a 256 bit floating point unit while the older systems have a 128 bit unit. Is it possible that we are looking at some sort of cumulative error caused by the difference in floating point precision?

At this point I have aborted GFAM tasks on this system, while allowing them to run on my other systems.

Any information would be apreciated.

[Nov 24, 2011 8:57:56 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


GFAM: Results on AMD-FX Inconclusive?

Hi,

Took some digging but at spot 654 is your CPU and 30 others, so maybe one of the others check in with an answer:

http://boincstats.com/stats/host_cpu_stats.ph...eamid=&st=650&or=

Linux 10.04Lts I'm gathering. OSses should have no influence on computational outcome.

Oh, OCing? It did cause a problem for someone on DSFL, the same science engine, but as not a problem for you, less likely.

If the FP precision would cause a minor difference is only possible for the techs to see in the full data output. Really depends on how the tolerances are set.

Result logs are always of interest, so you might [if not done already] want to compare to your wingman (click on WU name on Result Status page). If I were you, I'd wait for the 3rd copy and see which goes invalid.

ttyl

--//--

edit: Link to a related thread with the 8120 as co-topic: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,32102

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 24, 2011 10:27:38 AM]

[Nov 24, 2011 10:08:21 AM]

Chris Holvenstot
Cruncher
USA
Joined: Aug 26, 2011
Post Count: 19
Status: Offline
Project Badges:


Re: GFAM: Results on AMD-FX Inconclusive?

@SekeRob - thanks for the reply. I have been playing with this since the last time I posted and an still scratching my head. I am having a little trouble navigating the WCG website infrastructure so any advice you have on where to find things would be appreciated.

Is there an "easy" way to search out other users who may be running one of these Bulldozer/FX chips? It sure would be nice to get some sort of positive affirmation that other users of this CPU family are able to run GFAM without issue.

Since I last posted I have replaced memory and under clocked the system - not that the FX needs any help in under performing.

I also jumped the kernel up to a 3.2 version which has some FX optimizations.

None of this was of any help. Each and every GFAM task I run on this system goes inconclusive and then into the invalid bin.

I am able to run rosetta, poem, and other WCG tasks without issue - at the original speed or at the under clocked speed.

I still suspect a cumulitive "rounding error" using the 256 bit floating point vs the more industry standard 128 bit unit. I know, I can be pig headed.

Where can I look to see evidence on what the validators believe was wrong with the output produced by GFAM tasks run on this system? (the WCG website is very complete, but the two things I have yet to be able to find is how do you isolate the RAC for a single system which is useful when optimizing systems, and where do you go to look at debug or diagnostic information)

I can read a stack trace and a panic dump. I am also proficient in setting breakpoints so if there is a way to get the same task to run on multiple systems, I can halt the task and compare results.

Once again, any information you can provide would be appreciated.

BTW - I've had GFAM "disabled" in the profile for this system so there are no current tasks to peer at. However, if you can provide me with a "crowbar" to open the lid and examine the internal workings I will happily reenable GFAM to get a few samples.

Thanks

----------------------------------------
[Edit 1 times, last edit by Chris Holvenstot at Dec 29, 2011 7:39:48 PM]

[Dec 29, 2011 7:37:28 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GFAM: Results on AMD-FX Inconclusive?

Hello Chris Holvenstot,
As I recall, the Bulldozer 256-bit FPU should run as 2 ordinary FPUs. None of our code is using the special instructions to unify it. Whatever the problem is will presumably be repeated by the second person to run a Bulldozer on GOFAM. Then the staff can start investigating. Or else it will not be repeated, in which case we can all begin pulling out our hair. biggrin

Either way, I think this problem will have to wait until 2012.

Something to look forward to for the new year!

Keep us informed about anything unusual on the Bulldozer. I think that there is a fair amount of interest.

Lawrence

[Dec 30, 2011 3:14:40 AM]

Chris Holvenstot
Cruncher
USA
Joined: Aug 26, 2011
Post Count: 19
Status: Offline
Project Badges:


Re: GFAM: Results on AMD-FX Inconclusive?

Lawrence - thank you for the response and the information. You are correct that Bulldozer's 256 bit FPU can run as two discrete 128 bit units. And it appears that updated compiler support is required to use it in the 256 bit mode. Further, AMD's FX supports Fused Multiply / Accumulate (FMA) operations while running in the 128 bit mode.

An FMA op feeds the result of a floating point multiply directly into the accumulators without rounding which increases both the speed and the precision of the operation. The FMA op seems to be much like what you would see in a GPU.

I have yet to determine when the FMA operation is used instead of the "traditional" method of doing a "multiply" followed by an "add"

I have been focused on the FPU because that is one of the big differences between the FX and previous processors.

I guess the next step for me will be to toss together a few "test" routines I can run on both the FX, one of my Phenom's and my Intel based Mac Pro.

It sure is nice being retired and having the time to do this...

[Dec 30, 2011 4:46:06 AM]

KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline


Re: GFAM: Results on AMD-FX Inconclusive?

I have an 8150 arriving on the third or so. I'll be running it with Ubuntu 11.10 and I'll report in what I see.

From my previous experience DSFL and GFAM are VERY unforgiving when it comes to validation. Anyone running an Athlon XP will report the same behavior. It's not that the AMD chips are bad, it's just that the results aren't similar enough to match Intel.

It will be interesting to see how it does with GFAM but I'm expecting the exact results that you're seeing. The techs are aware of the XP issue and have not commented on it yet.

----------------------------------------

Distributed computing volunteer since September 27, 2000

[Dec 30, 2011 6:00:35 AM]

kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

90 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

90 day badge for OpenPandemics - COVID-19


Re: GFAM: Results on AMD-FX Inconclusive?

I recall that during the DSFL and GFAM early betas, WUs for both Phenoms and Athlons were going inconclusive to invalid. Then the techs adjusted something in the validation, and ever since the next round of betas, my Phenom II (also running Ubuntu 10.04.3) has had no issue with DSFL and GFAM.

I'll be very interested to see how this unfolds with the Bulldozer chips.

----------------------------------------

[Dec 30, 2011 2:02:43 PM]

davyboy
Cruncher
Joined: Apr 28, 2007
Post Count: 9
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

14 day badge for Influenza Antiviral Drug Search

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

2 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: GFAM: Results on AMD-FX Inconclusive?

Having similar issues with GFAM and DFSL on AMD 6274 CPUs.

[Dec 30, 2011 3:48:37 PM]

Chris Holvenstot
Cruncher
USA
Joined: Aug 26, 2011
Post Count: 19
Status: Offline
Project Badges:


Re: GFAM: Results on AMD-FX Inconclusive?

@davyboy: Thank you - I was starting to question both my sanity and technical savvy. Now only my sanity is in question.

For those not familiar with AMD's Opteron 6274, it is is a fairly new offering from AMD for the server world. It implements the same Bulldozer microarchitecture used in AMD's FX chips.

I guess it is time to turn this one over to the admins/devs - however, if there is anything I can do to help resolve this issue, just ask. I "opened up" my bulldozer based system to GFAM for a short while this morning and by this evening there should be several "fresh" examples of this issue in case anyone wants to examine them in the light of day.

Once again DavyBoy - thanks.

[Jan 1, 2012 6:41:25 PM]

Chris Holvenstot
Cruncher
USA
Joined: Aug 26, 2011
Post Count: 19
Status: Offline
Project Badges:


Re: GFAM: Results on AMD-FX Inconclusive?

Bump

Anyone at the project looking at this and is there anything I can do to support those efforts?

[Jan 5, 2012 3:40:12 AM]

[ ]