Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: GO Fight Against Malaria Thread: Results on AMD-FX Inconclusive? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 62
|
Author |
|
Chris Holvenstot
Cruncher USA Joined: Aug 26, 2011 Post Count: 19 Status: Offline Project Badges: |
A few days ago I brought a new system into the fold. It is based on the new AMD FX/Bulldozer 8120 CPU. I currently have it loaded with Linux running the 2.6.32 kernel.
It has been in production with WCG for the past day and a half and is selecting tasks from all of the available projects. However, to date, all five GFAM tasks completed on this system have been placed in the "inconclusive" hopper. Tasks from ohter BOINC and WCG projects are completed and validated without error. My other systems, both Intel and AMD Phenom II based do not seem to be having validation issues with these tasks. Has anyone else had problems or heard of problems with GFAM and the FX/Bulldozer CPUs? (other than performance - I think we all understand that this series of CPUs is not a big step-up from the Phenom family in this regard) One big difference between the FX/Bulldozer family and my Intel and Phenom systems is that the FX/Bulldozer has a 256 bit floating point unit while the older systems have a 128 bit unit. Is it possible that we are looking at some sort of cumulative error caused by the difference in floating point precision? At this point I have aborted GFAM tasks on this system, while allowing them to run on my other systems. Any information would be apreciated. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
----------------------------------------Took some digging but at spot 654 is your CPU and 30 others, so maybe one of the others check in with an answer: http://boincstats.com/stats/host_cpu_stats.ph...eamid=&st=650&or= Linux 10.04Lts I'm gathering. OSses should have no influence on computational outcome. Oh, OCing? It did cause a problem for someone on DSFL, the same science engine, but as not a problem for you, less likely. If the FP precision would cause a minor difference is only possible for the techs to see in the full data output. Really depends on how the tolerances are set. Result logs are always of interest, so you might [if not done already] want to compare to your wingman (click on WU name on Result Status page). If I were you, I'd wait for the 3rd copy and see which goes invalid. ttyl --//-- edit: Link to a related thread with the 8120 as co-topic: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,32102 [Edit 1 times, last edit by Former Member at Nov 24, 2011 10:27:38 AM] |
||
|
Chris Holvenstot
Cruncher USA Joined: Aug 26, 2011 Post Count: 19 Status: Offline Project Badges: |
@SekeRob - thanks for the reply. I have been playing with this since the last time I posted and an still scratching my head. I am having a little trouble navigating the WCG website infrastructure so any advice you have on where to find things would be appreciated.
----------------------------------------Is there an "easy" way to search out other users who may be running one of these Bulldozer/FX chips? It sure would be nice to get some sort of positive affirmation that other users of this CPU family are able to run GFAM without issue. Since I last posted I have replaced memory and under clocked the system - not that the FX needs any help in under performing. I also jumped the kernel up to a 3.2 version which has some FX optimizations. None of this was of any help. Each and every GFAM task I run on this system goes inconclusive and then into the invalid bin. I am able to run rosetta, poem, and other WCG tasks without issue - at the original speed or at the under clocked speed. I still suspect a cumulitive "rounding error" using the 256 bit floating point vs the more industry standard 128 bit unit. I know, I can be pig headed. Where can I look to see evidence on what the validators believe was wrong with the output produced by GFAM tasks run on this system? (the WCG website is very complete, but the two things I have yet to be able to find is how do you isolate the RAC for a single system which is useful when optimizing systems, and where do you go to look at debug or diagnostic information) I can read a stack trace and a panic dump. I am also proficient in setting breakpoints so if there is a way to get the same task to run on multiple systems, I can halt the task and compare results. Once again, any information you can provide would be appreciated. BTW - I've had GFAM "disabled" in the profile for this system so there are no current tasks to peer at. However, if you can provide me with a "crowbar" to open the lid and examine the internal workings I will happily reenable GFAM to get a few samples. Thanks [Edit 1 times, last edit by Chris Holvenstot at Dec 29, 2011 7:39:48 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Chris Holvenstot,
As I recall, the Bulldozer 256-bit FPU should run as 2 ordinary FPUs. None of our code is using the special instructions to unify it. Whatever the problem is will presumably be repeated by the second person to run a Bulldozer on GOFAM. Then the staff can start investigating. Or else it will not be repeated, in which case we can all begin pulling out our hair. Either way, I think this problem will have to wait until 2012. Something to look forward to for the new year! Keep us informed about anything unusual on the Bulldozer. I think that there is a fair amount of interest. Lawrence |
||
|
Chris Holvenstot
Cruncher USA Joined: Aug 26, 2011 Post Count: 19 Status: Offline Project Badges: |
Lawrence - thank you for the response and the information. You are correct that Bulldozer's 256 bit FPU can run as two discrete 128 bit units. And it appears that updated compiler support is required to use it in the 256 bit mode. Further, AMD's FX supports Fused Multiply / Accumulate (FMA) operations while running in the 128 bit mode.
An FMA op feeds the result of a floating point multiply directly into the accumulators without rounding which increases both the speed and the precision of the operation. The FMA op seems to be much like what you would see in a GPU. I have yet to determine when the FMA operation is used instead of the "traditional" method of doing a "multiply" followed by an "add" I have been focused on the FPU because that is one of the big differences between the FX and previous processors. I guess the next step for me will be to toss together a few "test" routines I can run on both the FX, one of my Phenom's and my Intel based Mac Pro. It sure is nice being retired and having the time to do this... |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
I have an 8150 arriving on the third or so. I'll be running it with Ubuntu 11.10 and I'll report in what I see.
----------------------------------------From my previous experience DSFL and GFAM are VERY unforgiving when it comes to validation. Anyone running an Athlon XP will report the same behavior. It's not that the AMD chips are bad, it's just that the results aren't similar enough to match Intel. It will be interesting to see how it does with GFAM but I'm expecting the exact results that you're seeing. The techs are aware of the XP issue and have not commented on it yet. Distributed computing volunteer since September 27, 2000 |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
I recall that during the DSFL and GFAM early betas, WUs for both Phenoms and Athlons were going inconclusive to invalid. Then the techs adjusted something in the validation, and ever since the next round of betas, my Phenom II (also running Ubuntu 10.04.3) has had no issue with DSFL and GFAM.
----------------------------------------I'll be very interested to see how this unfolds with the Bulldozer chips. |
||
|
davyboy
Cruncher Joined: Apr 28, 2007 Post Count: 9 Status: Offline Project Badges: |
Having similar issues with GFAM and DFSL on AMD 6274 CPUs.
|
||
|
Chris Holvenstot
Cruncher USA Joined: Aug 26, 2011 Post Count: 19 Status: Offline Project Badges: |
@davyboy: Thank you - I was starting to question both my sanity and technical savvy. Now only my sanity is in question.
For those not familiar with AMD's Opteron 6274, it is is a fairly new offering from AMD for the server world. It implements the same Bulldozer microarchitecture used in AMD's FX chips. I guess it is time to turn this one over to the admins/devs - however, if there is anything I can do to help resolve this issue, just ask. I "opened up" my bulldozer based system to GFAM for a short while this morning and by this evening there should be several "fresh" examples of this issue in case anyone wants to examine them in the light of day. Once again DavyBoy - thanks. |
||
|
Chris Holvenstot
Cruncher USA Joined: Aug 26, 2011 Post Count: 19 Status: Offline Project Badges: |
Bump
Anyone at the project looking at this and is there anything I can do to support those efforts? |
||
|
|