Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Active Research Forum: Smash Childhood Cancer Thread: Invalid results on Linux / AMD |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 10
|
Author |
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
After a couple of weeks / months without significant troubles, one of my hosts experienced over 25 invalid results mostly with the batch SCC1_0000124_Lin-CSD-A (only a few invalid WUs with batches 118, 120, 123, 125).
----------------------------------------As usual with Vina-based projects, it is an AMD Athlon II x4 CPU. I assume, it is probably an affinity problem with my wingmen. Cheers, Yves |
||
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges: |
I have two AMD Kabini running SCC ob Linux since the start if this project. No imvalids in my side.
---------------------------------------- |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
Hi Eric,
----------------------------------------it can go well for weeks (sometime months) and suddenly there are a lot of invalid WUs at result validation without any changes on the host. Typically it occurs with Vina-based projects. Cheers, Yves |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If a system runs BOINC it's not a mission-critical system that can't be booted, and I say that with a period. At the very least I'd cycle the client with the sudo boinc-client -restart on the command line when this starts happening.
|
||
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges: |
Yves, I have never seen this behaviour on my AMDs.
----------------------------------------I have them running for three years now 24/7 on FAH, OET, ZIKA and SCC. I check a couple times a week for invalids on all my computers. Except the SCC for Android app I have a very very low rate of invalids (close to zero). |
||
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 366 Status: Offline Project Badges: |
I have a AMD Athlon II x4 640 running VINA work perfectly fine. No invalids here.
----------------------------------------Ubuntu Server 14.04 LTS 64-bit Linux 4.4.0-53-generic 4GB RAM [Edit 2 times, last edit by AgrFan at Mar 7, 2017 12:58:12 PM] |
||
|
flynryan
Senior Cruncher United States Joined: Aug 15, 2006 Post Count: 235 Status: Offline Project Badges: |
No invalids or errors on my 56 AMD/Linux cores.
|
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
I run two AMD Athlon II x4 (640) systems with Linux Mint 17.2 (still kernel 3.16). But the both CPUs have not been purchased at the same time; i.e. I suspect a CPU mask difference.
----------------------------------------This specific host is the only one experiencing from time to time such waves of invalid results with Vina since the Vina launch for supporting sciences. Results become invalid during the validation, no error during computation. After a couple of hours or days it is quiet again. Even during the period with invalid results, there are many valid results as well. It is the reason why I suspect an "affinity" problem between wingmen. Since the reason of the "invalidity" is not mentioned I can only make some assumptions. If the tech-team could send me by e-mail the reason for the missing validation, it would maybe help to understand the cause(s) of this recurring problem. Cheers, Yves |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7581 Status: Offline Project Badges: |
Perhaps you have already thought of this, but could heat be a transient problem with the one system ? The other thought that comes to mind is maybe a motherboard problem with a weak capacitor. Has that system ever been subject to a transient voltage spike ? Other than these, I would have no clue.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
Hi Sgt. Joe,
----------------------------------------I did consider this point as well. The both Athlon II x4 are in my office (with clean power) side by side. Since without any action the host generates valid results as well and "recover" for weeks or months, I do not think that it could be the reason. Likewise, I do not think that a possible RAM defect could cause this particular behaviour. My assumption is that there is a CPU mask difference. However, as mentioned, without knowing the reason for the failed validation, I cannot solve the problem. Cheers, Yves |
||
|
|