| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges:
|
Hello all. Recently this summer I upgraded my main system to use a Ryzen Threadripper 2950x and have been happily crunching away on all projects on World Community Grid.
----------------------------------------However, I started noticing a trend with the amount of Invalid results the Threadripper machine will return. While it's not a super accurate ratio, it seems that 1 in every 100 results turns out to be Invalid. I have done no overclocking to the system, and have not touched any of the precision boost settings. This is the only computer that has this issue in my grid, and it's weird that it seems to be completely inconsistent when it decides to go Invalid. No other project on World Community Grid has this issue, nor any other BOINC project I've run outside of World Community Grid, just Mapping Cancer Markers. I can provide more information if needed. Computer Settings: OS: Windows 10 Professional 1903 Build (18362.239) CPU: AMD Ryzen Threadripper 2950X @ Stock Settings One Invalid Result Log:
|
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
How does a valid result differ from an invalid result ? If you post bot an invalid and a valid we could compare.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges:
|
It was my understanding that an Invalid result occurs when the results of two computations don't match. In this case, I don't understand why the same machine would run some correctly and others not.
----------------------------------------Invalid Result:
Valid Result:
|
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
The only thing that really looks any different are the lines:
----------------------------------------SearchAlgorithmNumberToCreate = 576 SearchAlgorithmNumberToCreate = 76 The first one is the invalid one and the second is valid one. Not having looked at any other work units of MCM in this depth, I wonder if the value in the first one is too big. Did the wingman complete the invalid unit to a valid conclusion ? If for all or most of your invalids, if the wingman is completing them to a valid state, then the problem is with your machine. Just what it might be I am clueless. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges:
|
On almost all of them (I believe maybe 1 or 2 of them all were invalid, but it's still such a small sample size of them), when the workunit is sent to a third person, it comes back valid.
----------------------------------------What's also weird is, sometimes there's a couple days where there are no invalid results, followed by a few days where 6 or 7 of the 450 results done that day were invalid. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
On almost all of them (I believe maybe 1 or 2 of them all were invalid, but it's still such a small sample size of them), when the workunit is sent to a third person, it comes back valid. What's also weird is, sometimes there's a couple days where there are no invalid results, followed by a few days where 6 or 7 of the 450 results done that day were invalid. Transient heat, memory, or voltage variations perhaps ???? Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges:
|
Until I can get it sorted out, or until it's found that something else is going on, I've decided to not run Mapping Cancer Markers on that machine, as I don't like the idea of wasting hours working on a workunit for it to come out invalid.
----------------------------------------It's very weird that it only affects some workunits and not others, and only for Mapping Cancer Markers. Regardless, there are other machines in my grid that can crunch the project, and have done so without issue in the past. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Just in the last couple of days I am seeing re-sends of MCM unit in a greater proportion than previously. I might see 1 or 2 a day normally, but now am seeing perhaps 10 to 12. The majority of them are "no reply." Just an observation.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
ThreadRipper
Veteran Cruncher Sweden Joined: Apr 26, 2007 Post Count: 1324 Status: Offline Project Badges:
|
The problem you describe is something I have seen on my ThreadRipper 2990WX as well (before I adjusted down the RAM frequency).
----------------------------------------What is your RAM running at? Is XMP profile enabled in BIOS? In my case, XMP 3466Mhz did not work. Most probably my Motherboard did not like that RAM kit I have (not on QVL list) since I upgraded from ThreadRipper 1950X to 2990WX and the RAM was not stable above 2933Mhz. However, I was able to lower the latencies from 16-18-18-36-1T to 14-13-13-28-1T and I have no invalid WUs anylonger. So, since CPU is at stock I would check the RAM first. Just dial in a speed that is one step lower than the one you are running now and see if the frequency of Invalid results lowers or disappears altogether. ![]() Join The International Team: https://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=CK9RP1BKX1 AMD TR2990WX @ PBO, 64GB Quad 3200MHz 14-17-17-17-1T, RX6900XT @ Stock AMD 3800X @ PBO AMD 2700X @ 4GHz |
||
|
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges:
|
This is a late bump, but I wanted to post that I figured out what the problem was.
----------------------------------------Despite being marketed as 2666MHz, the system became more unstable as time went on from my last post. Since my last post here, I had gone around other projects experimenting away from World Community Grid. During my adventures, I crunched a couple numbers for GIMPS, and using Prime95 found that the system was unstable, and would error out on results. Sure enough, if I brought the RAM down to 2400MHz, the system passed Prime95. So I must have gotten a incorrectly binned kit of RAM. Anyway, since coming back to World Community Grid a couple days ago, I have yet to return an Invalid MCM Workunit. So mystery solved. Thanks for those above that reached out. |
||
|
|
|