Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: A different kind of monster WU |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
We've all seen the monster WUs which take many hours over a single position, but I've come across a different kind of monster. It is checkpointing regularly, but is just huge. Admittedly, this is on a slow machine by current standards, but still it's looking massive.
CMD2_0002-RADIA.clustersOccur-TCTPA.clustersOccur_62_1 checkpoint CPU time: 138049.800000 current CPU time: 140425.600000 fraction done: 0.400359 |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Is it 6.13 running a new or old work unit? The new were to complete the position that passes the 4 hour mark. For now best guesstimate is that this job has X positions including a monster and is now working them all off.
----------------------------------------Meantime, has your rDCF gone haywire too? Fraction done reads as 40.04% edit: an ?
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jun 1, 2009 3:42:42 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Kremmen, you were told that some tasks were expected to be large.
Waste your time reporting them if you want, but I feel these reports are just more clutter. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Kremmen, you were told that some tasks were expected to be large. As far as I could tell, some tasks were expected to be large because they would have some unexpected and unpredictably long positions to calculate out of the hundred or so in the WU. I'd not seen anything that indicated that some WUs would simply have vastly more positions in them than most. This WU is a monster because it's trying to calculate a massive number of positions more than any other WU I've seen. Something like 600. (It was a little over 1/3 of the way through when it had 200 entries in wcg_checkpoint_00.ckp) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Is it 6.13 running a new or old work unit? The new were to complete the position that passes the 4 hour mark. For now best guesstimate is that this job has X positions including a monster and is now working them all off. Clearly, it's an old one. As I said, it does not have X positions including a monster one. It appears to simply have a huge number of positions (~600). My wingman has finished, so you can see how big this is from his results: CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 0-- 613 Pending Validation 30/05/09 02:05:33 1/06/09 19:06:40 57.75 1,101.8 / 0. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Partial quote knreed with emphasis"Our algorithm was based on the assumption that an average position would take a few seconds to compute"
----------------------------------------Let's see, "checkpointing regularly" 03/06/2009 06:54:24 World Community Grid [checkpoint_debug] result CMD2_0003-ENOAA.clustersOccur-ITCHA.clustersOccur_7_26124_26731_0 Has 607 positions and completed in 1.89 hours, about 0.186 minutes (11 seconds) per position. Your 600 positions task at 40% after 39 hours must then be checkpointing about every 9.75 minutes or 54 times slower than mine, which is slower than the a few seconds. Time to consider to detach that device from HCMD2 if this bothers too much, certainly at your reported near 100% fail rate.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jun 3, 2009 5:55:39 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Your 600 positions task at 40% after 39 hours must then be checkpointing about every 9.75 minutes or 54 times slower than mine, which is slower than the a few seconds. Indeed. That's why I reported it. Clearly, some WUs are totally missing the estimate of a few seconds per position and not only because they have a few monster positions that take hours to compute. My wingman's 57.75 hours indicates checkpointing about every 6 mins and the credit claim of 1,101.8 suggests that it's a well above average machine. (HCMD2 average points per hour of runtime = 100.42 = 14.34 boinc points/hour = 828 boinc points/60 hours.) It's not "bothering" me that this WU exists. I was just pointing out that there are WUs out there which are huge and don't fit into the patterns previously described. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
54 times slower than a these days average Windows machine is a point of reflection for support where 100x was not considered acceptable. I'll watch what shakes out.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Okay, maybe my wingman's credit claim was too high, but still it was a very long WU. Amazingly, it actually validated too!!
CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 0-- 613 Valid 5/30/09 02:05:33 6/1/09 19:06:40 57.75 1,101.8 / 1,101.8 CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 1-- 613 Valid 5/30/09 02:05:15 6/5/09 03:42:25 89.65 387.5 / 1,101.8 |
||
|
|