Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1223 times and has 8 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
A different kind of monster WU

We've all seen the monster WUs which take many hours over a single position, but I've come across a different kind of monster. It is checkpointing regularly, but is just huge. Admittedly, this is on a slow machine by current standards, but still it's looking massive.

CMD2_0002-RADIA.clustersOccur-TCTPA.clustersOccur_62_1

checkpoint CPU time: 138049.800000
current CPU time: 140425.600000
fraction done: 0.400359
[Jun 1, 2009 3:34:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Is it 6.13 running a new or old work unit? The new were to complete the position that passes the 4 hour mark. For now best guesstimate is that this job has X positions including a monster and is now working them all off.

Meantime, has your rDCF gone haywire too? Fraction done reads as 40.04%

edit: an ?
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Jun 1, 2009 3:42:42 PM]
[Jun 1, 2009 3:40:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Kremmen, you were told that some tasks were expected to be large.

Waste your time reporting them if you want, but I feel these reports are just more clutter.
[Jun 1, 2009 3:45:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Kremmen, you were told that some tasks were expected to be large.

As far as I could tell, some tasks were expected to be large because they would have some unexpected and unpredictably long positions to calculate out of the hundred or so in the WU. I'd not seen anything that indicated that some WUs would simply have vastly more positions in them than most.

This WU is a monster because it's trying to calculate a massive number of positions more than any other WU I've seen. Something like 600. (It was a little over 1/3 of the way through when it had 200 entries in wcg_checkpoint_00.ckp)
[Jun 2, 2009 2:14:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Is it 6.13 running a new or old work unit? The new were to complete the position that passes the 4 hour mark. For now best guesstimate is that this job has X positions including a monster and is now working them all off.

Clearly, it's an old one. As I said, it does not have X positions including a monster one. It appears to simply have a huge number of positions (~600). My wingman has finished, so you can see how big this is from his results:

CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 0-- 613 Pending Validation 30/05/09 02:05:33 1/06/09 19:06:40 57.75 1,101.8 / 0.
[Jun 3, 2009 4:52:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Partial quote knreed with emphasis"Our algorithm was based on the assumption that an average position would take a few seconds to compute"

Let's see, "checkpointing regularly"

03/06/2009 06:54:24 World Community Grid [checkpoint_debug] result CMD2_0003-ENOAA.clustersOccur-ITCHA.clustersOccur_7_26124_26731_0

Has 607 positions and completed in 1.89 hours, about 0.186 minutes (11 seconds) per position. Your 600 positions task at 40% after 39 hours must then be checkpointing about every 9.75 minutes or 54 times slower than mine, which is slower than the a few seconds. Time to consider to detach that device from HCMD2 if this bothers too much, certainly at your reported near 100% fail rate.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Jun 3, 2009 5:55:39 AM]
[Jun 3, 2009 5:42:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Your 600 positions task at 40% after 39 hours must then be checkpointing about every 9.75 minutes or 54 times slower than mine, which is slower than the a few seconds.

Indeed. That's why I reported it. Clearly, some WUs are totally missing the estimate of a few seconds per position and not only because they have a few monster positions that take hours to compute.

My wingman's 57.75 hours indicates checkpointing about every 6 mins and the credit claim of 1,101.8 suggests that it's a well above average machine. (HCMD2 average points per hour of runtime = 100.42 = 14.34 boinc points/hour = 828 boinc points/60 hours.)

It's not "bothering" me that this WU exists. I was just pointing out that there are WUs out there which are huge and don't fit into the patterns previously described.
[Jun 3, 2009 6:39:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

54 times slower than a these days average Windows machine is a point of reflection for support where 100x was not considered acceptable. I'll watch what shakes out.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 3, 2009 7:15:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A different kind of monster WU

Okay, maybe my wingman's credit claim was too high, but still it was a very long WU. Amazingly, it actually validated too!!

CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 0-- 613 Valid 5/30/09 02:05:33 6/1/09 19:06:40 57.75 1,101.8 / 1,101.8
CMD2_ 0002-RADIA.clustersOccur-TCTPA.clustersOccur_ 62_ 1-- 613 Valid 5/30/09 02:05:15 6/5/09 03:42:25 89.65 387.5 / 1,101.8
[Jun 5, 2009 5:05:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread