Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 98
Posts: 98   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8498 times and has 97 replies Next Thread
TXR13
Cruncher
Canada
Joined: Dec 5, 2005
Post Count: 36
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Monster WU on the loose...

So, now that CMD2 6.13 is released, I'm watching my P3 server chew into some HCMD2 work. But it looks like I've got a real monster, named CMD2_ 0002-RADIA.clustersOccur-RADIA.clustersOccur_ 1984_ 0.

Currently, after 20:35:12 crunching time, the unit is 2.516% complete. Correct me if I'm wrong, but I thought the techs had adjusted the units so that they would exit after computing the current position, if a certain time had elapsed. What was that time, exactly? I remember something like 8-10 hours, though I'm probably wrong...

Last checkpoint was taken four hours ago, when it would have been only 16.5 hours into crunching. If the time limit was already passed, shouldn't it have ended then? Or am I missing something here? confused
----------------------------------------

[May 29, 2009 2:25:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

I have one of these bad boyz and mine has been crunching for 1.5 hours at 0%. worried

Edit: 2.25 hours still 0%
----------------------------------------
[Edit 1 times, last edit by Former Member at May 29, 2009 3:12:46 AM]
[May 29, 2009 2:32:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

TXR13,

When did the 2.516% [checkpoint] occur exactly? If your <checkpoint_debug> flag is on in the cc_config.xml, I'd like to see the series for this task, and given it's P3, it can't be to long a list.

Brinktastee,

Yours is more obvious. It immediately started on a tough position so it will stay on 0% until the position is complete. The algorithm will determine if to continue or end the trip for this task. If it decides to finish the task, the percent will jump to 100%. Follow comment by armstrdj about a coming change to introduce percent progress on long positions, not just at the end.

As always with Quorum projects, the place to ascertain how the wingman/wingmen fared is the WU Detail, one level deeper into the Result Status page.

edit: correct name armstrdj
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at May 29, 2009 2:40:15 PM]
[May 29, 2009 7:27:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

My monster WU (Pentium 4, see here) has checkpointed at least twice in its (so far) 33-hour run:

374.855400 secs
78189.820000 secs
current CPU time is 118850.300000 secs and reports 2.5157% complete
[May 29, 2009 8:13:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

hmmm, logic defying, I'd have expected this to finish right on the second position. Oh well, a new version was scheduled to do more refined % progress display anyhow.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at May 29, 2009 8:20:04 AM]
[May 29, 2009 8:19:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

hmmm, logic defying, I'd have expected this to finish right on the second position.

That's why I mentioned it. Also, 1300 minutes is a huge number of std deviations away from the mean/s.d. figures given by knreed, even given that this is a P4 and not a Core duo/quad/etc.
[May 29, 2009 10:45:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

21.61 hours for the second position where desired max length is 8 hours and the max overall not expected to exceed 10-12 hours. Maybe after all, P3 (or a P4 that has hyperthreading) on should be excluded from running this project, or an additional 'yes I read and understand the consequence' option before getting any more of these jobs. tongue
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at May 29, 2009 12:03:58 PM]
[May 29, 2009 12:03:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

Not ideal of course, but we have to appreciate that a single position job taking near 22 hours and not checkpointing in between gets a few volunteers upset when resuming at 0.0%, zero CPU time after a power out.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2009 12:07:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

21.61 hours for the second position where desired max length is 8 hours and the max overall not expected to exceed 10-12 hours. Maybe after all, P3 (or a P4 that has hyperthreading) on should be excluded from running this project

Hmmm. This PC is mostly getting much higher credit than requested. The WUs' RSS are tiny and I suspect most code is running in cache, which is nice for (relatively) high clock speed CPUs. Over the last 40 HCMD2 WUs, the fastest machine this one has ever been paired with completed in 40% of the time (0.81 vs 2.03 hours). Average is about 70%. If the average (of fast machines, as it's mostly version 611 results) PC on WCG would take 15 hours, maybe all machines need the disclaimer! tongue
[May 29, 2009 1:03:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Monster WU on the loose...

Not really, since P3 in general are outliers in the total pool of volunteers and one has to draw the line somewhere, some day wink

Fortunately, in the last week there was repeat comment on automatically generating light work for light devices, automatic without need to opt-in, in infant shoes of development smile

But, when checking the quorums of the last forty on the pairings, you do seem to have stretched the analytical effort quite a bit. Enjoy.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2009 1:12:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 98   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread