Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 40
Posts: 40   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6286 times and has 39 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
I aborted one of these work units today

I aborted task ach1_8_62_10 (I think this is correct) today. The
task had accumulated over 15 hours of CPU time, and the 'To
completion' time for the task was increasing all day long instead
of decreasing.

-Brenda Helminen, Michigan Tech
[Nov 15, 2007 1:12:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

I don't know what went wrong, but I think Abort is the right thing to do in such cases. It immediately tells the srver to try another computer with a fresh copy.
[Nov 15, 2007 11:50:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

I just aborted 3 work units today that were doing that. I left one running that has now been running 23.5 hours, has 2 hours remaining, and is 91% complete just to see what the result is.

These were aborted:

ach1_16_26 (11.54 hrs CPU time)

ach1_18_44 (16.99 hrs CPU time)

ach1_18_52 (10.34 hrs CPU time)

This one is still running:

ach1_18_43


Hope some this info helps........


BTW, the computer these are/were running on has a QX6700 CPU, 3GB RAM, and 200GB hard drive space dedicated to WCG.

Memory Settings:

Use no more than: 60 % of memory while computer in use
Use no more than: 80 % of memory while computer idle
Use no more than: 80 % of virtual memory
----------------------------------------

[Jan 25, 2008 2:21:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

This isn't looking good at all....... Now up to 25.5 hrs run time, 94% progress, and 1.5 hrs till completion.
----------------------------------------

[Jan 25, 2008 4:19:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

There is a possibility that these work units are just really, really huge.

If they are larger than predicted, then the behaviour with the estimated time increasing and the progress increasing very slowly is exactly what we would expect to see.
[Jan 25, 2008 4:31:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

See other thread regarding this project and errors incurred. The longest did here 20:48 hours on a P4 2.53ghz and yes, the remaining time was increasing until it stabilised ar 21 hours and when using the system the predicted times went longer.... stopped using and it slowly crept down. It's an intense job, hungry for lots of CPU L2 cache and the less there is the more it slows down. Do 14 of these and you got the sunshine badge to show off. To compare, on the Q6600 it was done in 7 hours.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Jan 25, 2008 5:18:34 PM]
[Jan 25, 2008 5:17:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

There is a possibility that these work units are just really, really huge.

If they are larger than predicted, then the behaviour with the estimated time increasing and the progress increasing very slowly is exactly what we would expect to see.



If the problem is that they are that huge and not something else, they never should have been sent out in the first place. Just think of the time it would take to run one sent to a computer that is slower than mine. shock


Now at 26.5 hrs run time, 95.193 completed, 1 hr 17+ minutes to go.....
----------------------------------------

[Jan 25, 2008 5:21:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Did WCG not warn that the reference machine took 14 hours to do them and they were tough to chew? Just consider the accomplishment biggrin
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jan 25, 2008 5:24:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

See other thread regarding this project and errors incurred. The longest did here 20:48 hours on a P4 2.53ghz and yes, the remaining time was increasing until it stabilised ar 21 hours and when using the system the predicted times went longer.... stopped using and it slowly crept down. It's an intense job, hungry for lots of CPU L2 cache and the less there is the more it slows down. Do 14 of these and you got the sunshine badge to show off. To compare, on the Q6600 it was done in 7 hours.



A P4 @ 2.53GHz doesn't even come close to comparing to a QX6700 @ 3.5GHz. laughing laughing laughing

Appears to me that there's a problem with some of these work units. Maybe there isn't??? I don't know for sure. That's why I posted the information that I posted. I'm not complaining - just posting my experiences with this project with the hope that it will help.

I've two of these work units that were completed on a machine with a Q6700, 3GB RAM, and ~100GB of hard drive space. Those two are:

ach1_17_63 (6.2 hours) and ach1_17_45 (6.19 hours)
----------------------------------------

[Jan 25, 2008 5:32:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

WCG aim for 10 hours per work unit, but it's not always possible. Even in a batch of normal work units, there are sometimes monster units.

I don't know enough about ACH to say whether this work unit is unusual for the batch or not. Sometimes, the only way we can find out is by trying the work units, and watching for problems.
[Jan 25, 2008 5:33:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 40   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread