Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 149
Posts: 149   Pages: 15   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12269 times and has 148 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

I suspended all tasks that my machine was simultaneously running except for the mother-of-all Wus. Now my machine is chewing through that one WU post haste.

It's at 45:45 CPU time and 72.9% complete. It's going to be close, but my machine should kick the rest out before the timeout times being reported.
[Aug 2, 2008 9:24:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

I now have my first failure crying

8/2/2008 3:54:46 PM|World Community Grid|Aborting task faah5012_1qbt_1hpx_01_0: exceeded CPU time limit 177736.732098
8/2/2008 3:54:51 PM|World Community Grid|Computation for task faah5012_1qbt_1hpx_01_0 finished

This is a lot of work I did that I did not get credit for. I have a dozen machines and I am curretly running 8 tasks in progress for this project that will exceed 40 hours and at least another ten waiting.

If I get another failure, with no credit, i will delete all work units and stop working on this project. This is not a good way to run a project. I want to help but I like gettig credit for my work. sad
----------------------------------------



[Aug 2, 2008 9:31:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

If the same or equal machines as the one that timed out do these faah50xx, it would be the prudent thing to do.

Sorry for the inconvenience. We'll report the time outs to the techs to highlight the issue for their consideration.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Aug 2, 2008 9:39:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

mclaver, obviously we are all deeply apologetic about the wasted work, but unfortunately there was no way to recall the work once started. You know incidents such as this are rare at WCG, and the techs will assign you retrospective credit if they can.

Can you tell me whether the work unit is claiming credit for the time spent? Check the results status page once it is reported.
[Aug 2, 2008 9:53:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
sad Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

Thanks for your response...and "sympathy" smile . I know have discoverd a second failure due to CPU limit.

8/2/2008 11:04:40 AM|World Community Grid|Aborting task faah5009_1hvk_1bv7_00_0: exceeded CPU time limit 201447.246158
8/2/2008 11:04:45 AM|World Community Grid|Computation for task faah5009_1hvk_1bv7_00_0 finished

This one shows Zero credit. I also have another one on 7/29 that failed with zero credit in the results page but I do not have a log entry for it. The previous fialure I reported had not shown up on the resuts page yet becasue it has not uploaded yet. This is at least three significant efforts with no credit.

I still have four on another machine that have been running over 47 hours and have another 20 hours to go. This is a lot of work to loose. Do these have a chance to complete without timeing out? I am willing to give them a chance but am I waisting my time. I do not want to donate another 240 hours of CPU time with no credit.

If you check, I have been a pretty good contributor to WCG over the last couple of years, and this is the first time I have been unhappy. If there is a way to get some credit for this wasted work that would be great. smile
----------------------------------------



[Aug 2, 2008 10:18:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

mclaver, it is the weekend now. Hopefully the techs will work out a way to give credit for the wasted work on Monday. They have done so in the past, but working out a fair way to do it isn't guaranteed.

Consider aborting any work you are sure won't finish. You may lose credit, but better to avoid wasting further time on them.

These particular work units are important to the scientists, so they will be reissued soon with corrected time estimates. I'm afraid this means the long work units will continue to be issued for a while longer.
[Aug 2, 2008 10:26:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

I understand the weekend so no hurry to resolve right now. Since I have a number of machines, and this work is important I will and see what happens and not cancel anything yet.

Is there a way to save my log, figure out when these work units started, and when they exceeded cpu limit, to determine the value of the work I performed, even though they did not complete. This would seem to be a fair way to calcualte credit.

- Mitch
----------------------------------------



[Aug 2, 2008 10:40:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
petehardy
Senior Cruncher
USA
Joined: May 4, 2007
Post Count: 318
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

mclaver,

I'm sure that everyone got blindsided by these giant jobs, and lessons will be learnt from them. It's even possible that some of the results may be usable. To me it's like an involuntary beta test, the best lessons are learnt from our mistakes.

Pete
----------------------------------------

"Patience is a virtue", I can't wait to learn it!
[Aug 3, 2008 1:06:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Second really long work unit received

I got another one of the monsters - faah5015_ range.

So far 21% completed after 12 hours, estimated 57 hours to do the entire task. My previous one was completed in 33 hours so this one is even bigger.

This is the short deadline monster - I think it'll be done just before the deadline. I did note that one of the original tasks that got an error after 0.70 hours claiming 3.1 credit. Since I am already way past the error point, I will let it run and see what happens, if it completes, I calculated it will claim a even more whopping 660 credits. Wow.

Fingers (and toes) crossed.

JB
[Aug 3, 2008 12:06:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

I can only presume when these WU's were first created, someone ran a sample set of them to determine how long they would take. If they noticed similar things I would have anticipated before the WU's were released to the community for general use, all machines in the grid would have been placed into a specific class. The class would have had to be set up by system capacity, and availability. Then these WU's dispatched only to systems in classes capable of providing results without failing.

From where I sit, this is more of a technical issue in the scheduler / dispatcher.

I'm sure it would annoy the fool out of me if this append to me too.

But heck... what would I know. wink
----------------------------------------
[Edit 2 times, last edit by Former Member at Aug 3, 2008 12:26:42 PM]
[Aug 3, 2008 12:22:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 149   Pages: 15   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread