Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 149
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I suspended all tasks that my machine was simultaneously running except for the mother-of-all Wus. Now my machine is chewing through that one WU post haste.
It's at 45:45 CPU time and 72.9% complete. It's going to be close, but my machine should kick the rest out before the timeout times being reported. |
||
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I now have my first failure
----------------------------------------![]() 8/2/2008 3:54:46 PM|World Community Grid|Aborting task faah5012_1qbt_1hpx_01_0: exceeded CPU time limit 177736.732098 8/2/2008 3:54:51 PM|World Community Grid|Computation for task faah5012_1qbt_1hpx_01_0 finished This is a lot of work I did that I did not get credit for. I have a dozen machines and I am curretly running 8 tasks in progress for this project that will exceed 40 hours and at least another ten waiting. If I get another failure, with no credit, i will delete all work units and stop working on this project. This is not a good way to run a project. I want to help but I like gettig credit for my work. ![]() ![]() ![]() ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
If the same or equal machines as the one that timed out do these faah50xx, it would be the prudent thing to do.
----------------------------------------Sorry for the inconvenience. We'll report the time outs to the techs to highlight the issue for their consideration.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
mclaver, obviously we are all deeply apologetic about the wasted work, but unfortunately there was no way to recall the work once started. You know incidents such as this are rare at WCG, and the techs will assign you retrospective credit if they can.
Can you tell me whether the work unit is claiming credit for the time spent? Check the results status page once it is reported. |
||
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for your response...and "sympathy"
----------------------------------------![]() 8/2/2008 11:04:40 AM|World Community Grid|Aborting task faah5009_1hvk_1bv7_00_0: exceeded CPU time limit 201447.246158 8/2/2008 11:04:45 AM|World Community Grid|Computation for task faah5009_1hvk_1bv7_00_0 finished This one shows Zero credit. I also have another one on 7/29 that failed with zero credit in the results page but I do not have a log entry for it. The previous fialure I reported had not shown up on the resuts page yet becasue it has not uploaded yet. This is at least three significant efforts with no credit. I still have four on another machine that have been running over 47 hours and have another 20 hours to go. This is a lot of work to loose. Do these have a chance to complete without timeing out? I am willing to give them a chance but am I waisting my time. I do not want to donate another 240 hours of CPU time with no credit. If you check, I have been a pretty good contributor to WCG over the last couple of years, and this is the first time I have been unhappy. If there is a way to get some credit for this wasted work that would be great. ![]() ![]() ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
mclaver, it is the weekend now. Hopefully the techs will work out a way to give credit for the wasted work on Monday. They have done so in the past, but working out a fair way to do it isn't guaranteed.
Consider aborting any work you are sure won't finish. You may lose credit, but better to avoid wasting further time on them. These particular work units are important to the scientists, so they will be reissued soon with corrected time estimates. I'm afraid this means the long work units will continue to be issued for a while longer. |
||
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I understand the weekend so no hurry to resolve right now. Since I have a number of machines, and this work is important I will and see what happens and not cancel anything yet.
----------------------------------------Is there a way to save my log, figure out when these work units started, and when they exceeded cpu limit, to determine the value of the work I performed, even though they did not complete. This would seem to be a fair way to calcualte credit. - Mitch ![]() ![]() ![]() |
||
|
petehardy
Senior Cruncher USA Joined: May 4, 2007 Post Count: 318 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
mclaver,
----------------------------------------I'm sure that everyone got blindsided by these giant jobs, and lessons will be learnt from them. It's even possible that some of the results may be usable. To me it's like an involuntary beta test, the best lessons are learnt from our mistakes. Pete ![]() "Patience is a virtue", I can't wait to learn it! |
||
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I got another one of the monsters - faah5015_ range.
So far 21% completed after 12 hours, estimated 57 hours to do the entire task. My previous one was completed in 33 hours so this one is even bigger. This is the short deadline monster - I think it'll be done just before the deadline. I did note that one of the original tasks that got an error after 0.70 hours claiming 3.1 credit. Since I am already way past the error point, I will let it run and see what happens, if it completes, I calculated it will claim a even more whopping 660 credits. Wow. Fingers (and toes) crossed. JB |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I can only presume when these WU's were first created, someone ran a sample set of them to determine how long they would take. If they noticed similar things I would have anticipated before the WU's were released to the community for general use, all machines in the grid would have been placed into a specific class. The class would have had to be set up by system capacity, and availability. Then these WU's dispatched only to systems in classes capable of providing results without failing.
----------------------------------------From where I sit, this is more of a technical issue in the scheduler / dispatcher. I'm sure it would annoy the fool out of me if this append to me too. But heck... what would I know. ![]() [Edit 2 times, last edit by Former Member at Aug 3, 2008 12:26:42 PM] |
||
|
|
![]() |