| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 9
|
|
| Author |
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hello !
----------------------------------------I am a little bit disappointed this morning finding the following failures: WU: lh054_00060 CPU time: 44.23 Claimed/granted Boinc credit: 566.1 / 0.0 <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> ... ------------ WU: lg597_00103 CPU time: 9.37 Claimed/granted Boinc credit: 63.1 / 0.0 <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> Failed to get VersionInfo size: 2 </stderr_txt> ... What did go wrong ? Regards, |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Okay, step by step, please visit the many times mentioned Result Status page and copy the lines into a post for the work units you posted above. A line would typically look like:
----------------------------------------dddt0101a0038_ ZINC04146649-0001_ 06_ 0-- Lapsed-01 Pending Validation 09/05/2007 07:49:47 09/11/2007 07:45:56 4.83 51.0 / 0.0 The 566.10 hours looks like a job that ran over it's time out. Did you never see that no progress was made in the Tasks Tab of BOINCmgr? On the second WU I'll reserve the response until the requested lines have been copy/pasted.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi Sekerob
----------------------------------------here is the requested info: lh054_ 00060_ 13-- kermc03 Error 09/08/2007 21:22:24 09/11/2007 04:51:32 44.23 566.1 / 0.0 Because the computer is running alone only for crunching purpose, I am not looking very often on it. The error message, I put in my initial e-mail, mentioned already that the CPU experienced a time out (over the time). The second info is: lg597_ 00103_ 10-- kerdiwi01 Error 09/07/2007 08:53:48 09/08/2007 01:12:05 9.37 63.1 / 0.0 I reported this both failures because I was surprised of them. Normally, I did not have too many failures. Regards |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hi Sekerob here is the requested info: lh054_ 00060_ 13-- kermc03 Error 09/08/2007 21:22:24 09/11/2007 04:51:32 44.23 566.1 / 0.0 Because the computer is running alone only for crunching purpose, I am not looking very often on it. The error message, I put in my initial e-mail, mentioned already that the CPU experienced a time out (over the time). The second info is: lg597_ 00103_ 10-- kerdiwi01 Error 09/07/2007 08:53:48 09/08/2007 01:12:05 9.37 63.1 / 0.0 I reported this both failures because I was surprised of them. Normally, I did not have too many failures. Regards I fear you have a machine with an issue of hanging on the occasional HPF2 job. If you can isolate the machine by linking it to a specific profile (very easily created and standard called school, work, home) and deselect HPF2 in the device profile you'd not have to worry about that client. DDDT and FA@H are stable, but HPF2 has the strange looping. Closing the project and restarting usually makes it run proper. 566 hours is a pity... that's nearly 4 weeks. You might want to consider setting up BOINCview for remote monitoring. I'll ask the techs as it was understood previously that CPU time-out was 2 weeks i.e. 296 hours and wallclock 3 weeks.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Sep 11, 2007 2:14:22 PM] |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
566 hours is a pity... Sekerob, it is "only" 44.23 hours. 566.1 are the claimed credits. That does not change the problem, but the damage is less dramatic. Cheers. Jean. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
![]()
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi everybody !
----------------------------------------Me again ! I observed again during the last two weeks some failing WUs for HPF2. lh694_ 00118_ 2-- (2.74 hours 47.8 claimed points) lh618_ 00092_ 7-- (37.04 hours 244.4 claimed points) lh582_ 00045_ 4-- (23.21 hours 411.5 claimed points) I am wondering why the same WU can run successfully on some devices and failed by others ! In my particular case, the devices currently crunching are "state of the art" in terms of CPU and RAM. Is it possible that some WUs behave unforeseeable depending of the CPU which computes them ? Cheers, |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello KerSamson,
There is a known bug in HPF2 which causes some work units to get caught in an infinite loop. We have never been able to locate the bug because the same work unit will run fine if it is run again on the same computer. This is probably a problem with an uninitialized memory location or an out of bounds memory access. This problem can be solved by restarting the HPF2 work unit from the last check point. I think it has happened twice on my computer. I know it has happened once. Sekerob has posted in Start Here: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16378 Lawrence |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hello lawrencehardin,
----------------------------------------Thank you very much for your feedback. Indeed such failures are the worst for analysing and solving ! Everybody having developed and debugged software (especially real-time one or directly using low level languages) does know too good this nightmare. Considering how many WUs, my systems complete weekly, the numbers of errors is finally limited. By the way, two systems are working again for DDDT since two weeks without any problem (unlike one month ago). Have a nice week-end, |
||
|
|
|