| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
I have an old machine (Intel P4, I think) that has been crunching HCC WUs without any problems. Between yesterday and today, I have had a couple of WUs error out:
----------------------------------------<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> 19:41:10 (1720): No heartbeat from core client for 30 sec - exiting Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00415352 write attempt to address 0x0000000C Engaging BOINC Windows Runtime Debugger... Is this because of my machine or are these WUs funky? If any more of the error dump is needed, I can provide it (just didn't want to create a massive post if not needed). Thanks, CJSL |
||
|
|
depriens
Senior Cruncher The Netherlands Joined: Jul 29, 2005 Post Count: 350 Status: Offline Project Badges:
|
I've been running HCC exclusively on my machines for quite some time now and the feeder seems to have stopped. No new workunits have been sent for the last few hours and I start getting other project's workunits.
----------------------------------------Maybe the reason the feeder is stopped has something to do with the errors you encounter... ![]() |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
depriens... thanks for the response/observation. I just got another HCC WU with error (makes a total of 3)
---------------------------------------- . I haven't seen anything in the forums (known issues or HCC) indicating that there is a problem with HCC. Until things get sorted out, I'll switch to another project on my old PC. Thanks, CJSL ---------------------------------------- [Edit 1 times, last edit by cjslman at Apr 11, 2012 6:23:23 PM] |
||
|
|
marvey11
Advanced Cruncher Germany Joined: Apr 2, 2011 Post Count: 89 Status: Offline Project Badges:
|
There's a P4 among my machines running almost exclusively on HCC1 tasks, with only the occasional CEP2 job. I've had no errors so far on any machine (only some of those jobs did run a lot longer than estimated), so that's probably not the reason. But I can confirm that the tasks arrive here only in trickles (if even that), although my hosts usually request work for 12 hours.
----------------------------------------12-Apr-2012 00:34:26 [World Community Grid] [sched_op] CPU work request: 43236.08 seconds; 0.00 devices Something's definitely going on... EDIT: BTW, times are UTC+2 ... ![]() [Edit 1 times, last edit by marvey11 at Apr 12, 2012 12:04:39 AM] |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
Well... I haven't seen any scientists posting any known problems with HCC and I haven't seen people running around in the street in a panic
---------------------------------------- , so... I'll assume that it's either my machine (which I'm doubting due to the time of the failure) or some (only a few) misbehaved WUs escaped the HCC lab to do havoc and mayhem on my machine .I've switched back to HCC and have some WUs ready to run (they probably start tomorrow morning or midday). Let's see how they behave ... CJSL |
||
|
|
LAZA74
Advanced Cruncher Germany Joined: Sep 28, 2008 Post Count: 56 Status: Offline Project Badges:
|
I got sometimes this problem, maybe it was that before:
----------------------------------------Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | Finished download of hcc1_image02_6.40.tga Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | [error] File hcc1_image02_6.40.tga has wrong size: expected 5500, got 32812 Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | [error] Checksum or signature error for hcc1_image02_6.40.tga Is there something know about checksum errors?
NAS - Eigenbau
Xiaomi Mi 10T |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Checksum could be an indication that the cloud distributed part needs a refresh. Only the fixed files such as the .tga shown in your messages come from the cloud btw.
----------------------------------------Some have copied these fixed files from other hosts and put them on the problem machine, which work fine on same science, but not everyone can (or feels save doing that). --//--รน edit: cjslman's problem is covered in the Start Here FAQ's. The 1073... and heartbeat are device problems. [Edit 2 times, last edit by Former Member at May 9, 2012 7:06:07 AM] |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
Thanks for the suggestions... after analyzing the frequency of the errors, which were increasing (and have gotten a few errors in the past from other projects on the same machine), I have decided to pull the computer from the crunching effort
---------------------------------------- . It's an old and slow desktop computer which is only needed as a print server. Hopefully in the future it can be replaced with a new multicore one.CJSL ---------------------------------------- [Edit 1 times, last edit by cjslman at May 9, 2012 12:27:50 PM] |
||
|
|
dskagcommunity
Senior Cruncher Austria Joined: May 10, 2011 Post Count: 219 Status: Offline Project Badges:
|
Would try memtest, perhaps only the memory got defect. When you have a spare part (or two or more in the computer and not all are needed and you can remove one) its not a big thing.
-------------------------------------------------------------------------------- [Edit 1 times, last edit by dskagcommunity at May 9, 2012 3:20:31 PM] |
||
|
|
LAZA74
Advanced Cruncher Germany Joined: Sep 28, 2008 Post Count: 56 Status: Offline Project Badges:
|
Would try memtest, perhaps only the memory got defect. When you have a spare part (or two or more in the computer and not all are needed and you can remove one) its not a big thing. IF it would be a defective RAM i would get the errors on all WUs and not only with HCC!?! I'm cunching on all projects from WCG, plus Spinhenge, Leiden Classics, QMC, and eOn2 and got no problems there. So my suggesting was that some of the WUs cause this problem (or maybe a sector or two on the HDD are bad and coincidentially where used by HCC?). Also got now another error message for the last 4 WUs: Result Name: X0960062360913200512150926_ 1-- <core_client_version>6.12.33</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>hcc1_image02_6.40.tga</file_name> <error_code>-200</error_code> </file_xfer_error> </message> ]]> At least, i had to reinstall this machine (upgrade to XUbuntu Precise) and do another partition layout (cause of other problems, look there: https://secure.worldcommunitygrid.org/forums/...ead,32661_offset,0#377163) and will for help if the problems continuity...Thanks to all for your help!
NAS - Eigenbau
Xiaomi Mi 10T |
||
|
|
|