Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2953 times and has 10 replies Next Thread
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
A couple of HCC WUs errored out

I have an old machine (Intel P4, I think) that has been crunching HCC WUs without any problems. Between yesterday and today, I have had a couple of WUs error out:

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
19:41:10 (1720): No heartbeat from core client for 30 sec - exiting

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00415352 write attempt to address 0x0000000C

Engaging BOINC Windows Runtime Debugger...


Is this because of my machine or are these WUs funky? If any more of the error dump is needed, I can provide it (just didn't want to create a massive post if not needed).

Thanks,
CJSL
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Apr 11, 2012 4:25:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
depriens
Senior Cruncher
The Netherlands
Joined: Jul 29, 2005
Post Count: 350
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

I've been running HCC exclusively on my machines for quite some time now and the feeder seems to have stopped. No new workunits have been sent for the last few hours and I start getting other project's workunits.
Maybe the reason the feeder is stopped has something to do with the errors you encounter...
----------------------------------------

[Apr 11, 2012 5:57:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

depriens... thanks for the response/observation. I just got another HCC WU with error (makes a total of 3) crying . I haven't seen anything in the forums (known issues or HCC) indicating that there is a problem with HCC. Until things get sorted out, I'll switch to another project on my old PC.

Thanks,
CJSL
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


----------------------------------------
[Edit 1 times, last edit by cjslman at Apr 11, 2012 6:23:23 PM]
[Apr 11, 2012 6:21:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
marvey11
Advanced Cruncher
Germany
Joined: Apr 2, 2011
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

There's a P4 among my machines running almost exclusively on HCC1 tasks, with only the occasional CEP2 job. I've had no errors so far on any machine (only some of those jobs did run a lot longer than estimated), so that's probably not the reason. But I can confirm that the tasks arrive here only in trickles (if even that), although my hosts usually request work for 12 hours.

12-Apr-2012 00:34:26 [World Community Grid] [sched_op] CPU work request: 43236.08 seconds; 0.00 devices
12-Apr-2012 00:34:30 [World Community Grid] Scheduler request completed: got 0 new tasks
...
12-Apr-2012 00:41:49 [World Community Grid] [sched_op] CPU work request: 43716.93 seconds; 0.00 devices
12-Apr-2012 00:41:53 [World Community Grid] Scheduler request completed: got 0 new tasks
...
12-Apr-2012 01:07:20 [World Community Grid] [sched_op] CPU work request: 45365.53 seconds; 0.00 devices
12-Apr-2012 01:07:24 [World Community Grid] Scheduler request completed: got 1 new tasks
12-Apr-2012 01:07:24 [World Community Grid] [sched_op] estimated total CPU task duration: 9929 seconds
...
12-Apr-2012 01:40:06 [World Community Grid] [sched_op] CPU work request: 37451.11 seconds; 0.00 devices
12-Apr-2012 01:40:09 [World Community Grid] Scheduler request completed: got 2 new tasks
12-Apr-2012 01:40:09 [World Community Grid] [sched_op] estimated total CPU task duration: 19858 seconds


Something's definitely going on...

EDIT: BTW, times are UTC+2 ...
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by marvey11 at Apr 12, 2012 12:04:39 AM]
[Apr 12, 2012 12:00:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

Well... I haven't seen any scientists posting any known problems with HCC and I haven't seen people running around in the street in a panic shock , so... I'll assume that it's either my machine (which I'm doubting due to the time of the failure) or some (only a few) misbehaved WUs escaped the HCC lab to do havoc and mayhem on my machine devilish .

I've switched back to HCC and have some WUs ready to run (they probably start tomorrow morning or midday). Let's see how they behave ... thinking

CJSL
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Apr 12, 2012 11:20:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
LAZA74
Advanced Cruncher
Germany
Joined: Sep 28, 2008
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

I got sometimes this problem, maybe it was that before:

Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | Finished download of hcc1_image02_6.40.tga
Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | [error] File hcc1_image02_6.40.tga has wrong size: expected 5500, got 32812
Mi 09 Mai 2012 08:21:12 CEST | World Community Grid | [error] Checksum or signature error for hcc1_image02_6.40.tga

Is there something know about checksum errors?
----------------------------------------
NAS - Eigenbau
Xiaomi Mi 10T
[May 9, 2012 6:31:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

Checksum could be an indication that the cloud distributed part needs a refresh. Only the fixed files such as the .tga shown in your messages come from the cloud btw.

Some have copied these fixed files from other hosts and put them on the problem machine, which work fine on same science, but not everyone can (or feels save doing that).

--//--รน

edit: cjslman's problem is covered in the Start Here FAQ's. The 1073... and heartbeat are device problems.
----------------------------------------
[Edit 2 times, last edit by Former Member at May 9, 2012 7:06:07 AM]
[May 9, 2012 7:03:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

Thanks for the suggestions... after analyzing the frequency of the errors, which were increasing (and have gotten a few errors in the past from other projects on the same machine), I have decided to pull the computer from the crunching effort crying . It's an old and slow desktop computer which is only needed as a print server. Hopefully in the future it can be replaced with a new multicore one.

CJSL
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


----------------------------------------
[Edit 1 times, last edit by cjslman at May 9, 2012 12:27:50 PM]
[May 9, 2012 12:27:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dskagcommunity
Senior Cruncher
Austria
Joined: May 10, 2011
Post Count: 219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

Would try memtest, perhaps only the memory got defect. When you have a spare part (or two or more in the computer and not all are needed and you can remove one) its not a big thing.
----------------------------------------
http://www.research.dskag.at
Crunching for my Dog who had "good" Braincancer.


----------------------------------------
[Edit 1 times, last edit by dskagcommunity at May 9, 2012 3:20:31 PM]
[May 9, 2012 3:19:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
LAZA74
Advanced Cruncher
Germany
Joined: Sep 28, 2008
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A couple of HCC WUs errored out

Would try memtest, perhaps only the memory got defect. When you have a spare part (or two or more in the computer and not all are needed and you can remove one) its not a big thing.


IF it would be a defective RAM i would get the errors on all WUs and not only with HCC!?!
I'm cunching on all projects from WCG, plus Spinhenge, Leiden Classics, QMC, and eOn2 and got no problems there.

So my suggesting was that some of the WUs cause this problem (or maybe a sector or two on the HDD are bad and coincidentially where used by HCC?).

Also got now another error message for the last 4 WUs:

Result Name: X0960062360913200512150926_ 1--
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>hcc1_image02_6.40.tga</file_name>
<error_code>-200</error_code>
</file_xfer_error>

</message>
]]>

At least, i had to reinstall this machine (upgrade to XUbuntu Precise) and do another partition layout (cause of other problems, look there:
https://secure.worldcommunitygrid.org/forums/...ead,32661_offset,0#377163) and will crying for help if the problems continuity...

Thanks to all for your help!
----------------------------------------
NAS - Eigenbau
Xiaomi Mi 10T
[May 14, 2012 2:21:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread