Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1691 times and has 10 replies Next Thread
davidhobbs
Senior Cruncher
England
Joined: Dec 30, 2004
Post Count: 152
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Regular errors on one device

I have taken the plunge and switched all my devices from UD to BOINC so I am just beginning to climb up a new learning curve! They all seem quite happy apart from one particular device, a 1GHz 256MB XP-Pro machine. It has now returned six results, three of which are valid and three in error. The error messages include the lines "Failed to get version info size 1812" and "Unhandled exception detected".

How should I go about understanding what has happened? This device performed quite happily for many months running the UD agent.

Thanks,
David.
[Sep 16, 2007 10:33:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

David --

Please clip the messages from BOINC Manager and post them here.

Thanks,
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 16, 2007 10:40:13 AM]
[Sep 16, 2007 10:39:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
davidhobbs
Senior Cruncher
England
Joined: Dec 30, 2004
Post Count: 152
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

Hello Dave Bell,

This is what BOINC manager currently shows:

15/09/2007 22:03:16||Starting BOINC client version 5.8.15 for windows_intelx86
15/09/2007 22:03:16||log flags: task, file_xfer, sched_ops
15/09/2007 22:03:16||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
15/09/2007 22:03:16||Data directory: C:\Program Files\BOINC
15/09/2007 22:03:16||Processor: 1 AuthenticAMD AMD Duron(tm) Processor [x86 Family 6 Model 7 Stepping 1] [fpu tsc sse 3dnow mmx]
15/09/2007 22:03:16||Memory: 255.48 MB physical, 617.50 MB virtual
15/09/2007 22:03:16||Disk: 18.64 GB total, 15.09 GB free
15/09/2007 22:03:16|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 283134; location: (none); project prefs: default
15/09/2007 22:03:16||General prefs: from World Community Grid (last modified 2007-08-25 16:55:58)
15/09/2007 22:03:16||Host location: none
15/09/2007 22:03:16||General prefs: using your defaults
15/09/2007 22:03:18|World Community Grid|Restarting task lh249_00139_8 using hpf2 version 518
16/09/2007 06:03:15|World Community Grid|Sending scheduler request: To fetch work
16/09/2007 06:03:15|World Community Grid|Requesting 81 seconds of new work
16/09/2007 06:03:20|World Community Grid|Scheduler RPC succeeded [server version 509]
16/09/2007 06:03:20|World Community Grid|Deferring communication for 5 min 3 sec
16/09/2007 06:03:20|World Community Grid|Reason: requested by project
16/09/2007 06:03:23|World Community Grid|[file_xfer] Started download of file lh280-289_lh288.fasta.gz
16/09/2007 06:03:23|World Community Grid|[file_xfer] Started download of file lh280-289_lh288.psipred.gz
16/09/2007 06:03:24|World Community Grid|[file_xfer] Finished download of file lh280-289_lh288.fasta.gz
16/09/2007 06:03:24|World Community Grid|[file_xfer] Throughput 218 bytes/sec
16/09/2007 06:03:24|World Community Grid|[file_xfer] Finished download of file lh280-289_lh288.psipred.gz
16/09/2007 06:03:24|World Community Grid|[file_xfer] Throughput 1844 bytes/sec
16/09/2007 06:03:24|World Community Grid|[file_xfer] Started download of file lh280-289_lh288.psipred_ss2.gz
16/09/2007 06:03:24|World Community Grid|[file_xfer] Started download of file lh280-289_aalh28803_05.075_v1_3.gz
16/09/2007 06:03:25|World Community Grid|[file_xfer] Finished download of file lh280-289_lh288.psipred_ss2.gz
16/09/2007 06:03:25|World Community Grid|[file_xfer] Throughput 7772 bytes/sec
16/09/2007 06:03:25|World Community Grid|[file_xfer] Started download of file lh280-289_aalh28809_05.075_v1_3.gz
16/09/2007 06:03:31|World Community Grid|[file_xfer] Finished download of file lh280-289_aalh28803_05.075_v1_3.gz
16/09/2007 06:03:31|World Community Grid|[file_xfer] Throughput 75508 bytes/sec
16/09/2007 06:03:36|World Community Grid|[file_xfer] Finished download of file lh280-289_aalh28809_05.075_v1_3.gz
16/09/2007 06:03:36|World Community Grid|[file_xfer] Throughput 96664 bytes/sec
16/09/2007 10:48:42|World Community Grid|Task lh249_00139_8 exited with zero status but no 'finished' file
16/09/2007 10:48:42|World Community Grid|If this happens repeatedly you may need to reset the project.
16/09/2007 10:48:49|World Community Grid|Restarting task lh249_00139_8 using hpf2 version 518


... but the results in question were returned before the dates shown here.

The error messages I referred to were the ones shown in My Grid, Results Status. I'm not clear if these would have been repeated in BOINC manager on the actual device?

This device runs overnight and would have gone into hibernation at 06:30 today. I restarted it at about 10:48 to get the data you asked for.

Sorry for the confusion, but I find this BOINC stuff utterly confusing at the moment. When I was young I used to find change stimulating and refreshing... now it's just something else to be endured!

David.
[Sep 16, 2007 11:09:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

David --

I believe the "exited with zero status but no 'finished' file" is one you will often see when exiting hibernation. I believe the "UNHANDLED EXCEPTION" message should be followed by a statement about the exception that was not handled. I am not sure, but the unable to get version info sounds like it may relate to a communications problem. Perhaps another of the CA's or another member might be able to shed more light on these.
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 16, 2007 12:09:54 PM]
[Sep 16, 2007 11:59:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

The FAQ section maintainede by the CA's on WCG called 'Start Here', has an item on the VFAQ item of '1812' (always reminds me of a Napoleontic event). http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=15646

Shortly I'll be posting an FAQ listing down the various existing error logs, which are all found in the BOINC program directory, the key being stdoutdae.txt. On slower crunchers it holds months of entries. Also stderrdae.txt can hold the interesting stuff to post, so we can have a communal look without having the second guess.

From looking at the log above, hope your virtual memory is auto-expand. It being too tight could in combination with other processes lead to issues, otherwise 3 invalid, 3 error would recommend a memtest86 run and verification you got all the latest video drivers.

The [zero status] error is amongst as Dave indicates a classic hibernation notification. I think you wrote the manual. Timing important, i switched off any auto system time synching getting rid of most of these messages. You'll see them at mostly at restart.... just ignore them unless you see many appearing in a short time frame.

The unhandled exception i've seen in every DDD-T log. Not sure, but maybe it's looking for a checkpoint, when at start there is no checkpoint.
Added: The link of course :O
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Sep 16, 2007 1:28:38 PM]
[Sep 16, 2007 1:18:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
davidhobbs
Senior Cruncher
England
Joined: Dec 30, 2004
Post Count: 152
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

Thanks Sek,

I can confirm my virtual memory is set to system-determined size, and I believe I have all the latest drivers installed, I will perform a memory check as you suggest but this device has been running the UD agent happily at 100% throttle.

David.
[Sep 16, 2007 5:26:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

Throttle, yea, are yu running 'Minimum Impact' or 'Maximum Output' profile? The log suggests you've not attached to any device profile, suggesting default, suggesting you're running at 60%. It's been a source for some to have problems.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 16, 2007 5:34:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
davidhobbs
Senior Cruncher
England
Joined: Dec 30, 2004
Post Count: 152
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

I'm running a custom profile, where I have set all devices to run all the time at 100% CPU and 100% memory usage.

David.
[Sep 18, 2007 10:53:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

David,

I have an older device that was also getting a lot of invalids. I noticed that the only stable project that the device was not getting invalids was FAAH. Once I changed that device's profile to crunching just FAAH's, I have not had a single invalid. I think the HPF2 and DDDT projects do not like older systems. Look at your valids and invalids and see if a pattern similar to mine emerges. It may be as simple as just changing your profile.

Dan
[Sep 18, 2007 1:17:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Regular errors on one device

Older systems may have *not up to date* drivers and libraries. It's a bit contradictory as FA@H and DDDT use the identical underlying science engine 'AutoDock 4.0 / 2007 Update). Also their memory footprint is close to each other.

Any pertinent correlation would be most helpful. E.g. did the work-unit flunking occur for a WU where the graphics were viewed, even briefly?
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 18, 2007 1:59:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread