Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2961 times and has 9 replies Next Thread
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Multiple WU Error Outs

I happen to have one small family laptop I'm running HPF2 WUs on. Generally these WUs run fine on it and I do occasionally see one WU error out and from reading this forum I know this can happen for this project.

The issue I've had twice recently is this one laptop will all of the sudden error out a series of six WUs in a row (the most recent series covered many WU series). Then it will find a WUs that goes fine to completion and things seem to then settle down.

Is it common for a series of errors to occasionally pop up like this? Or is this an indication that something else is blowing things out of water in quick succession that needs further investigation?

Just asking to see if I need to do anything or hang back and go with the flow and just keep crunching.....
[Feb 27, 2013 2:50:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

What sciences, if any, is this in combination with? [As I've posted several times], it happened for me in *longer* past when AutoDock tasks were already running]. Things would be just fine if not. Also, though a client should not affect the processing, what exact version and OS is this on. The HPF2 fails seem to be an exclusive on Windows... never been able to replicate this on Linux, where I've had a 100% thumbs up score and cant remember OSX reports, also a *nix based OS.

edit: Added the Linux bit.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 27, 2013 2:58:33 PM]
[Feb 27, 2013 2:56:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

Right now this laptop is running on Win 7, i3 laptop. It isn't running with any other projects at the moment - I'm running both of my laptops only on HPF2 WUs as I'm trying to go for an Emerald badge for my little 2 laptop team rbotterb sometime in 2Q2013.

By the way, I checked today, and the one laptop that errored out 6 times in a row for a second time earlier this week just completed another HPF2 WU to completion and as of this morning it is marked valid. Go figure.

Since my family uses (abuses?) this small laptop every day to do homework, play games, and look at e-mail - they even argue about who gets this laptop next every day (my family headache right now), is it possible that other applications also don't play nice with these HPF2 WUs running in background? Maybe a heavy load of some video game or something else that puts a stress on a small laptop that would cause it to occasionally hiccup a series of HPF2 WUs in error succession?

Again, just wondering. As long as this laptop runs most of its HPF2 WUs to completion, erroring out some isn't the end of the world and the HPF2 scientists have obviously made allowances on its set up of having 19 crunchers crunching by committee for each WU.
[Feb 28, 2013 1:33:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

Though it's "crunching by committee", each task is actually unique in it's offset departure point for the fold simulation, so ideally all 19 go valid. Each error reported will initiate sending a new copy, but there's limitation to that if the valids have already reached a critical number. Vaguely remember the number being 17 valids, where from 15 returned results the validation processes starts, to see how well they jive with each other.

Wonder if it's an out-right "too busy" thing too. Defrag disk, if possible create a separate BOINC partition which too works to reduce fragmentation [on all my devices], and one thing I've done is defrag the disk after disabling the swapfile function, then set a fixed sized swapfile, or create the swapfile too in it's own partition [something Linux does automatically]. This way when swap use expands and shrinks, the swap area remains contiguous, and speeds up reading and writing.

But, the HPF2 fails are 99.9% of the cases within a few seconds from start, so it's only costing a little bandwidth in the end. If you can live with that, just leave it be.
[Feb 28, 2013 1:56:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
biggrin Re: Multiple WU Error Outs

SekeRob - thanks for the insight. The last two WUs on the small laptop ran to completion ok, so I guess for now I'll just hang loose and not worry about it.

Like you note, at least when this blows out a series like this, each one ends very fast so grand total the only loss is not much more than a minute or two at most - nothing to be a big deal.
[Mar 2, 2013 6:23:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

16 errors on a row on my 8-threaded i7 with windows7. All the same eror:

Result Log
Result Name: qo245_ 00011_ 9--

<core_client_version>7.0.42</core_client_version>
<![CDATA[
<message>
Onjuiste functie. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\nblist.cc line:711

</stderr_txt>
]]>


4 cores with the HPF2's and the other cores FAAH.

I asked 4 new ones. 1 errored out with the same error, the other 3 keep running now.
[Mar 11, 2013 4:47:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

I had a number of HPF2 WUs error out yesterday on my 4-core laptop. Still got 7 WUs done by EOD, so the fact 6-8 WUs errored out is just life. I was running my laptop at the time connected remotely to another laptop at work. That work laptop hit a wall on memory and hung up - while getting that environment straightened up I suspect the HPF2 WUs get tripped up at the same time since about six errored out in quick succesion.

To handle these occasional HPF2 WU 'error hiccups', since my 4-core laptop is offline crunching most of the time, I've just increased my backlong to 4 days most of the time. That way if a series of HPF2 WUs blow out in a series, I still have plently of WUs in backlog to crunch more offline without running dry. Seems to be working well for me now and I'm probably 30-45 days away now from earning a 1-year badge for my little team - not too bad for a team of only two small laptops with 5 cores between the two of them. wink
[Mar 12, 2013 1:39:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs

The following WU "Computation error" got my attention when the WU was highlighted in magenta (BoincTasks 1.45):
World Community Grid 6.40 hpf2 qo210_00083_16 3/10/2013 9:02:34 PM 00:00:35 (00:00:34) - 3/20/2013 9:02:35 PM 100.000 Computation error Coltrane 98.73
Result Log
Result Name: qo210_ 00083_ 16--
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\nblist.cc line:711
</stderr_txt>
]]>

There is another 1/2 page or so of WUs but are unavailable at this time due to stats update. I'll post now until I can verify the other WU's Result log(s) later. (If needed?)

qo210_ 00084_ 16-- Coltrane Error 3/11/13 04:02:35 3/13/13 22:49:22 0.01 / 0.01 0.2 / 0.0
qo210_ 00077_ 6-- Coltrane Error 3/11/13 04:02:35 3/13/13 21:22:55 0.01 / 0.01 0.2 / 0.0
qo210_ 00083_ 16-- Coltrane Error 3/11/13 04:02:35 3/14/13 00:29:22 0.01 / 0.01 0.2 / 0.0
qo210_ 00074_ 2-- Coltrane Error 3/11/13 04:02:35 3/13/13 21:22:55 0.01 / 0.01 0.3 / 0.0
qn905_ 00019_ 3-- Coltrane Error 3/6/13 02:56:57 3/9/13 11:19:57 0.01 / 0.01 0.3 / 0.0
qn905_ 00020_ 3-- Coltrane Error 3/6/13 02:56:57 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
qn905_ 00021_ 11-- Coltrane Error 3/6/13 02:56:57 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
qn905_ 00018_ 14-- Coltrane Error 3/6/13 02:56:39 3/9/13 11:19:57 0.01 / 0.01 0.3 / 0.0
qn905_ 00017_ 1-- Coltrane Error 3/6/13 02:56:22 3/9/13 11:19:57 0.01 / 0.02 0.3 / 0.0
qn905_ 00016_ 0-- Coltrane Error 3/6/13 02:56:05 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
qn905_ 00015_ 12-- Coltrane Error 3/6/13 02:55:48 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
qn905_ 00014_ 11-- Coltrane Error 3/6/13 02:55:31 3/9/13 11:19:57 0.01 / 0.01 0.3 / 0.0
qn905_ 00013_ 11-- Coltrane Error 3/6/13 02:55:30 3/9/13 11:19:57 0.01 / 0.01 0.3 / 0.0
qn905_ 00012_ 17-- Coltrane Error 3/6/13 02:55:13 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
qn905_ 00011_ 11-- Coltrane Error 3/6/13 02:55:13 3/9/13 11:19:57 0.02 / 0.02 0.4 / 0.0
----------------------------------------

[Mar 14, 2013 1:48:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs with ERROR:: Exit at: .\nblist.cc line:711

Don't know what you guys want to hear but when a log of HPF2 includes

ERROR:: Exit at: .\nblist.cc line:711

It cannot be fixed. A too small percent does, and it's only randomly on devices, not reproducible in the labs, so programmers can't allocate targeted time to debug. Noted was that in past I had these occur more often when an AutoDock based science was already running... FAAH for sure, HFCC I cannot remember. I just don't mix FAAH & HPF2 in 1 profile.

Sorry.
[Mar 14, 2013 7:46:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple WU Error Outs with ERROR:: Exit at: .\nblist.cc line:711

Not a problem for me. Just seemed odd that the 26 WU errors occurred only on one machine out of 5.

PS. What is the search argument that I should have used? I tried before I posted without success.

Thanks
----------------------------------------

[Mar 15, 2013 5:41:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread