Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 24
Posts: 24   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2586 times and has 23 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

crying
X0000051081451200506301656_ 0-- 674419 Pending Validation 09/04/2008 06:26:32 09/04/2008 13:57:04 7.22 104.5 / 0.0
lx345_ 00002_ 15-- 674419 Error 09/04/2008 13:54:06 09/04/2008 13:57:04 0.01 0.2 / 0.0
lx307_ 00085_ 15-- 674419 Error 09/04/2008 13:51:49 09/04/2008 13:54:06 0.01 0.2 / 0.0
lx344_ 00011_ 13-- 674419 Error 09/04/2008 13:46:06 09/04/2008 13:49:27 0.02 0.2 / 0.0
lx343_ 00002_ 1-- 674419 Error 09/04/2008 13:10:18 09/04/2008 13:49:27 0.45 6.5 / 0.0
lx327_ 00019_ 17-- 674419 Error 09/04/2008 06:14:08 09/04/2008 13:49:27 7.51 108.6 / 0.0
lx292_ 00058_ 5-- 674419 Error 09/04/2008 07:09:17 09/04/2008 13:49:27 6.37 92.2 / 0.0
X0000051081229200506301700_ 1-- 674419 Pending Validation 09/04/2008 06:13:00 09/04/2008 13:46:05 7.07 102.3 / 0.0

One error after 100% computing time, one after about 90% computing time and four errors after a very short runtime.
All HPF2 jobs with: Output file xxxxxx absent. The two HCC jobs were successfull.
I detach from HPF2 for a while and let the other WCG projects run without changing something.

From the messages (filtered out the downloads):

08:03:13 Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 2 completed tasks
08:03:18 Scheduler request succeeded: got 0 new tasks
08:07:12 Resetting project
08:12:58 Sending scheduler request: To fetch work. Requesting 1107 seconds of work, reporting 0 completed tasks
08:13:03 Scheduler request succeeded: got 1 new tasks
08:13:15 Starting X0000051081229200506301700_1
08:13:15 Starting task X0000051081229200506301700_1 using hcc1 version 606
08:14:06 Sending scheduler request: To fetch work. Requesting 312 seconds of work, reporting 0 completed tasks
08:14:11 Scheduler request succeeded: got 1 new tasks
08:15:21 Starting lx327_00019_17
08:15:22 Starting task lx327_00019_17 using hpf2 version 518
08:26:30 Sending scheduler request: To fetch work. Requesting 70 seconds of work, reporting 0 completed tasks
08:26:35 Scheduler request succeeded: got 1 new tasks
08:39:08 Starting X0000051081451200506301656_0
08:39:08 Starting task X0000051081451200506301656_0 using hcc1 version 606
09:09:15 Sending scheduler request: To fetch work. Requesting 3 seconds of work, reporting 0 completed tasks
09:09:20 Scheduler request succeeded: got 1 new tasks
09:22:21 Starting lx292_00058_5
09:22:21 Starting task lx292_00058_5 using hpf2 version 518
15:10:17 Sending scheduler request: To fetch work. Requesting 19 seconds of work, reporting 0 completed tasks
15:10:22 Scheduler request succeeded: got 1 new tasks
15:20:02 Computation for task X0000051081229200506301700_1 finished
15:20:02 Starting lx343_00002_1
15:20:03 Starting task lx343_00002_1 using hpf2 version 518
15:20:05 Started upload of X0000051081229200506301700_1_0
15:20:11 Finished upload of X0000051081229200506301700_1_0
15:46:03 Sending scheduler request: To fetch work. Requesting 56 seconds of work, reporting 1 completed tasks
15:46:08 Scheduler request succeeded: got 1 new tasks
15:47:21 Computation for task lx327_00019_17 finished
15:47:21 Output file lx327_00019_17_0 for task lx327_00019_17 absent
15:47:21 Computation for task lx343_00002_1 finished
15:47:21 Output file lx343_00002_1_0 for task lx343_00002_1 absent
15:47:21 Starting lx344_00011_13
15:47:21 Starting task lx344_00011_13 using hpf2 version 518 ------------------------------ All this in 1 second
15:47:22 Computation for task lx292_00058_5 finished
15:47:22 Output file lx292_00058_5_0 for task lx292_00058_5 absent
15:48:22 Computation for task lx344_00011_13 finished
15:48:22 Output file lx344_00011_13_0 for task lx344_00011_13 absent

15:49:26 Sending scheduler request: To fetch work. Requesting 2839 seconds of work, reporting 4 completed tasks
15:49:31 Scheduler request succeeded: got 1 new tasks
15:49:47 Starting R00105_e53070496bca57c42a71512adfb0c7ce_03_001_1
15:49:47 Starting task R00105_e53070496bca57c42a71512adfb0c7ce_03_001_1 using rice version 617
15:50:36 Sending scheduler request: To fetch work. Requesting 2031 seconds of work, reporting 0 completed tasks
15:50:41 Scheduler request succeeded: got 1 new tasks
15:50:47 Starting X0000051090077200506160847_1
15:50:47 Starting task X0000051090077200506160847_1 using hcc1 version 606
15:51:47 Sending scheduler request: To fetch work. Requesting 1261 seconds of work, reporting 0 completed tasks
15:51:52 Scheduler request succeeded: got 1 new tasks
15:52:06 Starting lx307_00085_15
15:52:06 Starting task lx307_00085_15 using hpf2 version 518
15:52:53 Sending scheduler request: To fetch work. Requesting 511 seconds of work, reporting 0 completed tasks
15:52:58 Scheduler request succeeded: got 1 new tasks
15:52:59 Computation for task lx307_00085_15 finished
15:52:59 Output file lx307_00085_15_0 for task lx307_00085_15 absent
15:53:06 Starting R00105_b3be9bdc15c18bc35c730af66ff31598_03_001_16
15:53:06 Starting task R00105_b3be9bdc15c18bc35c730af66ff31598_03_001_16 using rice version 617
15:54:04 Sending scheduler request: To fetch work. Requesting 817 seconds of work, reporting 1 completed tasks
15:54:09 Scheduler request succeeded: got 1 new tasks
15:55:08 Computation for task X0000051081451200506301656_0 finished
15:55:08 Starting lx345_00002_15
15:55:08 Starting task lx345_00002_15 using hpf2 version 518
15:55:10 Started upload of X0000051081451200506301656_0_0
15:55:17 Finished upload of X0000051081451200506301656_0_0
15:56:01 Computation for task lx345_00002_15 finished
15:56:01 Output file lx345_00002_15_0 for task lx345_00002_15 absent
15:57:02 Sending scheduler request: To fetch work. Requesting 865 seconds of work, reporting 2 completed tasks
15:57:07 Scheduler request succeeded: got 1 new tasks
15:57:27 Starting faah4333_000617_MC_xMut_md02680_01_0
15:57:27 Starting task faah4333_000617_MC_xMut_md02680_01_0 using faah version 606
----------------------------------------
[Edit 3 times, last edit by Crystal Pellet at Sep 4, 2008 5:19:04 PM]
[Sep 4, 2008 4:20:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

I detach from HPF2 for a while and let the other WCG projects run without changing something.


This while became a fortnight for a long testing period.
Except updating to Boinc version 6.2.18 from 6.2.16 on the 5th of September, I crunched in that period 18 FAAH's, 4 DDDT's, 18 HCC's and 17 NRftW's.

All with a valid result.

And hundreds of valid ABC's (sorry Sek, I'm still sponsoring this tiny smile project, until next Spring, then it should finish)
So I'm still wondering what was going wrong with the HPF2 jobs, but I will try 1 again.
[Sep 18, 2008 11:25:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

I reported in the beta section to follow up on the 6.03 test version. Before 5.18 was doing great on my Quad/Vista HP to include under BOINC 6 test and 6.18, but the HPF2 6.03 is not doing well now on that platform:

ly122_ 00010_ 11-- 628290 Error 09/17/2008 20:16:46 09/18/2008 07:29:19 0.02 0.3 / 0.0
ly112_ 00082_ 0-- 628290 Error 09/17/2008 17:15:56 09/18/2008 07:27:41 0.02 0.4 / 0.0
ly110_ 00029_ 12-- 628290 Error 09/17/2008 16:03:44 09/18/2008 02:57:52 0.02 0.3 / 0.0
ly100_ 00064_ 16-- 628290 Error 09/17/2008 12:33:37 09/18/2008 02:56:21 0.02 0.3 / 0.0
ly084_ 00037_ 11-- 628290 Error 09/17/2008 08:05:24 09/17/2008 08:11:40 0.02 0.3 / 0.0

All the same error:
Result Log <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> Funzione non corretta. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR:: Exit at: .\dock_structure.cc line:401 </stderr_txt> ]]>

All fails within minutes (first intermediate progress file write), which reminds me of a RICE fix that was applied after which it did not re-occur.

Here a filter of all HPF2's on the first page, latest work which makes it quite obvious for that device:

ly122_ 00010_ 11-- 628290 Error 09/17/2008 20:16:46 09/18/2008 07:29:19 0.02 0.3 / 0.0
ly112_ 00082_ 0-- 628290 Error 09/17/2008 17:15:56 09/18/2008 07:27:41 0.02 0.4 / 0.0
ly097_ 00012_ 16-- 628290 Pending Validation 09/17/2008 11:34:39 09/18/2008 07:24:51 4.45 68.7 / 0.0
ly110_ 00029_ 12-- 628290 Error 09/17/2008 16:03:44 09/18/2008 02:57:52 0.02 0.3 / 0.0
ly100_ 00064_ 16-- 628290 Error 09/17/2008 12:33:37 09/18/2008 02:56:21 0.02 0.3 / 0.0
ly084_ 00001_ 9-- 628290 Pending Validation 09/17/2008 08:05:24 09/17/2008 14:32:07 5.14 78.0 / 0.0
ly084_ 00037_ 11-- 628290 Error 09/17/2008 08:05:24 09/17/2008 08:11:40 0.02 0.3 / 0.0
lx240_ 00047_ 11-- 628290 Valid 09/02/2008 12:25:50 09/03/2008 16:37:19 6.67 101.3 / 100.2
lx187_ 00019_ 8-- 628290 Valid 09/02/2008 09:25:23 09/03/2008 14:42:28 5.59 84.9 / 79.4
lx244_ 00056_ 7-- 628290 Valid 09/02/2008 13:42:36 09/03/2008 14:09:32 4.66 70.9 / 67.8
lx224_ 00005_ 5-- 628290 Valid 09/02/2008 05:58:51 09/03/2008 06:25:32 5.74 87.5 / 85.7
lx164_ 00038_ 8-- 628290 Valid 09/02/2008 00:54:24 09/03/2008 06:21:36 5.93 90.5 / 85.0
lx221_ 00009_ 10-- 628290 Valid 09/02/2008 04:34:30 09/03/2008 05:39:22 5.41 82.5 / 81.8
lx197_ 00031_ 8-- 628290 Valid 09/01/2008 19:18:10 09/02/2008 18:48:47 4.02 61.1 / 64.0
lx147_ 00046_ 9-- 628290 Valid 09/01/2008 17:30:08 09/02/2008 17:20:25 5.49 83.4 / 85.5

I've deselected Proteome Folding for now since positively nothing changed on the client but the September OS patches. Avast logs are clean and 2 out of 7 somehow did get through on this RICE/HPF2 profile cruncher.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 18, 2008 11:39:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

Sekerob, just a reminder. The troubles I had with HPF2 was with version 5.18 also on a quad and it seems to occur when there were 2 or more HPF2 running on a quad at the same time. I thought already about problems with memory allocations or something like that.

My 6.03 job is running almost an hour now. I will see how it will end and let the world (you,forum) know. In the mean time I will study the beta section about this item.
[Sep 18, 2008 12:27:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

I've deselected Proteome Folding for now since positively nothing changed on the client but the September OS patches.
I am not qualified to debug this problem, but it sounds like you have identified the problem yourself. Maybe the September OS patches are the problem.
[Sep 18, 2008 12:43:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

I've deselected Proteome Folding for now since positively nothing changed on the client but the September OS patches.
I am not qualified to debug this problem, but it sounds like you have identified the problem yourself. Maybe the September OS patches are the problem.
[Sep 18, 2008 12:45:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

Crystal Pellet,

The time fails were identical for RICE, passing the first few minutes and they finished fine. It's an interesting thought that running them 1 at the time works fine. The first 2 were started same time and 1 failed. The second Pending Validation (Top of list) also ran when a second HPF2 started, 2 failing in a row right around that time.

The top 2 errors also occurred when a 3rd HPF2 was running and still is, giving some credence to the speculation that there somehow could be x-job interference. 3GB Ram, with 2.7 allowed to be used by BOINC should more than suffice.

Astrolab, I very much doubt a MS patch relationship for a number of reasons. I mentioned this angle just for completeness.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Sep 18, 2008 1:04:46 PM]
[Sep 18, 2008 12:56:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

The single HPF2 job has finished successful on my quad concurrently running with 2 HCC's and 1 FAAH-job.
The pending validation result is:

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>
called boinc_finish

</stderr_txt>
]]>

I will try to get 2 HPF2-jobs concurrently running and see what happen.

Cheers
[Sep 18, 2008 7:58:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

I will try to get 2 HPF2-jobs concurrently running and see what happen.


smile News or sad News !!

I cannot reproduce the errors. Even when I run 2 HPF2's concurrently.
See the stdoutdea-output and the result logs of those two jobs:

18-Sep-2008 22:19:06 [World Community Grid] Starting ly191_00058_16
18-Sep-2008 22:19:06 [World Community Grid] Starting task ly191_00058_16 using hpf2 version 603
19-Sep-2008 04:12:31 [World Community Grid] Computation for task ly191_00058_16 finished
19-Sep-2008 04:12:33 [World Community Grid] Started upload of ly191_00058_16_0
19-Sep-2008 04:12:40 [World Community Grid] Finished upload of ly191_00058_16_0

ly191_ 00058_ 16-- 674419 Pending Validation 09/18/2008 19:36:22 09/19/2008 06:15:11 5.86 77.6 / 0.0

Result Log

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>
called boinc_finish

</stderr_txt>
]]>

and

18-Sep-2008 22:19:39 [World Community Grid] Starting ly194_00007_0
18-Sep-2008 22:19:39 [World Community Grid] Starting task ly194_00007_0 using hpf2 version 603
18-Sep-2008 23:36:03 [World Community Grid] Resuming task ly194_00007_0 using hpf2 version 603
19-Sep-2008 02:11:35 [World Community Grid] Resuming task ly194_00007_0 using hpf2 version 603
19-Sep-2008 03:46:51 [World Community Grid] Resuming task ly194_00007_0 using hpf2 version 603
19-Sep-2008 04:22:47 [World Community Grid] Computation for task ly194_00007_0 finished
19-Sep-2008 04:22:50 [World Community Grid] Started upload of ly194_00007_0_0
19-Sep-2008 04:22:59 [World Community Grid] Finished upload of ly194_00007_0_0

ly194_ 00007_ 0-- 674419 Pending Validation 09/18/2008 20:19:27 09/19/2008 06:15:11 5.76 76.3 / 0.0

Result Log

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>
called boinc_finish

</stderr_txt>
]]>

You see both jobs started almost at the same time, just the second one had to wait three times to run because of short debt.

Because of still pending validation I suppose two valids. On my machine no errors. I'm not amused about not reproducable errors/bugs.
[Sep 19, 2008 6:53:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error on HPF2 job

Well, there were 3 eventually running simultaneous and finishing, but then the next one failed when 3 RICE were running, which throws the concurrent HPF2 idea out the window.

9/19/2008 11:49:46 AM|World Community Grid|Starting ly202_00065_4
9/19/2008 11:49:46 AM|World Community Grid|Starting task ly202_00065_4 using hpf2 version 603
9/19/2008 11:51:26 AM|World Community Grid|Computation for task ly202_00065_4 finished
9/19/2008 11:51:26 AM|World Community Grid|Output file ly202_00065_4_0 for task ly202_00065_4 absent

Same error in log.

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
Funzione non corretta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\dock_structure.cc line:401

</stderr_txt>
]]>

Deselected until there is a technical response.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 19, 2008 11:26:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 24   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread