Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2384 times and has 6 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Timed out WUs that were OK for peers?

One of my computers spent 56 hours on 7 work units that were issued to it on 11/14. If I understand this "work unit status" page (below) correctly, 2 peers processed the same work unit in under 4 hours (mine being the "Error" one below). This computer has two quad core Xeon 5410 processors with 4 GB memory running 32 bit RHEL 5.2 using BOINC 6.2.15. It would appear that the problem was on my end, but I have no idea what would have caused it.

X0000092121074200710041010_ 2-- 603 Valid 11/18/09 21:21:57 11/19/09 03:06:02 3.82 72.8 / 43.4
X0000092121074200710041010_ 1-- 603 Valid 11/14/09 19:38:18 11/18/09 09:05:19 3.93 43.4 / 43.4
X0000092121074200710041010_ 0-- 603 Error 11/14/09 19:35:54 11/18/09 21:04:10 56.11 765.6 / 0.0

[Nov 19, 2009 3:01:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

Hi,

first look in the Result Status page and click on the error log. Probably a exceeded cpu time???? Please copy paste in your next post.

BOINC 6.2.15 ? What OS?

Was the system recently booted? If so can you please go to the beginning of the client message log and also post the top 30 lines of startup information. This will give us all the base info needed to understand your sytem better.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 19, 2009 3:18:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

Sorry that I didn't explicitly state it earlier, but yes, they exceeded the CPU time limit:
<core_client_version>6.2.15</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259058234.000000
Skipping: /computation_deadline
In ExtractGlcmFeatures: End of 0 iteration of outer loop.
In ExtractGlcmFeatures: End of 1 iteration of outer loop.
In ExtractGlcmFeatures: End of 2 iteration of outer loop.
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259058234.000000
Skipping: /computation_deadline
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259058234.000000
Skipping: /computation_deadline
In ExtractGlcmFeatures: End of 3 iteration of outer loop.
In ExtractGlcmFeatures: End of 4 iteration of outer loop.
In ExtractGlcmFeatures: End of 5 iteration of outer loop.
In ExtractGlcmFeatures: End of 6 iteration of outer loop.
In ExtractGlcmFeatures: End of 7 iteration of outer loop.
In ExtractGlcmFeatures: End of 8 iteration of outer loop.
In ExtractGlcmFeatures: End of 9 iteration of outer loop.
In ExtractGlcmFeatures: End of 10 iteration of outer loop.
In ExtractGlcmFeatures: End of 11 iteration of outer loop.
In ExtractGlcmFeatures: End of 12 iteration of outer loop.
In ExtractGlcmFeatures: End of 13 iteration of outer loop.
In ExtractGlcmFeatures: End of 14 iteration of outer loop.
In ExtractGlcmFeatures: End of 15 iteration of outer loop.

</stderr_txt>
]]>


The OS is Red Hat Enterprise Linux (RHEL) 5.2, with real-time patched kernel 2.6.26.8-rt16.

02-Nov-2009 12:53:05 [---] Benchmark results:
02-Nov-2009 12:53:05 [---] Number of CPUs: 8
02-Nov-2009 12:53:05 [---] 2020 floating point MIPS (Whetstone) per CPU
02-Nov-2009 12:53:05 [---] 2783 integer MIPS (Dhrystone) per CPU
02-Nov-2009 12:53:06 [---] Resuming computation
02-Nov-2009 13:06:44 [World Community Grid] Computation for task mw978_00015_14 finished
02-Nov-2009 13:06:44 [World Community Grid] Starting X0000096101138200802141820_0
02-Nov-2009 13:06:44 [World Community Grid] Starting task X0000096101138200802141820_0 using hcc1 version 603
02-Nov-2009 13:06:47 [World Community Grid] Started upload of mw978_00015_14_0
02-Nov-2009 13:06:51 [World Community Grid] Finished upload of mw978_00015_14_0
02-Nov-2009 13:24:50 [---] Exit requested by user
02-Nov-2009 13:44:51 [---] Starting BOINC client version 6.2.15 for i686-pc-linux-gnu
02-Nov-2009 13:44:51 [---] log flags: task, file_xfer, sched_ops
02-Nov-2009 13:44:51 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3 c-ares/1.5.1
02-Nov-2009 13:44:51 [---] Data directory: /home/censored/BOINC
02-Nov-2009 13:44:52 [---] Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU L5410 @ 2.33GHz [Family 6 Model 23 Stepping 6]
02-Nov-2009 13:44:52 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
02-Nov-2009 13:44:52 [---] OS: Linux: 2.6.26.8-rt16
02-Nov-2009 13:44:52 [---] Memory: 3.93 GB physical, 1.94 GB virtual
02-Nov-2009 13:44:52 [---] Disk: 223.61 GB total, 204.31 GB free
02-Nov-2009 13:44:52 [---] Local time is UTC -6 hours
02-Nov-2009 13:44:52 [---] No coprocessors
02-Nov-2009 13:44:52 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 1089680; location: (none); project prefs: default
02-Nov-2009 13:44:52 [---] General prefs: from World Community Grid (last modified 27-Apr-2007 12:37:48)
02-Nov-2009 13:44:52 [---] Host location: none
02-Nov-2009 13:44:52 [---] General prefs: using your defaults
02-Nov-2009 13:44:52 [---] Preferences limit memory usage when active to 4019.70MB
02-Nov-2009 13:44:52 [---] Preferences limit memory usage when idle to 4019.70MB
02-Nov-2009 13:44:52 [---] Preferences limit disk usage to 3.73GB

[Nov 19, 2009 4:32:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

Well, I don't like the mid job

Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259058234.000000

Anything in the stdoutdae.txt log file indicating job resumes?

I've had several jobs hung on newly installed W7-64 bit post BOINC manager hangs, amongst a RICE job which showed near the end similar start all over log entries:

Result Name: R00433_ 6fdfb9e90643b6d8e2f46844db78be6f_ 03_ 008_ 7--
<core_client_version>6.10.18</core_client_version>

...

wcg_seed 41234051
Unrecognized XML in parse_init_data_file: hostid
Skipping: 1112084
Skipping: /hostid
Unrecognized XML in parse_init_data_file: starting_elapsed_time
Skipping: 10672.325539
Skipping: /starting_elapsed_time
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259348301.000000
Skipping: /computation_deadline
wcg_seed 1054116059
running time: 10160.423306
wcg_seed 493042590
Unrecognized XML in parse_init_data_file: hostid
Skipping: 1112084
Skipping: /hostid
Unrecognized XML in parse_init_data_file: starting_elapsed_time
Skipping: 10672.325539
Skipping: /starting_elapsed_time
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1259348301.000000
Skipping: /computation_deadline
wcg_seed 362329819
running time: 10162.170518

...


To my surprise full seeds were counted and sumptuous credit granted (yes RICE does more on 64 bits). In past only those seeds after these intermissions were considered, so maybe code was improved? Maybe knreed will comment.

Learning from past, I thought I'd caught all the usual security settings and exceptions, not counting on the 6.10.18 client manager itself being able to hang itself and cause sufficient interruption for the jobs to shake.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 19, 2009 4:52:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

It looks like you experienced an issue that it isn't clear what happened. However, after looking at your results since then the issue appears to have been corrected and you are no longer getting errors. Let us know if the error returns.
[Nov 20, 2009 4:42:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

One of my computers spent 56 hours on 7 work units that were issued to it on 11/14. If I understand this "work unit status" page (below) correctly, 2 peers processed the same work unit in under 4 hours (mine being the "Error" one below). This computer has two quad core Xeon 5410 processors with 4 GB memory running 32 bit RHEL 5.2 using BOINC 6.2.15. It would appear that the problem was on my end, but I have no idea what would have caused it.

X0000092121074200710041010_ 2-- 603 Valid 11/18/09 21:21:57 11/19/09 03:06:02 3.82 72.8 / 43.4
X0000092121074200710041010_ 1-- 603 Valid 11/14/09 19:38:18 11/18/09 09:05:19 3.93 43.4 / 43.4
X0000092121074200710041010_ 0-- 603 Error 11/14/09 19:35:54 11/18/09 21:04:10 56.11 765.6 / 0.0


Why does BOINC have 2 people crunch the same WU succesfully? Are all WU's crunched twice?
[Nov 20, 2009 7:34:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Timed out WUs that were OK for peers?

Hi,

please see this help article discussing the 3 types of distributions currently found at WCG.

https://secure.worldcommunitygrid.org/help/vi...o?searchString=redundancy
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 20, 2009 7:50:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread