Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test starting Oct 31, 2013 [Issues Thread] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 211
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
TYT, CP.
Let's see if I can summarize all the issues, OTOH: 1) Output file too large (Error -131) 2) Maximum Disk Use Exceeded (disk_bound overstepped) 3) Memory model exceeded (memory_bound overstepped) 4) Loss of -large- portions of CPU time at time of reporting, which looks to happen at end. 5) Progress % erratic (e.g. happens it can from 0.5% to 50% only at end of 1st pass when there are only 2 passes) 6) Related to 5), checkpoints at times multiple hours apart... not good for part time crunchers. 7) Jobs seem stuck in memory at times, [when seemingly no more progress is made]... wont unload, even when "Leave application in memory when suspended" is off. Full client restart required to get them to unload. 8) Some tasks freeze on the CPU time use when running [is it the display or is it the CPU time in Task Manager indicates no CPU time use?], while elapsed time keeps accumulating and progress % goes backward. Users of BOINC manager wont see this easily, to users of BOINCTasks it's obvious since both Elapsed and CPU time is shown. Wish list: Printing of OS and CPU details in Result Log. Did I miss any? (Copy list and insert 9) and so on. The DIY department. P.S. Whilst the 10,000 originals left the feeder in 1.5 hours and were sent quite early in the day, 'only' 3,556 had validated at midnight [Don't know but doubt the 'error' results that were credited were included... my own count, 9 shown at midnight as valid on My Grid, but 18 listed with credit, including 9 with error. 2 in PV]. |
||
|
Thargor
Veteran Cruncher UK Joined: Feb 3, 2012 Post Count: 1291 Status: Offline Project Badges: |
Task #1, on debian 6.0.7 (Squeeze):
----------------------------------------- Fri 01 Nov 2013 08:33:10 GMT World Community Grid Computation for task BETA_BETA_9999984_0877_1 finished - Fri 01 Nov 2013 08:33:10 GMT World Community Grid Output file BETA_BETA_9999984_0877_1_0 for task BETA_BETA_9999984_0877_1 exceeds size limit. - Fri 01 Nov 2013 08:33:10 GMT World Community Grid File size: 107813020.000000 bytes. Limit: 10485760.000000 bytes Task #2, on a 64-bit Windows 7 PC, appears to be getting a variant of the 0.5% issue, but it's getting anything from 5-15 minutes into the task then appears to be completely restarting from scratch (had been doing this for 4-5 hours overnight). Haven't tried restarting the client, yet, noticed on the way out to work this morning - will pop the WU ID in here, when I get home later, if it still hasn't finished. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, that one that was running overnight finally finished, and just before I could take a look at it. The server logs show:
BETA_ BETA_ 9999987_ 0168_ 0-- <M/c-ID> Pending Validation 31/10/13 06:55:37 01/11/13 10:10:15 12.33 / 24.69 235.2 / 0.0 It clearly finished without recording any CPU time for the second half of the run, but at least it finished. I have no idea whether or not it checkpointed again, but I suspect not. One checkpoint near the beginning and then 24 hours running without one is clearly not a good idea ... So far, all the others I've had were resends that all failed (again) with oversize output files. Here's wishing the techs a successful time tracking down these and the other issues that people have seen. |
||
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 703 Status: Offline Project Badges: |
I aborted, this one-
----------------------------------------had restarted every three minutes since it began about 21 hours ago. wingman hasn't reported either. stderr.txt has this, repeating... Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.19_windows_intelx86 -SettingsFile BETA_9999984_0541.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 1000000 Running Result log: Result Name: BETA_ BETA_ 9999984_ 0541_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> aborted by user </message> ]]> and now somone els has it:-/ edit to add-it claimed this! BETA_ BETA_ 9999984_ 0541_ 0-- 719 User Aborted 10/31/13 06:16:14 11/1/13 11:28:40 0.00 65.9 / 0.0 [Edit 2 times, last edit by pramo at Nov 1, 2013 11:57:03 AM] |
||
|
Thargor
Veteran Cruncher UK Joined: Feb 3, 2012 Post Count: 1291 Status: Offline Project Badges: |
Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.19_windows_intelx86 -SettingsFile BETA_9999984_0541.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 1000000 Running going to abort this one-stderr.txt has the above, repeating... has restarted every three minutes since it began about 21 hours ago. wingman hasn't reported either. and now somone els has it:-/ That sounds very much like the issue I'm getting on my Windows 7 64-bit machine at home, but I didn't get chance to inspect the detailed logs. |
||
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 703 Status: Offline Project Badges: |
That sounds very much like the issue I'm getting on my Windows 7 64-bit machine at home, but I didn't get chance to inspect the detailed logs. Wasnt much detail to see on that one:) Running on XP. Now, I wish I had thought to restart/reboot before aborting to see what would happen. rats:( |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2977 Status: Offline Project Badges: |
SekeRob, from your excellent summary of issues, I think one may have been missed one off (as to whether it's a concern or not, I don't have enough data - it may just be my machine/set-up, or it may be more widespread). Anyhow, as you requested, I'll copy your list of 8 and add a 9th onto the end;
----------------------------------------1) Output file too large (Error -131) 2) Maximum Disk Use Exceeded (disk_bound overstepped) 3) Memory model exceeded (memory_bound overstepped) 4) Loss of -large- portions of CPU time at time of reporting, which looks to happen at end. 5) Progress % erratic (e.g. happens it can from 0.5% to 50% only at end of 1st pass when there are only 2 passes) 6) Related to 5), checkpoints at times multiple hours apart... not good for part time crunchers. 7) Jobs seem stuck in memory at times, [when seemingly no more progress is made]... wont unload, even when "Leave application in memory when suspended" is off. Full client restart required to get them to unload. 8) Some tasks freeze on the CPU time use when running [is it the display or is it the CPU time in Task Manager indicates no CPU time use?], while elapsed time keeps accumulating and progress % goes backward. Users of BOINC manager wont see this easily, to users of BOINCTasks it's obvious since both Elapsed and CPU time is shown. 9) Running 4 concurrently (i.e., using all available cores), appears to be very inefficient. |
||
|
coolstream
Senior Cruncher SCOTLAND Joined: Nov 8, 2005 Post Count: 475 Status: Offline Project Badges: |
One BETA grabbed which completed to 100% and then errored out with Result Log:
----------------------------------------Result Name: BETA_ BETA_ 9999985_ 0880_ 4-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> puting pass 4128 [04:48:24]: Computing pass 4129 [04:48:28]: Computing pass 4130 ... [06:33:51]: Computing pass 6079 [06:33:54]: Computing pass 6080 Run complete, CPU time: 17194.570621 06:34:24 (29232): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>BETA_BETA_9999985_0880_4_0</file_name> <error_code>-131</error_code> </file_xfer_error> Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit Processor: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz running 8 threads 8GB RAM 74GB free disk space Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
11/1/2013 6:17:23 AM | World Community Grid | Output file BETA_BETA_9999988_0704_4_0 for task BETA_BETA_9999988_0704_4 exceeds size limit. 11/1/2013 6:17:23 AM | World Community Grid | File size: 16705286.000000 bytes. Limit: 10485760.000000 bytes It happened again. File size to big. Well, I'm two for two. Batting 1000. |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
SekeRob, from your excellent summary of issues, I think one may have been missed one off (as to whether it's a concern or not, I don't have enough data - it may just be my machine/set-up, or it may be more widespread). Anyhow, as you requested, I'll copy your list of 8 and add a 9th onto the end; 1) Output file too large (Error -131) 2) Maximum Disk Use Exceeded (disk_bound overstepped) 3) Memory model exceeded (memory_bound overstepped) 4) Loss of -large- portions of CPU time at time of reporting, which looks to happen at end. 5) Progress % erratic (e.g. happens it can from 0.5% to 50% only at end of 1st pass when there are only 2 passes) 6) Related to 5), checkpoints at times multiple hours apart... not good for part time crunchers. 7) Jobs seem stuck in memory at times, [when seemingly no more progress is made]... wont unload, even when "Leave application in memory when suspended" is off. Full client restart required to get them to unload. 8) Some tasks freeze on the CPU time use when running [is it the display or is it the CPU time in Task Manager indicates no CPU time use?], while elapsed time keeps accumulating and progress % goes backward. Users of BOINC manager wont see this easily, to users of BOINCTasks it's obvious since both Elapsed and CPU time is shown. 9) Running 4 concurrently (i.e., using all available cores), appears to be very inefficient. One of my machines had 8 tasks. All finished over 98% efficiency running concurrently. Couldn't tell you the specs off the top of my head but it's a rack server based off Intel(Xeon) running Ubuntu. Of course of my 12 other machines they got one task between them. Go figure. Distributed computing volunteer since September 27, 2000 |
||
|
|