Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3071 times and has 8 replies Next Thread
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Stuck workunits - continued after computer restart

Of the 60 or so WU's my 8 thread laptop downloaded and processed in the last 24 hours, 6 got stuck. The stuck WU's continued after a computer restart. Computing preferences on laptop are set to use at most 50% CPU time, to reduce heat and fan noise. This is not a VINA project so I didn't think I needed to run any 3rd party CPU temp throttler.

Here's info on one of them:


State: Running
CPU time at last checkpoint: 00:02:18
CPU time: 00:02:18
Elapsed time: 12:40:04
Estimated time remaining: 36:26:28
Fraction done: 2.083 %

Event log:

7/17/2012 8:20:46 AM | World Community Grid | Starting task cfsw_8609_08609169_0 using cfsw version 612 in slot 0


stderr.txt log last updated Jul 17:

[08:20:46] INFO:Beginning simulation: 1990:240:1267586660
[08:24:45] INFO: Finished tick number 4


Restarted computer and task continued.


stderr.txt log last updated Jul 18:

[08:20:46] INFO:Beginning simulation: 1990:240:1267586660
[08:24:45] INFO: Finished tick number 4
[09:49:44] DEBUG: Restarting from checkpoint.
[09:49:44]PctComplete = 0.0208333
[09:49:44]ticks:currentTick:modules:currentModule:restart:seed240:5:6:0:0:24306



Event log after computer restarted:

7/18/2012 9:48:34 AM | | No config file found - using defaults
7/18/2012 9:48:35 AM | | Starting BOINC client version 7.0.28 for windows_x86_64
7/18/2012 9:48:35 AM | | log flags: file_xfer, sched_ops, task
7/18/2012 9:48:35 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
7/18/2012 9:48:35 AM | | Running as a daemon
7/18/2012 9:48:35 AM | | Data directory: C:\ProgramData\BOINC
7/18/2012 9:48:35 AM | | Running under account boinc_master
7/18/2012 9:48:35 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz [Family 6 Model 42 Stepping 7]
7/18/2012 9:48:35 AM | | Processor: 256.00 KB cache
7/18/2012 9:48:35 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe
7/18/2012 9:48:35 AM | | OS: Microsoft Windows 7: Enterprise x64 Edition, Service Pack 1, (06.01.7601.00)
7/18/2012 9:48:35 AM | | Memory: 3.96 GB physical, 7.91 GB virtual
7/18/2012 9:48:35 AM | | Disk: 232.75 GB total, 189.98 GB free
7/18/2012 9:48:35 AM | | Local time is UTC -4 hours
7/18/2012 9:48:35 AM | | No usable GPUs found
7/18/2012 9:48:35 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2119053; resource share 100
7/18/2012 9:48:35 AM | World Community Grid | General prefs: from World Community Grid (last modified 23-Jun-2012 10:30:44)
7/18/2012 9:48:35 AM | World Community Grid | Host location: none
7/18/2012 9:48:35 AM | World Community Grid | General prefs: using your defaults
7/18/2012 9:48:35 AM | | Reading preferences override file
7/18/2012 9:48:35 AM | | Preferences:
7/18/2012 9:48:35 AM | | max memory usage when active: 3039.67MB
7/18/2012 9:48:35 AM | | max memory usage when idle: 3647.61MB
7/18/2012 9:48:35 AM | | max disk usage: 10.00GB
7/18/2012 9:48:35 AM | | don't use GPU while active
7/18/2012 9:48:35 AM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
7/18/2012 9:48:35 AM | | Not using a proxy
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8609_08609169_0 using cfsw version 612 in slot 0
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8609_08609183_0 using cfsw version 612 in slot 4
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8609_08609038_0 using cfsw version 612 in slot 2
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8638_08638954_0 using cfsw version 612 in slot 5
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8638_08638606_0 using cfsw version 612 in slot 7
7/18/2012 9:49:43 AM | World Community Grid | Restarting task cfsw_8638_08638070_0 using cfsw version 612 in slot 6

Edit: 60 instead of 120 WUs
----------------------------------------
[Edit 1 times, last edit by BSD at Jul 18, 2012 2:19:23 PM]
[Jul 18, 2012 2:15:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

hmmm, thought the stuck CFSW was fixed :O

Been running this science exclusively on an 8 core lappie, at 100%, but using at times ThreadMasterGUI to throttle [to 10%, which translates to 80% per core. WCG/IBM is aware of the funky BOINC throttle and are working on and with to abolish this stop / start monster of WU doom. If you think ThreadMasterGUI is too involved, run TThrottle, but my finding has been with my laptop, that I needed to throttle all the way down to effective 25% before the desired temps are reached. My laptop actually came with HP coolsense, and self throttles according presets [coolest, silent mode, performance mode]. With performance mode still getting 2.6-2.7Ghz and no groyne frier [sits on a cooler notepal with 3 fans, with coolest mode the laptop alternates between 1.8 and 2.0 Ghz. This thing senses when on table or on lap, so it switches automatically.... now sitting under the big kitchen ceiling fan [in high rev gear mode], which is just enough to run optimal... we're sliding into week 8 of pretty much non-stop blue sky. Today at 2600 meters [Monte Amaro], the air temp was still 21C...not good [atmospheric vertical column rule is 6C per 1000 meters].

edit: Device is getting mostly 6.12 [64 bits], and the occasional 6.11 [32 bits], but only when acting as wing/repair man... not a single stuck incident, but not using the BOINC throttle... that one I wont use [except for testing], period.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 18, 2012 5:05:55 PM]
[Jul 18, 2012 5:00:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

... WCG/IBM is aware of the funky BOINC throttle and are working on and with to abolish this stop / start monster of WU doom. ...

Only had one stuck this morning, resumed after computer restart:

7/19/2012 7:57:26 AM | World Community Grid | Restarting task cfsw_8725_08725450_0 using cfsw version 612 in slot 0

I won't post any more if I still see them since this is apparently still a problem, at least it seems with this device. confused Go back to running TThrottle again...
[Jul 19, 2012 12:08:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

I'm seeing this problem on both of my PCs.

BTW, I don't restart my computer. I suspend and then resume the stuck WUs.
----------------------------------------
[Aug 24, 2012 1:57:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

Been experiencing this on both an i7 and a bulldozer (both desktops). At first I attributed it to excessive summer heat, but it's continuing even when cooler.

As the project is winding down I didn't even mention it, but I will add to the list that this problem hasn't been resolved.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Aug 24, 2012 3:16:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

BTW, I don't restart my computer. I suspend and then resume the stuck WUs.


Yes, with the Leave applications in memory while suspended option UNchecked.
Suspending doesn't seem to make a difference in the 'stalled' status otherwise.
Maybe BSD has that option checked, and that's why they're resorting to rebooting. (?)
[Aug 24, 2012 7:56:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

There's an FAQ in the Start Here [read only] forum discussing stuck WU's and how to get most of these tasks rolling again without end result problems. As was noted, in the case of CFSW, this has dogged this science from the start, was mostly fixed, but the BOINC in-build throttle remaining the suspect. Some science apps simply don't like the continuous stop [cool] / start [run] way. See prior discussion in this thread and elsewhere what alternates and final solution is worked on... [next year, if lucky]. Since this science ends in a few weeks, the programmers let it be [for the anyway, very low incidence rate].
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 24, 2012 10:25:25 AM]
[Aug 24, 2012 10:24:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wchoff
Cruncher
Joined: Nov 17, 2004
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

+1 more for this problem on an i7. It affects maybe 2% of CFSW work units.
[Aug 24, 2012 4:50:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits - continued after computer restart

I'm seeing more of these than usual since they started coming again after the 'computation error' was fixed. Yes, it's a low percentage, but before last weekend I was seeing maybe 1 workunit every couple days; while I've had to restart the boinc-client service on 4 machines today... one of them had 3 work units over 8 hours (out of 4 cores). Suspending them (with suspend to memory UNchecked) did not make the Elapsed time reset back to the last checkpoint time; restarting the service did.
[Aug 30, 2012 12:23:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread