Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Drug Search for Leishmaniasis Forum Thread: Stuck workunits |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 49
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've got a stuck work unit, still need help troubleshooting these?
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Questar,
Armstrdj listed his questions at the start of this thread. Identify the stuck unit, look at it with Task Manager / Process Monitor. What OS, what version BOINC, what throttle setting? Armstrdj emphasizes the throttle in 2 more short queries. Once you have answered that, then reboot or take your preferred corrective action. Once Armstrdj has it figured out, we will get a post about it. Lawrence |
||
|
Gil II
Senior Cruncher Canada Joined: Dec 6, 2006 Post Count: 368 Status: Offline Project Badges: |
I aborted 2 stucked WUs this morning, I seem to get quite a few that get stucked. I have another one. I am running Windows 7 on an i7. The task manager shows 0 CPU usage. DSFL_00000045_0000045_658_0 has elapsed time of 33:57:24, 33.214% and 47:57:12 to completion.
----------------------------------------I am aborting it as well. |
||
|
Gil II
Senior Cruncher Canada Joined: Dec 6, 2006 Post Count: 368 Status: Offline Project Badges: |
One more thing: I have the profile set to use 100%
---------------------------------------- |
||
|
robertmiles
Senior Cruncher US Joined: Apr 16, 2008 Post Count: 443 Status: Offline Project Badges: |
By the way, I cannot see the properties of any wu on this machine, I do not know why, every time I click on properties (Win 7) it seems the little window appears but I am not able to see it (it is just shown as miniature on the task bar). Any suggestions? Thanks The usual procedure for this is to use the mouse to drag the boundaries of the little windows outwards, if it appears at all. If it appears only on the taskbar, clicking on it is usually adequate. |
||
|
robertmiles
Senior Cruncher US Joined: Apr 16, 2008 Post Count: 443 Status: Offline Project Badges: |
I have started having a problem with BOINC Manager losing the display normally shown. The PC is a laptop running 7/24 and uses Windows XP. The system is set up to use 100% of processors 80% of the time - I just changed it to 70% of the time.. Several things mentioned in this thread I know nothing about are LAIM and TThrottle. What is LAIM and where do I get TThrottle? I haven't said it yet, but the only I am running now is DSFL. In addition, if I reboot, everything seems to run fine. I am also going to shut down BOINC manager in between examining the status of WUs which are running. Will this cause a problem transferring the results or the reporting of tasks completed? See here for the Tthrottle program: http://efmer.eu/boinc/ |
||
|
robertmiles
Senior Cruncher US Joined: Apr 16, 2008 Post Count: 443 Status: Offline Project Badges: |
6.12.33 (both 32 and 64 bit) I am running BOINC 6.10.56... What am I missing by not moving to 6.12.33? I'm now running 6.12.34. As far as I can tell, you're missing the Notices tab where you can get some messages from the projects, and very little else you're likely to care about. [Edit 1 times, last edit by robertmiles at Oct 31, 2011 11:19:38 PM] |
||
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
Found one hung/stuck this morning.
No errors in computer system or application logs. Have client set to use 50% CPU time. Device has been a good cruncher except for this problem. After finding hung/stuck WU I performed the following steps: 1. Suspended client 2. Computer reboot 3. Resume client, task showed status "Running", CPU last checkpoint 07:27:56, CPU time 07:28:02, Elapsed time 07:29:40. Appears to be running normally now. WU: DSFL_00000060_0000016_0808_0 State: Running Estimated app speed: 1.95 GFLOPs/sec Estimated task size: 67550 GFLOPs Max RAM usage: 250 MB CPU checkpoint: 07:27:56 CPU time: 07:46:29 Elapsed time: 52:14:10 Fraction done: 66.389% Virtual memnory size: 37.18 MB Working set size: 34.05 MB Task Manager DSFL hung processes: wcg_dsfl_6.19_windows_intelx86*32 CPU 00% wcg_dsfl_vina_6.19_windows_intelx86*32 CPU 00% Computer info: Client: BOINC client version 6.12.33 for windows_x86_64 CPU: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ [Family 15 Model 107 Stepping 1] OS: Microsoft Windows 7 x64 Edition, Service Pack 1 Memory: 2 GB DDR2, 1.87 GB physical, 3.75 GB virtual Disk: 74.40 GB total, 53.91 GB free Logs: stdout.txt [00:12:43] [INFO] Checkpoint complete. [00:51:08] [INFO] Checkpoint complete. [01:29:53] [INFO] Checkpoint complete. [02:08:35] [INFO] Checkpoint complete. [02:47:46] [INFO] Checkpoint complete. [03:27:05] [INFO] Checkpoint complete. [04:06:05] [INFO] Checkpoint complete. [04:45:18] [INFO] Checkpoint complete. [05:27:08] [INFO] Checkpoint complete. [06:07:42] [INFO] Checkpoint complete. [06:49:28] [INFO] Checkpoint complete. [07:30:27] [INFO] Checkpoint complete. [08:11:04] [INFO] Checkpoint complete. [08:51:40] [INFO] Checkpoint complete. [09:32:11] [INFO] Checkpoint complete. [10:14:06] [INFO] Checkpoint complete. [10:51:00] [INFO] Checkpoint complete. [11:27:53] [INFO] Checkpoint complete. [12:04:49] [INFO] Checkpoint complete. [12:41:40] [INFO] Checkpoint complete. [13:19:16] [INFO] Checkpoint complete. [13:56:46] [INFO] Checkpoint complete. [14:33:35] [INFO] Checkpoint complete. stderr.txt INFO: No state to restore. Start from the beginning. [23:34:11] Number of tasks = 36 [23:34:11] Starting job 0,CPU time is 0.000000. [23:34:11] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [00:12:43] Finished Job #0 cpu time used 1155.577407 [00:12:43] Starting job 1,CPU time is 1155.577407. [00:12:43] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [00:51:08] Finished Job #1 cpu time used 1147.605756 [00:51:08] Starting job 2,CPU time is 2303.183164. [00:51:08] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [01:29:53] Finished Job #2 cpu time used 1161.177843 [01:29:53] Starting job 3,CPU time is 3464.361007. [01:29:53] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [02:08:35] Finished Job #3 cpu time used 1159.305831 [02:08:35] Starting job 4,CPU time is 4623.666839. [02:08:35] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [02:47:46] Finished Job #4 cpu time used 1174.079126 [02:47:46] Starting job 5,CPU time is 5797.745965. [02:47:46] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [03:27:05] Finished Job #5 cpu time used 1167.729885 [03:27:05] Starting job 6,CPU time is 6965.475850. [03:27:05] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [04:06:05] Finished Job #6 cpu time used 1168.322689 [04:06:05] Starting job 7,CPU time is 8133.798539. [04:06:05] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [04:45:18] Finished Job #7 cpu time used 1175.311534 [04:45:18] Starting job 8,CPU time is 9309.110073. [04:45:18] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [05:27:08] Finished Job #8 cpu time used 1225.263054 [05:27:08] Starting job 9,CPU time is 10534.373128. [05:27:08] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [06:07:42] Finished Job #9 cpu time used 1215.341391 [06:07:42] Starting job 10,CPU time is 11749.714518. [06:07:42] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [06:49:28] Finished Job #10 cpu time used 1251.096820 [06:49:28] Starting job 11,CPU time is 13000.811338. [06:49:28] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [07:30:27] Finished Job #11 cpu time used 1227.181867 [07:30:27] Starting job 12,CPU time is 14227.993205. [07:30:27] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [08:11:04] Finished Job #12 cpu time used 1216.558198 [08:11:04] Starting job 13,CPU time is 15444.551403. [08:11:04] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [08:51:40] Finished Job #13 cpu time used 1216.636199 [08:51:40] Starting job 14,CPU time is 16661.187602. [08:51:40] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [09:32:11] Finished Job #14 cpu time used 1220.583024 [09:32:11] Starting job 15,CPU time is 17881.770626. [09:32:12] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [10:14:06] Finished Job #15 cpu time used 1238.725941 [10:14:06] Starting job 16,CPU time is 19120.496567. [10:14:06] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [10:51:00] Finished Job #16 cpu time used 1106.031490 [10:51:00] Starting job 17,CPU time is 20226.528056. [10:51:00] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [11:27:53] Finished Job #17 cpu time used 1104.175078 [11:27:53] Starting job 18,CPU time is 21330.703134. [11:27:53] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [12:04:49] Finished Job #18 cpu time used 1107.185897 [12:04:49] Starting job 19,CPU time is 22437.889032. [12:04:49] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [12:41:40] Finished Job #19 cpu time used 1103.925476 [12:41:40] Starting job 20,CPU time is 23541.814508. [12:41:40] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [13:19:16] Finished Job #20 cpu time used 1125.765616 [13:19:16] Starting job 21,CPU time is 24667.580125. [13:19:16] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [13:56:46] Finished Job #21 cpu time used 1106.187491 [13:56:46] Starting job 22,CPU time is 25773.767615. [13:56:46] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 [14:33:35] Finished Job #22 cpu time used 1103.114271 [14:33:35] Starting job 23,CPU time is 26876.881887. [14:33:35] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0 |
||
|
evanfitz
Cruncher Joined: Sep 14, 2011 Post Count: 3 Status: Offline Project Badges: |
I found four stuck work units out of five running this morning.
I closed the BOINC client, went to process explorer and found the 4 processes for those did not close,and the boinctray process is still listed as well. All non-stuck work units closed. Here is the state of those processes at that point. OS is Win 7 Professional Service Pack 1 32 bit Processors are 2 Xeon x5450s Using default settings, I think that is 60% cpu |
||
|
|