Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 49
Posts: 49   Pages: 5   [ Previous Page | 1 2 3 4 5 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 53302 times and has 48 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits

I've got a stuck work unit, still need help troubleshooting these?
[Oct 18, 2011 9:03:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits

Hello Questar,
Armstrdj listed his questions at the start of this thread. Identify the stuck unit, look at it with Task Manager / Process Monitor. What OS, what version BOINC, what throttle setting? Armstrdj emphasizes the throttle in 2 more short queries. Once you have answered that, then reboot or take your preferred corrective action.

biggrin Once Armstrdj has it figured out, we will get a post about it.

Lawrence
[Oct 18, 2011 9:34:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gil II
Senior Cruncher
Canada
Joined: Dec 6, 2006
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

I aborted 2 stucked WUs this morning, I seem to get quite a few that get stucked. I have another one. I am running Windows 7 on an i7. The task manager shows 0 CPU usage. DSFL_00000045_0000045_658_0 has elapsed time of 33:57:24, 33.214% and 47:57:12 to completion.

I am aborting it as well.
----------------------------------------

[Oct 30, 2011 3:21:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gil II
Senior Cruncher
Canada
Joined: Dec 6, 2006
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

One more thing: I have the profile set to use 100%
----------------------------------------

[Oct 30, 2011 3:30:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 443
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

By the way, I cannot see the properties of any wu on this machine, I do not know why, every time I click on properties (Win 7) it seems the little window appears but I am not able to see it (it is just shown as miniature on the task bar). Any suggestions?

Thanks


The usual procedure for this is to use the mouse to drag the boundaries of the little windows outwards, if it appears at all.

If it appears only on the taskbar, clicking on it is usually adequate.
[Oct 31, 2011 10:58:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 443
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

I have started having a problem with BOINC Manager losing the display normally shown. The PC is a laptop running 7/24 and uses Windows XP. The system is set up to use 100% of processors 80% of the time - I just changed it to 70% of the time.. Several things mentioned in this thread I know nothing about are LAIM and TThrottle. What is LAIM and where do I get TThrottle?

I haven't said it yet, but the only I am running now is DSFL. In addition, if I reboot, everything seems to run fine. I am also going to shut down BOINC manager in between examining the status of WUs which are running. Will this cause a problem transferring the results or the reporting of tasks completed?


See here for the Tthrottle program:

http://efmer.eu/boinc/
[Oct 31, 2011 11:07:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 443
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

6.12.33 (both 32 and 64 bit)

I am running BOINC 6.10.56... What am I missing by not moving to 6.12.33?


I'm now running 6.12.34. As far as I can tell, you're missing the Notices tab where you can get some messages from the projects, and very little else you're likely to care about.
----------------------------------------
[Edit 1 times, last edit by robertmiles at Oct 31, 2011 11:19:38 PM]
[Oct 31, 2011 11:14:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck workunits

Found one hung/stuck this morning.

No errors in computer system or application logs.

Have client set to use 50% CPU time. Device has been a good cruncher except for this problem.


After finding hung/stuck WU I performed the following steps:

1. Suspended client
2. Computer reboot
3. Resume client, task showed status "Running", CPU last checkpoint 07:27:56, CPU time 07:28:02, Elapsed time 07:29:40. Appears to be running normally now.



WU: DSFL_00000060_0000016_0808_0
State: Running
Estimated app speed: 1.95 GFLOPs/sec
Estimated task size: 67550 GFLOPs
Max RAM usage: 250 MB
CPU checkpoint: 07:27:56
CPU time: 07:46:29
Elapsed time: 52:14:10
Fraction done: 66.389%
Virtual memnory size: 37.18 MB
Working set size: 34.05 MB


Task Manager DSFL hung processes:

wcg_dsfl_6.19_windows_intelx86*32 CPU 00%
wcg_dsfl_vina_6.19_windows_intelx86*32 CPU 00%


Computer info:

Client: BOINC client version 6.12.33 for windows_x86_64
CPU: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ [Family 15 Model 107 Stepping 1]
OS: Microsoft Windows 7 x64 Edition, Service Pack 1
Memory: 2 GB DDR2, 1.87 GB physical, 3.75 GB virtual
Disk: 74.40 GB total, 53.91 GB free



Logs:


stdout.txt

[00:12:43] [INFO] Checkpoint complete.
[00:51:08] [INFO] Checkpoint complete.
[01:29:53] [INFO] Checkpoint complete.
[02:08:35] [INFO] Checkpoint complete.
[02:47:46] [INFO] Checkpoint complete.
[03:27:05] [INFO] Checkpoint complete.
[04:06:05] [INFO] Checkpoint complete.
[04:45:18] [INFO] Checkpoint complete.
[05:27:08] [INFO] Checkpoint complete.
[06:07:42] [INFO] Checkpoint complete.
[06:49:28] [INFO] Checkpoint complete.
[07:30:27] [INFO] Checkpoint complete.
[08:11:04] [INFO] Checkpoint complete.
[08:51:40] [INFO] Checkpoint complete.
[09:32:11] [INFO] Checkpoint complete.
[10:14:06] [INFO] Checkpoint complete.
[10:51:00] [INFO] Checkpoint complete.
[11:27:53] [INFO] Checkpoint complete.
[12:04:49] [INFO] Checkpoint complete.
[12:41:40] [INFO] Checkpoint complete.
[13:19:16] [INFO] Checkpoint complete.
[13:56:46] [INFO] Checkpoint complete.
[14:33:35] [INFO] Checkpoint complete.

stderr.txt

INFO: No state to restore. Start from the beginning.
[23:34:11] Number of tasks = 36
[23:34:11] Starting job 0,CPU time is 0.000000.
[23:34:11] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[00:12:43] Finished Job #0 cpu time used 1155.577407
[00:12:43] Starting job 1,CPU time is 1155.577407.
[00:12:43] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[00:51:08] Finished Job #1 cpu time used 1147.605756
[00:51:08] Starting job 2,CPU time is 2303.183164.
[00:51:08] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[01:29:53] Finished Job #2 cpu time used 1161.177843
[01:29:53] Starting job 3,CPU time is 3464.361007.
[01:29:53] ./ZINC09276746.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[02:08:35] Finished Job #3 cpu time used 1159.305831
[02:08:35] Starting job 4,CPU time is 4623.666839.
[02:08:35] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[02:47:46] Finished Job #4 cpu time used 1174.079126
[02:47:46] Starting job 5,CPU time is 5797.745965.
[02:47:46] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[03:27:05] Finished Job #5 cpu time used 1167.729885
[03:27:05] Starting job 6,CPU time is 6965.475850.
[03:27:05] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[04:06:05] Finished Job #6 cpu time used 1168.322689
[04:06:05] Starting job 7,CPU time is 8133.798539.
[04:06:05] ./ZINC09276747.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[04:45:18] Finished Job #7 cpu time used 1175.311534
[04:45:18] Starting job 8,CPU time is 9309.110073.
[04:45:18] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[05:27:08] Finished Job #8 cpu time used 1225.263054
[05:27:08] Starting job 9,CPU time is 10534.373128.
[05:27:08] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[06:07:42] Finished Job #9 cpu time used 1215.341391
[06:07:42] Starting job 10,CPU time is 11749.714518.
[06:07:42] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[06:49:28] Finished Job #10 cpu time used 1251.096820
[06:49:28] Starting job 11,CPU time is 13000.811338.
[06:49:28] ./ZINC09276751.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[07:30:27] Finished Job #11 cpu time used 1227.181867
[07:30:27] Starting job 12,CPU time is 14227.993205.
[07:30:27] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[08:11:04] Finished Job #12 cpu time used 1216.558198
[08:11:04] Starting job 13,CPU time is 15444.551403.
[08:11:04] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[08:51:40] Finished Job #13 cpu time used 1216.636199
[08:51:40] Starting job 14,CPU time is 16661.187602.
[08:51:40] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[09:32:11] Finished Job #14 cpu time used 1220.583024
[09:32:11] Starting job 15,CPU time is 17881.770626.
[09:32:12] ./ZINC09276752.pdbqt size = 36 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[10:14:06] Finished Job #15 cpu time used 1238.725941
[10:14:06] Starting job 16,CPU time is 19120.496567.
[10:14:06] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[10:51:00] Finished Job #16 cpu time used 1106.031490
[10:51:00] Starting job 17,CPU time is 20226.528056.
[10:51:00] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[11:27:53] Finished Job #17 cpu time used 1104.175078
[11:27:53] Starting job 18,CPU time is 21330.703134.
[11:27:53] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[12:04:49] Finished Job #18 cpu time used 1107.185897
[12:04:49] Starting job 19,CPU time is 22437.889032.
[12:04:49] ./ZINC09276754.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[12:41:40] Finished Job #19 cpu time used 1103.925476
[12:41:40] Starting job 20,CPU time is 23541.814508.
[12:41:40] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[13:19:16] Finished Job #20 cpu time used 1125.765616
[13:19:16] Starting job 21,CPU time is 24667.580125.
[13:19:16] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[13:56:46] Finished Job #21 cpu time used 1106.187491
[13:56:46] Starting job 22,CPU time is 25773.767615.
[13:56:46] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[14:33:35] Finished Job #22 cpu time used 1103.114271
[14:33:35] Starting job 23,CPU time is 26876.881887.
[14:33:35] ./ZINC09276756.pdbqt size = 35 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000060.pdbqt size = 1310 0
[Nov 15, 2011 2:03:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
evanfitz
Cruncher
Joined: Sep 14, 2011
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck workunits

I found four stuck work units out of five running this morning.



I closed the BOINC client, went to process explorer and found the 4 processes for those did not close,and the boinctray process is still listed as well. All non-stuck work units closed. Here is the state of those processes at that point.



OS is Win 7 Professional Service Pack 1 32 bit
Processors are 2 Xeon x5450s
Using default settings, I think that is 60% cpu
[Nov 21, 2011 2:19:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 49   Pages: 5   [ Previous Page | 1 2 3 4 5 ]
[ Jump to Last Post ]
Post new Thread