Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
![]() |
Author |
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have two wu's that stopped making any CPU progress. They look very similar to the stuck wu's I get sometimes in the Lesh. project. Both of these are still running but are making no progress.
----------------------------------------DFAM_1df7_TBdhfrDry_0000246_0655_1 Running almost 17 hours with CPU stuck at 06:00:20 DFAM_1df7_TBdhfrDry_0000231_0014_1 Running almost 4 hours with CPU stuck at 00:29:50 Suspend/resume task; start/stop BOINC; reboot machine but with no effect. I will look at them again in about an hour; if no progress I will blow them away. ![]() Win7, i7 970 processor, BOINC 6.12.34, GPU's are idle. ![]() ![]() |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3007 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dataman, before you 'blow them away', it might be worthwhile taking a copy of all the data within the particular slots and letting the techs have a copy. You never know, it may help them in some way...
----------------------------------------![]() |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dataman,
Are you familar with and do you have the process explorer application? If so can you launch it next time you get a stuck vina workunit and look at something. In process explore find the process listing for vina, should have a name like wcg_gfam_vina_6.08_windows_intel86. Right click on it and select properties from the pop up menu. In the properites window that opens there is a threads tab. Click on it and there should be 1 or 2 threads listed. Highlight each and look at the state value below. What state do the threads show? Also do you run CEP2 on a machine that has stuck workunits for DSFL or GFAM and if so does CEP2 ever have stuck workunits? Thanks, armstrdj |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Who's going to draft a best practices "Stuck WU" actions FAQ (Windows oriented as that's the platform that suffers these events).
Include items such as: A) Info needed for debug: - Capturing the result info in stuck state (as what armstrdj asks with PE). - References to useful help topics on the internet, such as this for PE: Ten Best Practices / Troubleshooting Tips with Process Explorer Tool Process Explorer, Part 2 (On Thread Analysis) Link to the Process Explorer tool, of course. Slot information and message log copy. Debug flags to set. B) Actions / workarounds / tips: - Stop client , LAIM on/off to force memory unload on suspend of troubled task. - BOINC throttle to 100% and using TThrottle instead if the throttle is temperature control motivated (it's *only* design purpose) - Use of "While the processor usage is greater than XX suspend computing" (works a dream on me Linux box btw to prevent heartbeat issues on the heavy duty CEP2). Open a new forum thread, polish live in iterations or stick it in WCG WIKI so it can be collaboratively tweaked. Then we'll include it in the Start Here FAQ's and you'll be added to the FAQ Hall of Fame (full credit with honors). Who's Game? Don't be shy and share your skills to further the cause! --//-- |
||
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
One of mine got stuck at 95%, after a laptop reboot, it completed.
----------------------------------------Patrick ![]() |
||
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dataman, Are you familar with and do you have the process explorer application? If so can you launch it next time you get a stuck vina workunit and look at something. In process explore find the process listing for vina, should have a name like wcg_gfam_vina_6.08_windows_intel86. Right click on it and select properties from the pop up menu. In the properites window that opens there is a threads tab. Click on it and there should be 1 or 2 threads listed. Highlight each and look at the state value below. What state do the threads show? Also do you run CEP2 on a machine that has stuck workunits for DSFL or GFAM and if so does CEP2 ever have stuck workunits? Thanks, armstrdj Hi armstrdj Normally I would post everything but we are currently in an RV ~2,000 miles from where my servers are. I have been managing them remotely using some software and the help of my house sitters. I only have a connection when we stop and get a WiFi connection to allow me to connect to my CA network. I can say I am running only GFAM and have had only one reoccurance. I had a similar problem with the DFSL Beta. I don't run CEP2 and have not run DSFL for a week or more. I hope someone else can post the information you need. I cannot get into the files you need. Beautiful morning in San Antonio ... cheers! ![]() |
||
|
|
![]() |