Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 10
|
![]() |
Author |
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 609 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() |
I have a WU (APR) in a waiting to run state. This should mean it is waiting for a free CPU.
Yesterday I suspended all other waiting to start WUs to see if the ARP WU would start when a task finished and thus free a CPU. It did not. So how can I get this "stuck" WU to resume? |
||
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 771 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
More information would help diagnose this, memory size, CPU, profile settings, other projects etc.
----------------------------------------This could be over-commit of memory or result of too many days tasks requested. If some WUs are processing, set no new tasks and let some clear. Check the event log for messages, post any unusual ones here. If nothing obvious, restart and post the first 50 or so messages. Paul.
Paul.
|
||
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2386 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I assume you mean the ARP WU has been "waiting to run" for a long time. ARP WUs need 0.8 to 1 GB RAM per WU. I always see ARP take priority over OPN & HST. In BoincTasks the Status column will save waiting for memory if you're trying to run too many WUs for your RAM. BOINCmgr probably posts a message in a log some where.
----------------------------------------![]() ![]() |
||
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 609 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() |
BOINC client version 7.16.6 for x86_64-pc-linux-gnu
----------------------------------------Processor: 4 GenuineIntel Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz [Family 6 Model 37 Stepping 2] Linux Ubuntu: Ubuntu 20.04.1 LTS [5.4.0-42-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)] memory size, CPU, profile settings, other projects etc. 4GB memory 40% used: default WCG profiles, only WCG projectIf some WUs are processing, set no new tasks and let some clear Had already tried that. When I checked up, the queue was empty except for the stuck WU.The log starts at the 17th so not much in it and nothing for ARP in what is there except me suspending and resuming in hope of it restarting. I didn't want to have to reboot but if I must. It's what I do for my Windows machines. I always see ARP take priority over OPN & HST. I see this too so I am not imagining this.[Edit 2 times, last edit by BobbyB at Sep 20, 2020 4:45:21 PM] |
||
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 609 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() |
The reboot did it so I'll never learn what was wrong.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
if LAIM is on, switch it temporarily off. That would unload the job and resume it from last known checkpoint.
|
||
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 609 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() |
LAIM is off in both the preferences and override presuming 0 = off.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Makes no sense, suggesting that the client_state.xml has somehow a wrong bit. Since client state is permanently in memory, it's hard to look at, other then every so many seconds a copy is written to storage, plus the last copy get's copied to client_state_prev.xml
----------------------------------------[Edit 1 times, last edit by Former Member at Sep 21, 2020 3:13:12 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Most probably a resource shortage of some kind.. Any other manipulation by the client would have survived the reboot. If a wrong bit in memory, it would have been written to disk at shutdown and then been reloaded on reboot.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The event log would state "waiting for memory".... oh, do we have a leak ;?
----------------------------------------[Edit 1 times, last edit by Former Member at Sep 21, 2020 5:36:53 PM] |
||
|
|
![]() |