| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 2
|
|
| Author |
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
I have this curious case with boinccmd --get_tasks. About 10 hours ago, 4 ARP1 tasks and some others were running, when a bunch of FAH2 tasks arrived. The ARP1 tasks were at 87.5%, 59.2%, 53.8% and 10.2% and paused as soon as the FAH2 tasks started, so far so good.
----------------------------------------So now, after 10 hours, there are still enough FAH2 tasks in my queue to keep the CPUs busy and 24 FAH2 tasks have been returned in the meantime. What I'm seeing now is that all four ARP1 tasks are Waiting to run in Boinc Manager, but when I look with boinccmd --get_tasks, two of them are SUSPENDED and the other two are UNINITIALIZED. The latter is curious. I have never seen this behaviour before, tasks that have been running for hours going from SUSPENDED to UNINITIALIZED. *** The two UNINITIALIZED ARP1 tasks are sitting at the top of my queue, numbered 1 and 2 in boinccmd --get_tasks. In order to find out if this UNINITIALIZED thing is a problem, I suspended all uninitialized FAH2 tasks first, then another running FAH2 task to see which 'Waiting to run' ARP1 task would start. None of the UNINITIALIZED ARP1 tasks would resume, one SUSPENDED ARP1 task did continue. This is the state of the 4 ARP1 tasks in boinccmd --get_tasks: ======== Tasks ======== This is how an executing FAH2 task and an uninitialized FAH2 task look like in boinccmd --get_tasks: 78) ----------- My fear is that the UNINITIALIZED ARP1 tasks will be stuck, or worse, after the FAH2 tasks have left. *** In client_state.xml, the two UNINITIALIZED ARP1 tasks already have a final time: #1: <name>ARP1_0022898_013_1</name> #2: <name>ARP1_0025860_012_0</name> … while the other two (SUSPENDED) ARP1 units, don't have a final time (they're still zero): #3: <name>ARP1_0030492_013_0</name> #4: <name>ARP1_0028420_013_1</name> Isn't this curious? So, what I'm planning to do, maybe tomorrow, maybe tonight, is to remove the final time elements from the <result>-entry of the UNINITIALIZED tasks in client_state.xml to see if this will be a remedy. [Edit 2 times, last edit by adriverhoef at Jun 2, 2020 10:40:54 AM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Well, I was a bit curious (of course) about what would happen if I would pause all non-running tasks, except for one ARP1 task (the one that was sitting there, UNINITIALIZED and Waiting to run at 59.2%, #2 in the queue), and then pause only one running FAH2 task, so that that ARP1 task could resume. Would it überhaupt run? Would it resume at 59.2%?
----------------------------------------$ wcgresults -NCLOSED1P | head -5 *** Well, it did resume, but jumped back to its last checkpoint: 50%. In any case, that's better than return to 0% or error out completely. I still have the other UNINITIALIZED ARP1 task at 87.5%, four seconds after its last checkpoint. $ wcgresults -NSO_CL1PPED | head -5Not afraid. (Download wcgresults for Linux here .) [Edit 1 times, last edit by adriverhoef at Jun 2, 2020 1:28:35 PM] |
||
|
|
|