Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Defeat Cancer Thread: HDC Work Unit Pauzed and next started without interference |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
A HDC WU per the log below started and pauzed by itself and started the next HDC WU after 24 minutes CPU time:
----------------------------------------2006-11-16 21:05:32 [---] Starting B10277_0006_CTMA1Aa-4-0-2_1 2006-11-16 21:05:33 [World Community Grid] Starting task B10277_0006_CTMA1Aa-4-0-2_1 using hdc version 514 2006-11-16 21:05:35 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_0 2006-11-16 21:05:35 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_1 2006-11-16 21:05:38 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_1 2006-11-16 21:05:38 [World Community Grid] Throughput 132 bytes/sec 2006-11-16 21:05:38 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_2 2006-11-16 21:05:39 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_0 2006-11-16 21:05:39 [World Community Grid] Throughput 79 bytes/sec 2006-11-16 21:05:39 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_3 2006-11-16 21:05:41 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_2 2006-11-16 21:05:41 [World Community Grid] Throughput 91 bytes/sec 2006-11-16 21:05:41 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_4 2006-11-16 21:05:43 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_3 2006-11-16 21:05:43 [World Community Grid] Throughput 87 bytes/sec 2006-11-16 21:05:43 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_5 2006-11-16 21:05:49 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_4 2006-11-16 21:05:49 [World Community Grid] Throughput 6076 bytes/sec 2006-11-16 21:05:49 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_6 2006-11-16 21:05:51 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_5 2006-11-16 21:05:51 [World Community Grid] Throughput 5947 bytes/sec 2006-11-16 21:05:51 [World Community Grid] Started upload of file B10277_0011_CTMA1Aa-4-0-7_1_7 2006-11-16 21:05:55 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_6 2006-11-16 21:05:55 [World Community Grid] Throughput 2366 bytes/sec 2006-11-16 21:05:57 [World Community Grid] Finished upload of file B10277_0011_CTMA1Aa-4-0-7_1_7 2006-11-16 21:05:57 [World Community Grid] Throughput 1548 bytes/sec 2006-11-16 21:10:20 [World Community Grid] Sending scheduler request: Requested by user 2006-11-16 21:10:20 [World Community Grid] Reporting 1 tasks 2006-11-16 21:10:20 [---] [http_debug] HTTP_OP::init_post(): https://secure.worldcommunitygrid.org/boinc/wcg_cgi/fcgi 2006-11-16 21:10:25 [World Community Grid] Scheduler RPC succeeded [server version 507] 2006-11-16 21:42:52 [---] Starting B10277_0003_CTMA1Aa-4-0-13_0 2006-11-16 21:42:52 [World Community Grid] Starting task B10277_0003_CTMA1Aa-4-0-13_0 using hdc version 514 There are no entries in the error log. Suspending the subsequent unit did not resume the paused WU. The WU, though pre-empting is on, was unloaded from memory. Exiting BOiNC and restarting did not resume the paused WU either and resumed the next one. From the result status page, the other 2 in the quorum have completed the paused WU. Workunit Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit B10277_ 0006_ CTMA1Aa-4-0-2 Pending Validation 11/14/2006 15:28:48 11/14/2006 19:06:35 2.85 61 / 0 B10277_ 0006_ CTMA1Aa-4-0-2 In Progress 11/14/2006 15:25:51 11/21/2006 15:25:51 0.00 0 / 0 B10277_ 0006_ CTMA1Aa-4-0-2 Pending Validation 11/14/2006 15:25:44 11/15/2006 00:34:48 5.90 57 / 0 The full message log in the txt files have no entries on the problem WU. Is there a sequence to make it to restart or should it be cancelled? Device 34409
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 3 times, last edit by Sekerob at Nov 16, 2006 10:34:56 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Are you using the alpha client?
What's the deadline on the work unit that preempted it? |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Both have the identical deadline date/time to the second on Nov.21. When suspending the subsequent WU in progress (58 minutes), it remained in memory and BOiNC started downloading new work, rather than resuming the one in pause. What is of concern is, that it was removed from memory, and started the next in queue.
----------------------------------------Alpha 5.7.2, but done 20 WU's or so without a single problem. Read that the new version has done away with EDF and has a better checkpoint switching, but suspending all the other projects, did not make them resume....... they are overworked according BOINCview :o It's stuck at 1.500% exact if that can help to know where it is in the startup sequence. Guess, unless knreed et al can check if the 2 WU's already send in the quorum have an exception in their logs, I'll cancel and see what happens. No rush as it is seemingly dead. Nighty Night... i'll see if dreams moved it on tomorrow morning.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Well dreams moved it on..... after about 5 wall clock hours, it suspended the very long HDC # 2 (after 4:22 hrs and 48%) and of same date/time stamp, then it went back to the 1st, suggesting it's not recognizing that both are of the same project?
----------------------------------------There is a 3rd with same date/time stamp, to further obscure logic. Absolutely no entries in the message screen, other than saying it restarted the HDC # 1, which after 4:21 hours is only at 50% Interesting is, that if the new versions are able to wait for the exact checkpoints before project switching, the 'keep in memory' could be turned off, improving ram needs considerably. I'll see what i can get to hear at the BOiNC dev forum. The second one was not unloaded from memory, so now i have 2 memory hungry HDC's occupying ram on 1 thread. Edit: Found a thread reporting scheduling issues with long WU's and added the observation to it: http://boinc.berkeley.edu/dev/forum_thread.php?id=1312
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Nov 17, 2006 8:33:14 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The second one was not unloaded from memory, so now i have 2 memory hungry HDC's occupying ram on 1 thread. That is what Virtual Memory is for. Applications left in memory but not being accessed end up on the hard drive, leaving RAM available for other processes. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Regrettably no Lawrence, the portion with keep-in-memory on, that was in physical ram, remains in physical ram (that's what the taskmanager has been telling me all along) the paused HDC # 2 still taking 125mb. In the case of k.i.m. off, it would just unload the ram part in the past and loose crunching time to the last checkpoint. Anyway, the new method is fine... I'm switching off the k.i.m setting now that that is functioning theoretically without time loss in the alpha and observe.
----------------------------------------ciao and happy birthday, to WCG of course :D
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
|