Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 54
Posts: 54   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 231045 times and has 53 replies Next Thread
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

can you tell anything from this??

Result Log

Result Name: E200397_ 510_ A.25.C18H10N6S.3.4.set1d06_ 1--



<core_client_version>6.10.56</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[03:16:21] Number of jobs = 16
[03:16:21] Starting job 0,CPU time has been restored to 0.000000.
[03:16:21] Starting new Job
[03:16:21] Qink name = fldman
[03:16:21] Qink name = gesman
[03:16:21] Qink name = scfman
[03:18:25] Qink name = anlman
[03:18:27] End of Job
[03:18:29] Finished Job #0
[03:18:29] Starting job 1,CPU time has been restored to 111.098943.
[03:18:29] Starting new Job
[03:18:29] Qink name = fldman
[03:18:30] Qink name = gesman
[03:18:30] Qink name = scfman
[03:24:50] Qink name = anlman
[03:25:28] End of Job
[03:25:31] Finished Job #1
[03:25:31] Starting job 2,CPU time has been restored to 366.906930.
[03:25:31] Starting new Job
[03:25:31] Qink name = fldman
[03:25:31] Qink name = gesman
[03:25:31] Qink name = scfman
[03:29:34] Qink name = anlman
[03:29:34] Qink name = drvman
[03:30:45] Qink name = optman
[03:30:45] Qink name = fldman
[03:30:45] Qink name = gesman
[03:30:45] Qink name = scfman
[03:38:56] Qink name = anlman
[03:38:56] Qink name = drvman
[03:40:18] Qink name = optman
[03:40:19] Qink name = fldman
[03:40:19] Qink name = gesman
[03:40:19] Qink name = scfman
[03:47:39] Qink name = anlman
[03:47:39] Qink name = drvman
[03:48:51] Qink name = optman
[03:48:51] Qink name = fldman
[03:48:51] Qink name = gesman
[03:48:51] Qink name = scfman
[03:55:52] Qink name = anlman
[03:55:52] Qink name = drvman
[03:57:01] Qink name = optman
[03:57:02] Qink name = fldman
[03:57:02] Qink name = gesman
[03:57:02] Qink name = scfman
[04:03:38] Qink name = anlman
[04:03:38] Qink name = drvman
[04:04:50] Qink name = optman
[04:04:50] Qink name = fldman
[04:04:50] Qink name = gesman
[04:04:50] Qink name = scfman
[04:12:13] Qink name = anlman
[04:12:13] Qink name = drvman
[04:13:24] Qink name = optman
[04:13:24] Qink name = fldman
[04:13:24] Qink name = gesman
[04:13:25] Qink name = scfman
[04:20:06] Qink name = anlman
[04:20:06] Qink name = drvman
[04:21:15] Qink name = optman
[04:21:15] Qink name = fldman
[04:21:15] Qink name = gesman
[04:21:15] Qink name = scfman
[04:28:07] Qink name = anlman
[04:28:07] Qink name = drvman
[09:04:29] Number of jobs = 16
[09:04:29] Starting job 2,CPU time has been restored to 366.906930.
[09:04:32] Starting new Job
[09:04:32] Qink name = fldman
[09:04:36] Qink name = gesman
[09:04:36] Qink name = scfman
Quit requested: Exiting
[10:10:06] Number of jobs = 16
[10:10:06] Starting job 2,CPU time has been restored to 366.906930.
[10:10:09] Starting new Job
[10:10:09] Qink name = fldman
[10:10:15] Qink name = gesman
[10:10:15] Qink name = scfman
[10:14:18] Qink name = anlman
[10:14:18] Qink name = drvman
[10:15:31] Qink name = optman
[10:15:31] Qink name = fldman
[10:15:31] Qink name = gesman
[10:15:32] Qink name = scfman
[10:23:20] Qink name = anlman
[10:23:20] Qink name = drvman
[10:24:31] Qink name = optman
[10:24:31] Qink name = fldman
[10:24:31] Qink name = gesman
[10:24:32] Qink name = scfman
[10:31:51] Qink name = anlman
[10:31:52] Qink name = drvman
[10:33:01] Qink name = optman
[10:33:01] Qink name = fldman
[10:33:01] Qink name = gesman
[10:33:02] Qink name = scfman
[10:39:57] Qink name = anlman
[10:39:57] Qink name = drvman
[10:41:06] Qink name = optman
[10:41:06] Qink name = fldman
[10:41:06] Qink name = gesman
[10:41:06] Qink name = scfman
[10:47:39] Qink name = anlman
[10:47:39] Qink name = drvman
[10:48:51] Qink name = optman
[10:48:51] Qink name = fldman
[10:48:51] Qink name = gesman
[10:48:52] Qink name = scfman
[10:55:55] Qink name = anlman
[10:55:55] Qink name = drvman
[10:57:05] Qink name = optman
[10:57:05] Qink name = fldman
[10:57:05] Qink name = gesman
[10:57:05] Qink name = scfman
[11:03:29] Qink name = anlman
[11:03:29] Qink name = drvman
[11:04:37] Qink name = optman
[11:04:37] Qink name = fldman
[11:04:37] Qink name = gesman
[11:04:38] Qink name = scfman
[11:11:22] Qink name = anlman
[11:11:22] Qink name = drvman
[11:12:31] Qink name = optman
[11:12:32] Qink name = fldman
[11:12:32] Qink name = gesman
[11:12:32] Qink name = scfman
[11:17:52] Qink name = anlman
[11:17:52] Qink name = drvman
[11:19:00] Qink name = optman
[11:19:00] Qink name = fldman
[11:19:00] Qink name = gesman
[11:19:00] Qink name = scfman
[11:24:14] Qink name = anlman
[11:24:14] Qink name = drvman
[11:25:23] Qink name = optman
[11:25:23] Qink name = fldman
[11:25:23] Qink name = gesman
[11:25:24] Qink name = scfman
[11:30:18] Qink name = anlman
[11:30:18] Qink name = drvman
[11:31:27] Qink name = optman
[11:31:27] Qink name = fldman
[11:31:27] Qink name = gesman
[11:31:27] Qink name = scfman
[11:36:05] Qink name = anlman
[11:36:05] Qink name = drvman
[11:37:13] Qink name = optman
[11:37:13] Qink name = anlman
[11:37:43] End of Job
[11:37:45] Finished Job #2
[11:37:45] Starting job 3,CPU time has been restored to 620.750794.
[11:37:46] Starting new Job
[11:37:46] Qink name = fldman
[11:37:46] Qink name = gesman
[11:37:46] Qink name = scfman
[11:44:51] Qink name = anlman
[11:45:26] End of Job
[11:45:29] Finished Job #3
[11:45:29] Starting job 4,CPU time has been restored to 874.990683.
[11:45:29] Starting new Job
[11:45:29] Qink name = fldman
[11:45:30] Qink name = gesman
[11:45:30] Qink name = scfman
[11:50:12] Qink name = anlman
[11:50:44] End of Job
[11:50:46] Finished Job #4
[11:50:46] Starting job 5,CPU time has been restored to 1129.058561.
[11:50:46] Starting new Job
[11:50:46] Qink name = fldman
[11:50:47] Qink name = gesman
[11:50:47] Qink name = scfman
[13:30:01] Qink name = anlman
[13:30:27] End of Job
[13:30:30] Finished Job #5
[13:30:30] Starting job 6,CPU time has been restored to 1383.810482.
[13:30:30] Starting new Job
[13:30:31] Qink name = fldman
[13:30:31] Qink name = gesman
[13:30:31] Qink name = scfman
[13:35:06] Qink name = anlman
[13:35:33] End of Job
[13:35:36] Finished Job #6
[13:35:36] Starting job 7,CPU time has been restored to 1638.702411.
[13:35:36] Starting new Job
[13:35:36] Qink name = fldman
[13:35:37] Qink name = gesman
[13:35:37] Qink name = scfman
[13:42:15] Qink name = anlman
[13:42:42] End of Job
[13:42:45] Finished Job #7
[13:42:45] Starting job 8,CPU time has been restored to 1895.606466.
[13:42:45] Starting new Job
[13:42:45] Qink name = fldman
[13:42:46] Qink name = gesman
[13:42:46] Qink name = scfman
[13:47:22] Qink name = anlman
[13:47:50] End of Job
[13:47:52] Finished Job #8
[13:47:52] Starting job 9,CPU time has been restored to 2151.794476.
[13:47:52] Starting new Job
[13:47:52] Qink name = fldman
[13:47:53] Qink name = gesman
[13:47:53] Qink name = scfman
[13:52:50] Qink name = anlman
[13:53:32] End of Job
[13:53:35] Finished Job #9
[13:53:35] Starting job 10,CPU time has been restored to 2404.770286.
[13:53:35] Starting new Job
[13:53:35] Qink name = fldman
[13:53:36] Qink name = gesman
[13:53:36] Qink name = scfman
[14:05:47] Qink name = anlman
[14:06:27] End of Job
[14:06:30] Finished Job #10
[14:06:30] Starting job 11,CPU time has been restored to 2660.934295.
[14:06:30] Starting new Job
[14:06:30] Qink name = fldman
[14:06:30] Qink name = gesman
[14:06:30] Qink name = scfman
[14:12:47] Qink name = anlman
[14:13:28] End of Job
[14:13:30] Finished Job #11
[14:13:30] Starting job 12,CPU time has been restored to 2914.254126.
[14:13:30] Starting new Job
[14:13:31] Qink name = fldman
[14:13:33] Qink name = gesman
[14:13:34] Qink name = scfman
[14:48:59] Qink name = anlman
[14:57:32] End of Job
[14:57:35] Finished Job #12
[14:57:35] Starting job 13,CPU time has been restored to 3173.410322.
[14:57:36] Starting new Job
[14:57:36] Qink name = fldman
[14:57:39] Qink name = gesman
[14:57:39] Qink name = scfman
[16:30:51] Qink name = anlman
[16:39:25] End of Job
[16:39:29] Finished Job #13
[16:39:29] Starting job 14,CPU time has been restored to 3427.134178.
[16:39:29] Starting new Job
[16:39:29] Qink name = fldman
[16:39:32] Qink name = gesman
[16:39:32] Qink name = scfman
[18:09:53] Qink name = anlman
[18:17:15] End of Job
[18:17:19] Finished Job #14
[18:17:19] Starting job 15,CPU time has been restored to 3680.898037.
[18:17:19] Starting new Job
[18:17:19] Qink name = fldman
[18:17:22] Qink name = gesman
[18:17:22] Qink name = scfman
[19:58:08] Qink name = anlman
[20:05:43] End of Job
[20:05:46] Finished Job #15
called boinc_finish
Exiting 0

</stderr_txt>
]]>
----------------------------------------

[Oct 5, 2010 6:26:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

Not without the CPU time / percent progress% annotation. E.g. the next 4 lines, if the task ran uninterrupted, suggest these 2 steps took 1:40 hours respective 1:33 hours on the wallclock. How much CPU time got added for those 2 Jobs, 5 and 13.

[11:50:47] Qink name = scfman
[13:30:01] Qink name = anlman

[14:57:39] Qink name = scfman
[16:30:51] Qink name = anlman

What is disturbing is that it started at 3:16 and ended 20:05 which is basically 17 hours. What (CPU) time was recorded on the Result Status page? Riddle.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 5, 2010 6:43:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

Not without the CPU time / percent progress% annotation. .


and I guess for this I need the checkpoint debugging? I am working on figuring this out. I can't seem to find my config file. I will try to do this later. Gotta go to the gym now!
----------------------------------------

[Oct 5, 2010 6:52:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

Checkpoint Debugging switched on allows the real time monitoring so you can correlate. Without you still could by annotating the Wallclock and CPU time, but you'd need an old Manager or BOINCTasks. The new manager only shows CPU time in task properties and I don't remember to have seen that updating when the pop up is on screen.

Still, why did this task run 17 hours, for loosing checkpoint time should not loose the actual computed data/progress.... hmmm maybe the full result was computed whereas otherwise it would have cut off at 12 hours... presuming this client is set to Run Always i.e. no throttling of any kind.

How does the log compare to the wingman's?

edit: as for gym, today for second day in row was mowing grass for 2 hours in 28C, just before the thunderstorm that was predicted, and came right on the button... that was my gym for today... must have lost a litre in sweat.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Oct 5, 2010 7:09:12 PM]
[Oct 5, 2010 7:02:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yose-ue
Cruncher
Joined: Dec 27, 2008
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

The following is a results log of a job I aborted after over 20 hours it claimed to be at less tan 20% complete at the time (2.39 house cpu time when aborted). I know the percent complete will go back to a small % and the difference between clock time and cpu time will jump.
Result Name: E200391_ 465_ A.25.C15H8N8S2.11.2.set1d06_ 1--
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[01:05:44] Number of jobs = 16
[01:05:44] Starting job 0,CPU time has been restored to 0.000000.
[01:05:44] Starting new Job
[01:05:44] Qink name = fldman
[01:05:45] Qink name = gesman
[01:05:45] Qink name = scfman
[01:10:32] Qink name = anlman
[01:10:35] End of Job
[01:10:38] Finished Job #0
[01:10:38] Starting job 1,CPU time has been restored to 244.687292.
[01:10:38] Starting new Job
[01:10:38] Qink name = fldman
[01:10:39] Qink name = gesman
[01:10:40] Qink name = scfman
[01:26:47] Qink name = anlman
[01:27:50] End of Job
[01:27:53] Finished Job #1
[01:27:53] Starting job 2,CPU time has been restored to 502.611411.
[01:27:53] Starting new Job
[01:27:53] Qink name = fldman
[01:27:54] Qink name = gesman
[01:27:55] Qink name = scfman
[01:40:54] Qink name = anlman
[01:40:54] Qink name = drvman
[01:45:00] Qink name = optman
[01:45:00] Qink name = fldman
[01:45:00] Qink name = gesman
[01:45:01] Qink name = scfman
[02:08:05] Qink name = anlman
[02:08:05] Qink name = drvman
[02:12:00] Qink name = optman
[02:12:00] Qink name = fldman
[02:12:00] Qink name = gesman
[02:12:01] Qink name = scfman
[02:34:42] Qink name = anlman
[02:34:42] Qink name = drvman
[02:38:37] Qink name = optman
[02:38:37] Qink name = fldman
[02:38:37] Qink name = gesman
[02:38:38] Qink name = scfman
[03:00:22] Qink name = anlman
[03:00:23] Qink name = drvman
[03:04:16] Qink name = optman
[03:04:16] Qink name = fldman
[03:04:16] Qink name = gesman
[03:04:18] Qink name = scfman
[03:25:01] Qink name = anlman
[03:25:01] Qink name = drvman
[03:28:55] Qink name = optman
[03:28:56] Qink name = fldman
[03:28:56] Qink name = gesman
[03:28:57] Qink name = scfman
[03:49:38] Qink name = anlman
[03:49:38] Qink name = drvman
[03:53:32] Qink name = optman
[03:53:33] Qink name = fldman
[03:53:33] Qink name = gesman
[03:53:34] Qink name = scfman
[04:11:44] Qink name = anlman
[04:11:44] Qink name = drvman
[04:15:40] Qink name = optman
[04:15:40] Qink name = fldman
[04:15:40] Qink name = gesman
[04:15:41] Qink name = scfman
[04:32:32] Qink name = anlman
[04:32:32] Qink name = drvman
[04:36:26] Qink name = optman
[04:36:26] Qink name = fldman
[04:36:26] Qink name = gesman
[04:36:28] Qink name = scfman
[04:51:14] Qink name = anlman
[04:51:14] Qink name = drvman
[04:55:08] Qink name = optman
[04:55:08] Qink name = fldman
[04:55:08] Qink name = gesman
[04:55:10] Qink name = scfman
[05:09:51] Qink name = anlman
[05:09:51] Qink name = drvman
[05:13:46] Qink name = optman
[05:13:46] Qink name = fldman
[05:13:46] Qink name = gesman
[05:13:47] Qink name = scfman
[05:28:26] Qink name = anlman
[05:28:26] Qink name = drvman
[05:32:21] Qink name = optman
[05:32:21] Qink name = fldman
[05:32:21] Qink name = gesman
[05:32:23] Qink name = scfman
[05:45:41] Qink name = anlman
[05:45:41] Qink name = drvman
[05:49:36] Qink name = optman
[05:49:36] Qink name = fldman
[05:49:36] Qink name = gesman
[05:49:37] Qink name = scfman
[06:01:53] Qink name = anlman
[06:01:53] Qink name = drvman
[06:05:47] Qink name = optman
[06:05:47] Qink name = anlman
[06:06:48] End of Job
[06:06:51] Finished Job #2
[06:06:51] Starting job 3,CPU time has been restored to 759.991496.
[06:06:51] Starting new Job
[06:06:51] Qink name = fldman
[06:06:52] Qink name = gesman
[06:06:53] Qink name = scfman
[06:24:39] Qink name = anlman
[06:25:40] End of Job
[06:25:43] Finished Job #3
[06:25:43] Starting job 4,CPU time has been restored to 1011.043185.
[06:25:43] Starting new Job
[06:25:43] Qink name = fldman
[06:25:44] Qink name = gesman
[06:25:44] Qink name = scfman
[06:37:54] Qink name = anlman
[06:38:54] End of Job
[06:38:57] Finished Job #4
[06:38:57] Starting job 5,CPU time has been restored to 1268.415269.
[06:38:57] Starting new Job
[06:38:57] Qink name = fldman
[06:38:58] Qink name = gesman
[06:38:58] Qink name = scfman
[06:52:15] Qink name = anlman
[06:53:15] End of Job
[06:53:18] Finished Job #5
[06:53:18] Starting job 6,CPU time has been restored to 1525.483334.
[06:53:19] Starting new Job
[06:53:19] Qink name = fldman
[06:53:20] Qink name = gesman
[06:53:20] Qink name = scfman
[07:05:24] Qink name = anlman
[07:06:24] End of Job
[07:06:27] Finished Job #6
[07:06:27] Starting job 7,CPU time has been restored to 1782.495396.
[07:06:28] Starting new Job
[07:06:28] Qink name = fldman
[07:06:29] Qink name = gesman
[07:06:29] Qink name = scfman
Quit requested: Exiting
[07:20:32] Number of jobs = 16
[07:20:32] Starting job 7,CPU time has been restored to 1782.495396.
[07:20:37] Starting new Job
[07:20:38] Qink name = fldman
[07:20:39] Qink name = gesman
[07:20:39] Qink name = scfman
[07:41:01] Qink name = anlman
[07:42:16] End of Job
[07:42:19] Finished Job #7
[07:42:19] Starting job 8,CPU time has been restored to 2039.295445.
[07:42:19] Starting new Job
[07:42:20] Qink name = fldman
[07:42:23] Qink name = gesman
[07:42:23] Qink name = scfman
[07:57:23] Qink name = anlman
[07:58:43] End of Job
[07:58:45] Finished Job #8
[07:58:45] Starting job 9,CPU time has been restored to 2298.379636.
[07:58:46] Starting new Job
[07:58:46] Qink name = fldman
[07:58:49] Qink name = gesman
[07:58:49] Qink name = scfman
[08:14:37] Qink name = anlman
[08:16:18] End of Job
[08:16:21] Finished Job #9
[08:16:21] Starting job 10,CPU time has been restored to 2554.195623.
[08:16:21] Starting new Job
[08:16:21] Qink name = fldman
[08:16:22] Qink name = gesman
[08:16:22] Qink name = scfman
[08:50:38] Qink name = anlman
[08:52:33] End of Job
[08:52:35] Finished Job #10
[08:52:35] Starting job 11,CPU time has been restored to 2812.147744.
[08:52:35] Starting new Job
[08:52:35] Qink name = fldman
[08:52:36] Qink name = gesman
[08:52:36] Qink name = scfman
[09:09:23] Qink name = anlman
[09:11:11] End of Job
[09:11:15] Finished Job #11
[09:11:15] Starting job 12,CPU time has been restored to 3068.527766.
[09:11:15] Starting new Job
[09:11:15] Qink name = fldman
[09:11:21] Qink name = gesman
[09:11:22] Qink name = scfman
[11:47:09] Qink name = anlman
[11:55:05] End of Job
[11:55:08] Finished Job #12
[11:55:08] Starting job 13,CPU time has been restored to 3325.123802.
[11:55:09] Starting new Job
[11:55:09] Qink name = fldman
[11:55:15] Qink name = gesman
[11:55:15] Qink name = scfman
[16:01:21] Qink name = anlman
[16:09:08] End of Job
[16:09:12] Finished Job #13
[16:09:12] Starting job 14,CPU time has been restored to 3580.479760.
[16:09:13] Starting new Job
[16:09:13] Qink name = fldman
[16:09:19] Qink name = gesman
[16:09:19] Qink name = scfman
Quit requested: Exiting
[05:34:52] Number of jobs = 16
[05:34:52] Starting job 14,CPU time has been restored to 3580.479760.
[05:34:58] Starting new Job
[05:34:58] Qink name = fldman
[05:35:12] Qink name = gesman
[05:35:12] Qink name = scfman
[10:06:13] Qink name = anlman
[10:13:40] End of Job
[10:13:46] Finished Job #14
[10:13:46] Starting job 15,CPU time has been restored to 3835.595703.
[10:13:46] Starting new Job
[10:13:46] Qink name = fldman
[10:13:52] Qink name = gesman
[10:13:53] Qink name = scfman
Abort requested: Exiting

</stderr_txt>
]]>
close

Return to Top
[Oct 5, 2010 7:40:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

I snipped out part of your log. Comments follow below it.

can you tell anything from this??

Result Log
.
.
.
[03:25:31] Finished Job #1
[03:25:31] Starting job 2,CPU time has been restored to 366.906930.
[03:25:31] Starting new Job
[03:25:31] Qink name = fldman
[03:25:31] Qink name = gesman
[03:25:31] Qink name = scfman
[03:29:34] Qink name = anlman
[03:29:34] Qink name = drvman
[03:30:45] Qink name = optman
[03:30:45] Qink name = fldman
[03:30:45] Qink name = gesman
[03:30:45] Qink name = scfman
[03:38:56] Qink name = anlman
[03:38:56] Qink name = drvman
[03:40:18] Qink name = optman
[03:40:19] Qink name = fldman
[03:40:19] Qink name = gesman
[03:40:19] Qink name = scfman
[03:47:39] Qink name = anlman
[03:47:39] Qink name = drvman
[03:48:51] Qink name = optman
[03:48:51] Qink name = fldman
[03:48:51] Qink name = gesman
[03:48:51] Qink name = scfman
[03:55:52] Qink name = anlman
[03:55:52] Qink name = drvman
[03:57:01] Qink name = optman
[03:57:02] Qink name = fldman
[03:57:02] Qink name = gesman
[03:57:02] Qink name = scfman
[04:03:38] Qink name = anlman
[04:03:38] Qink name = drvman
[04:04:50] Qink name = optman
[04:04:50] Qink name = fldman
[04:04:50] Qink name = gesman
[04:04:50] Qink name = scfman
[04:12:13] Qink name = anlman
[04:12:13] Qink name = drvman
[04:13:24] Qink name = optman
[04:13:24] Qink name = fldman
[04:13:24] Qink name = gesman
[04:13:25] Qink name = scfman
[04:20:06] Qink name = anlman
[04:20:06] Qink name = drvman
[04:21:15] Qink name = optman
[04:21:15] Qink name = fldman
[04:21:15] Qink name = gesman
[04:21:15] Qink name = scfman
[04:28:07] Qink name = anlman
[04:28:07] Qink name = drvman *****************
[09:04:29] Number of jobs = 16
[09:04:29] Starting job 2,CPU time has been restored to 366.906930.
[09:04:32] Starting new Job
[09:04:32] Qink name = fldman
[09:04:36] Qink name = gesman
[09:04:36] Qink name = scfman
Quit requested: Exiting
[10:10:06] Number of jobs = 16
[10:10:06] Starting job 2,CPU time has been restored to 366.906930.
[10:10:09] Starting new Job
[10:10:09] Qink name = fldman
[10:10:15] Qink name = gesman
[10:10:15] Qink name = scfman
[10:14:18] Qink name = anlman
[10:14:18] Qink name = drvman
[10:15:31] Qink name = optman
[10:15:31] Qink name = fldman
[10:15:31] Qink name = gesman
[10:15:32] Qink name = scfman
[10:23:20] Qink name = anlman
[10:23:20] Qink name = drvman
[10:24:31] Qink name = optman
[10:24:31] Qink name = fldman
[10:24:31] Qink name = gesman
[10:24:32] Qink name = scfman
[10:31:51] Qink name = anlman
[10:31:52] Qink name = drvman
[10:33:01] Qink name = optman
[10:33:01] Qink name = fldman
[10:33:01] Qink name = gesman
[10:33:02] Qink name = scfman
[10:39:57] Qink name = anlman
[10:39:57] Qink name = drvman
[10:41:06] Qink name = optman
[10:41:06] Qink name = fldman
[10:41:06] Qink name = gesman
[10:41:06] Qink name = scfman
[10:47:39] Qink name = anlman
[10:47:39] Qink name = drvman
[10:48:51] Qink name = optman
[10:48:51] Qink name = fldman
[10:48:51] Qink name = gesman
[10:48:52] Qink name = scfman
[10:55:55] Qink name = anlman
[10:55:55] Qink name = drvman
[10:57:05] Qink name = optman
[10:57:05] Qink name = fldman
[10:57:05] Qink name = gesman
[10:57:05] Qink name = scfman
[11:03:29] Qink name = anlman
[11:03:29] Qink name = drvman
[11:04:37] Qink name = optman
[11:04:37] Qink name = fldman
[11:04:37] Qink name = gesman
[11:04:38] Qink name = scfman
[11:11:22] Qink name = anlman
[11:11:22] Qink name = drvman
[11:12:31] Qink name = optman
[11:12:32] Qink name = fldman
[11:12:32] Qink name = gesman
[11:12:32] Qink name = scfman
[11:17:52] Qink name = anlman
[11:17:52] Qink name = drvman
[11:19:00] Qink name = optman
[11:19:00] Qink name = fldman
[11:19:00] Qink name = gesman
[11:19:00] Qink name = scfman
[11:24:14] Qink name = anlman
[11:24:14] Qink name = drvman
[11:25:23] Qink name = optman
[11:25:23] Qink name = fldman
[11:25:23] Qink name = gesman
[11:25:24] Qink name = scfman
[11:30:18] Qink name = anlman
[11:30:18] Qink name = drvman
[11:31:27] Qink name = optman
[11:31:27] Qink name = fldman
[11:31:27] Qink name = gesman
[11:31:27] Qink name = scfman
[11:36:05] Qink name = anlman
[11:36:05] Qink name = drvman
[11:37:13] Qink name = optman
[11:37:13] Qink name = anlman
[11:37:43] End of Job
[11:37:45] Finished Job #2
[11:37:45] Starting job 3,CPU time has been restored to 620.750794.
[11:37:46] Starting new Job
[11:37:46] Qink name = fldman
[11:37:46] Qink name = gesman
[11:37:46] Qink name = scfman
[11:44:51] Qink name = anlman
[11:45:26] End of Job
.
.
.


I don't know *what* happened, but it looks as if it happened at 04:28:07.

As you know, CEP2 WUs consist of 16 jobs, and checkpointing happens only as each one is completed. Job #2 is always way longer than any of the others, so if something happens (BOINC is shut down or whatever), while Job #2 is well under way, then a lot of CPU time is lost since the WU goes back to the start of Job #2 on its restart.

Looks like your WU was almost to the end of Job #2 at 04:28:07. Then there's an unexplained gap of 4-1/2 hours. Then the job restarts, going back to the checkpoint at the beginning of Job 2. Exit is requested only a little ways into Job #2; WU restarts an hour later, again going back to the beginning of Job #2, and finally runs straight through to the end from there.

So the mystery is what happened at 04:28:07 that caused processing to stop for 4-1/2 hours. The rest of the checkpointing is just the way CEP2 does things.
----------------------------------------

[Oct 5, 2010 8:12:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

okay. is this time UT or local?
----------------------------------------

[Oct 5, 2010 8:40:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

Your time, which makes it easier to relate to other event log files.

This bit in a just finished CEP2 Result log file looks kinky, not showing in the wingman log, but both ran all 16 jobs:

Parent was killed, exiting
[22:09:24] Qink name = anlman
[22:17:39] End of Job
[22:17:43] Finished Job #15
called boinc_finish
Exiting 0

</stderr_txt>
]]>

Edit: From BOINCTasks, a 37 minute gap... quite normal here

6.19 cep2 E200409_083_A.26.C17H7N5S4.8.3.set1d06_0 06:44:30 (06:07:15) 05-10-2010 22:18 05-10-2010 22:23 Reported: OK
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Oct 5, 2010 8:54:11 PM]
[Oct 5, 2010 8:50:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

how are you using boincTasks? I googled and it looks like it's only for windows
----------------------------------------

[Oct 5, 2010 8:55:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks are not checkpointing proporly

Remote monitoring from a Windows machine ;>)
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 5, 2010 8:57:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 54   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread