| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 10
|
|
| Author |
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
I noticed that FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- shows aborted on my end, but the server reports as valid.
----------------------------------------The CPU time is the same on both (21.12 hrs), but the server reports 0:00 elapsed time. Reduced credit was given. There is NO results log. Cheers ![]() ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I hope this doesn't count as hijacking your thread, Nix, but that's a useful thread title you've chosen
So here's another strange result, after I successfully processed an _2 unit and wondered what the earlier Invalid was. Project Name: FightAIDS@Home - Phase 2 Created: 09/30/2015 15:59:28 Name: FAH2_avx101118_000086_0009_001 Minimum Quorum: 1 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit FAH2_ avx101118_ 000086_ 0009_ 001_ 2-- 714 Valid 04/10/15 17:07:20 06/10/15 06:26:22 10.44 376.3 / 370.4 << mine FAH2_ avx101118_ 000086_ 0009_ 001_ 1-- 714 Valid 04/10/15 17:07:19 06/10/15 01:37:19 25.03 370.4 / 370.4 FAH2_ avx101118_ 000086_ 0009_ 001_ 0-- 714 Invalid 30/09/15 18:00:00 04/10/15 12:53:52 13.91 340.7 / 340.7 Result Name: FAH2_ avx101118_ 000086_ 0009_ 001_ 0-- <core_client_version>7.6.6</core_client_version> <![CDATA[ <stderr_txt> [19:12:31] INFO:Turning trickle messaging on. [19:12:31] INFO:Turning intermediate uploads on. %IMPACT-I: Requested file to open for appending md.out Does not exist. Opening it as a new file. %IMPACT-I: Softcore binding energy with umax = 1000.00000 %IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic Non-Polar Hydration Model %IMPACT-I: Hybrid potential for binding with lambda = 0.07000 agbnpf_assign_parameters(): info: attempting to load from SQL tables. [19:21:16] INFO: Checkpoint skipped. Progress 1000/100000 CPU time 518.094921 [19:29:50] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 1025.784576 [19:38:16] INFO: Checkpoint skipped. Progress 3000/100000 CPU time 1530.385410 [19:46:38] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 2027.966200 [19:55:02] INFO: Checkpoint skipped. Progress 5000/100000 CPU time 2529.010212 [20:03:25] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 3026.700202 [20:11:47] INFO: Checkpoint skipped. Progress 7000/100000 CPU time 3526.215404 [20:20:12] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 4027.555818 [20:28:38] INFO: Checkpoint skipped. Progress 9000/100000 CPU time 4530.908644 [20:37:05] INFO: Sending trickle message to server. [20:37:05] INFO: Starting intermediate upload, index = 1 [20:37:05] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 5033.169464 [20:45:28] INFO: Checkpoint skipped. Progress 11000/100000 CPU time 5534.478677 [20:53:49] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 6034.025079 (snipped middle part - alternating checkpoints and skipped checkpoints) [07:33:05] INFO: Checkpointed. Progress 88000 of 100000 steps complete CPU time 44139.063341 [07:41:25] INFO: Checkpoint skipped. Progress 89000/100000 CPU time 44637.470936 [07:49:40] INFO: Sending trickle message to server. [07:49:40] INFO: Starting intermediate upload, index = 9 [07:49:40] INFO: Checkpointed. Progress 90000 of 100000 steps complete CPU time 45129.638491 [07:58:02] INFO: Checkpoint skipped. Progress 91000/100000 CPU time 45624.536063 [08:06:20] INFO: Checkpointed. Progress 92000 of 100000 steps complete CPU time 46119.371235 [08:14:41] INFO: Checkpoint skipped. Progress 93000/100000 CPU time 46617.513628 [08:23:03] INFO: Checkpointed. Progress 94000 of 100000 steps complete CPU time 47116.966430 [08:31:20] INFO: Checkpoint skipped. Progress 95000/100000 CPU time 47608.385180 [08:39:35] INFO: Checkpointed. Progress 96000 of 100000 steps complete CPU time 48102.518348 [08:47:56] INFO: Checkpoint skipped. Progress 97000/100000 CPU time 48601.565547 [08:56:14] INFO: Checkpointed. Progress 98000 of 100000 steps complete CPU time 49096.619120 [09:04:33] INFO: Checkpoint skipped. Progress 99000/100000 CPU time 49593.685106 [09:12:54] INFO: Checkpointed. Progress 100000 of 100000 steps complete CPU time 50089.565485 %IMPACT-I: Species 1 written to SQL file md-out1.dms %IMPACT-I: Species 2 written to SQL file md-out2.dms 09:12:55 (1168): called boinc_finish(0) Note the skipped checkpoints, something I'd not seen previously in any results. (Caused by a high "Checkpoint to disk at most every" setting?). Are they the cause of the subsequent Invalid? If so, is that a bug? |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Checkpointing skipped just means that the WtD is set longer than it takes to get from one to the next. The science app asks the client one time at start what the allowed interval is, and keeps this minimum ''at most' interval till the end. If you change WtD it keeps the old till end unless restarted [unloaded from memory].
|
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
It looks like the client was told to quit. There was at least two prior trickle-up message sent, so the client was at least 30% done when told to stop. Here is my client log:
----------------------------------------10/5/2015 10:41:32 AM Started upload of FAH2_avx17377-ls_000055_0011_001_0_6 10/5/2015 10:41:32 AM Started upload of FAH2_avx17377-ls_000055_0011_001_0_16 10/5/2015 10:41:35 AM Sending scheduler request: To send trickle-up message. 10/5/2015 10:41:35 AM Not requesting tasks: don't need 10/5/2015 10:41:40 AM Scheduler request completed 10/5/2015 10:41:45 AM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_6 10/5/2015 10:41:45 AM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_16 10/5/2015 11:00:05 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 11:17:50 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 11:39:46 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 11:54:25 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 12:15:49 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 12:35:53 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 12:51:57 PM Sending scheduler request: To send trickle-up message. 10/5/2015 12:51:57 PM Not requesting tasks: don't need 10/5/2015 12:52:00 PM Scheduler request completed 10/5/2015 12:54:18 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 1:12:28 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 1:32:21 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 1:50:45 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed 10/5/2015 1:50:46 PM Started upload of FAH2_avx17377-ls_000055_0011_001_0_7 10/5/2015 1:50:46 PM Started upload of FAH2_avx17377-ls_000055_0011_001_0_17 10/5/2015 1:50:46 PM Sending scheduler request: To send trickle-up message. 10/5/2015 1:50:46 PM Not requesting tasks: don't need 10/5/2015 1:50:50 PM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_7 10/5/2015 1:50:50 PM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_17 10/5/2015 1:50:50 PM Scheduler request completed 10/5/2015 1:50:50 PM Result FAH2_avx17377-ls_000055_0011_001_0 is no longer usable 10/5/2015 1:50:52 PM Computation for task FAH2_avx17377-ls_000055_0011_001_0 finished 10/5/2015 1:52:52 PM Sending scheduler request: To report completed tasks. 10/5/2015 1:52:52 PM Reporting 1 completed tasks Server log FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1 What is also strange is that the time stamp in the server log has no correlation to that in the client log. So why no results log showing the output up to the stop message? Cheers ![]() ![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Same as the sent/deadlines showing in your client are the local times versus the UTC time on the Result Status pages, will the event & result logs contain your local clock times.
----------------------------------------10/5/2015 1:52:52 PM Reporting 1 completed tasks Server log FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1 Assuming you're somewhere in the NA, your 1:52:52PM (13:52:52) would be probable with a 5-8 hour + offset to UTC, but why the minutes are so far apart, 33 or 27 depending which direction on the globe? I remember in Australia incurred 8.5 hours offset between home and Adelaide, half hour zones they had. Maybe local system time is not synched with the internet time? Guess guess guess. [Edit 1 times, last edit by SekeRob* at Oct 6, 2015 5:10:52 PM] |
||
|
|
JEklund2
Advanced Cruncher Finland Joined: Aug 10, 2006 Post Count: 119 Status: Offline Project Badges:
|
.. I also guess that it might be related to the "Time difference" .. I just lost eight running tasks and the only reason I can guess is also related to the time issues .. I have dual boot system ( normally running Windows 10 ) but also running Linux ( Mint ) .. After starting the Mint and when came back to WIN10 I noticed that time was ran "backwards ( was three hours less than the actual wall clock in Finland ) .. After some googling I found that in the Linux it should be set "UTC=no", so when it was UTC=yes then Linux (maybe?) modified the clock and when back in WIN10 I had to once move it "forward" ..
----------------------------------------![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I think if you set the time in BIOS upon startup, either OS should be picking up the time from there.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
Same as the sent/deadlines showing in your client are the local times versus the UTC time on the Result Status pages, will the event & result logs contain your local clock times. 10/5/2015 1:52:52 PM Reporting 1 completed tasks Server log FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1 Assuming you're somewhere in the NA, your 1:52:52PM (13:52:52) would be probable with a 5-8 hour + offset to UTC, but why the minutes are so far apart, 33 or 27 depending which direction on the globe? I remember in Australia incurred 8.5 hours offset between home and Adelaide, half hour zones they had. Maybe local system time is not synched with the internet time? Guess guess guess. I am GMT-7 right now in Pacific time zone. So server time of completion is 20:25 equals 1:25 PM local time (Win XP has been synced to internet clock for the past 6 years). The client checked in at 12:51:57 PM and 1:50:46 PM to send trickle-up messages. Perhaps 1:25 PM is when the server declared the WU done due to time expiration and began generating the next WU. My client didn't know that until the next time it checked in with another trickle-up message. Still to be explained is where the results log and final elapsed time went. It is hard to diagnose any further without the results log. These short deadlines are very irritating. Cheers ![]() ![]() |
||
|
|
littlepeaks
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 748 Status: Offline Project Badges:
|
OK -- something strange is going on here. My PC crashed this morning with a BSOD. I rebooted successfully. For one of the WUs I was running when it crashed, I got:
FAH2_ avx101119_ 000093_ 0035_ 003_ 1-- Cruncher In Progress 10/6/15 20:31:10 10/10/15 20:31:10 6.30 / 0.00 194.2 / 0.0 So under "Results", still showing the WU as "In Progress", has not sent out a repair WU, but awarded me credit for work already done. On BOINC Manager, still showing status as in progress, and 7.29 hours elapsed time. Is this some type of feature for FAAH2 to save work already done, or what? |
||
|
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4894 Status: Offline Project Badges:
|
OK -- something strange is going on here. My PC crashed this morning with a BSOD. I rebooted successfully. For one of the WUs I was running when it crashed, I got: FAH2_ avx101119_ 000093_ 0035_ 003_ 1-- Cruncher In Progress 10/6/15 20:31:10 10/10/15 20:31:10 6.30 / 0.00 194.2 / 0.0 So under "Results", still showing the WU as "In Progress", has not sent out a repair WU, but awarded me credit for work already done. On BOINC Manager, still showing status as in progress, and 7.29 hours elapsed time. Is this some type of feature for FAAH2 to save work already done, or what? littlepeaks, This has nothing to do with your BSOD. But you are correct in that it is a feature for FAHB to save work already done. It is part of the trickle-up messaging that reports results with every 10% of the workunit completion. You can also see this happening in your Event Log. |
||
|
|
|