Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2782 times and has 9 replies Next Thread
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Strange Results

I noticed that FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- shows aborted on my end, but the server reports as valid.

The CPU time is the same on both (21.12 hrs), but the server reports 0:00 elapsed time. Reduced credit was given.

There is NO results log.

Cheers coffee
----------------------------------------

[Oct 6, 2015 6:40:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange Results

I hope this doesn't count as hijacking your thread, Nix, but that's a useful thread title you've chosen cool
So here's another strange result, after I successfully processed an _2 unit and wondered what the earlier Invalid was.

Project Name: FightAIDS@Home - Phase 2
Created: 09/30/2015 15:59:28
Name: FAH2_avx101118_000086_0009_001
Minimum Quorum: 1
Replication: 2
Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
FAH2_ avx101118_ 000086_ 0009_ 001_ 2-- 714 Valid 04/10/15 17:07:20 06/10/15 06:26:22 10.44 376.3 / 370.4 << mine
FAH2_ avx101118_ 000086_ 0009_ 001_ 1-- 714 Valid 04/10/15 17:07:19 06/10/15 01:37:19 25.03 370.4 / 370.4
FAH2_ avx101118_ 000086_ 0009_ 001_ 0-- 714 Invalid 30/09/15 18:00:00 04/10/15 12:53:52 13.91 340.7 / 340.7

Result Name: FAH2_ avx101118_ 000086_ 0009_ 001_ 0--
<core_client_version>7.6.6</core_client_version>
<![CDATA[
<stderr_txt>
[19:12:31] INFO:Turning trickle messaging on.
[19:12:31] INFO:Turning intermediate uploads on.
%IMPACT-I: Requested file to open for appending md.out Does not exist.
Opening it as a new file.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.07000
agbnpf_assign_parameters(): info: attempting to load from SQL tables.
[19:21:16] INFO: Checkpoint skipped. Progress 1000/100000 CPU time 518.094921
[19:29:50] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 1025.784576
[19:38:16] INFO: Checkpoint skipped. Progress 3000/100000 CPU time 1530.385410
[19:46:38] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 2027.966200
[19:55:02] INFO: Checkpoint skipped. Progress 5000/100000 CPU time 2529.010212
[20:03:25] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 3026.700202
[20:11:47] INFO: Checkpoint skipped. Progress 7000/100000 CPU time 3526.215404
[20:20:12] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 4027.555818
[20:28:38] INFO: Checkpoint skipped. Progress 9000/100000 CPU time 4530.908644
[20:37:05] INFO: Sending trickle message to server.
[20:37:05] INFO: Starting intermediate upload, index = 1
[20:37:05] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 5033.169464
[20:45:28] INFO: Checkpoint skipped. Progress 11000/100000 CPU time 5534.478677
[20:53:49] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 6034.025079

(snipped middle part - alternating checkpoints and skipped checkpoints)

[07:33:05] INFO: Checkpointed. Progress 88000 of 100000 steps complete CPU time 44139.063341
[07:41:25] INFO: Checkpoint skipped. Progress 89000/100000 CPU time 44637.470936
[07:49:40] INFO: Sending trickle message to server.
[07:49:40] INFO: Starting intermediate upload, index = 9
[07:49:40] INFO: Checkpointed. Progress 90000 of 100000 steps complete CPU time 45129.638491
[07:58:02] INFO: Checkpoint skipped. Progress 91000/100000 CPU time 45624.536063
[08:06:20] INFO: Checkpointed. Progress 92000 of 100000 steps complete CPU time 46119.371235
[08:14:41] INFO: Checkpoint skipped. Progress 93000/100000 CPU time 46617.513628
[08:23:03] INFO: Checkpointed. Progress 94000 of 100000 steps complete CPU time 47116.966430
[08:31:20] INFO: Checkpoint skipped. Progress 95000/100000 CPU time 47608.385180
[08:39:35] INFO: Checkpointed. Progress 96000 of 100000 steps complete CPU time 48102.518348
[08:47:56] INFO: Checkpoint skipped. Progress 97000/100000 CPU time 48601.565547
[08:56:14] INFO: Checkpointed. Progress 98000 of 100000 steps complete CPU time 49096.619120
[09:04:33] INFO: Checkpoint skipped. Progress 99000/100000 CPU time 49593.685106
[09:12:54] INFO: Checkpointed. Progress 100000 of 100000 steps complete CPU time 50089.565485
%IMPACT-I: Species 1 written to SQL file md-out1.dms
%IMPACT-I: Species 2 written to SQL file md-out2.dms
09:12:55 (1168): called boinc_finish(0)

Note the skipped checkpoints, something I'd not seen previously in any results. (Caused by a high "Checkpoint to disk at most every" setting?). Are they the cause of the subsequent Invalid? If so, is that a bug?
[Oct 6, 2015 7:05:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange Results

Checkpointing skipped just means that the WtD is set longer than it takes to get from one to the next. The science app asks the client one time at start what the allowed interval is, and keeps this minimum ''at most' interval till the end. If you change WtD it keeps the old till end unless restarted [unloaded from memory].
[Oct 6, 2015 10:49:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

It looks like the client was told to quit. There was at least two prior trickle-up message sent, so the client was at least 30% done when told to stop. Here is my client log:

10/5/2015 10:41:32 AM Started upload of FAH2_avx17377-ls_000055_0011_001_0_6
10/5/2015 10:41:32 AM Started upload of FAH2_avx17377-ls_000055_0011_001_0_16
10/5/2015 10:41:35 AM Sending scheduler request: To send trickle-up message.
10/5/2015 10:41:35 AM Not requesting tasks: don't need
10/5/2015 10:41:40 AM Scheduler request completed
10/5/2015 10:41:45 AM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_6
10/5/2015 10:41:45 AM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_16
10/5/2015 11:00:05 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 11:17:50 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 11:39:46 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 11:54:25 AM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 12:15:49 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 12:35:53 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 12:51:57 PM Sending scheduler request: To send trickle-up message.
10/5/2015 12:51:57 PM Not requesting tasks: don't need
10/5/2015 12:52:00 PM Scheduler request completed
10/5/2015 12:54:18 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 1:12:28 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 1:32:21 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 1:50:45 PM [checkpoint] result FAH2_avx17377-ls_000055_0011_001_0 checkpointed
10/5/2015 1:50:46 PM Started upload of FAH2_avx17377-ls_000055_0011_001_0_7
10/5/2015 1:50:46 PM Started upload of FAH2_avx17377-ls_000055_0011_001_0_17
10/5/2015 1:50:46 PM Sending scheduler request: To send trickle-up message.
10/5/2015 1:50:46 PM Not requesting tasks: don't need
10/5/2015 1:50:50 PM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_7
10/5/2015 1:50:50 PM Finished upload of FAH2_avx17377-ls_000055_0011_001_0_17
10/5/2015 1:50:50 PM Scheduler request completed
10/5/2015 1:50:50 PM Result FAH2_avx17377-ls_000055_0011_001_0 is no longer usable
10/5/2015 1:50:52 PM Computation for task FAH2_avx17377-ls_000055_0011_001_0 finished
10/5/2015 1:52:52 PM Sending scheduler request: To report completed tasks.
10/5/2015 1:52:52 PM Reporting 1 completed tasks


Server log
FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1


What is also strange is that the time stamp in the server log has no correlation to that in the client log. So why no results log showing the output up to the stop message?

Cheers coffee
----------------------------------------

[Oct 6, 2015 2:59:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange Results

Same as the sent/deadlines showing in your client are the local times versus the UTC time on the Result Status pages, will the event & result logs contain your local clock times.

10/5/2015 1:52:52 PM Reporting 1 completed tasks

Server log
FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1

Assuming you're somewhere in the NA, your 1:52:52PM (13:52:52) would be probable with a 5-8 hour + offset to UTC, but why the minutes are so far apart, 33 or 27 depending which direction on the globe? I remember in Australia incurred 8.5 hours offset between home and Adelaide, half hour zones they had. Maybe local system time is not synched with the internet time? Guess guess guess.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Oct 6, 2015 5:10:52 PM]
[Oct 6, 2015 4:15:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JEklund2
Advanced Cruncher
Finland
Joined: Aug 10, 2006
Post Count: 119
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

.. I also guess that it might be related to the "Time difference" .. I just lost eight running tasks and the only reason I can guess is also related to the time issues .. I have dual boot system ( normally running Windows 10 ) but also running Linux ( Mint ) .. After starting the Mint and when came back to WIN10 I noticed that time was ran "backwards ( was three hours less than the actual wall clock in Finland ) .. After some googling I found that in the Linux it should be set "UTC=no", so when it was UTC=yes then Linux (maybe?) modified the clock and when back in WIN10 I had to once move it "forward" ..
----------------------------------------

[Oct 6, 2015 7:57:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

I think if you set the time in BIOS upon startup, either OS should be picking up the time from there.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Oct 6, 2015 10:37:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

Same as the sent/deadlines showing in your client are the local times versus the UTC time on the Result Status pages, will the event & result logs contain your local clock times.

10/5/2015 1:52:52 PM Reporting 1 completed tasks

Server log
FAH2_ avx17377-ls_ 000055_ 0011_ 001_ 0-- Venus-4 Valid 10/2/15 02:23:27 10/5/15 20:25:34 21.12 / 0.00 272.0 / 233.1

Assuming you're somewhere in the NA, your 1:52:52PM (13:52:52) would be probable with a 5-8 hour + offset to UTC, but why the minutes are so far apart, 33 or 27 depending which direction on the globe? I remember in Australia incurred 8.5 hours offset between home and Adelaide, half hour zones they had. Maybe local system time is not synched with the internet time? Guess guess guess.

I am GMT-7 right now in Pacific time zone. So server time of completion is 20:25 equals 1:25 PM local time (Win XP has been synced to internet clock for the past 6 years). The client checked in at 12:51:57 PM and 1:50:46 PM to send trickle-up messages. Perhaps 1:25 PM is when the server declared the WU done due to time expiration and began generating the next WU. My client didn't know that until the next time it checked in with another trickle-up message.

Still to be explained is where the results log and final elapsed time went. It is hard to diagnose any further without the results log.

These short deadlines are very irritating.

Cheers coffee
----------------------------------------

[Oct 7, 2015 6:31:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

OK -- something strange is going on here. My PC crashed this morning with a BSOD. I rebooted successfully. For one of the WUs I was running when it crashed, I got:
FAH2_ avx101119_ 000093_ 0035_ 003_ 1-- Cruncher In Progress 10/6/15 20:31:10 10/10/15 20:31:10 6.30 / 0.00 194.2 / 0.0

So under "Results", still showing the WU as "In Progress", has not sent out a repair WU, but awarded me credit for work already done.

On BOINC Manager, still showing status as in progress, and 7.29 hours elapsed time. Is this some type of feature for FAAH2 to save work already done, or what?
[Oct 7, 2015 3:38:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange Results

OK -- something strange is going on here. My PC crashed this morning with a BSOD. I rebooted successfully. For one of the WUs I was running when it crashed, I got:
FAH2_ avx101119_ 000093_ 0035_ 003_ 1-- Cruncher In Progress 10/6/15 20:31:10 10/10/15 20:31:10 6.30 / 0.00 194.2 / 0.0

So under "Results", still showing the WU as "In Progress", has not sent out a repair WU, but awarded me credit for work already done.

On BOINC Manager, still showing status as in progress, and 7.29 hours elapsed time. Is this some type of feature for FAAH2 to save work already done, or what?


littlepeaks,
This has nothing to do with your BSOD. But you are correct in that it is a feature for FAHB to save work already done. It is part of the trickle-up messaging that reports results with every 10% of the workunit completion. You can also see this happening in your Event Log.
[Oct 7, 2015 4:10:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread