Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 179
Posts: 179   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 695092 times and has 178 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1330
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

It's very well possible the Elapsed time is dubious [did it go back by the same amount on restart as the CPU time?]. There's some fixes to the time-keeping in the very latest clients.
I used a BOINC patch version 7.7.0 and got higher CPU-times than elapsed.
I had over 100% efficiency on all tasks. The result page shows the same times for elapsed and cpu.
BoincTasks showed for the last four tasks:

Elapsed- / CPU-time
19:48:45 (20:07:39) Result page stored 20.13 / 20.13
19:50:19 (20:21:43) Result page stored 20.36 / 20.36
21:38:11 (22:09:27) Result page stored 22.16 / 22.16
19:38:00 (20:10:05) Result page stored 20.17 / 20.17

I'll install recommended version 7.6.9 to see how the times are with that version.
Maybe there's a fix in it, that wasn't in the standalone patch.
[Sep 2, 2015 4:46:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

I've computed 7 Beta WUs so far.
I've noticed following by WCG-only hosts with a CPU efficiency over 99%, no restart:

  • i7 4770K, Windows 7 Pro SP1 x64
    • BETA_avx101118-040_r8_1_wcgfahb00200000, 10.45 hours, 307.6 granted credit (no wingman)
    • BETA_avx101118-034_r16_1_wcgfahb100000, 10.61 hours, 398.5 granted credit (wingman 14.66 hours)

  • Phenom II x6, Ubuntu 14.04 x64, 1090T
    • BETA_avx101118-044_r17_1_wcgfahb00500000, 26.21 hours, 443.3 granted credit (no wingman)
    • BETA_avx101118-053_r3_1_wcgfahb00300000, 29.02 hours, 443.3 granted credit (no wingman)
    • BETA_avx101118-049_r19_1_wcgfahb00500000, 26.29 hours, 443.3 granted credit (no wingman)
    • BETA_avx101118-059_r7_1_wcgfahb00300000, 25.64 hours, 443.3 granted credit (no wingman)

  • Phenom II x6, Ubuntu 14.04 x64, 1055T
    • BETA_avx101118-028_r5_1_wcgfahb00100000, 29.41 hours, 443.3 granted credit (no wingman)

I have several remarks regarding the crazy credit/hour ratio as well as the duration.
  • The Ubuntu application should be strongly optimized.
  • The credit calculation for long WUs must be modified (whatever the duration is, 443.3 seems to be the only possible granted credit)
  • Phenom II CPUs are not efficient for this project, even if I do not currently notice problem with the both Phenom II hosts (about 50 granted credits/hour for OET1).

Cheers,
Yves


I will be reviewing the points given on this, it is on my plate of things to investigate/improve upon.

Thanks,
-Uplinger
[Sep 3, 2015 2:24:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Thanks for the clarification.
I didn't catch that many beta WUs in the past and wasn't aware that this is a known problem in the field of the beta test.

As soon as I'm home I can look for a log and post the content. But the trickle messages shouldn't be the problem as I could observe a growth of elapsed time for the WU. In my understanding this can only mean that the trickle messages were received and validated by the server.


Rarusu,

What Sek has posted is correct. There are currently two bugs I'm working through right now on the validator and transitioner which are both backend systems. What you are seeing is your machine was not given a "hard stop" message before the deadline. In this case you would have been granted the credit for work done so far, then the next generation work unit would have been created off of how far you have gotten. I would suspect if your machine worked 24/7 on it, you got a pretty good chunk completed. I am hopeful I'll have that part fixed first. Then I will be moving on to the transitioner bug, which is less critical for lost work.

Thanks,
-Uplinger


I have recently put into place the fix for the hard stop/soft stop script that runs on the backend. Members may start seeing more of these messages as they get closer to deadlines.

Thanks,
-Uplinger
[Sep 3, 2015 2:25:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rarusu
Advanced Cruncher
Germany
Joined: Feb 7, 2006
Post Count: 64
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]



I have recently put into place the fix for the hard stop/soft stop script that runs on the backend. Members may start seeing more of these messages as they get closer to deadlines.

Thanks,
-Uplinger


Thanks for the update, uplinger.

I will keep an eye on this as soon as I receive a new beta WU.

Cheers
Rarusu
----------------------------------------
Cheers,
Rarusu


[Sep 3, 2015 5:49:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1680
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

@Uplinger
In advance, I thank you for your investigation.
Cheers,
Yves
----------------------------------------
[Sep 3, 2015 8:05:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

I recently received a beta WU and decided to test it by suspending it with LAIM disabled. Before the suspend, checkpoints were done every ~35 minutes
[18:08:19] INFO: Checkpointed. Progress 1000 of 100000 steps complete CPU time 2091.835000
[18:43:57] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 4143.176000
[19:19:17] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 6164.136000
[19:54:12] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 8138.962000
[20:29:12] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 10115.860000
[21:04:21] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 12121.109000
[21:39:04] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 14100.897000

After the resume that increased to every ~67 minutes:
[22:48:01] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 18051.571000
[23:55:19] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 21944.705000
[01:02:41] INFO: Sending trickle message to server.
[01:02:41] INFO: Starting intermediate upload, index = 1
[01:02:41] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 25768.012000
[02:10:55] INFO: Checkpointed. Progress 11000 of 100000 steps complete CPU time 29670.910000
[03:16:55] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 33558.779000
[04:22:10] INFO: Checkpointed. Progress 13000 of 100000 steps complete CPU time 37392.173000
... etc ...

So it appears that the suspend/resume cycle pretty much doubled the CPU time per checkpoint step! This client runs under openSUSE 13.2 on an Opteron 6168. WU name: BETA_avx101118-096_r11_1_wcgfahb00300000_0. As a result it will almost certainly not make the deadline, but I will let it continue running.
[Sep 3, 2015 10:55:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Same behaviour under Windows 7 sp 1 running on an i7 2600K.
Checkpoints done every ~700 seconds and then every ~1400 seconds after the restart :
[19:43:59] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 6999.936471
[19:55:36] INFO: Checkpointed. Progress 11000 of 100000 steps complete CPU time 7685.607666
[20:07:49] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 8397.019027
[20:19:18] INFO: Checkpointed. Progress 13000 of 100000 steps complete CPU time 9084.031831
[20:30:45] INFO: Checkpointed. Progress 14000 of 100000 steps complete CPU time 9761.076171
[20:42:06] INFO: Checkpointed. Progress 15000 of 100000 steps complete CPU time 10435.000491
[20:53:42] INFO: Checkpointed. Progress 16000 of 100000 steps complete CPU time 11128.596537
[21:05:42] INFO: Checkpointed. Progress 17000 of 100000 steps complete CPU time 11844.329125
[23:03:58] INFO:Turning trickle messaging on.
[23:03:58] INFO:Turning intermediate uploads on.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.00480
agbnpf_assign_parameters(): info: attempting to load from SQL tables.
[23:29:10] INFO: Checkpointed. Progress 18000 of 100000 steps complete CPU time 13311.600202
[23:52:50] INFO: Checkpointed. Progress 19000 of 100000 steps complete CPU time 14719.197225
[00:16:10] INFO: Sending trickle message to server.
[00:16:10] INFO: Starting intermediate upload, index = 2
[00:16:10] INFO: Checkpointed. Progress 20000 of 100000 steps complete CPU time 16099.150871
[00:39:33] INFO: Checkpointed. Progress 21000 of 100000 steps complete CPU time 17482.411738
[01:03:05] INFO: Checkpointed. Progress 22000 of 100000 steps complete CPU time 18877.949883
[01:26:27] INFO: Checkpointed. Progress 23000 of 100000 steps complete CPU time 20272.146420
[01:50:06] INFO: Checkpointed. Progress 24000 of 100000 steps complete CPU time 21679.571842
[02:13:59] INFO: Checkpointed. Progress 25000 of 100000 steps complete CPU time 23102.300962


WU name : BETA_avx101118-060_r4_1_wcgfahb00300000_0
[Sep 3, 2015 7:46:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

The researchers have identified the problem with cpu time increasing. They have supplied us with a fix that we will be testing on alpha soon.

Thanks,
-Uplinger
[Sep 3, 2015 9:03:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1311
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Great to hear Uplinger, thanks for the news
----------------------------------------

[Sep 3, 2015 9:37:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1311
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

For the super micro-managers, though I don't think this overrides the "don't need, cache full". Hitting update while selecting WCG will 'request' work from WCG even though it's really not it's turn if you have more than one active project attached to the client:
<fetch_on_update></fetch_on_update>
When updating a project, request work even if not highest priority project. +New in 7.0.54
There were some bugged point releases that actually would fetch 1 unit at the time, again and again and again, but that's for the silly who want to over-commit their client(s).

Anyway if this works, please keep it a [public] secret. wink

Since this is a public secret can somebody remind me roughly where the fetch_on_update line goes please?
----------------------------------------

[Sep 3, 2015 9:46:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 179   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread