Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test for PC - September 18, 2015 [ Issues Thread ] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 172
|
Author |
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
Got an example of any result log that has a validation statement in the result log [it being the output from the client science app]? Don't know what you're looking for but here is what I'm looking at. BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Miner3 Valid 9/24/15 22:58:09 9/26/15 02:35:08 8.20 / 0.00 272.0 / 272.0 Result Log Result Name: BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Close Return to Top This particular task went out 5 times. 3 were returned as invalid. Mine was returned as valid and 1 more is listed as in progress. It seems like all 3 of the invalid tasks never got past step 10k out of 100k. They all had the INFO: received message from server to exit after next major checkpoint flag in their result log. That's why I asked why my result log had nada. Judging by my run time I didn't finish this task either. I'm guessing that it exited at around 60k or 70k of 100k completed. I also had one of these, but really did not give it a second thought as long as it was valid. It has aged out of my results, but I think the time was around 12 hours whereas the other betas have all gone about 20 to 30 hours. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
Got an example of any result log that has a validation statement in the result log [it being the output from the client science app]? Don't know what you're looking for but here is what I'm looking at. BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Miner3 Valid 9/24/15 22:58:09 9/26/15 02:35:08 8.20 / 0.00 272.0 / 272.0 Result Log Result Name: BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Close Return to Top This particular task went out 5 times. 3 were returned as invalid. Mine was returned as valid and 1 more is listed as in progress. It seems like all 3 of the invalid tasks never got past step 10k out of 100k. They all had the INFO: received message from server to exit after next major checkpoint flag in their result log. That's why I asked why my result log had nada. Judging by my run time I didn't finish this task either. I'm guessing that it exited at around 60k or 70k of 100k completed. I also had one of these, but really did not give it a second thought as long as it was valid. It has aged out of my results, but I think the time was around 12 hours whereas the other betas have all gone about 20 to 30 hours. Cheers FWIW I did a little extra digging and found something interesting. It seems that every resent task that I received that had been previously returned as invalid and I returned as valid has nothing written to my result log. Judging by the amount of run time I logged there is no way that I completed any of those tasks either but with an empty result log I have no way to see how much of that resent task I actually finished.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you are into wasting time, then intervene and set your client to go offline I tried that on one and it didn't work out well. Even though the results appeared to be uploaded fine and within the time limit, the return time listed and credit are totally wrong. (Actual run time was about 95 hours, but CPU/Elapsed time are listed as 48.58/0.00.) 27-Sep-2015 10:17:53 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_0 27-Sep-2015 10:18:00 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_20 27-Sep-2015 10:18:01 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_10 [Edit 1 times, last edit by Former Member at Sep 27, 2015 3:11:15 AM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Because of the short deadlines for the resends, most resends are forced to stop early.
----------------------------------------When the task starts immediately after receive, the soft stop will be send after ~9.6 hours run time. Result Name: BETA_ FAHB_ avx38783-ls_ 000014_ 0015_ 002_ 1-- <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> [18:32:34] INFO:Turning trickle messaging on. [18:32:34] INFO:Turning intermediate uploads on. %IMPACT-I: Requested file to open for appending md.out Does not exist. Opening it as a new file. %IMPACT-I: Softcore binding energy with umax = 1000.00000 %IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic Non-Polar Hydration Model %IMPACT-I: Hybrid potential for binding with lambda = 0.85000 agbnpf_assign_parameters(): info: attempting to load from SQL tables. [18:46:27] INFO: Checkpointed. Progress 1000 of 100000 steps complete CPU time 831.875333 [19:00:07] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 1653.033396 [19:13:45] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 2469.823432 [19:27:32] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 3297.549138 [19:41:09] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 4114.463975 [19:54:52] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 4936.714045 [20:08:35] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 5760.087323 [20:22:18] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 6582.352994 [20:36:05] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 7409.704298 [20:49:44] INFO: Sending trickle message to server. [20:49:44] INFO: Starting intermediate upload, index = 1 [20:49:44] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 8227.430340 [21:03:18] INFO: Checkpointed. Progress 11000 of 100000 steps complete CPU time 9042.067562 [21:16:52] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 9855.690777 [21:30:41] INFO: Checkpointed. Progress 13000 of 100000 steps complete CPU time 10680.546065 [21:44:18] INFO: Checkpointed. Progress 14000 of 100000 steps complete CPU time 11498.318907 [21:57:57] INFO: Checkpointed. Progress 15000 of 100000 steps complete CPU time 12316.216550 [22:11:37] INFO: Checkpointed. Progress 16000 of 100000 steps complete CPU time 13136.220206 [22:25:17] INFO: Checkpointed. Progress 17000 of 100000 steps complete CPU time 13955.163056 [22:38:56] INFO: Checkpointed. Progress 18000 of 100000 steps complete CPU time 14773.559902 [22:52:35] INFO: Checkpointed. Progress 19000 of 100000 steps complete CPU time 15593.064355 [23:06:15] INFO: Sending trickle message to server. [23:06:15] INFO: Starting intermediate upload, index = 2 [23:06:15] INFO: Checkpointed. Progress 20000 of 100000 steps complete CPU time 16412.116405 [23:20:06] INFO: Checkpointed. Progress 21000 of 100000 steps complete CPU time 17231.823660 [23:33:41] INFO: Checkpointed. Progress 22000 of 100000 steps complete CPU time 18044.822871 [23:47:12] INFO: Checkpointed. Progress 23000 of 100000 steps complete CPU time 18855.762869 [00:00:47] INFO: Checkpointed. Progress 24000 of 100000 steps complete CPU time 19669.916488 [00:14:25] INFO: Checkpointed. Progress 25000 of 100000 steps complete CPU time 20487.829731 [00:28:06] INFO: Checkpointed. Progress 26000 of 100000 steps complete CPU time 21309.050196 [00:41:41] INFO: Checkpointed. Progress 27000 of 100000 steps complete CPU time 22123.687418 [00:55:23] INFO: Checkpointed. Progress 28000 of 100000 steps complete CPU time 22946.077889 [01:09:04] INFO: Checkpointed. Progress 29000 of 100000 steps complete CPU time 23766.377948 [01:22:44] INFO: Sending trickle message to server. [01:22:44] INFO: Starting intermediate upload, index = 3 [01:22:44] INFO: Checkpointed. Progress 30000 of 100000 steps complete CPU time 24586.412804 [01:36:20] INFO: Checkpointed. Progress 31000 of 100000 steps complete CPU time 25403.249640 [01:50:03] INFO: Checkpointed. Progress 32000 of 100000 steps complete CPU time 26224.392104 [02:03:44] INFO: Checkpointed. Progress 33000 of 100000 steps complete CPU time 27046.657775 [02:17:18] INFO: Checkpointed. Progress 34000 of 100000 steps complete CPU time 27860.202990 [02:30:57] INFO: Checkpointed. Progress 35000 of 100000 steps complete CPU time 28678.116233 [02:44:30] INFO: Checkpointed. Progress 36000 of 100000 steps complete CPU time 29491.739448 [02:58:13] INFO: Checkpointed. Progress 37000 of 100000 steps complete CPU time 30315.050326 [03:11:54] INFO: Checkpointed. Progress 38000 of 100000 steps complete CPU time 31135.038382 [03:25:35] INFO: Checkpointed. Progress 39000 of 100000 steps complete CPU time 31956.180846 [03:39:12] INFO: Sending trickle message to server. [03:39:12] INFO: Starting intermediate upload, index = 4 [03:39:12] INFO: Checkpointed. Progress 40000 of 100000 steps complete CPU time 32773.064482 [03:52:50] INFO: Checkpointed. Progress 41000 of 100000 steps complete CPU time 33590.962125 [04:06:43] INFO: Checkpointed. Progress 42000 of 100000 steps complete CPU time 34422.369455 [04:20:20] INFO: received message from server to exit after next major checkpoint. [04:20:20] INFO: Checkpointed. Progress 43000 of 100000 steps complete CPU time 35240.204697 [04:33:56] INFO: Checkpointed. Progress 44000 of 100000 steps complete CPU time 36055.996327 [04:47:38] INFO: Checkpointed. Progress 45000 of 100000 steps complete CPU time 36878.090397 [05:01:15] INFO: Checkpointed. Progress 46000 of 100000 steps complete CPU time 37693.585624 [05:14:54] INFO: Checkpointed. Progress 47000 of 100000 steps complete CPU time 38512.450473 [05:28:38] INFO: Checkpointed. Progress 48000 of 100000 steps complete CPU time 39336.946958 [05:42:27] INFO: Checkpointed. Progress 49000 of 100000 steps complete CPU time 40166.045473 [05:56:35] INFO: Exit:<current_step>50000</current_step> <total_steps>100000</total_steps> 05:56:35 (5992): called boinc_finish(0) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Because of the short deadlines for the resends, most resends are forced to stop early. When the task starts immediately after receive, the soft stop will be send after ~9.6 hours run time. Yes, I saw this too on a resend from a "no reply". It looks a little silly but (a) I'd assume that production tasks will have longer deadlines and (b) it's a good way to test re-building work units "in the middle". It would be interesting to understand where the ~9.6hrs comes from though. In my case it was 9.45hrs CPU, 10.57 elapsed, so it's pretty consistent. I wondered if the servers waited until they'd got enough data from a client to be able to estimate when the WU might finish and so to understand if it might be quicker to send it on elsewhere or not, but it seems a bit complicated. From what's been said already it appears it's simpler to just chop it after a "reasonable" period and let everyone move on. I also wondered if any account is/will be taken of whether a machine is "reliable" or not. Or even if the definition of "reliable" could/should be amended for this project. For example, is the machine normally left on over a weekend? |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
If you are into wasting time, then intervene and set your client to go offline I tried that on one and it didn't work out well. Even though the results appeared to be uploaded fine and within the time limit, the return time listed and credit are totally wrong. (Actual run time was about 95 hours, but CPU/Elapsed time are listed as 48.58/0.00.) 27-Sep-2015 10:17:53 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_0 27-Sep-2015 10:18:00 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_20 27-Sep-2015 10:18:01 [World Community Grid] Finished upload of BETA_FAHB_avx1755 6_000044_0019_002_0_10 Highly predictable this would happen... see the first, now emphasized part, of my comment :P My take is that 'reliable' is a concept out the window, absolute zero redundant computing, as much as 'I must and shall finish the work unit in whole'. How WCG/scientists sow the bits together is truly not my concern. Whatever means they find to arrive at the final step in the fastest manner. And a so-called 'GUI aborted' task won't get anyone anywhere in completed tasks rankings [The concern was expressed]. I have aborted a running Beta task and it most definitely got an error code. [@Techs, set the counter to 5 events, and reduce the distribution to 1 per host [minimum work assignment], should such smartypants mosey in [what a waste of time if you'd have to spend effort on this. These are definitively onto the wrong fork of volunteering ] [Edit 1 times, last edit by SekeRob* at Sep 27, 2015 12:03:01 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
Off topic:
----------------------------------------How WCG/scientists sow the bits together is truly not my concern This such a good sentence on so many levels, a triple threat punner. And a so-called 'GUI aborted' task won't get anyone anywhere in completed tasks rankings [The concern was expressed]. I have aborted a running Beta task and it most definitely got an error code. [@Techs, set the counter to 5 events, and reduce the distribution to 1 per host [minimum work assignment], should such smartypants mosey in [what a waste of time if you'd have to spend effort on this. These are definitively onto the wrong fork of volunteering shame on you] Right on Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Because of the short deadlines for the resends, most resends are forced to stop early. When the task starts immediately after receive, the soft stop will be send after ~9.6 hours run time. Yes, I saw this too on a resend from a "no reply". It looks a little silly but (a) I'd assume that production tasks will have longer deadlines and (b) it's a good way to test re-building work units "in the middle". It would be interesting to understand where the ~9.6hrs comes from though. In my case it was 9.45hrs CPU, 10.57 elapsed, so it's pretty consistent. That's rather easy to understand. A resend has a deadline of 33.6 hours. So after 9.6 hours run time 24 hours left. 24 hours left and not yet 70% returned means: soft stop for that task. |
||
|
vepaul
Senior Cruncher Belgium Joined: Nov 17, 2004 Post Count: 261 Status: Offline Project Badges: |
Here are mine
----------------------------------------BETA_ FAHB_ avx38783-ls_ 000077_ 0006_ 003_ 0-- Bureau2-HP Valide 26/09/15 14:40:47 27/09/15 10:38:51 13,76 / 17,08 378,8 / 378,8 BETA_ FAHB_ avx38783-ls_ 000066_ 0001_ 004_ 0-- paul-HP2 En cours 26/09/15 14:38:10 30/09/15 14:38:10 7,47 / 0,00 233,1 / 0,0 BETA_ FAHB_ avx17556-ls_ 000006_ 0006_ 004_ 0-- paul-HP2 En cours 26/09/15 14:30:12 30/09/15 14:30:12 8,46 / 0,00 272,0 / 0,0 BETA_ FAHB_ avx17556_ 000050_ 0008_ 005_ 0-- paul-HP2 En cours 26/09/15 14:28:03 30/09/15 14:28:03 8,39 / 0,00 272,0 / 0,0 BETA_ FAHB_ avx17556_ 000047_ 0004_ 002_ 1-- Bureau2-HP Valide 26/09/15 13:12:29 27/09/15 00:09:27 6,88 / 8,78 182,1 / 182,1 How can WUs that are running already have their time stated? Paul [Edit 2 times, last edit by vep at Sep 27, 2015 3:39:05 PM] |
||
|
KWSN-A Shrubbery
Senior Cruncher Joined: Jan 8, 2006 Post Count: 476 Status: Offline Project Badges: |
Whenever it sends a trickle, it updates the status.
---------------------------------------- |
||
|
|