Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test for PC - September 18, 2015 [ Issues Thread ] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 172
|
Author |
|
Jason1478963
Senior Cruncher United States Joined: Sep 18, 2005 Post Count: 295 Status: Offline Project Badges: |
In this beta I'm seeing sets of "Temporarily failed download" messages before the downloads start flowing. Not seeing that for the uploads, nor for downloads of other work. And I've been seeing the same but with uploads, but I've also noticed the internet seeming to be terribly slow today. Some servers down perhaps? Ramjet-FX-8350 64845 World Community Grid 9/23/2015 6:06:37 PM Sending scheduler request: To send trickle-up message. 64844 World Community Grid 9/23/2015 6:04:48 PM Finished upload of BETA_FAHB_avx38783-ls_000049_0014_002_0_1 64843 World Community Grid 9/23/2015 6:04:36 PM Started upload of BETA_FAHB_avx38783-ls_000049_0014_002_0_1 64842 9/23/2015 6:01:01 PM Internet access OK - project servers may be temporarily down. 64841 9/23/2015 6:00:56 PM Project communication failed: attempting access to reference site 64840 World Community Grid 9/23/2015 6:00:52 PM Backing off 00:03:43 on upload of BETA_FAHB_avx38783-ls_000049_0014_002_0_1 64839 World Community Grid 9/23/2015 6:00:52 PM Temporarily failed upload of BETA_FAHB_avx38783-ls_000049_0014_002_0_1: transient HTTP error 64838 World Community Grid 9/23/2015 5:59:04 PM Scheduler request completed 64837 World Community Grid 9/23/2015 5:59:00 PM Finished upload of BETA_FAHB_avx38783-ls_000049_0014_002_0_11 and again: DoDecaCore2 734 World Community Grid 9/23/2015 4:53:25 PM Requesting new tasks for CPU 733 World Community Grid 9/23/2015 4:53:25 PM Sending scheduler request: To send trickle-up message. 732 9/23/2015 4:51:40 PM Internet access OK - project servers may be temporarily down. 731 World Community Grid 9/23/2015 4:51:33 PM Scheduler request failed: Timeout was reached 730 9/23/2015 4:51:33 PM Project communication failed: attempting access to reference site 729 World Community Grid 9/23/2015 4:46:56 PM Finished upload of BETA_FAHB_avx38783-ls_000069_0012_002_0_3 728 World Community Grid 9/23/2015 4:46:33 PM Finished upload of BETA_FAHB_avx38783-ls_000069_0012_002_0_13 727 World Community Grid 9/23/2015 4:46:23 PM Requesting new tasks for CPU 726 World Community Grid 9/23/2015 4:46:23 PM Sending scheduler request: To send trickle-up message. 725 World Community Grid 9/23/2015 4:46:23 PM Started upload of BETA_FAHB_avx38783-ls_000069_0012_002_0_13 724 World Community Grid 9/23/2015 4:46:23 PM Started upload of BETA_FAHB_avx38783-ls_000069_0012_002_0_3 723 World Community Grid 9/23/2015 3:23:23 PM Finished upload of BETA_FAHB_avx38783-ls_000085_0000_002_0_4 722 World Community Grid 9/23/2015 3:23:19 PM Finished upload of BETA_FAHB_avx38783-ls_000085_0000_002_0_14 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Kremmen, can you show me the log of the file you're having issues with? Reason I ask is there may be an issue with the CDN, then you are downloading directly from our site. Thanks, -Uplinger Sure. here's one: 24-Sep-2015 00:29:26 [World Community Grid] Scheduler request succeeded: got 1 new tasks 24-Sep-2015 00:29:28 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-agbnp2.param 24-Sep-2015 00:29:28 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-paramstd.dat 24-Sep-2015 00:29:28 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-restangl.dat 24-Sep-2015 00:29:30 [---] Project communication failed: attempting access to reference site 24-Sep-2015 00:29:30 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls-agbnp2.param: HTTP error 24-Sep-2015 00:29:30 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls-paramstd.dat: HTTP error 24-Sep-2015 00:29:30 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls-restangl.dat: HTTP error 24-Sep-2015 00:29:30 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-restdist.dat 24-Sep-2015 00:29:30 [World Community Grid] Started download of beta21.FAHB_avx17556-ls_000049-in1.dms 24-Sep-2015 00:29:30 [World Community Grid] Started download of beta21.FAHB_avx17556-ls_000049-in2.dms 24-Sep-2015 00:29:32 [---] Internet access OK - project servers may be temporarily down. 24-Sep-2015 00:29:32 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls-restdist.dat: HTTP error 24-Sep-2015 00:29:32 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls_000049-in1.dms: HTTP error 24-Sep-2015 00:29:32 [World Community Grid] Temporarily failed download of beta21.FAHB_avx17556-ls_000049-in2.dms: HTTP error 24-Sep-2015 00:29:32 [World Community Grid] Started download of f903a014a48d949f079af9b79d60d993.rst 24-Sep-2015 00:29:32 [World Community Grid] Started download of de4f7fe302bb0549eb0031504736ce3a.inp 24-Sep-2015 00:29:33 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-agbnp2.param 24-Sep-2015 00:29:37 [World Community Grid] Finished download of beta21.FAHB_avx17556-ls-agbnp2.param 24-Sep-2015 00:29:37 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-paramstd.dat 24-Sep-2015 00:29:37 [World Community Grid] Finished download of de4f7fe302bb0549eb0031504736ce3a.inp 24-Sep-2015 00:29:38 [World Community Grid] Started download of beta21.FAHB_avx17556-ls-restangl.dat 24-Sep-2015 00:29:38 [World Community Grid] Finished download of f903a014a48d949f079af9b79d60d993.rst 24-Sep-2015 00:29:39 [World Community Grid] Finished download of beta21.FAHB_avx17556-ls-paramstd.dat 24-Sep-2015 00:29:39 [World Community Grid] Finished download of beta21.FAHB_avx17556-ls-restangl.dat [ ... everything continues fine ... ] |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
Starting to see this error on all 6 machines.
----------------------------------------85993 World Community Grid 9/24/2015 2:17:00 PM [error] handle_trickle_down failed: unexpected null pointer Also seeing some ls tasks completing in about 2:30. Log shows only 19,000 of 100,000 steps completed. Are these some that were only partially completed by another cruncher and sent out to me to finish or is there something else going on? Result Log Result Name: BETA_ FAHB_ avx17556-ls_ 000027_ 0002_ 001_ 0-- <core_client_version>7.4.36</core_client_version> <![CDATA[ <stderr_txt> [13:14:53] INFO:Turning trickle messaging on. [13:14:53] INFO:Turning intermediate uploads on. %IMPACT-I: Requested file to open for appending md.out Does not exist. Opening it as a new file. %IMPACT-I: Softcore binding energy with umax = 1000.00000 %IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic Non-Polar Hydration Model %IMPACT-I: Hybrid potential for binding with lambda = 1.00000 agbnpf_assign_parameters(): info: attempting to load from SQL tables. [13:22:08] INFO: Checkpointed. Progress 1000 of 100000 steps complete CPU time 434.915188 [13:29:15] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 861.702724 [13:36:19] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 1284.465434 [13:43:18] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 1703.468520 [13:50:21] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 2126.246830 [13:57:21] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 2545.983120 [14:04:23] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 2967.965825 [14:11:27] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 3392.070144 [14:18:28] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 3813.974848 [14:25:31] INFO: Sending trickle message to server. [14:25:31] INFO: Starting intermediate upload, index = 1 [14:25:31] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 4235.941953 [14:32:27] INFO: Checkpointed. Progress 11000 of 100000 steps complete CPU time 4651.669018 [14:39:27] INFO: Checkpointed. Progress 12000 of 100000 steps complete CPU time 5071.311708 [14:46:30] INFO: Checkpointed. Progress 13000 of 100000 steps complete CPU time 5493.668816 [14:53:34] INFO: Checkpointed. Progress 14000 of 100000 steps complete CPU time 5917.663934 [15:00:37] INFO: Checkpointed. Progress 15000 of 100000 steps complete CPU time 6340.567044 [15:07:38] INFO: Checkpointed. Progress 16000 of 100000 steps complete CPU time 6761.504543 [15:14:40] INFO: Checkpointed. Progress 17000 of 100000 steps complete CPU time 7182.301640 [15:21:43] INFO: Checkpointed. Progress 18000 of 100000 steps complete CPU time 7604.877149 [15:28:45] INFO: received message from server to exit after next major checkpoint. [15:28:45] INFO: Checkpointed. Progress 19000 of 100000 steps complete CPU time 8027.374657 [15:35:48] INFO: Exit:<current_step>20000</current_step> <total_steps>100000</total_steps> 15:35:48 (3416): called boinc_finish(0) </stderr_txt> ]]>
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------[Edit 2 times, last edit by nanoprobe at Sep 24, 2015 7:49:29 PM] |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
nanoprobe,
1. You received a 'soft' stop from the server. A message that came on a trickle message from server to your client. In the logs you see: [15:28:45] INFO: received message from server to exit after next major checkpoint. This is telling your machine to finish that current checkpoint and return, you haven't reached the deadline yet, but your computer was determined not to have completed enough of the task before a certain time. Right now, soft stops are sent when within 24 hours of the deadline. This is a setting we will be tweaking going forward when we get more data back during production. But it basically works like this, if you aren't 70% done, then we want you to stop so we can send the rest to someone else. This is due to the fact that, we get a majority of results back within the first 24 hours (over 50%), so by sending it when you get to the next checkpoint, means there was no wasted CPU time on your machine, but we are going to schedule it to another host with the results you send back. As for the [error] handle_trickle_down failed: unexpected null pointer I will need to make sure I'm not sending an invalid xml to the client, I don't think I am since it is properly stopping. My guess is that you got a message for a result that you already completed. What this means is that if I schedule a message to be sent to you to soft stop, and you report the result on the same scheduler request, your client would not see this properly, null pointer because the result does not exist on your client anymore. Just a speculation, could you paste in the sched_reply_www.worldcommunitygrid.org.xml here for us to examine? or you can email it to support@worldcommunitygrid.org as that makes it to me as well.Thanks, -Uplinger |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
nanoprobe, 1. You received a 'soft' stop from the server. A message that came on a trickle message from server to your client. In the logs you see: [15:28:45] INFO: received message from server to exit after next major checkpoint. This is telling your machine to finish that current checkpoint and return, you haven't reached the deadline yet, but your computer was determined not to have completed enough of the task before a certain time. Right now, soft stops are sent when within 24 hours of the deadline. This is a setting we will be tweaking going forward when we get more data back during production. But it basically works like this, if you aren't 70% done, then we want you to stop so we can send the rest to someone else. This is due to the fact that, we get a majority of results back within the first 24 hours (over 50%), so by sending it when you get to the next checkpoint, means there was no wasted CPU time on your machine, but we are going to schedule it to another host with the results you send back. As for the [error] handle_trickle_down failed: unexpected null pointer I will need to make sure I'm not sending an invalid xml to the client, I don't think I am since it is properly stopping. My guess is that you got a message for a result that you already completed. What this means is that if I schedule a message to be sent to you to soft stop, and you report the result on the same scheduler request, your client would not see this properly, null pointer because the result does not exist on your client anymore. Just a speculation, could you paste in the sched_reply_www.worldcommunitygrid.org.xml here for us to examine? or you can email it to support@worldcommunitygrid.org as that makes it to me as well.Thanks, -Uplinger Keith, I emailed the xml file you requested. I hope you can adjust the soft stops soon. The longest any of these betas have run on my machines is 16 hours with a majority of them finishing in 11-13 hours. Thanks for the explanation.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am still sitting patiently, hoping to get at least 1 beta task on either of my 2 computers.
|
||
|
NorthernRaider
Cruncher Canada Joined: Dec 10, 2008 Post Count: 12 Status: Offline Project Badges: |
Hi finally got some Beta's with the first one with this result : This was a soft stop too !
----------------------------------------This Beta was received on the BETA_ FAHB_ avx17556-ls_ 000023_ 0002_ 001_ 0-- Jabbah Invalid 9/21/15 19:15:55 9/24/15 22:39:36 2.69 / 2.72 70.3 / 0.0 Workunit Status Project Name: Beta Test Created: 09/21/2015 19:15:19 Name: BETA_FAHB_avx17556-ls_000023_0002_001 Minimum Quorum: 1 Replication: 2 Result Log Result Name: BETA_ FAHB_ avx17556-ls_ 000023_ 0002_ 001_ 0-- <core_client_version>7.4.23</core_client_version> <![CDATA[ <stderr_txt> [10:03:53] INFO:Turning trickle messaging on. [10:03:53] INFO:Turning intermediate uploads on. %IMPACT-I: Requested file to open for appending md.out Does not exist. Opening it as a new file. %IMPACT-I: Softcore binding energy with umax = 1000.00000 %IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic Non-Polar Hydration Model %IMPACT-I: Hybrid potential for binding with lambda = 1.00000 agbnpf_assign_parameters(): info: attempting to load from SQL tables. [10:31:26] INFO: Checkpointed. Progress 1000 of 100000 steps complete CPU time 976.208000 [10:58:38] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 1940.408000 [11:25:51] INFO: received message from server to exit after next major checkpoint. [11:25:51] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 2909.396000 [11:53:06] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 3875.304000 [12:20:29] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 4851.572000 [12:47:38] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 5816.272000 [13:14:55] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 6785.144000 [13:42:15] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 7753.736000 [14:09:05] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 8705.128000 [14:36:03] INFO: Exit:<current_step>10000</current_step> <total_steps>100000</total_steps> 14:36:03 (22315): called boinc_finish(0) </stderr_txt> ]]> How Come this was not run on the 21st when it was sent , because Beta's should have the highest priority and a lot of work on OETI was done since then. I have two more of these that I kicked off now. Is there a setting that we can do in the client so that BETA is Priority ONE. Thanks for the explanation in advance. TJ |
||
|
NorthernRaider
Cruncher Canada Joined: Dec 10, 2008 Post Count: 12 Status: Offline Project Badges: |
It does not make a whole lot of sense. Right after the Unit was sent back, it created two new for others to complete.
----------------------------------------Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit BETA_ FAHB_ avx17556-ls_ 000023_ 0002_ 001_ 2-- - In Progress 9/24/15 22:43:19 9/26/15 08:19:18 0.00 0.0 / 0.0 BETA_ FAHB_ avx17556-ls_ 000023_ 0002_ 001_ 1-- - In Progress 9/24/15 22:40:50 9/26/15 08:16:49 0.00 0.0 / 0.0 BETA_ FAHB_ avx17556-ls_ 000023_ 0002_ 001_ 0-- 714 Invalid 9/21/15 19:15:55 9/24/15 22:39:36 2.69 70.3 / 0.0 |
||
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8976 Status: Offline Project Badges: |
15 Total (All were manually activated)
----------------------------------------2 In Progress 12 Valid BETA_ FAHB_ avx17556-ls_ 000037_ 0012_ 002_ 0-- ZS In Progress 09/24/2015 18:08:40 09/28/2015 18:08:40 0.00 / 0.00 0.0Â /Â 0.0 BETA_ FAHB_ avx17556-ls_ 000004_ 0012_ 004_ 0-- ZS In Progress 09/24/2015 18:08:40 09/28/2015 18:08:40 0.00 / 0.00 0.0Â /Â 0.0 BETA_ FAHB_ avx17556-ls_ 000039_ 0015_ 002_ 0-- TM Valid 09/24/2015 02:08:41 09/24/2015 18:55:24 14.21 / 14.41 396.8Â /Â 396.8 BETA_ FAHB_ avx17556-ls_ 000041_ 0010_ 003_ 0-- TM Valid 09/24/2015 02:08:41 09/24/2015 18:55:24 14.14 / 14.34 394.8Â /Â 394.8 BETA_ FAHB_ avx17556-ls_ 000017_ 0006_ 003_ 0-- TM Valid 09/24/2015 02:08:41 09/24/2015 18:55:24 14.13 / 14.32 394.4Â /Â 394.4 BETA_ FAHB_ avx38783-ls_ 000036_ 0015_ 001_ 2-- TM Valid 09/23/2015 16:50:27 09/24/2015 09:35:11 14.43 / 14.62 402.7Â /Â 402.7 BETA_ FAHB_ avx38783-ls_ 000044_ 0005_ 003_ 1-- TM Valid 09/23/2015 15:32:46 09/24/2015 09:35:11 14.31 / 14.47 398.6Â /Â 398.6 BETA_ FAHB_ avx38783-ls_ 000040_ 0005_ 002_ 1-- TM Valid 09/23/2015 15:32:46 09/24/2015 09:35:11 14.39 / 14.51 399.6Â /Â 399.6 BETA_ FAHB_ avx38783-ls_ 000048_ 0007_ 003_ 1-- TM Valid 09/23/2015 15:32:46 09/24/2015 10:50:54 14.53 / 14.70 405.0Â /Â 405.0 BETA_ FAHB_ avx38783-ls_ 000042_ 0013_ 002_ 1-- TM Valid 09/23/2015 15:32:46 09/24/2015 09:35:11 14.45 / 14.62 402.8Â /Â 402.8 BETA_ FAHB_ avx38783_ 000054_ 0005_ 002_ 0-- CP Valid 09/23/2015 01:27:13 09/24/2015 06:52:06 19.04 / 24.81 523.2Â /Â 523.2 BETA_ FAHB_ avx17556_ 000087_ 0018_ 002_ 0-- NS Valid 09/22/2015 19:10:08 09/24/2015 06:45:06 25.42 / 25.69 435.2Â /Â 435.2 BETA_ FAHB_ avx17556_ 000092_ 0017_ 002_ 0-- LG Valid 09/22/2015 03:42:23 09/24/2015 07:18:05 24.73 / 24.78 422.0Â /Â 422.0 BETA_ FAHB_ avx17556_ 000096_ 0002_ 002_ 0-- LG Valid 09/22/2015 03:42:23 09/24/2015 09:52:02 27.29 / 27.35 422.0Â /Â 422.0 |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1258 Status: Recently Active Project Badges: |
Could somebody please kindly remind me what log flag it is used to display following information
----------------------------------------[13:29:15] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 861.702724 [13:36:19] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 1284.465434 [13:43:18] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 1703.468520 [13:50:21] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 2126.246830 [13:57:21] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 2545.983120 [14:04:23] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 2967.965825 [14:11:27] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 3392.070144 [14:18:28] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 3813.974848 |
||
|
|