Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test for PC - July 29, 2015 [ Issues Thread ] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 146
|
Author |
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Just an update, I am seeing successful trickle messages being sent to the server as well as intermediate uploads happening. From our testing and what I'm seeing now, uploads are about 100KB each for a total of about 1MB for the entire work unit.
Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
What happens with tasks [in production] that have sent e.g. 9 trickles/intermediate uploads and then crash. Is the repair going to do the whole, or will the repair pick up where the other left off? [Think not, but ask the question anyway].
----------------------------------------[Edit 1 times, last edit by Former Member at Jul 29, 2015 10:51:53 PM] |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Sek, The goal here is that results will be truly zero redundancy. And then from that stand point, we will be doing a few things on the backend to both speed up the end results for the researchers as well as make the results run more efficiently.
So, the simulation you are doing is 100k steps. The total steps to complete a given job from the researchers is 2million. So the results from one result will go into the subsequent result. The intermediate uploads are able to act as a way to restart a work unit from those points. Thus we will be validating the work unit as it goes along. Also, instead of waiting for an entire batch to be completed before we send the results back or start the next stage of a job, the next work unit will be generated automatically by the results that are there. As you can see this has a level of complexity that we have not attempted in the past, and we will be attempting to have these tests done within beta soon. In the scenario you provided, the next work unit will be sent out and start from say step 90,000 and attempt to do the next 100k steps. Still having the same 10% checkpoint. I know, this probably raises more questions than it answers, but more complexity requires more testing :) Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
OK, so nothing is lost on a partway job since the last good step is used to kick off the next 100K block i.e. like CEP2 a partial computation will still turn valid for purpose, with the difference that the CEP2 scientists don't need the 'skipped' parts, whereas with BEDAM you need all shackles in the chain.
----------------------------------------That's a pretty long concatenation you're contemplating. Hope this works out with the non-homogeneous fleet of computers working all together on one problem. [Edit 2 times, last edit by Former Member at Jul 29, 2015 11:19:20 PM] |
||
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges: |
I think I am seeing odd numbers for cpu/clock time
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's a fix for that in the 7.7 development client. See this off and on with various sciences, but only notice if efficiency is very high such as with this one, you get the 'perfect' 100%. ;D
----------------------------------------Et Alia, for those interested, you can set a log flag in the cc_config.xml to see the detail trickles in the event log <trickle_debug>1</trickle_debug> to go with 12466 World Community Grid 7/30/2015 1:00:14 AM Sending scheduler request: To send trickle-up message. Not set it here, just know from the CPDN days that one was written about every 6 hours, so not really a log inundation. [Edit 1 times, last edit by Former Member at Jul 29, 2015 11:41:52 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
Snagged 5 of them, all on one machine. This will be interesting as that is an 8 core server which is only running off of a 16gb flash drive.
----------------------------------------Addendum: 20% after 8 hours 1.86ghz Xeon 5320 Linux Mint 17. Definitely big jobs. Running smoothly so far. Rats !!! All 5 errored out. The relevant part was: process exited with code 38 (0x26, -218)... .... forrtl: No space left on device forrtl: severe (38): error during write, unit 2, file /var/lib/boinc-client/slots/4/md.out Image PC Routine Line Source Each of the WU's errored, but I think this was the cause which made the others error. I'll post all of the stderr_txt files if needed. Further update July 31, 2015: After about 9 months of crunching MCM1 on the flash drive with no problems, I think what ever BEDAM is doing may have finally fried the flash drive. It may have been coincidence that it happened at the same time as the beta test, but at any rate, the flash drive has been toasted. I have tried everything I can think of to resurrect it and no luck. It remains impossible to write to it any longer. Another US $6.00 down the drain (sarcasm). I will install another drive and maybe get in on the next beta. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 3 times, last edit by Sgt.Joe at Jul 31, 2015 4:41:29 PM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Two tasks running fine in the Windows 8.1 machine. Close to 35 % in 7.5 hours.
----------------------------------------Checkpointing and uploading quietly as described by uplinger. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Longest running on desktop 11:25 hours at 51% and 11:41 on laptop at 38.2%... with that, BOINCTasks logged 38 checkpoints, just as prescribed. Unfortunately, the standard trickle message does not tell against which task the progress is made, so tried the debug function.
7/30/2015 5:39:01 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-005_r5_1fn_1_1438227471.xml Just what I wanted to know, which task is doing a progress report. But actually, not needed as just before or after there are the uploads either starting or finishing. 7/30/2015 7:26:18 AM Started upload of BETA_avx101118-005_r5_1eg_1_3 7/30/2015 7:26:18 AM Started upload of BETA_avx101118-005_r5_1eg_1_13 7/30/2015 7:26:18 AM [sched_op] Starting scheduler request 7/30/2015 7:26:18 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-005_r5_1eg_1_1438233977.xml 7/30/2015 7:26:18 AM Sending scheduler request: To send trickle-up message. |
||
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges: |
I've got 7 beta wu. 6 on different i7-980@3,3GHz and 1 on my amd 5350@2GHz.
----------------------------------------Runtime/Progress: i7: 11:40hrs and 44%; amd: 11:40hrs and 27% [Edit 1 times, last edit by Eric_Kaiser at Jul 30, 2015 7:19:25 AM] |
||
|
|