Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 146
Posts: 146   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 128270 times and has 145 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

Just an update, I am seeing successful trickle messages being sent to the server as well as intermediate uploads happening. From our testing and what I'm seeing now, uploads are about 100KB each for a total of about 1MB for the entire work unit.

Thanks,
-Uplinger
[Jul 29, 2015 10:45:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

What happens with tasks [in production] that have sent e.g. 9 trickles/intermediate uploads and then crash. Is the repair going to do the whole, or will the repair pick up where the other left off? [Think not, but ask the question anyway].
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 29, 2015 10:51:53 PM]
[Jul 29, 2015 10:51:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

Sek, The goal here is that results will be truly zero redundancy. And then from that stand point, we will be doing a few things on the backend to both speed up the end results for the researchers as well as make the results run more efficiently.

So, the simulation you are doing is 100k steps. The total steps to complete a given job from the researchers is 2million. So the results from one result will go into the subsequent result. The intermediate uploads are able to act as a way to restart a work unit from those points. Thus we will be validating the work unit as it goes along.

Also, instead of waiting for an entire batch to be completed before we send the results back or start the next stage of a job, the next work unit will be generated automatically by the results that are there.

As you can see this has a level of complexity that we have not attempted in the past, and we will be attempting to have these tests done within beta soon.

In the scenario you provided, the next work unit will be sent out and start from say step 90,000 and attempt to do the next 100k steps. Still having the same 10% checkpoint.

I know, this probably raises more questions than it answers, but more complexity requires more testing :)

Thanks,
-Uplinger
[Jul 29, 2015 11:03:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

OK, so nothing is lost on a partway job since the last good step is used to kick off the next 100K block i.e. like CEP2 a partial computation will still turn valid for purpose, with the difference that the CEP2 scientists don't need the 'skipped' parts, whereas with BEDAM you need all shackles in the chain.

That's a pretty long concatenation you're contemplating. Hope this works out with the non-homogeneous fleet of computers working all together on one problem.
----------------------------------------
[Edit 2 times, last edit by Former Member at Jul 29, 2015 11:19:20 PM]
[Jul 29, 2015 11:10:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
OldChap
Veteran Cruncher
UK
Joined: Jun 5, 2009
Post Count: 978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

I think I am seeing odd numbers for cpu/clock time


----------------------------------------

[Jul 29, 2015 11:24:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

There's a fix for that in the 7.7 development client. See this off and on with various sciences, but only notice if efficiency is very high such as with this one, you get the 'perfect' 100%. ;D

Et Alia, for those interested, you can set a log flag in the cc_config.xml to see the detail trickles in the event log

<trickle_debug>1</trickle_debug>

to go with

12466 World Community Grid 7/30/2015 1:00:14 AM Sending scheduler request: To send trickle-up message.

Not set it here, just know from the CPDN days that one was written about every 6 hours, so not really a log inundation.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 29, 2015 11:41:52 PM]
[Jul 29, 2015 11:34:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

Snagged 5 of them, all on one machine. This will be interesting as that is an 8 core server which is only running off of a 16gb flash drive.
Addendum: 20% after 8 hours 1.86ghz Xeon 5320 Linux Mint 17. Definitely big jobs. Running smoothly so far.

Rats !!! All 5 errored out. The relevant part was:

process exited with code 38 (0x26, -218)...
....
forrtl: No space left on device
forrtl: severe (38): error during write, unit 2, file /var/lib/boinc-client/slots/4/md.out
Image PC Routine Line Source

Each of the WU's errored, but I think this was the cause which made the others error.

I'll post all of the stderr_txt files if needed.

Further update July 31, 2015: After about 9 months of crunching MCM1 on the flash drive with no problems, I think what ever BEDAM is doing may have finally fried the flash drive. It may have been coincidence that it happened at the same time as the beta test, but at any rate, the flash drive has been toasted. I have tried everything I can think of to resurrect it and no luck. It remains impossible to write to it any longer. Another US $6.00 down the drain (sarcasm). I will install another drive and maybe get in on the next beta.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 3 times, last edit by Sgt.Joe at Jul 31, 2015 4:41:29 PM]
[Jul 30, 2015 2:17:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

Two tasks running fine in the Windows 8.1 machine. Close to 35 % in 7.5 hours.
Checkpointing and uploading quietly as described by uplinger.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Jul 30, 2015 3:47:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

Longest running on desktop 11:25 hours at 51% and 11:41 on laptop at 38.2%... with that, BOINCTasks logged 38 checkpoints, just as prescribed. Unfortunately, the standard trickle message does not tell against which task the progress is made, so tried the debug function.

7/30/2015 5:39:01 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-005_r5_1fn_1_1438227471.xml

Just what I wanted to know, which task is doing a progress report.

But actually, not needed as just before or after there are the uploads either starting or finishing.

7/30/2015 7:26:18 AM Started upload of BETA_avx101118-005_r5_1eg_1_3
7/30/2015 7:26:18 AM Started upload of BETA_avx101118-005_r5_1eg_1_13
7/30/2015 7:26:18 AM [sched_op] Starting scheduler request
7/30/2015 7:26:18 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-005_r5_1eg_1_1438233977.xml
7/30/2015 7:26:18 AM Sending scheduler request: To send trickle-up message.
[Jul 30, 2015 6:59:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric_Kaiser
Veteran Cruncher
Germany (Hessen)
Joined: May 7, 2013
Post Count: 1047
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC - July 29, 2015 [ Issues Thread ]

I've got 7 beta wu. 6 on different i7-980@3,3GHz and 1 on my amd 5350@2GHz.
Runtime/Progress: i7: 11:40hrs and 44%; amd: 11:40hrs and 27%
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Eric_Kaiser at Jul 30, 2015 7:19:25 AM]
[Jul 30, 2015 7:18:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 146   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread