Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 52
Posts: 52   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9795 times and has 51 replies Next Thread
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

Found all my machines in Waiting to report mode at 7:30 EST with nothing left to crunch. Had to manually update to get things going again. Wonder what caused that? sad
I am not complaining but this proves that there is a case for allowing individual rigs to hold a bigger cache of GPU WU's

100% agree

Only back up minutes before you found your PCs.
----------------------------------------
[Nov 18, 2012 12:49:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ryan222h
Senior Cruncher
Joined: Sep 4, 2006
Post Count: 425
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

It will probably take several more hours for everyone's machines to automatically update and start receiving work again, for those who can't manually update. Another unfortunate loss of compute power due to low cache availability.

Looking at it from the other side though, imagine everyone sending a day or two worth of results to the validators at once. They would quickly be overwhelmed. So I suppose the short cache is a necessary evil.
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by ryan222h at Nov 18, 2012 12:54:23 PM]
[Nov 18, 2012 12:53:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

ryan mine received new work immediately?
[Nov 18, 2012 12:55:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ryan222h
Senior Cruncher
Joined: Sep 4, 2006
Post Count: 425
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

Quite a few of my machines were in "project backoff" mode, one was for about two hours. If I had not manually updated they would have been dry until then, no?
----------------------------------------

[Nov 18, 2012 12:57:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dskagcommunity
Senior Cruncher
Austria
Joined: May 10, 2011
Post Count: 219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

agree with ryan, my fastest main unattended gpu cruncher for wcg has nothing reported since 7:30 so i have to wait some aditional hours too until the next sheduler request got out :/
----------------------------------------
http://www.research.dskag.at
Crunching for my Dog who had "good" Braincancer.


[Nov 18, 2012 12:59:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

There was definitely an unexpected issue last night. It looks like the database was completely inaccessible for about 5 hours. We are looking into it.
[Nov 18, 2012 1:07:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

The "Do networking communication" [DNC] has 1 (one) function: It resets the client scheduler deferral counters, which then if there is work-backfill need initiates a connect after a little wirring, which can take a few minute.

Per my 2 devices, the last pre-out result was reported at 07:30AM and the first post-out was at 12:11.

SN2S_ AAW88538_ 0000053_ 0428_ 0-- 1854592 Valid 11/15/12 20:25:45 11/18/12 12:11:19 6.90 / 7.03 123.8 / 123.8
X0900076860298200610231336_ 1-- 2215202 Valid 11/16/12 03:20:51 11/18/12 07:30:24 2.53 / 2.55 54.2 / 56.0

Had set up a 30 minute page refresh in the webbrowser to my Result Status page, which alerted me to the return and then did the DNC [which does not log], and up my nearly dry [clean install] Linux box results went (only got init of 20). It's stocked again, but now for a day.

edit: At knreed, the result files uploaded fine during that time, just the scheduler took a saturday night AWOL in the department of doling out new work or taking receipt of the step 2 "Ready to Report" bits.

@whoever concluded low cache allowance was a bandwidth protection factor of the server: Ready to Report takes little to noting in BW. If upload of result files fails [which are the BW eaters], the client gets a backoff instruction, so when the network comes back, not all thursty clients connect simultaneously [except for the few hundred-few thousand that are hammering the Update button]. It's a different matter if the result file uploads fail for a longer period. Then knreed starts sweating, if it takes longer than 12 hours [think that was the number he mentioned one time]
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 18, 2012 1:22:13 PM]
[Nov 18, 2012 1:14:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

Ready to Report takes little to noting in BW. If upload of result files fails [which are the BW eaters]

Well, appart for CEP2 there uploads goes to a separate server and the two projects DDDT2 and C4CW not being active, all uploads is according to https://secure.worldcommunitygrid.org/help/vi...?shortName=minimumreq#413 less than 0.2 MB and except for HPF2 all downloads is also less than 0.2 MB except the initial application-downloads. So just based on this info the bandwidth-eater for WCG's servers are the downloads and not the uploads.

But, due to GPU being so fast, even each HCC-transfer is fairly small it's maybe still significant. Let's see, it looks like you have:
1: Downloads: variable but as an average over 188 files it's 114 KB/wu.
2: Uploads, 125 KB/wu.

So far so good, uploads looks larger... Except:
1a: Downloads is zipped files, meaning no extra compression.
2b: Uploads is uncompressed files, using NTFS own compression gives 1/2 the size, so if assumes BOINC's upload-compression gives similar result means is only roughly 60 KB/wu.

As for connections to scheduling-server "takes little to nothing in BW":
3a: Reporting 1 result and asking for work: 71 KB uploaded.
3b: Getting 1 new task in return: 58 KB downloaded.
This is 129 KB total, but this isn't the full story:
4a: Asking for more work since cache not full: 66 KB uploaded.
4b: Getting told you've hit the "in progress"-limit: 37 KB downloaded.

This gives a total of 232 KB transferred/wu for scheduling-server and 239 KB transferred/wu for upload/download-servers, meaning 49% of bandwidth is used on scheduling-server...

This is for a computer running both CPU & GPU for WCG, if you're only running GPU the uploads in 3a decreases to 54 KB and 48 KB in 4a. The download-sizes is the same.

If you're only running GPU the possibility for multiple tasks finishing so close together they're reported as one increases, and the total scheduling-server-usage decreases. Still, you can expect 20 - 50% of all bandwidth used is due to the Scheduling-server for anyone running only GPU.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Nov 18, 2012 3:05:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

Thank you for that Sunday afternoon expansion and proofing that the Result file uploaded is grosso modo not a concern either [long as not many thousands are hitting update simultaneous, the random factor will nicely spread the load], which was the point being made.
[Nov 18, 2012 3:46:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project is temporarily shut down for maintenance

I have increased the GPU cache to 400 jobs. This should give more buffer for outages.
[Nov 18, 2012 7:51:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 52   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread