Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 57
|
![]() |
Author |
|
dskagcommunity
Senior Cruncher Austria Joined: May 10, 2011 Post Count: 219 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Even when both machines are running gpu only? But ok i will see, i know there is nothing lost so i try to be patient :)
---------------------------------------- |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've posted an update here: https://secure.worldcommunitygrid.org/forums/...34538_lastpage,yes#408361
In answer to Sek's question, we currently have 6 validators running for hcc1. |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Woo-hoo! {I'll just pretend I understood all of that}
----------------------------------------![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't have anywhere near the volume of some of you guys, so I was able to do a quick review of my queue, and found that all Pending Validation are now waiting on a wingman. Plus, I just checked one that was submitted about 10 minutes before, and it was validated. So even if it's not the nearly-instant validation of times past, it seems to have gotten much better.
Thanks knreed, et al! |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Pending pages 49 to 83 since this morning!!! ![]() 102 now. 105...that's the number of pages, not tasks. Seeing tasks returned on the 7th, 9th, and 10th. 78 PV pages now. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
seems to be pretty much caught up now
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"Seems" is actually so caught up that the nightly spill was just over one year when normally 2,
1:034:00:40:01 2,627,972 4,734 and day stats were about 35 runtimes years higher than normal. 01/15/2013 436:061:07:33:44 1,047,508,586 1,928,511 Things were racing in overdrive. ![]() Back to normal, of course today we could a statistical dip, but still, there may be a little carry over from prior days... will be visible with noon stats. At least the last knreed post on database performance degradation is strongly hinting that we´re out of the validation abyss (and more general improvements under the hood, we may not see but just "feel", making the admin of the system lighter). ![]() Crunch On. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Oh, let's not forget, it's Wednesday today. Same as Sunday, a backup is run which means the validators are paused for as long as it takes. Start time is on or after 07:30 UTC (my PV queue started increasing from results with timestamp of 08:33 UTC and on).
... currently running backups of the database starting at 7:30 UTC on Sunday and Wednesday mornings. In order to allow them to finish relatively quickly while we wait for the additional RAM, we are stopping the backend processes (i.e. validation) while the backup runs. A little fluidity... no absolutes. ![]() |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Could be as mundanely simple as one CPU validator running [the smallest portion of HCC total], and not the GPU ones. The normal BOINC-behaviour with 2 instances of the same service, is one handles the odd-numbered while the other handles the even-numbered ones. For validation it's checking the wu-id. With 6 HCC-validators it won't be even/odd-numbers, but each will handle 1/6 of the wu-id's. Example, one validator will handle only wuid 614000000, 614000006, 614000012, 614000018 and so on. In case one of the validators gets severely backlogged or possibly even crashes, any wu-id corresponding to this validator can take many hours before finally validated. Other reported tasks corresponding to wuid for one of the other validators on the other hand can be validated within 1 second of reporting. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's an interesting, probably most efficient way to have multiple validators running... no risk of stepping on each-other's toes or knowing what the co-validators in the team is doing, has done. Except, they'll have to exchange the "hello I'm here" or "I'm going offline", to know what numbers to check... if they don't [or if there's a bug], a particular interval set could get forgotten for a while. Is this an avenue for the techs to explore the occasional why "try validation" rule has to kick in? Not that's it is worth exploring as the last time I've had a "try validation" could be over a year ago. As it is they clear anyway sooner or later, when there's the question "does a new copy have to be send" when there's a No Reply.
----------------------------------------Anyway, 14:20 UTC now and the backup must still be running. No new valids since 08:21 UTC. That's 6 hours and counting. Maybe a disk to disk backup and then second stepping that to tape [or whatever medium this is going]. My ~94Gb backup from HD to external drive took took about 30 minutes, but then this is not a perpetually accessed live DB. We'll survive. ![]() edit: So as I posted, things came back... the oldest from this morning getting the thumbs up. Sort of an expectation reference is now set of somewhat over 6 hours for the backup to take at least. Crunch On. [Edit 1 times, last edit by Former Member at Jan 16, 2013 2:32:10 PM] |
||
|
|
![]() |