Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
![]() |
Author |
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I hope that this is not a sign for some impending crash any time soon...
----------------------------------------![]() Ralf ![]() |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2155 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
According to my figures, the Pending Validation count is decreasing in the past few days: 262 two days ago, 180 yesterday, 179 today. Those numbers are quite low and quite normal, as you can see below:
***************** (179) [today] Now compare this to 23 days ago (773) and 24 days ago (1753). Those are my figures. What are yours? Adri |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
According to my figures, the Pending Validation count is decreasing in the past few days: 262 two days ago, 180 yesterday, 179 today. I haven't taken note of the exact numbers, as I am extremely busy at work and with emergency service training (CERT) right now, but it has increased from the high 200s last week to well over 600 this morning. Just checked again now before heading to bed and it is down to 570 right now, but that's still about double of what the "normal" numbers in recent weeks used to be... The oldest WU is PVa jail right now is from 3/20... Ralf ![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 953 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ralf, Adri...
----------------------------------------That looks like a fairly typical pattern to me. I tend to return 650 to 700 tasks a day, and at any time there are probably 450 to 550 tasks pending validation (out of those returned over the previous week or so). I tend to return work quite quickly, and there are so many systems out there that seem to struggle to return tasks before deadline that I expect a long wait for many of them to validate! Hence, I regard PV tasks that are waiting for other tasks in progress as in detention rather than jail :-) though I do monitor the situation in case there's genuine evidence of a validation/verification backlog (multiple results per WU pending validation or verification)... At the moment the situation is probably not being helped by whatever they are doing that results in periods of up to 6 hours several times a week where only retries are being shipped out[*1]. Several users have posted that they've increased their queue sizes so they don't run dry (and you can probably guess what happens later...) Cheers - Al. *1 That interval is long enough to run my fastest system out of work on its normal set-up, and if I allow WCG to send it more than the 64-task limit it gets so many extra MCM1 tasks that it can't get ARP1 work (if any is available...) P.S. Oh for some sort of server status, even if it's only in an occasional text file we can download (akin to the daily ARP1 generation progress files) -- then we could see what is really going on rather than having to try to crowdsource the status! Ah, well... [Edit 2 times, last edit by alanb1951 at Mar 27, 2025 7:14:38 AM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ralf, Adri... That looks like a fairly typical pattern to me. I tend to return 650 to 700 tasks a day, and at any time there are probably 450 to 550 tasks pending validation (out of those returned over the previous week or so).. Just checked while having a quick wake up coffee, and PVa jail is down to 525 right now. For me, this is NOT a typical pattern, as I mentioned, it "typically" hovers around 250-270, that's why I noticed a diversion from the typical pattern the other morning, when it was all the sudden at around 650. more than double of what I would consider typical. Daily returned WU numbers has increased a bit in recent days, from about 650 to around 800 as some hosts that used to primarily crunch SiDock or Rosetta (and for some weird reason struggle with MCM1) are falling back on WCG, but a 15%-20% increase in WUs returned should not all the sudden result in an up to 150% increase in PVa numbers, hence my concern.. ![]() Ralf ![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 953 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
...but a 15%-20% increase in WUs returned should not all the sudden result in an up to 150% increase in PVa numbers, hence my concern.. That's fair enough -- if I saw something as high as that I'd start sifting through the data I'd collected about my wingmen's results to see if I could spot a pattern...I use the APIs to poll all outstanding results at least once a day, which ensures I see what happens to all tasks associated with each WU I've seen; I can then do a sort of time-line for each WU, including an analysis of things like late returners and other reasons for retries being issued and validation being held up. My normal reports are based on when I returned my results, but I can also examine when WUs eventually validated or when the WUs disappeared from the database! Recently there have been days where I have received so many retries that my PVal numbers dropped quite considerably -- result returned, instant validation, sometimes for a WU that had been waiting 7 or 8 days if I wasn't the first retry! And on days with few [or no] retries the numbers are likely to climb 6 days later :-( I reckon it will continue to be a "mystery" unless/until they either put a more aggressive cap on tasks available per host or the launch of MAM1 persuades some users to shift [part of] their attention away from MCM1. A dramatic reduction in missed deadlines would undoubtedly improve overall workflow (and might even increase throughput by ensuring [fast] systems with smaller buffers might find it easier to get work regularly!) Keep being concerned -- it's all data for the hypothetical crowdsourced status page! Cheers - Al. P.S. I'd love to have read-only access to a replica copy of their BOINC database to see what proportion of WUs sent to the different platforms end up needing retries, especially for missed deadlines :-) -- that information might be quite instructive... |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
...but a 15%-20% increase in WUs returned should not all the sudden result in an up to 150% increase in PVa numbers, hence my concern.. That's fair enough -- if I saw something as high as that I'd start sifting through the data I'd collected about my wingmen's results to see if I could spot a pattern...![]() I use the APIs to poll all outstanding results at least once a day, which ensures I see what happens to all tasks associated with each WU I've seen; I can then do a sort of time-line for each WU, including an analysis of things like late returners and other reasons for retries being issued and validation being held up. My normal reports are based on when I returned my results, but I can also examine when WUs eventually validated or when the WUs disappeared from the database! I used to pull the CSV from the results page manually once a day, but with the frequent outages, I got out of the rythm. Beside, as I already mentioned, I don't have the time right now to further play with it.I don't really get to check overall every two or three days to see if something is very obviously "outside of a pattern". Things like this, specially an increasing number of WUs in PVa jail have always indicated that something is (slowly) brewing in the background... Ralf ![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7662 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Things like this, specially an increasing number of WUs in PVa jail have always indicated that something is (slowly) brewing in the background... I agree this is a quick and dirty indicator that something is awry. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Things like this, specially an increasing number of WUs in PVa jail have always indicated that something is (slowly) brewing in the background... I agree this is a quick and dirty indicator that something is awry. Cheers Currently able to upload, but not reporting finished WUs, and the Results page just shows those nauseating progress bars... ![]() Ralf ![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 953 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The BOINC database server is either down or has dropped off the network -- "Server can't open database" messages are appearing in BOINC logs, and API calls fail so the progress bars get a chance to irritate end users! (A scheduler request at 05:50 UTC today succeeded, but a request at 05:55 UTC got the failure message.)
----------------------------------------The scheduler is trying to persuade clients to not call in for an hour (which is fair enough under the circumstances, I think!) If past form is an indicator, the next stage will be "feeder not running", then we have to wait for someone to apply appropriate suasion... At least this time it is obvious why we are likely to run out of work :-) [Late edit: it was still reporting the database error at about 07:55 UTC but was reporting "feeder not running" by about 08:30...] Cheers - Al. P.S. I very much doubt that whatever has happened here (flaky hardware again?; imperfect patches in their efforts to try to stop these sorts of incidents?) has anything to do with the apparent validation backlog, either as cause or effect. I haven't seen a genuine PVal jail task (two or more pending for one WU) or stuck retry in quite a while, and those are much better problem indicators! If someone can give me chapter and verse on why I'm wrong, I'd be delighted to learn something new [no sarcasm intended] :-) [Edited to add a time bracket for the initial service failure; edited again to add the time-check for feeder not running.] [Edit 2 times, last edit by alanb1951 at Mar 30, 2025 9:05:25 AM] |
||
|
|
![]() |