Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 81
|
![]() |
Author |
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1271 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Personally I would say no the administrators are not running. I still have over 5000 results waiting to be deleted & this number hasn't changed for a long time
----------------------------------------![]() |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2138 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Purging still not happening. So, the "solution" Tiger Lily talks about Here ,obviously was not the right solution to the problem
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1944 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Purging still not happening. So, the "solution" Tiger Lily talks about Here ,obviously was not the right solution to the problem +1Unfortunately, the last almost two years, talk has been cheap... ![]() Ralf ![]() |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 787 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I haven't seen any movement.
----------------------------------------
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2145 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The tech team is still tinkering with the assimilators. MCM1-assimilation has come to a halt at the moment.
Five days ago (last Monday), I didn't see any FileDeleteState = 2 (so no assimilation taking place) for nearly 24 hours (Feb 5 19:00-Feb 6 15:00). Then assimilation was resumed, till the assimilation ran out of FileDeleteState = 2 again at Feb 8 06:00. (Times are UTC.) Also, all MCM1-workunit-IDs that I saw with FileDeleteState = 2 adhered to the formula Workunit-ID modulo 4 = 1. Of course, I expect that volunteers with a larger number of valid tasks than I have(*1) may observe slightly differing results. Apparently, getting the assimilators in fully functioning condition turns out to be tricky. The oneliner that I ran was this one: ls -tr wcgresults.2024-02-*|while read f;do ls -l $f;sed 's/ *<Result>//' $f|perl -w00ne 'print "$a\n" if /<FileDeleteState>2</ && /<AppName>mcm1</ && (($a) = /(..)<\/WorkunitId>/)'|sort|uniq -c;done Adri [*1] Currently: $ wcgstats -wsV -aMCM1 -m0 -P1 |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 929 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've been seeing the same as Adri (but on a mere 15000 or so results...)
I was wondering if WCG had done a tweak to the standard BOINC assimilator wrapper (which, unlike [say] the validator wrapper, doesn't seem to have a way of constraining the [range of] workunits selected) as it kept working upwards. I was also wondering what was causing certain WUs to provoke assimilator problems, and whether it was known to be limited to older WUs... If, for instance, the assimilator wrapper had been modified to also accept a lowest acceptable WU number (or range of WU numbers) there might have been the option of starting the other assimilators with a high enough WU number to get some assimilations done[*1]. However, if it were that simple that would surely have been done already :-) -- so it looks like we wait, and hope that the resumption of ARP1 (and, perhaps, some more SCC1?) might take the pressure off MCM1 a bit. [Apologies for armchair SysAdmin mode...] Cheers - Al. P.S. I suspect that anything that facilitates a significant reduction in the number of MCM1 WUs to be queried might reduce the possibility of a recurrence of the issues that bit WCG/IBM a fair while ago... *1 Another hack would be to make a huge increase in the number of WUs to be considered by a single pass of the assimilator. Probably a bad move from a performance standpoint :-( |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7633 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Not being a sysadmin, but knowing that some possible alternative actions may be possible, such as stream separation on a old/new basis or the use of holding corrals based on selective criteria may be helpful. Even restrictive flow may be helpful to stave off overload. We may not get all we want, but we may some.Without knowing the setup and any of the throughput parameters any of this is pure speculation. Sooner or later the techs in the back room will determine the nature of the problem and formulate some fix.
----------------------------------------Good luck and Godspeed. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 929 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sgt. Joe,
Without knowing the setup and any of the throughput parameters any of this is pure speculation. Sooner or later the techs in the back room will determine the nature of the problem and formulate some fix. Agreed! My "problem" (if it is such) is that before I retired I had many occasions when I was trying to fix problem systems that seemed to resist every common-sense attempt at solution, so (knowing the spaghetti nature of some of the BOINC server code) I have real sympathy for the WCG folks if they have strange errors to deal with, and am frustrated that there's nothing I can do to help!Ah, well, it will (as you say) get sorted eventually (or the system will break completely...) Cheers - Al. P.S. I often had to stop and remind myself that if a fix was urgent, perfect was the enemy of good... |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 342 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
P.S. I often had to stop and remind myself that if a fix was urgent, perfect was the enemy of good... Agree, when the system was down then a fix that relieved the symptoms without negative consequences went in. That being said, the system is not down and appears to be able to cope with the backlog so is a fix urgent? Sometimes it can take longer to fix a bad fix than to fix the original problem. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 929 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
P.S. I often had to stop and remind myself that if a fix was urgent, perfect was the enemy of good... Agree, when the system was down then a fix that relieved the symptoms without negative consequences went in. That being said, the system is not down and appears to be able to cope with the backlog so is a fix urgent? Sometimes it can take longer to fix a bad fix than to fix the original problem. Agreed - that's why I always like[d] problem avoidance using existing mechanisms whilst working on a longer-term solution :-)It will be interesting to see if there will be an "after action report" once this does get resolved. I suspect tt would make for interesting reading :-) Cheers - Al. |
||
|
|
![]() |