Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Member(s) browsing this thread: AgrFan |
Thread Status: Active Total posts in this thread: 3520
|
![]() |
Author |
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1114 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
wow. new ARPs that is interesting. I haven't gotten any in a while, but I'll keep hoping.
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
According to ....generations.txt there were 22 extreme units validated in the last 24 hours, averaging 20 hours to complete.
Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sunday Report - Xmas special
There has been a lack of data for the last 5 weeks due to a pause and then a data publication stoppage. About 34,000 units validated in 5 weeks so an average of under 1,000 per day. Assuming that a full generation 182 will be the last, there are 1,698,665 units still outstanding, so my forecast end date would now be 17 October 2027, however we are still coming out of testing so we should finish well before then. The definitions of normal, accelerated & extreme have moved on to generations 142, 132 & 127, respectively. There are 38 Extremes and 23 Accelerated units, although the numbers in their generations are 1,465 & 4,381 due to lack of movement/change of definition.. Some extremes have been released but I haven't seen them myself. Mike |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2218 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There are 38 Extremes and 23 Accelerated units, although the numbers in their generations are 1,465 & 4,381 due to lack of movement/change of definition.. Mike, could you explain what the file state.txt is trying to make clear? +--------------+----------------+-------------+ You say: "There are 38 Extremes and there are 1465 in their generations". When I count the number of Extremes in generations.txt, they (1+1+1+2+1+2+2+4+2+2+7+2+2+4+3+4+1+5+4+2+9+4+1+4+5+2+3+7+20+96+519+743) add(ed) up to 1465 (yesterday). What is that number of 38 then? |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
adri
My guess is that the lower number is those that are "registered" (my quotes and terminology) due to lack of movement of the rest. For example, when the classification of a generation moves on, those in that newly classified generation I presume do not get registered as the new class until they have been distributed. This is just a guess as so little information as to what is going on at Krembil is disseminated by them. IBM used to clarify some of my queries when they were running things and those 3 reports were established as a consequence of my questions. " The blind leading the blind" seems appropriate. Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1065 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mike, Adri;
Firstly, regarding the counts given in state,txt -- up until 31st October 2022 (inclusive) the counts seemed to include every grid cell, whereas from 1st November onwards the counts now only sum to 29186... On 31st October 2022 state.txt was as follows +--------------+----------------+-------------+ whilst on 1st November it was as follows +--------------+----------------+-------------+ with 6423 cells no longer included. It doesn't directly answer the question as to what is being counted, but(assuming that some aspects of the generation management take place on the non-BOINC database) it suggests that there may be some issue in the connection between the two databases, as often mentioned in web-site related contexts. It would be interesting to see the queries used to generate the three reports [for a user-based forensic effort] , but I don't think we're likely to get lucky in that matter :-) And on that note, Mike remarked This is just a guess as so little information as to what is going on at Krembil is disseminated by them. I rather suspect that as far as Igor's team are concerned this is part of the "IBM set it up and we're leaving it alone unless it is an obvious source of system failure" section which will [slowly] get resolved as they sort out the more critical aspects of the inter-database communication problems. I wonder how much of the stuff that worked fine in an all-IBM environment was taken for granted when documenting systems; do we know how much worthwhile documentation was provided? Mike's "blind leading the blind" assessment may be nearer the truth than one might hope would be the case...---- On the subject of Adri's one-off work unit -- do we officially know that the work generation for each category is effectively separated out? There's plenty of evidence in the current generation shifts and average completion times to suggest that there's a constant [but small] trickle of Extreme units and that the same may be true for Accelerated units... I actually looked at the work units surrounding Adri's oddity - there was one other ARP1 unit there, and everything else appeared to be MCM1. Given the reasonably swift turn-around being reported for Extremes at present, it must happen once or twice a day -- perhaps it's regulated by the number of units that actually get assimilated and purged on a given day? The above may also apply to Accelerated cells, but it would appear that Normal cells may not get another batch issued until all stragglers from the last batch have returned. If anyone can tell us what the real situation is regarding triggers for new work, I'd love to know! If there is actually a trickle of work going out for those categories the odds against any of the regular ARP1 posters getting one would be pretty remote, I suspect -- sorry, unixchick :-( -- so we'd be none the wiser about the new work... Cheers - Al. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2218 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Al, that is an exc-Al-lent observation, showing these two tables of state.txt from the recent past, not older than two months. On one day, the total number of workunits (or grid cells) in state.txt suddenly dropped from 35609 to 29186 ...
----------------------------------------with 6423 cells no longer included. Al suggested: it would appear that Normal cells may not get another batch issued until all stragglers from the last batch have returned. Apart from that I've 'caught' one new Normal task - and nothing else, ARP1-wise - this morning: workunit 236797124 App: Africa Rainfall Project (A quick research - looking at 500 workunits before as well as after that workunit-ID, that's only 3 minutes apart - showed no other ARP1-tasks.) The workunit shown above was generated this morning, so, this is probably only a tiny trickle, a casual passer-by. The other machine of mine is still processing 12 ARP1-tasks on all 12 of its threads. ![]() [Edit 2 times, last edit by adriverhoef at Dec 27, 2022 12:50:50 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12564 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I noticed when units started to flow a couple of weeks ago that they were releasing in generation order starting with 134, then 135 next day, then 136, etc.
I guess it was their way of staggering the flow. I doubt they were being returned that quickly. Mike |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2218 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That's an interesting viewpoint, Mike. I want to share mine also, if you don't mind.
Starting to look from 1 November, I see the first tasks appear at 4 November. First a few Extremes (116, 110, 119, 124, 108, 123, 117, 123), then a few 130 and 131, followed by 132 on one machine and twelve hours later 133 to 139. Then resends, the last one on 18 November. After a long wait and tapping Krembil on the shoulder 'can we get more?', on 8 December two resends, the next day 132 and 133, the day after 135, in the days that follow I'm seeing 136 to 140, then resends. On 19 December I'm starting again with 134 and 135. Then resends. And, on 24 December, this sudden 117, and 27 December, one new 135. There were two questions that I wanted to ask Cyclops. One was what the query is to generate state.txt, to shed some light on the matter, the other was what is happening to the many generations that seem to be 'stuck'. I have saved generations.txt from 30 November 2022 and selected the colums 'generation' and 'num_units_currently_on_generation': 014 | 1Their values (num_units…) haven't changed in four weeks. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1114 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I think when ARP ran into trouble and things weren't working for them, they had a tech at IBM to help them. They reformulated the WUs to be more granular and restarted them from a certain generation, or started them all over. I'm guessing they don't have that level of expertise or attention anymore. They might have to abandon the difficult areas.
We know that ARP has had a drive/tape space issue, so they might not have had time to look at the difficult WUs to manage them. I hope they can restart/rewind the ones that need that once the holidays are done, and they have the time to do that. |
||
|
|
![]() |