Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: 0x40 , William Albert , Unixchick
Thread Status: Active
Total posts in this thread: 89
Posts: 89   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4028 times and has 88 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Adri,
When I looked, Unixchick, which is just now, I was seeing three deadlines of 5 days and a half. Just saying. biggrin
I think that the task would have had a short deadline when it was sent to Unixchick (it is, after all, in the Extreme generation range!) and someone at WCG has [once again] gone through the data tables extending the return deadlines to try to reduce the likelihood of redundant retries being issued! The client never learns about the "extended" deadline...

And as confirmation of the above, I have just noticed that according to one of my clients one of the retries I'm trying to upload has a three day deadline (as expected) but the server says it has a 7 day deadline! -- By the way, the retry had itself been stuck at "Waiting to be sent" for nearly 4 days...

As long as a task gets downloaded o.k. and started long enough before the original deadline it shouldn't get auto-aborted so all should be well. However, other users with larger caches might still lose tasks to "Not started by deadline" :-(

Cheers - Al.

P.S. One of the scripts I use to analyse wingman data now warns me about changed deadline dates, and there have recently been lots of 3-day increases for MCM1 tasks and 4-day increases for ARP1 tasks.

[Edited to add my three/seven day deadline example and the P.S. ...]
----------------------------------------
[Edit 3 times, last edit by alanb1951 at Nov 18, 2024 1:31:58 AM]
[Nov 18, 2024 12:57:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 834
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It looks like they extended the deadline for the ARP WUs, which is probably a good idea with the download/upload issues.

I've added a link in the first post to Mike's ARP sunday update. - Thank you Mike.
[Nov 18, 2024 6:15:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 834
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Looks like they are ignoring the 3 WUs that are the most extreme for the moment. It would be nice to know if they just need more time to work on forming the WUs with a smaller step or if they plan on leaving them out.

I would like to see the extremes put on the front of the sending queue and not on the end if possible to get them sent out more often in an effort to catch them up.
[Nov 18, 2024 3:27:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
tfmagnetism
Cruncher
Joined: Jul 22, 2011
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Perhaps that should be: ARP1 status: "Upload Log Jam"? If we NEVER get to the "Ready to report" stage (as for last few days - Friday or Saturday night something like that) we never can get to download any more workunits! I'd read that as "Upload Log Jam" folks for sure (see my ARP1 upload problems thread). However clearly if you've not got any "uploading" maybe no problem?! Who knows I've no idea for last few days.
[Nov 18, 2024 4:23:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2069
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unixchick:
I've added a link in the first post to Mike's ARP sunday update.

In December 2021, knreed started three daily reports based on the progress of ARP1. Since the 19th of May 2023 I've been collecting the contents of these three files completed.txt, generations.txt and state.txt daily, for future reference. With the great help of Al (alanb1951) I've been able to extend my collection of the three files to mid July 2022.

With the restart of ARP1 I've introduced a script dedicated to the history of ARP1, based on its historical statistics. The script summarizes the daily statistics of these three files and uploads each resulting file to a webpage, so that you can have a look at 'the past' of ARP1.

This is what you can find:
  • History of completed generations (updated daily) ['completed']
  • History of the number of workunits within each generation (updated daily) ['generations']
  • History of the number of extreme, accelerated and normal generations (updated daily) ['state']

    Here is what they represent.
    In each file, the first column always represents the date of the measurement (obviously).
    - In 'completed', there are three more columns (Extreme, Accelerated, Normal), each divided into three parts: "units", "avg.h", "gen" (where units = number_units, avg.h = avg_hrs_to_complete, gen = max_generation, so that they directly correspond to the original file completed.txt);
    - In 'generations', there are columns for each generation that has one or more workunits, each divided into two parts: "gen" and "num" (where gen = generation, num = num_units_currently_on_generation, so that these two columns directly correspond to 2 of the columns in the original file generations.txt);
    - In 'state', there are three columns (Extreme, Accelerated, Normal), each divided into two parts: "units" and "gen" (where units = number_units, gen = max_generation, so that they directly correspond to the original file state.txt).

    You can also reach these three files completed.txt, generations.txt and state.txt by surfing to http://adriverhoef.unaux.com. You will currently also find there "My pending tasks" (updated once a day) and "My current tasks" (updated hourly). If you find that the browser doesn't show the latest results, try to refresh the page. Also, sometimes, e.g. when there are download issues, it can be a bit harder to update the three files. In that case, try coming back one hour later. biggrin

    Adri
  • [Nov 18, 2024 4:36:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Unixchick
    Veteran Cruncher
    Joined: Apr 16, 2020
    Post Count: 834
    Status: Recently Active
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Project Status (First Post Updated)

    Wow Adri,

    That is alot of data. I love your files.
    When I look at the state file I find it curious that the number of extreme WUs goes up when the bracket is stable. I get that it will increase when the extreme group moves up, and there might be a bit of measurement wobble around that movement, but I can't understand it getting larger when it is stable. shouldn't it go down as WUs move into the accelerated category??

    I'll put a link to your explainer post here in my first post if that is ok.
    [Nov 18, 2024 8:28:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    adriverhoef
    Master Cruncher
    The Netherlands
    Joined: Apr 3, 2009
    Post Count: 2069
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Project Status (First Post Updated)

    Wow Adri,

    That is alot of data. I love your files.
    Many thanks!

    When I look at the state file I find it curious that the number of extreme WUs goes up when the bracket is stable. I get that it will increase when the extreme group moves up, and there might be a bit of measurement wobble around that movement, but I can't understand it getting larger when it is stable. shouldn't it go down as WUs move into the accelerated category??
    Indeed, it's quite odd to say the least; note that I'm just reproducing the daily history of the statistics and this is how the figures are moving. I'm afraid I don't have a good explanation for what's happening in the 'state' file. The number of Extremes and Accelerateds are ever increasing since the semi-official restart on 2024-11-01, and the number of Normals is steadily going down (in fact, already since 2023-05-21).

    I'll put a link to your explainer post here in my first post if that is ok.
    Yes, please go ahead, Unixchick.

    Adri
    [Nov 18, 2024 11:55:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    alanb1951
    Veteran Cruncher
    Joined: Jan 20, 2006
    Post Count: 858
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Project Status (First Post Updated)

    Unixchick,
    When I look at the state file I find it curious that the number of extreme WUs goes up when the bracket is stable. I get that it will increase when the extreme group moves up, and there might be a bit of measurement wobble around that movement, but I can't understand it getting larger when it is stable. shouldn't it go down as WUs move into the accelerated category??
    Adri and I had just started a conversation about that off-board but I'm still piecing together evidence; I suspect that it is a side-effect of when cells get re-categorized (it only happens when a unit is processed, not on some sweep of all units), so what we are seeing is the final re-categorization of lower-generation cells that hadn't been processed since being categorized as Normal, but would have slipped backwards as others advanced!

    I can find examples of this within my recent (and pre-hiatus) wingman data -- initial tasks getting a 6-day deadline but being for a generation categorized other than Normal at the time of downloading. Without read access to the BOINC databases (for result and workunit tables) on a daily basis, there is no efficient[*1] way to track such cases to the next generation(s) to confirm what happens next.

    Cheers - Al.

    P.S. I hope Adri doesn't mind my having mentioned my suspicions as to the reason before having collated a large batch of evidence :-) -- I've been spending too much time just trying to get MCM1 work through [50% drop typical because of up/download issues] (let alone ARP1) to concentrate on anything else :-(

    *1 -- I won't recommend using the APIs for the collection of WU information en masse in the hope that one might find relevant ARP1 work units amongst the data -- it has to be done one WU at a time, and it can take quite a while to collect my own WU data (a few hundred on any given day!) so I reckon it would take many hours to get [tens of] thousands and I reckon that's antisocial!... :-)

    [Edited to acknowledge Adri's new post which landed whilst I was typing this one!]
    ----------------------------------------
    [Edit 1 times, last edit by alanb1951 at Nov 19, 2024 12:20:33 AM]
    [Nov 19, 2024 12:12:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    Mike.Gibson
    Ace Cruncher
    England
    Joined: Aug 23, 2007
    Post Count: 12120
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Project Status (First Post Updated)

    My interpretation of this is that you are not comparing like with like.

    My Sunday Report uses generations.txt and compares the current situation with the same report 7 days earlier. That uses actual validations.

    The numbers in the various categories in state.txt do not add up to 35609, probably due to lack of movement of some units.

    Other reports use numbers returned as completed but not necessarily validated.

    Mike
    [Nov 19, 2024 2:30:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    alanb1951
    Veteran Cruncher
    Joined: Jan 20, 2006
    Post Count: 858
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Project Status (First Post Updated)

    My interpretation of this is that you are not comparing like with like.

    My Sunday Report uses generations.txt and compares the current situation with the same report 7 days earlier. That uses actual validations.

    The numbers in the various categories in state.txt do not add up to 35609, probably due to lack of movement of some units.

    Other reports use numbers returned as completed but not necessarily validated.

    Mike
    Mike,

    I wasn't sure to whom you were responding here - i.e. who the "you" in question might be! With that in mind, treat the following as me trying to satisfy my curiosity, rather than a critique!...

    I've highlighted part of your post in blue, as it intrigues me...

    My understanding of "completed" (in both completed.txt and generations.txt) was that it refers to numbers of WUs that have been marked with either a canonical result or marked as invalid. That would be a lot easier to organize at the database level (and more "accurate") than sifting results and seeing if there were enough candidates for validation!

    My understanding of the "currently on generation" numbers in generations.txt is that they show tasks that were last expected to run at that generation number -- if a task fails when actually sent out, it will stay at the same generation number, so progress statistics should probably be based on that figure rather than any of the "completed" values (and the numbers I get by assuming that on a daily basis sum up to about the same as your weekly numbers!)
    However, at "crisis" times there are frequently discrepancies between the "completed" values in generations.txt and the differences between daily "current generation" counts, allowing for cells moving into a generation as well as moving out, hence some speculation that this might [sometimes?; always?] reflect WUs that completed as failures -- looking back through past generations files there were rarely discrepancies when things were running smoothly...

    I can't begin to explain the state.txt totals, but I'm reasonably confident about the explanation of why the category counts appeared to be going backwards quite quickly at first :-)

    If you know otherwise, please clarify; I really do want to know how it works but I don't think the WCG tech team are going to let me see the scripts or the relevant part of their statistics database schema (if that's where they do the relevant stuff) -- they've got far more pressing things to deal with than satisfying my curiosity!

    Cheers - Al

    P.S. I'm not willing to claim any sure knowledge of how things work here; one of my many hats before I retired involved a lot of database work so I know how little we know about this! So I just look at the data and draw [tentative] conclusions based on how it changes over time :-)
    ----------------------------------------
    [Edit 1 times, last edit by alanb1951 at Nov 19, 2024 11:31:00 AM]
    [Nov 19, 2024 11:29:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    Posts: 89   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
    [ Jump to Last Post ]
    Post new Thread