Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 57
Posts: 57   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4109 times and has 56 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I wish the WCG folks had enough time to do a monthly "report card" :-), so...

It has been an interesting month, despite how quiet the forums have been in general. Everything (apart from the apparent lack of progress on ARP1) was going so well until early afternoon (UTC) on the 29th, when there was a [roughly] 5 hour outage...

After the huge numbers of delayed retries during July, August seems to have passed without any such problems -- I have only seen three retries [of mine or of wingmen] that were requested in August and held up at all, and those were about 6 hours each. (There's more on Sgt. Joe's long-stalled task below...)

The forums also seemed a bit more responsive -- this might just have been because there seemed to be a lot less traffic, but possibly because there was some improvement in the time needed to authorize access to each page. (I can still get the incorrect display of the "Join" button if I flip back through a list of loaded pages quickly enough, so a check is in there somewhere!) And if the authentication stuff has become quicker that would improve API access as well (see later...)

There were (as always) some minor issues[*1], but nothing that appeared to be show-stopping :-) That said, db_purge appeared to do one of those 24-hour vanishing acts from around 12:30 UTC on the 7th and the file deleter(s) seemed to be unavailable for 24 hours from around 12:30 UTC on the 21st. Those outages don't normally affect any users other than those (like myself) who might be using the APIs to monitor the flow of our work through the system...

Talking of the APIs, something that seems to have improved a lot this month is the response time of the APIs when collecting workunit and wingman result information; my script that picks up and analyses the latest day's returned tasks would typically spend 10 to 15 minutes of time waiting for replies to the individual queries for a 500-workunit day, but now it seems to take 4 to 8 minutes.

However, not all went quite as well...

There was a [total] outage from about 14:50 UTC to around 20:00 UTC on the 29th. As scheduler requests were met with "feeder not running" messages rather than various HTTP or network errors, there was something running, but no BOINC or web-site features were accessible. Uploads and non-BOINC functionality came back first; scheduler requests were being met a few minutes later...

I have seen no official comment about that down time, so I have no idea whether it was a crash or something planned [at short notice].

And finally...

TigerLily's work period has finished -- I wonder how long it will be before someone else with time to keep a regular eye on the forums can be found...

Also, it has been noted elsewhere that the stalled original task for Sgt. Joe's long-delayed WU (due since late April!) has cleared and validated. Interestingly, the [initial] task that had been stuck since late April got sent out to someone very soon after this restart -- coincidence, technician interaction or just the classic "it'll work after a reboot"???

The hope for the next month has to be that the relatively stable service can be maintained and that the techs at the data centre and WGC can solve the problems related to needing access to multiple distinct filestores so they can get ARP1 work flowing again -- sadly, it looks as if WCG's determination to help Delft out regarding work creation and data movement has ended up biting them...

Cheers - Al.

*1 I think we are going to be stuck with the continuing "my new devices haven't shown up" issues as long as a messaging system of some form is used to pass details between the two distinct database systems, especially if said messaging system has strange ideas about caching and acknowledgment :-( [so I tend to ignore those :-)]
[Sep 1, 2024 5:52:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 834
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you Al. Great update. Not great news, but very nice update.
[Sep 1, 2024 5:15:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks Al. I appreciate the update.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 1, 2024 6:40:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 224
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi Al, thank you very much for the really interesting update. I hadn’t noticed the outage, but that would seem to explain why some of my machines have been struggling to get work.
Cheers,
Mark
[Sep 2, 2024 7:23:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 834
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Nothing new to report, just bumping this thread up.

Hopefully soon we will get an intro post from the new communication person. I hope TigerLily left the new person some notes to get them started.
[Sep 9, 2024 3:27:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

A little off topic, but I was hoping to see a little information on the outage yesterday.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 12, 2024 12:09:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

A little off topic, but I was hoping to see a little information on the outage yesterday.

Cheers
Agreed...

As it is, all that is known is the duration (about 01:50 to 17:20 UTC on 2024-09-11), that it didn't seem to have affected anything on the BOINC side, and that when Web services came back the APIs seemed to recover after the forums returned...

As it's extremely uncommon for [relatively] short outages to be explained by WCG (probably because there's a choice between developing/fixing things and telling us what happened -- small team, large workload?) I'm not optimistic (and, having sometimes been in that sort of position myself when I was still working, I have some sympathy...)

Cheers - Al.

P.S. I noticed that user savas responded to a message in the BOINC Agent forum's "Couldn't create record in host database" thread, to say "We are investigating..." -- that user also responded to a few posts during the transition from IBM to Krembil. So there is still someone from WCG visiting some threads sometimes :-)
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Sep 12, 2024 10:03:56 AM]
[Sep 12, 2024 9:56:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Further to yesterday's post, savas posted some technical details about profile problems in both the thread I referenced yesterday and in the News thread 2023-12-11 Update (Device profile issue).

I'll leave it up to individual readers to place their own interpretations on it -- I'm keeping mine to myself as I don't have sufficient [other] information :-)

A few more "This is what broke, and this is what we've done about it" posts would be welcome, even if (like the one mentioned above) they have to acknowledge that it may not be a permanent fix...

Cheers - Al.
[Sep 13, 2024 1:11:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 292
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Not quite sure where to put this, but here goes:

I see there are still volunteers that are running, what appears to be, a six (6) day queue.

I discovered these after several of my WU's were marked "SERVER ABORTED." I have sorted my WU's on the "ERROR" flag. The following selections were culled from the results and displayed below.

PLEASE PEOPLE, to improve the throughput of everyone's work units, try not to run a queue longer than 3 days. This allows you to keep your machines busy and minimize the "Server Aborted" WU's that bring frustration to us volunteers.

FYI: my maximum queue was recently increased to 0.3 days, it had been 0.01 days

Thank you!

Result name; OS type; OS version; Status; Sent time; Time due/ Return time; Cpu time/ Elapsed time; Claimed credit/ Granted credit

MCM1_0224817_5494_0 Microsoft Windows 10 Professional x64 Edition, (10.00.19045.00) Valid 2024-09-20 02:46:34 UTC 2024-09-26 03:06:59 UTC 1.44 / 1.45 76.2 / 75.8

MCM1_0224786_9090_0 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) Valid 2024-09-19 21:56:02 UTC 2024-09-25 22:06:29 UTC 2.91 / 2.92 75.7 / 75.6

MCM1_0224764_3120_0 Microsoft Windows 11 Professional x64 Edition, (10.00.27695.00) Valid 2024-09-19 13:18:19 UTC 2024-09-25 13:36:25 UTC 1.85 / 1.92 75.7 / 79.7

MCM1_0224770_3493_1 Microsoft Windows 10 Professional x64 Edition, (10.00.18362.00) Valid 2024-09-19 10:23:50 UTC 2024-09-25 11:13:46 UTC 1.35 / 1.35 77.6 / 76.3

MCM1_0224755_5637_1 Microsoft Windows 10 Professional x64 Edition, (10.00.22631.00) Valid 2024-09-19 04:02:40 UTC 2024-09-25 05:33:32 UTC 1.72 / 1.72 76.7 / 75.4

MCM1_0224757_5832_1 Microsoft Windows 10 Professional x64 Edition, (10.00.22631.00) Valid 2024-09-19 04:02:40 UTC 2024-09-25 05:25:23 UTC 1.74 / 1.74 77.1 / 75.8

MCM1_0224745_3003_1 Microsoft Windows 8.1 Core x64 Edition, (06.03.9600.00) Valid 2024-09-18 21:16:58 UTC 2024-09-24 22:01:31 UTC 1.57 / 1.57 72.9 / 74.6

MCM1_0224716_2059_1 Microsoft Windows 10 Professional x64 Edition, (10.00.19045.00) Valid 2024-09-18 18:47:21 UTC 2024-09-24 19:44:40 UTC 1.07 / 1.07 76.5 / 74.3

MCM1_0224702_7009_0 Microsoft Windows 11 Professional x64 Edition, (10.00.22621.00) Valid 2024-09-18 14:13:53 UTC 2024-09-24 14:20:55 UTC 0.94 / 1.02 6.5 / 40.3

MCM1_0224709_4555_1 Microsoft Windows 10 Core x64 Edition, (10.00.19045.00) Valid 2024-09-18 14:08:04 UTC 2024-09-24 14:14:44 UTC 1.66 / 1.66 77.9 / 79.2

MCM1_0224709_4542_1 Microsoft Windows 10 Core x64 Edition, (10.00.19045.00) Valid 2024-09-18 14:08:04 UTC 2024-09-24 14:14:44 UTC 1.66 / 1.66 77.7 / 78.6

MCM1_0224708_4587_1 Microsoft Windows 10 Core x64 Edition, (10.00.19045.00) Valid 2024-09-18 14:08:04 UTC 2024-09-24 14:14:44 UTC 1.66 / 1.66 77.9 / 78.5

MCM1_0224688_7943_1 Microsoft Windows 11 Professional x64 Edition, (10.00.22631.00) Valid 2024-09-18 08:53:53 UTC 2024-09-24 09:06:48 UTC 1.4 / 1.4 77.3 / 75.8
[Sep 26, 2024 4:58:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 232
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Is it time yet to replace 'SOON!' with 'Soon?' ?
[Sep 26, 2024 10:43:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 57   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread