Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 203
Posts: 203   Pages: 21   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 925504 times and has 202 replies Next Thread
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Hardware Recovery Updates (Updated March 31, 2023)

The Stat's page have not updated with current time stamps in the last 32 days. It should run on a cron daily

Last update user XML 2023-03-01 01:21:01 UTC (32 days 01:08:48 old)
Last update host XML 2023-02-28 13:11:22 UTC (32 days 13:18:27 old)
Last update team XML 2023-03-01 01:21:01 UTC (32 days 01:08:48 old)
You do know that the website has been down virtually that entire time?
[Apr 4, 2023 5:57:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Greg_BE
Advanced Cruncher
Joined: May 9, 2016
Post Count: 124
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Hardware Recovery Updates (Updated March 31, 2023)

Anyone know when the validator will kick in? Ive got about 11 tasks waiting for validation and I randomly checked a few and saw that my wingman is also waiting. Minimum Quorum: 2
Replication: 2
Is what one task says.....so it should have hit the validator by now.
[Apr 6, 2023 11:12:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sagittarius Lupus
Cruncher
United States
Joined: Jan 5, 2008
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Hardware Recovery Updates (Updated March 31, 2023)

Dear WCG Staff:

You know you all are rock stars for hacking out a RAID system recovery lasting an entire month, right?

Just saying, as an engineer who knows what that takes...

Kudos.
----------------------------------------
PSA: this wolf eats guns, muscle cars, rebel battle flags, their owners, bigots, fascists, viral vectors, Republicans, and anything associated with the number 45.
[Apr 9, 2023 10:46:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dena
Cruncher
USA
Joined: Sep 9, 2006
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

I hope you know what you are doing but unfortunately many who encounter this problem don't SSD drives have a weakness in that they have a limited number of writes to a cell. This isn't a problem on a drive that is read mostly and under those conditions they will outlast a rotating drive. The problem is data base programs need to maintain data pointers and these are often rewritten. While the data may be relatively static, the pointer are not and can result in a failure at about a year of usage. I was on another web site that switched to a SSD drive and after a few early failures I informed the owner of the site about this. The drive were swapped out with a rotating RAID and the problem was solved. I suggest you look into the following link where they test SSD drives to the point of failure.
https://techreport.com/review/27436/the-ssd-e...nt-two-freaking-petabytes
----------------------------------------
[Edit 1 times, last edit by Dena at Apr 16, 2023 2:38:39 PM]
[Apr 16, 2023 2:35:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

You still haven't been able to assimilate and delete lots of results from before the big HDD/RAID crash, from the results list.

I still have quite a lot of results on the results list that was crunched and validated February 27 to March 1, still not removed from the results list. I'm sure that others can see the same on their results lists.

Even after the restart of BOINC, following the big outage, there seems to be a long delay for assimilation and deletion of validated results from the results list. It's no longer the "normal" 24 hours, but much much longer.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 17, 2023 4:23:12 PM]
[Apr 17, 2023 4:16:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

Also now that some OPNG tasks are back, I see very slow validation of singletons. Of course, same problem applies to OPN1 singletons.
It used to take only seconds, or minutes, before singletons were visited by the validator. Now it can take hours, or many days.
----------------------------------------
[Edit 4 times, last edit by Grumpy Swede at Apr 17, 2023 6:34:47 PM]
[Apr 17, 2023 6:31:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

Thanks team!

Results list now cleaned up from old tasks, that was finished and validated before the latest long outage. Assimilate and delete now seems to work better. Not yet fully back to the 24 hour deletions of validated tasks, but we're getting closer. (I think it used to be 24 hours, but I could be wrong)

One step forward to full WCG recovery.
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Apr 18, 2023 6:58:22 PM]
[Apr 18, 2023 6:48:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

Now to a SCC question:

According to the March 27, 2023 update, at the first post in this thread, you had the following release plan for WU's: "We plan to start with MCM and OPN/OPNG; followed by ARP and then the new SCC work units."

So, since you already have released MCM1, OPN1/OPNG, and ARP1, the question is, when do you release SCC? If I remember correctly, you posted earlier that the SCC team have sent new SCC work to WCG, some time ago.
----------------------------------------
[Edit 3 times, last edit by Grumpy Swede at Apr 18, 2023 7:15:39 PM]
[Apr 18, 2023 7:11:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

Thanks team!

Results list now cleaned up from old tasks, that was finished and validated before the latest long outage. Assimilate and delete now seems to work better. Not yet fully back to the 24 hour deletions of validated tasks, but we're getting closer. (I think it used to be 24 hours, but I could be wrong)

Sorry, I didn't dare mentioning it (yet) when I saw you posting about those 24 hours yesterday, Grumpy Swede, 'cause I couldn't find any proof of that. After a long search I finally found post 546141 by knreed from 2017 stating:
After the result is validated, there is the 24 hour delete delay on the file (this is for all files not just ones waiting for a wingman). Following that there is a 24 hour delay before the workunit is deleted. Given delays this is a little more than 2 days of valid reuslts that hang around.

However, knreed added:
I wouldn't count on that remaining that way since based on database load and file system size we periodically adjust those values. However that is why you are seeing what you are seeing.


Adri
[Apr 18, 2023 9:00:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: March 20, 2023 Hardware Recovery Update

@adriverhoef

Thanks for finding that post by knreed. We'll see if Jurisica Lab will keep the same delete delays, or if they have/will change it.
[Apr 18, 2023 10:16:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 203   Pages: 21   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread