Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 143
Posts: 143   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 177598 times and has 142 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue.

However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility.

We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running.
----------------------------------------
[Edit 1 times, last edit by knreed at Jul 19, 2017 11:26:08 AM]
[Jul 19, 2017 11:25:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
duanebong
Advanced Cruncher
Singapore
Joined: Apr 25, 2009
Post Count: 134
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

"and there is no setting to control it in BOINC's preferences".
There is, I think if you tick the "Show advanced..." box, you get to set the space or percent.

You can set the max used storage space but it doesn't download extra wu's. If that's what this setting is for it doesn't work.


Same for me. In advanced setting we can control how much is the min remaining disk space BOINC will ensure the phone has, and also the max amount of space BOINC is allowed to consume. But there is no way to control how many days of WU Android phones keep in their buffer.

Generally the phone crunches on 1 WU and then downloads 1 or 2 extra WU in reserve. It adjusts for the number of cores you've allowed BOINC to use. So if you have an 8 core phone and allow 2 cores to be used for crunching, the phone will have a total of 4-6 WUs downloaded (working on 2 WUs + another 2-4 in reserve).

It would be useful to be able to control it more. Not just to cover server outages, but some times you could be on the road with no WIFI access. It would be nice to pre-load the buffer before leaving the house.
----------------------------------------

[Jul 19, 2017 11:33:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue.

However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility.

We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running.

Was using KSplice for long until Oracle proprietized it, but Ubuntu now too has boot-less kernel updating.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jul 19, 2017 11:43:15 AM]
[Jul 19, 2017 11:42:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

"and there is no setting to control it in BOINC's preferences".

There is, I think if you tick the "Show advanced..." box, you get to set the space or percent.

You can set the max used storage space but it doesn't download extra wu's. If that's what this setting is for it doesn't work.

My unit had one active wu (only using one core) and it has been idle since it completed. Max storage space is set to 90% and there's plenty of free space but it isn't being used for more wu's.

I wonder if this is a bug?

The space in which BOINC is installed by default is limited. Think they're working towards a new release removing some overdone controls. /OT
[Jul 19, 2017 11:52:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mmonnin
Advanced Cruncher
Joined: Jul 20, 2016
Post Count: 148
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Also, why do maintenance in the MIDDLE of the week? Unless it was absolute emergency, maintenance should wait until WEEKENDs.


Deferring planned system maintenance work to the weekend makes pirfect sense for an enterprise that runs at maximum only five days a week with reduced usage during the weekend. Doing so minimized the impact by affecting only the small number of weekend workers.

With an operation that runs 24/7 with users around the globe, it makes no sense to defer planned work to any specific day. When the work comes up on the schedule and the manpower to do it is available, it makes sense to do it during the staff's regular work day because there is no time of reduced use.


This project may run 24/7 but that doesn't admins work 24/7. The people actually doing the upgrade I'm guessing do not have 24/7 coverage of a full staff.

Tuesday is the typical upgrade day of the week. As an example, Microsoft releases patches on Tuesdays. Monday is a day to clean up anything from the weekend when most do not work. Patch on Tuesday so you have 3-4 days to fix/roll back/verify patch before the next weekend.
----------------------------------------

[Jul 19, 2017 11:58:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 721
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

aaaand it's still broke
----------------------------------------

Currently being moderated under false pretences
[Jul 19, 2017 12:09:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Ouch, all my Valid workunits uploaded at or after 14:21:56 UTC 18 July have turned Invalid. More tidying up to do, techs!
[Jul 19, 2017 12:35:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2987
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Ouch, all my Valid workunits uploaded at or after 14:21:56 UTC 18 July have turned Invalid. More tidying up to do, techs!
Yes, same here - so much for the statement that there'll be no loss of data!!!

Bang goes my machines reliability status😫
----------------------------------------

[Jul 19, 2017 12:38:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sandvika
Advanced Cruncher
United Kingdom
Joined: Apr 27, 2007
Post Count: 112
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

I don't like to be profane in a forum angel but I have seen the funny side and hopefully others will have a chuckle too: this upgrade is quite literally a clusterf***! clown blushing Good luck fixing it good luck

As an IT tech with 30+ years behind me I've been wary of the dash to cloud technology. It seems there's a lot to be said for spreading risk of unavailability across multiple vendors let alone multiple data centres, and if something is absolutely mission critical, as in this case, then having it in a hybrid environment such that overall control and therefore availabillity is ensured surely makes sense. I appreciate that cluster replication is limited to wire speeds unless in a single virtualised host, but there's no noticable loss of performance when my ZFS is resilvering after replacing a failed drive and I'd expect the same of a well implemented clustered filesystem. In this scenario I'd not expect failure of a single server to degrade performance, let alone to be fatal. I'd expect it to be more like an air crash investigation - very seldom is a whole fleet grounded after a single catastophe and the investigative emphasis is on preventing a recurrence and developing contingencies, not a dash to get airborne again.

Meanwhile I have discovered how to recover my Rosetta@home profile from the BOINC client config files so I no longer have 40 idle cores!
----------------------------------------

[Jul 19, 2017 12:43:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kkwok
Cruncher
Joined: Nov 23, 2004
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Care to share how to add another project under BOINC client?
[Jul 19, 2017 12:57:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 143   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread