Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 32
|
![]() |
Author |
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() I hope that it is just a weird coincidence that in the 7 hours since the server outage was reported there has not been a single post (on any subject) by any of the CAs (nor techs). ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I can sense some contention among crunchers, but I can't seem to put a finger on what exactly it is. I see fredski as airing what he wants to see happen: a 'demand' or a 'request' or whatever one wants to call it, that WCG staff chime in and at least say "we are aware and working on it". That seems reasonable to me. A timely response from the WCG staff would have surely cleared the air or at least allow channelling of energies to bear into concern areas. The alternative, silence on WCG's part, serves only to fuel speculation which may invite a counter-speculation and with it beclouding whatever issue there may be -- making intelligent discussion difficult at best, and confrontational at worst when contending positions are given time, because of the continuing silence, to harden.
There is a price to pay to maintain a vibrant, active, enthusiastic, and intelligent cruncher community: silence is not one of them. ; |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
AFAIK the Techs and CA went down to the nearest bar about 10 hours ago and tied one on......they ain't in a fit state to say nuttin!
![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
![]() I reported this problem 8 hours ago. I don't know what is going on, but I am interpreting the complete silence I am getting as a sign that people are too busy on the servers to chat. In general, I know that last month's BOINC upgrade slowed the system down because some processes / scripts could only run while others were stopped. knreed was talking about compensating for this by adding another physical server to our system on 5 April. And fredski misstated the situation by claiming the validator was off for more than 2 days when the posts show that the validator was overwhelmed and falling behind but still chugging away. There is a lot of work going on in the servers that I don't really know well enough to talk about. Anything I say about the background work may be misleading due to poor understanding on my part. I much prefer to wait until knreed makes a post about a situation. Lawrence Added: knreed just posted at https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,32929 We are going to take a short outage to add more storage. We had planned this change for sometime in the next 2-3 weeks. However, the filesystem is reaching near capacity and we need to do this now. In order to avoid a serious condition, we may stop accepting file uploads in the near future until we can get the additional storage added. We will provide more information about the timing of these events as we have that information. Added: knreed posted at https://secure.worldcommunitygrid.org/forums/...ad,32917_offset,20#372008 First of all - I apologize for our low level of communication. The problem was more serious than I expected and I've been doing some serious digging in to understand the issues. There are two issues we are experiencing. 1) The BOINC 'working directory' where we store the files uploaded from you as well as store the files ready for you to download went up to 97% full. We are adding more storage to address this issue. I'm still looking at why the storage jumped up in space relatively quickly that caught us unaware. 2) The SAN storage for the MySQL database that backs up the grid is hitting a very high load. The short term change is that while we are adding additional storage, we are getting some storage from a different SAN array and allocating a new filesystem from that allocation. We will put the binary logs on that filesystem which will alleviate some the storage contention. Longer term we are going to look at the use of the MySQL 5.5 compression and look at the block sizes for the SAN/RAID/FileSystem/MySQL to make sure that they are properly in alignment. Right now we are running the file_deleters, validators and assimilators only (not the transitioner). This is allowing the system to catch up on the back log for these queues. Once they are caught up, I will turn the transitioner back on and allow things to catch up. We should catch up faster once we get the new filesystem for the MySQL binary logs. Even longer term we are working to attempt to forecast the load we can handle on the current setup (we are always doing this) and planning for when we will need to expand. Added: knreed posted at https://secure.worldcommunitygrid.org/forums/...ad,32917_offset,20#372018 We will still proceed with the filesystem update tomorrow. We have had to manually intervene on the filesystem every few days due to what appears to be a slow memory leak in the software. We are anxious to get that resolved as it is disruptive to keep having to stop all access to the filesystem and unmount and remount it. The addition of storage for MySQL should be happening in the next 30-40 minutes. Once that is done we will stop the db server, make some changes, start it back up and after the initial start up period we should be able to catch up fairly quickly. [Edit 3 times, last edit by Former Member at Apr 4, 2012 8:50:22 PM] |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Whoa
----------------------------------------You go off to work and when you get back ........ ![]() Thanks to the behind the scenes crew that keep us permanently awash with Work Units and great science projects to contribute to It all looks good this end Cheers Dave p.s. All the best with tomorrow's upgrade. We'll be here waiting patiently ready to crunch when you are ![]() ![]() [Edit 1 times, last edit by David Autumns at Apr 4, 2012 5:02:58 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
fredski in the CCW thread posted: at the very least other servers respond to user issues, there has been question of: 1. is hcmd2 wu sizing for server load testing, or just trash to keep us busy ( 5 days old) 2. validator not responding for last 2 days (now into 3rd) 3. steady increase in pv waiting for server and now not able to report or check status (3-4 hours) a simple "we are aware and working on it" would be nice starting to get the feeling that the only one that cares is the crunchers OK the servers seem to be back up and running but it is important that the issues raised by fredski are addressed by either the CAs or WGC staff. 1) The HCMD2 workunit sizing is simply because we are working through the children, grand children, etc of the batches we released. These have always proved to be smaller (sometimes dramtically) than the initial workunits. This is valid research necessary to complete the project. We are working to confirm if there are any more batches that we need to run before we can declare the project finished. 2) Validator/Transitioner issues. The database load has caused a backlog in time to validate (i.e. pending validation queue). Additionally the database load has caused the connection issues that were reported. |
||
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@knread Thank you for the above and the other eplanations recently posted
----------------------------------------here: http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,32572_offset,20 and here: http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,32917_offset,20 We all appreciate how much effort you and your colleagues have been putting in to WGC especially in recent weeks. |
||
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 721 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
fredski in the CCW thread posted: at the very least other servers respond to user issues, there has been question of: 1. is hcmd2 wu sizing for server load testing, or just trash to keep us busy ( 5 days old) 2. validator not responding for last 2 days (now into 3rd) 3. steady increase in pv waiting for server and now not able to report or check status (3-4 hours) a simple "we are aware and working on it" would be nice starting to get the feeling that the only one that cares is the crunchers OK the servers seem to be back up and running but it is important that the issues raised by fredski are addressed by either the CAs or WGC staff. Regarding the question about giving people trash to crunch to keep us occupied, that has only ever happened on one project that I know of (United Devices/Grid.org) and it resulted in a full blown revolt and mass exodus amongst the crunchers. There's no way on Earth WCG would be stupid enough to do that. Since I've been with this project (after leaving that other one as it happens) WCG has consistently been orders of magnitude more transparent and honest with people than "that other place". The folk who complain about lack of information here seriously have no idea how good they have it. Considering the time,effort, enthusiasm and not to mention the enormous amount of resources devoted to this project i can certainly understand the concern devoted to this problem--we don't need to be lectured about some failed program ![]() You weren't. You got an example of a project that DID do dodgy things, suffered for it and a statement of how THIS project is better. The techs, scientists and CAs here go to considerable lengths to keep us informed of what's going on. They have been consistently helpful and provided, to use my own words, "orders of magnitude" more information and feedback than is received in other major (and still running) projects. (Since you really wanted a "lecture", here you go, "Sunshine") What the WCG staff and CA volunteers are NOT required to do is pander to whiny little children who get their knickers in a knot because they weren't personally asked if it's ok by them for the servers to receive breakdown work, or a new project started, or work unit times to vary or people to get a day off. WE VOLUNTEERED to help them. WE OFFERED. The project doesn't owe us anything and people need to start being grateful for the great communication we DO receive rather than chucking a hissy fit if there's a ten second delay in some minor glitch being explained or question answered that they could have sorted themselves faster by using the forum Search function. Want to know why I'm so fed up with petulant children on the forums? I've crunched on a number of projects since 1999 and this one, while not "perfect" is definitely one of if not THE best for consistent communication, transparency and personal effort being put in by the project staff etc. I'm fed up with people whining, complaining and attacking WCG and making unreasonable demands (there was a guy quite recently who threatened to leave if he didn't get his GPU app RIGHT F'ING NOW! ) while the WCG staff just have to put up with all the unfair, childish rubbish and "play nice". Aside from having to comply with forum rules like everybody else, I DON'T have to put up with tantrums and neither do the rest of the forum members. </rant> There, your got your "lecture". Happy now? ![]() Currently being moderated under false pretences |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sometimes you just have to remember that people have to get some sleep
----------------------------------------When you are awake in your part of the World others are getting in their essential ZZzz's in other time zones Please don't lose sight of the fact that if IBM didn't provide us with the resources and great Tech's like Keviin - there would be no World Community Grid and no Science completed, no matter how many PC's, Processors and Graphics Cards you have. I've been crunching since the early SETI and UD days and I can assure you that this is the best place to crunch anywhere - and it is about to get even better. I wouldn't be closing in on 37 years worth of crunching at the World Community Grid without the huge support of the IBM crew, just a small contribution to the 585,015 years of work that the IBM team has enabled It was fixed, I'm sure, almost as soon as Kevin got to work this morning what more can you ask? (It's been a while since I've been on the defence of the great WCG, but I didn't realise that just a "heads up" post would create such a bun fight) This is an altruistic project and we are volunteers. IBM gain great kudos from providing the infrastructure to run such a project and I'm sure that their latest kit and code get's a great real world intense beta trial on the project. But let's remember - no one is getting rich on account of the WCG If it was stopped tomorrow we would have no right to complain. But let's just hope it's onwards and upwards Keeping those CPU's glowing and the fans a blowing ![]() Dave ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dave I chuckled at your last line very poetic.
![]() |
||
|
|
![]() |