Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3260 times and has 7 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Server Issue - Delays in validation [Resolved]

One of the backend servers that we utilize went offline about 10 hours ago. We rebooted the server and it is currently performing disk checks. Due to the size of the filesystems involved this will take 6-8 hours from this point.

Due to this server going offline, transitioning and validation were halted. We are in the process of bringing these back online. As these catch up, members will see the back log of returned but not yet validated results catch up.

The server impacted is the server we use to divide the raw input data from the researchers into workunits as well as compile the results from you back into batches to return to the researchers. As such we expect that the impact to the end users will end as soon as the validators catch up.

We will keep you informed in this thread.
----------------------------------------
[Edit 1 times, last edit by knreed at May 20, 2011 1:19:45 AM]
[May 19, 2011 2:03:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

Here is the current state of the backlog:

Workunits waiting for 'transition': 357,090

Each of these represents one whose 'state' needs to be evaluated. This could lead to an other result been sent out (if for example, a result for the workunit was returned as an error) or it could move to 'need_validate' if the requirements for validation have been met.
[May 19, 2011 2:09:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

The transition backlog is down to 237450.

There is not really a validation backlog in that as the workunits are being transitioned, the validators are keeping up with workunits that are found to require validation so we are just waiting for those 237,450 to be caught up.
[May 19, 2011 3:02:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

We are temporarily out of workunits for the Human Proteome Folding - Phase 2. As soon as the backend server is back online we will be able to load additional workunits.
[May 19, 2011 3:54:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

The server is back up but there is one file system that needs to be disk checked (fsck) in 'repair' mode (the first check was looking for errors). That will run for awhile longer.
[May 19, 2011 5:23:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

We have run out of work for Computing for Clean Water, Help Conquer Cancer, Human Proteome Folding Phase 2 and Help Cure Muscular Dystrophy.

Fortunately the disk check finished in the last 10 minutes and we are about to start loading new work in!
[May 20, 2011 12:35:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

Human Proteome Folding Phase 2 has work again
[May 20, 2011 12:44:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Issue - Delays in validation

Work is available again for all of our projects.
[May 20, 2011 1:19:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread