Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1162 times and has 8 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
validator outage?

Hey, I think the validator might be on the blink again. I have several workunits where ALL COPIES have been returned and marked "pending validation". Some have been sitting for awhile. Hopefully it's just the Monday blahs and it'll get fixed soon. biggrin
[Sep 10, 2007 12:07:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: validator outage?

What project(s)?
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 10, 2007 12:13:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: validator outage?

It was actually the transitioner which was down. It's running again now, and looks like the work is being transitioned properly.
[Sep 10, 2007 1:57:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: validator outage?

We are now fully caught up. Something odd has been going on with the transitioner and the db_purge processes. They both keep deadlocking on the same workunit (they both shouldn't be trying to access the same workunit at the same time so this is odd).

They have actually stopped running several times over the past week, but we have some good monitoring in place now so we find out about it pretty quick (email notification are sent to various places). However, last night we just all happened to be sleeping when it happened.

We hope to get this resolved in the next few days so that it won't have to generate those notifications.
[Sep 10, 2007 4:42:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: validator outage?

Maybe the FA@H scheduler needs a pipe-wrench job or is being worked over, but the message log shows failing for past 1/2 hour, whilst result files are still uploading as does new work from other projects continue to download.

12-10-2007 22:33:13|World Community Grid|Scheduler request failed: HTTP internal server error

12-10-2007 23:04:06|World Community Grid|Requesting 58 seconds of new work, and reporting 1 completed tasks
12-10-2007 23:04:11|World Community Grid|Scheduler request failed: HTTP internal server error
12-10-2007 23:04:11|World Community Grid|Deferring communication for 2 hr 36 min 59 sec
12-10-2007 23:04:11|World Community Grid|Reason: scheduler request failed
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 12, 2007 9:06:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: validator outage?

Maybe the FA@H scheduler needs a pipe-wrench job or is being worked over, but the message log shows failing for past 1/2 hour, whilst result files are still uploading as does new work from other projects continue to download.

12-10-2007 22:33:13|World Community Grid|Scheduler request failed: HTTP internal server error

12-10-2007 23:04:06|World Community Grid|Requesting 58 seconds of new work, and reporting 1 completed tasks
12-10-2007 23:04:11|World Community Grid|Scheduler request failed: HTTP internal server error
12-10-2007 23:04:11|World Community Grid|Deferring communication for 2 hr 36 min 59 sec
12-10-2007 23:04:11|World Community Grid|Reason: scheduler request failed


I am getting the same message on some machines.

flag
----------------------------------------


[Oct 12, 2007 9:19:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: validator outage?

It appears it was restarted around 14:41 my time (UTC -8)

flag
----------------------------------------


[Oct 12, 2007 10:29:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: validator outage?

Would i be able to throw the pipe wrench far enough to hit the Validators or whatever other Server Daemon from here? smug

Got 37 listed in Pending with quorum minimum complete for a wee while now of different project plumage. crying

Added: And just when i posted that, fear must have struck Daemon X, as things start flowing again. smile
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 8, 2007 11:25:50 AM]
[Nov 8, 2007 11:23:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: validator outage?

There is a process that runs every six hours. This process takes care of some back-end accounting for workunits that have completed as well as runs the process which loads new work into BOINC grid. Part of this process stops the backend BOINC processes for between 10-30 minutes. Normally it is much closer to 10 minutes, but the researchers for the Dengue Fever project want to collect some detailed data about how long it takes to run 1 target through the grid. As a result, we are collecting much more data then normal. This is slowing down some of the backend processes. We will only be collecting that data for about one more week and then we will revert back to the normal amount of data. This is why if you go to your results page for BOINC you can see a lot more results then normal.
[Nov 8, 2007 3:28:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread