Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 32
Posts: 32   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3408 times and has 31 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Multiple Errors with FAAH units over many machines...

Over the last 24 hours we seem to be getting alot of results from many machines coming back as errors over the whole quorum - is there a problem with some of the latest FAAH WU's at the moment?

Here are just 3 examples - there are many many more:





This of course now means that some of these machines are being restricted in the amount of work they are receiving due to the server 'fail-safe' kicking in.

Please can the Techs take a look at this as we are losing a huge amount of credit for work completed, and also re-set our machines WU allowance so we can get the work we need.

Cheers.
[May 14, 2007 8:45:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rebirther
Cruncher
Germany
Joined: Nov 19, 2005
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

I have found more with Genome Comparison. The last WUs I got it show many errors in list. Validator problem?
[May 14, 2007 8:51:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

If it helps - I'm getting the same FAAH errors here too - seems to be occuring from overnight 05.14.07 when the 3rd result in the quorum arrives and then it tries for validation.

I've lost quite a few now myself as a result sad
[May 14, 2007 9:06:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

If it helps - I'm getting the same FAAH errors here too - seems to be occuring from overnight 05.14.07 when the 3rd result in the quorum arrives and then it tries for validation.

I've lost quite a few now myself as a result sad

Ady: I just checked the teams (XS_Team_Admin) account and we have over 23 PAGES of them going back to May9.
I have some myself on the clovertown machine.
All it does is FAAH units and never errors out.
Movieman
[May 14, 2007 9:25:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

Jeeze, thats alot of work Dave sad

I hope we all get credited for these once its sorted out confused
[May 14, 2007 9:30:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
olympic
Senior Cruncher
Joined: Jun 12, 2005
Post Count: 156
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

Same here, but only from the last 24 hours or so. The WU's seem to finish normally so hopefully this is just a problem with validation and all will be well once they run it again. I'm seeing errors across all projects including FAAH, GC and HPF2 so I'm confident the problem is with validation.
----------------------------------------

[May 14, 2007 9:49:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

Vaguely remember a bug about 3-4 weeks ago, where in fact the work units were not in 'error' and a validation rerun sorted it out, that is, if the Homogeneous Redundancy distribution logic did not break.

It's night at the office regrettably, so we have to wait it out till the technicians get in.

Added: Just checked that all work for a HPF2 job has been turning to error i.e. when the quorum 15 was reached. That project does not use the HR rule.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 4 times, last edit by Sekerob at May 14, 2007 10:00:51 AM]
[May 14, 2007 9:55:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

If you got loads of work in Task Buffer, propose to suspend network and crunch on. This way, you would not establish a bad client record..... I'm doing so as I got 2 days worth.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 14, 2007 10:03:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

Yeah just hit one of my HPF2 - have suspended the network...

The group credit for this WU had already been given, so it was when it tried to validate my result against all the others already validated....
----------------------------------------
[Edit 1 times, last edit by Former Member at May 14, 2007 10:31:50 AM]
[May 14, 2007 10:29:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors with FAAH units over many machines...

We are investigating now - I cannot logon to one of our servers.

In the meantime, I have disabled the schedulers so that no more work is returned until we have resolved the problem.
[May 14, 2007 10:32:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 32   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread