Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 352
Posts: 352   Pages: 36   [ Previous Page | 27 28 29 30 31 32 33 34 35 36 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2176035 times and has 351 replies Next Thread
RCC_Survivor
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 1337
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

IMHO
it would make sense that there was an increase in load after CFSW was accelerated 4-6 times.
The time remaining on the project dropped significantly without the crunchers being forewarned and the WUs are going fast.
http://i137.photobucket.com/albums/q210/Sekerob/WCGYearsPi1Project.png
Everyone without a badge in CFSW is probably running it exclusively creating a significant increase in the load.
Lowering the priority of CFSW may help the problem subside.
----------------------------------------
Be kinder than necessary, for everyone you meet is fighting some battle.

Please join the team The survivors hugs
Bilateral Renal, Melanoma, and Squamous Cell cancers
[Aug 15, 2012 8:57:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

I'm not sure whether this question belongs in this thread, but I recall Sekerob saying something about upload problems being counted against a machine, so I believe it does.

Starting yesterday evening, one of my machines is always being assigned a wingman for CFSW (for over 20 consecutive WUs). This is after correctly processing hundreds of WUs on that science, and there are no Errors or Invalids showing in Results Status. Is that likely to be an effect of the server problems? My other machines are still happily doing single validation, except when they get repair jobs.

Did you have any "Server Aborts"? What has been happening to me is that the Server Aborts" are being treated as errors in that once a device gets one the following WUs are assigned wingmen, and the single quorum in-progress WUs are sent to Pending Verification and assigned a wingman. After a while it goes back to single quorum until another server abort and then back in the quorum 2 loop.
Odd thing is that the device continues to pull repair WUs even though it's assigned quorum 2 requirement.
----------------------------------------
Bill P

----------------------------------------
[Edit 1 times, last edit by wplachy at Aug 15, 2012 9:34:57 PM]
[Aug 15, 2012 9:25:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Thanks for the feedback, Bill. I vaguely recall noticing a server abort recently (I don't remember whether it wason this machine), but there are none hanging around in Results Status at this point. The machine has returned to single quorum at this point.
----------------------------------------

[Aug 15, 2012 11:52:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PecosRiverM
Veteran Cruncher
The Great State of Texas
Joined: Apr 27, 2007
Post Count: 1054
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Starting yesterday evening, one of my machines is always being assigned a wingman for CFSW (for over 20 consecutive WUs). This is after correctly processing hundreds of WUs on that science, and there are no Errors or Invalids showing in Results Status. Is that likely to be an effect of the server problems? My other machines are still happily doing single validation, except when they get repair jobs.


Is it possiable your getting the second WU as the validator for the first WU?

coffee
cowboy
----------------------------------------

[Aug 16, 2012 2:04:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

Maybe it would be useful if you (we) post OS's you (we) are shrubbing (aka crunching) on :-)

Cheers and NI!

Windows XP (32 bit):

8/15/2012 4:10:36 PM World Community Grid Reporting 1 completed tasks, requesting new tasks
8/15/2012 4:10:41 PM World Community Grid Scheduler request completed: got 0 new tasks
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks sent
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks are available for Computing for Sustainable Water
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks are available for Computing for Clean Water
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 (Type A)
8/15/2012 4:10:41 PM World Community Grid Message from server: No tasks are available for the applications you have selected.
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 16, 2012 3:42:42 AM]
[Aug 16, 2012 3:41:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

One can almost sync clock on this... around 13:30 CET [DST], the transients kick in [when something in the North Americas slowly awakens]:

92626 World Community Grid 16-8-2012 13:27:59 Starting task cfsw_12336_12336802_0 using cfsw version 612 in slot 5
92627 World Community Grid 16-8-2012 13:28:02 Started upload of cfsw_12335_12335753_0_0
92628 World Community Grid 16-8-2012 13:28:04 [error] Error reported by file upload server: Maintenance underway: file uploads are temporarily disabled.
92629 World Community Grid 16-8-2012 13:28:04 Temporarily failed upload of cfsw_12335_12335753_0_0: transient upload error
92630 World Community Grid 16-8-2012 13:28:04 Backing off 3 min 15 sec on upload of cfsw_12335_12335753_0_0
92631 World Community Grid 16-8-2012 13:29:34 [checkpoint] result cfsw_12335_12335747_0 checkpointed
92632 World Community Grid 16-8-2012 13:29:42 [checkpoint] result cfsw_12335_12335742_0 checkpointed
92633 World Community Grid 16-8-2012 13:29:44 Computation for task cfsw_12335_12335742_0 finished
92634 World Community Grid 16-8-2012 13:29:44 Starting task cfsw_12336_12336445_0 using cfsw version 612 in slot 3
92635 World Community Grid 16-8-2012 13:29:46 Started upload of cfsw_12335_12335742_0_0
92636 World Community Grid 16-8-2012 13:30:10 Temporarily failed upload of cfsw_12335_12335742_0_0: connect() failed
92637 World Community Grid 16-8-2012 13:30:10 Backing off 3 min 9 sec on upload of cfsw_12335_12335742_0_0
92638 16-8-2012 13:30:15 Project communication failed: attempting access to reference site
92639 16-8-2012 13:30:18 Internet access OK - project servers may be temporarily down.
92640 World Community Grid 16-8-2012 13:30:19 [checkpoint] result cfsw_12335_12335681_0 checkpointed
92641 World Community Grid 16-8-2012 13:30:31 [checkpoint] result cfsw_12335_12335340_0 checkpointed
92642 World Community Grid 16-8-2012 13:31:19 Started upload of cfsw_12335_12335753_0_0
92643 World Community Grid 16-8-2012 13:31:25 [error] Error reported by file upload server: Maintenance underway: file uploads are temporarily disabled.
92644 World Community Grid 16-8-2012 13:31:25 Temporarily failed upload of cfsw_12335_12335753_0_0: transient upload error
92645 World Community Grid 16-8-2012 13:31:25 Backing off 4 min 36 sec on upload of cfsw_12335_12335753_0_0

P.S. Surprising it may be to many, but most of South America is east of Florida. We have depending on DST, only 3 hours difference between Rome and Buenos Aires. Workdays in the NAs start 5 to 6 days hours after Western Europe.
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 16, 2012 11:43:15 AM]
[Aug 16, 2012 11:42:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

References:
kateiacy [Aug 15, 2012 6:09:58 PM] post
wplachy [Aug 15, 2012 9:25:03 PM] post

... Is that likely to be an effect of the server problems? My other machines are still happily doing single validation, except when they get repair jobs.
•snip from kateiacy

Did you have any "Server Aborts"? What has been happening to me is that the Server Aborts" are being treated as errors...
•snip from wplachy

One thing is that reliable/trusted machines have a high chance of getting a lot of repair jobs; those machines are, after all, reliable/trusted machines.

Another thing is that if the server-issue ever causes a foul up in the validation process, I can imagine that it would occur only in cases where deadlines were missed due to delays injected from sync-difficulties surrounding the server-issue; but it doesn't seem fair, in any case, that the machines involved take a hit in their reliability/trust rating as a result of the said foul up. Unless the No-Reply_bug is fixed*, this bug lies in wait to greet the doneWU incorrectly deemed (due to the said injected delay) as having a status of No-Reply. Validation-processing at this point can only be a mess.

Notes:
*See also Ingleside [Jul 29, 2012 1:44:21 PM] post
;
[Aug 16, 2012 12:15:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Thanks for the additional explanation and the link to Ingleside's message. I think I'm seeing a combination of all those factors. I am getting some repair jobs, because all my machines are "reliable."

Mostly these are full-10-day-deadline tasks, though. Many times it's that I'm getting the second WU to validate somebody else's. I imagine lots of people are putting machines onto CFSW that haven't been running it before, in order to try to get badges before the science ends. I just got another slew of these overnight.
----------------------------------------

[Aug 16, 2012 1:00:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

Notably if you have a 64 bit machine, and the ZR task assignment is _1, of CFSW, and of version 6.11, it's a 32 bit version. Then you'd see 10 days. Long as it's 10 days exactly, it *not* a repair task [in case there were any doubt]
[Aug 16, 2012 1:06:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

Had to work the update button quite a bit this AM to get all my overnight update's in...
[Aug 17, 2012 11:39:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 352   Pages: 36   [ Previous Page | 27 28 29 30 31 32 33 34 35 36 | Next Page ]
[ Jump to Last Post ]
Post new Thread