Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14262 times and has 16 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code. Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages.
-- snip from Ingleside [Jul 24, 2012 12:13:31 PM] post

I'm not sure whether it's a server issue. It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203. I've seen it happen with other projects too like SIMAP, SETI and Yoyo.
-- snip from Crystal Pellet [Jul 25, 2012 9:27:45 AM] post

Seems the bug have showed up in my Win-machine:

Project Name: Computing for Sustainable Water
Created: 07/16/2012 13:35:23
Name: cfsw_8686_08686823
Minimum Quorum: 2
Replication: 2
-------------------------------------------------------------------------------
ResultName|AppVersionNumber|Status|SentTime|TimeDue[
Return Time]|CPUtime(hours)|Claimed/Granted BOINC-Credit
-------------------------------------------------------------------------------
cfsw_8686_08686823_2-- 611 Error 7/28/12 05:19:30 7/28/12 06:14:01 0.00 0.0 / 0.0
cfsw_8686_08686823_1-- 611 Valid 7/18/12 05:15:32 7/20/12 16:34:06 1.06 19.3 / 18.8
cfsw_8686_08686823_0-- 611 Valid 7/18/12 05:15:07 7/28/12 05:48:19 0.70 18.3 / 18.8

The sentTime and timeDue times of copy 0 should make the status of copy 0 qualify as a No-Reply. Instead, it got a Valid status. For copy 2 (my WU), it got an Error where it should have gotten a Server-aborted for a status. The above data is displayed under my account under 'Aborted' but not under 'Server-aborted' nor under 'User Aborted'. BOINC_v7.0.28 was used for my WU (copy 2).

The above confirms the existence of the bug. The analysis from Ingleside and Crystal Pellets (as reflected in the snips above) does not seem to be inconsistent with what may have been the conditions that resulted to what the bug have shown in the above instance.
;
[Jul 29, 2012 10:55:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1313
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

@andzgrid:
Task_0 surely had the status of 'No Reply' between 7/28/12 05:15:07 and 7/28/12 05:48:19 (33 minutes).

Task_0 returned a bit late and because task_2 wasn't yet returned, task_0 was validated OK and the status turned from 'No Reply' to 'Valid'.
Your task was cancelled by the server at 06:14:01. The time your machine contacted the server, task_2 on your machine wasn't started and got the signal to abort the not needed task, because quorum 2 was achieved.
----------------------------------------

[Jul 29, 2012 11:27:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

CP, you are correct.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 29, 2012 12:17:52 PM]
[Jul 29, 2012 12:16:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Hello Crystal Pellet
Reference: Crystal Pellet [Jul 29, 2012 11:27:38 AM] post

Exactly, except for the bug part -- a status of 'Error' rather than 'Server-Aborted' for copy 2 of the WU.

The bug has exposed a number of things that needs some polishing at the basic level -- the definition level:
1] What exactly should be the cut-off time before a WU is deemed as 'No-Reply'? 240hrs + <define timeAllowance>. I recommend to be strict about this: 240hrs ± 0 seconds.
2] Once a knight always a knight: A 'No-Reply' should remain a 'No-Reply' regardless of whatever result is returned later: Error, Invalid, Valid, Incorrect, <future statusNames> etc.
3] A 'Server-Aborted' should be applied only if the WU contemplated to be server-aborted has not yet started on a client. Once that condition is met and a 'Server-Aborted' status is issued to the WU, that 'Server-Aborted' status should remain 'Server-Aborted' regardless of other copies' current or later status.

To the extent that the web/server code does not catch mis-declaration of the status of WUs, that web/server code may be seen as having 'complicity' to the error.
;
[Jul 29, 2012 12:35:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

I'm not sure whether it's a server issue.
It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203.
I´ve seen it happen with other projects too like SIMAP, SETI and Yoyo.

For client v7.0.27 some error-codes was changed to exit-codes instead, checkin 25625, and also some errors like for memory, disk and so on was split into individual exit-codes.

But while the client started giving new exit-codes, they forgot to do the corresponding changes to the web-code. This missing web-code change was the already linked checkin 25858 done 10. July 2012.

While SETI have upgraded their web-code afterwards, most other projects has not.

Example while SETI shows they've upgraded and uses <!-- $Id: result.inc 25858 2012-07-10 20:21:12Z davea $ --> then you looks on your tasks or wu or individual result-links, SIMAP uses the old <!-- $Id: result.inc 24964 2012-01-01 23:54:58Z romw $ --> instead.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Jul 29, 2012 1:44:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1313
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

SIMAP has updated their server software with the latest version for result.inc from trunk:

<!-- $Id: result.inc 25873 2012-07-13 22:19:26Z boincadm $ -->

Since then results coming from BOINC clients > 7.0.25 and cancelled by the server are not longer shown and
perhaps even treated like 'Aborted by user' (In WCG -> 'Error'), but as expected 'Cancelled by Server'.

It's up to WCG-admins to do the same.
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Crystal Pellet at Aug 16, 2012 1:46:28 PM]
[Aug 16, 2012 1:41:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

It's up to WCG-admins to do the same.
I wonder what the status is these days about this matter?

P.S.
1] Also, then and lately, when I aborted a WU, the WU does not show up at the list of (status-category of) User-Aborted.
2] I still prefer the word 'Cancelled' rather than 'Aborted'. Attention: Please be informed that flight#123 was 'aborted'... How many WUs did you abort? Oh, I had a lot of abortions ! laughing
;
; andzgridPost#809
;
[Jan 18, 2013 6:40:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread