Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Computing for Sustainable Water Forum Thread: [Explained, but not resolved until new BOINC SERVER version's applied][Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 17
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code. Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages. -- snip from Ingleside [Jul 24, 2012 12:13:31 PM] post I'm not sure whether it's a server issue. It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203. I've seen it happen with other projects too like SIMAP, SETI and Yoyo. -- snip from Crystal Pellet [Jul 25, 2012 9:27:45 AM] post Seems the bug have showed up in my Win-machine: Project Name: Computing for Sustainable Water Created: 07/16/2012 13:35:23 Name: cfsw_8686_08686823 Minimum Quorum: 2 Replication: 2 ------------------------------------------------------------------------------- ResultName|AppVersionNumber|Status|SentTime|TimeDue[ Return Time]|CPUtime(hours)|Claimed/Granted BOINC-Credit ------------------------------------------------------------------------------- cfsw_8686_08686823_2-- 611 Error 7/28/12 05:19:30 7/28/12 06:14:01 0.00 0.0 / 0.0 cfsw_8686_08686823_1-- 611 Valid 7/18/12 05:15:32 7/20/12 16:34:06 1.06 19.3 / 18.8 cfsw_8686_08686823_0-- 611 Valid 7/18/12 05:15:07 7/28/12 05:48:19 0.70 18.3 / 18.8 The sentTime and timeDue times of copy 0 should make the status of copy 0 qualify as a No-Reply. Instead, it got a Valid status. For copy 2 (my WU), it got an Error where it should have gotten a Server-aborted for a status. The above data is displayed under my account under 'Aborted' but not under 'Server-aborted' nor under 'User Aborted'. BOINC_v7.0.28 was used for my WU (copy 2). The above confirms the existence of the bug. The analysis from Ingleside and Crystal Pellets (as reflected in the snips above) does not seem to be inconsistent with what may have been the conditions that resulted to what the bug have shown in the above instance. ; |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
@andzgrid:
----------------------------------------Task_0 surely had the status of 'No Reply' between 7/28/12 05:15:07 and 7/28/12 05:48:19 (33 minutes). Task_0 returned a bit late and because task_2 wasn't yet returned, task_0 was validated OK and the status turned from 'No Reply' to 'Valid'. Your task was cancelled by the server at 06:14:01. The time your machine contacted the server, task_2 on your machine wasn't started and got the signal to abort the not needed task, because quorum 2 was achieved. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
CP, you are correct.
----------------------------------------[Edit 1 times, last edit by Former Member at Jul 29, 2012 12:17:52 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Crystal Pellet
Reference: Crystal Pellet [Jul 29, 2012 11:27:38 AM] post Exactly, except for the bug part -- a status of 'Error' rather than 'Server-Aborted' for copy 2 of the WU. The bug has exposed a number of things that needs some polishing at the basic level -- the definition level: 1] What exactly should be the cut-off time before a WU is deemed as 'No-Reply'? 240hrs + <define timeAllowance>. I recommend to be strict about this: 240hrs ± 0 seconds. 2] Once a knight always a knight: A 'No-Reply' should remain a 'No-Reply' regardless of whatever result is returned later: Error, Invalid, Valid, Incorrect, <future statusNames> etc. 3] A 'Server-Aborted' should be applied only if the WU contemplated to be server-aborted has not yet started on a client. Once that condition is met and a 'Server-Aborted' status is issued to the WU, that 'Server-Aborted' status should remain 'Server-Aborted' regardless of other copies' current or later status. To the extent that the web/server code does not catch mis-declaration of the status of WUs, that web/server code may be seen as having 'complicity' to the error. ; |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
I'm not sure whether it's a server issue. It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203. I´ve seen it happen with other projects too like SIMAP, SETI and Yoyo. For client v7.0.27 some error-codes was changed to exit-codes instead, checkin 25625, and also some errors like for memory, disk and so on was split into individual exit-codes. But while the client started giving new exit-codes, they forgot to do the corresponding changes to the web-code. This missing web-code change was the already linked checkin 25858 done 10. July 2012. While SETI have upgraded their web-code afterwards, most other projects has not. Example while SETI shows they've upgraded and uses <!-- $Id: result.inc 25858 2012-07-10 20:21:12Z davea $ --> then you looks on your tasks or wu or individual result-links, SIMAP uses the old <!-- $Id: result.inc 24964 2012-01-01 23:54:58Z romw $ --> instead. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
SIMAP has updated their server software with the latest version for result.inc from trunk:
----------------------------------------<!-- $Id: result.inc 25873 2012-07-13 22:19:26Z boincadm $ --> Since then results coming from BOINC clients > 7.0.25 and cancelled by the server are not longer shown and perhaps even treated like 'Aborted by user' (In WCG -> 'Error'), but as expected 'Cancelled by Server'. It's up to WCG-admins to do the same. [Edit 2 times, last edit by Crystal Pellet at Aug 16, 2012 1:46:28 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's up to WCG-admins to do the same. I wonder what the status is these days about this matter?P.S. 1] Also, then and lately, when I aborted a WU, the WU does not show up at the list of (status-category of) User-Aborted. 2] I still prefer the word 'Cancelled' rather than 'Aborted'. Attention: Please be informed that flight#123 was 'aborted'... How many WUs did you abort? Oh, I had a lot of abortions ! ; ; andzgridPost#809 ; |
||
|
|