Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Computing for Sustainable Water Forum Thread: [Explained, but not resolved until new BOINC SERVER version's applied][Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 17
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not sure if reported previously, but here it is:
----------------------------------------Situation: 1. One of the two regular WU copies becomes "No Reply" 2. A repair copy got sent to my device (Win7 64bit SP1) 3. The "No Reply" regular copy got sent back to the server, and got validated as "Valid" with the other got-sent-back-in-time regular copy. 5. Repair copy got treated as "Error" and claimed to have not crunched at all (0.00 hour) Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error". Workunit Status Project Name: Computing for Sustainable Water Created: 07/10/2012 13:49:46 Name: cfsw_8020_08020188 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit cfsw_ 8020_ 08020188_ 2-- 611 Error 12年7月22日 11:48:28 12年7月22日 14:20:49 0.00 0.0 / 0.0 <--mine cfsw_ 8020_ 08020188_ 1-- 611 Valid 12年7月12日 11:48:35 12年7月12日 15:45:13 1.15 22.3 / 21.1 cfsw_ 8020_ 08020188_ 0-- 611 Valid 12年7月12日 11:48:16 12年7月22日 14:06:42 1.15 19.8 / 21.1 <--originally "No Reply" copy Result Log Result Name: cfsw_ 8020_ 08020188_ 2-- <core_client_version>7.0.31</core_client_version> Will add the Event Log entries when I have access to that device later today. Edit: Nothing found in the event log regarding to the WU (refer to the above edit). This is the second time I've encountered this issue (one with BOINC 7.0.28 and the other 7.0.31 (both are 64bit)). Edit: Note: Never encountered this problem when dealing with regular non-repair copies. Edit 2: Added additional stuff after investigating the event log. Edit 3: Changed the tag in the title to reflect we have to wait for the server-side codes get changed in order to solve this problem. [Edit 4 times, last edit by Former Member at Aug 1, 2012 5:14:14 AM] |
||
|
mikey
Veteran Cruncher Joined: May 10, 2009 Post Count: 821 Status: Offline Project Badges: |
Not sure if reported previously, but here it is: Situation: 1. One of the two regular WU copies becomes "No Reply" 2. A repair copy got sent to my device (Win7 64bit SP1) 3. The "No Reply" regular copy got sent back to the server, and got validated as "Valid" with the other got-sent-back-in-time regular copy. 5. Repair copy got treated as "Error" and claimed to have not crunched at all (0.00 hour) Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error". This is NORMAL Boinc behavior, the reason is because the 'no reply' copy of the workunit does NOT get marked as invalid even though a replacement is sent out. This happens only once in a great while as most 'no reply' workunits do NOT get returned prior to you returning the unit, if that had happened YOU would have gotten credit and the original 'no reply' pc would have gotten no credit. One way to help ensure it doesn't happen again is to reduce the size of your cache, meaning you will then return the units faster and hopefully be prior to the 'no reply' unit. Oh the reason you got a copy is because the 'no reply' copy expired and was not returned prior to its expiration. We have asked Dr. Anderson for this for many years but there are higher priorities, and probably always will be. Dr. David Anderson, of Berkeley, wrote and still maintains the Boinc program. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi mikey159b,
I think that Moonian is concerned with the behavior of his own work unit copy in this case, rather than the late unit that got validated. Lawrence |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error". Result Name: cfsw_ 8020_ 08020188_ 2-- <core_client_version>7.0.31</core_client_version> Hello Moonian, This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code. In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error". Result Name: cfsw_ 8020_ 08020188_ 2-- <core_client_version>7.0.31</core_client_version> Hello Moonian, This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code. In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'. Ah, this should explain why this "Error" status occurs. (I was using the older v.7.0.25 until I've got this newer machine recently) Thanks for the information [Edit 1 times, last edit by Former Member at Jul 23, 2012 6:09:07 PM] |
||
|
Bugg
Senior Cruncher USA Joined: Nov 19, 2006 Post Count: 271 Status: Offline Project Badges: |
I would be willing to bet that if you used the 6.10.58 (recommended by WCG, after all) or at the very least 6.12.34, things like this possibly wouldn't even happen. Just a guess, as I only use 6.12.34 as that's what I found before I started back with WCG and so have stuck with it. :)
----------------------------------------i5-12600K (3.7GHz), 32GB DDR5, Win11 64bit Home |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code. In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'. The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code. Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I would be willing to bet that if you used the 6.10.58 (recommended by WCG, after all) or at the very least 6.12.34, things like this possibly wouldn't even happen. Just a guess, as I only use 6.12.34 as that's what I found before I started back with WCG and so have stuck with it. :) Well, I can live with it without any problem, at least it doesn't consume any crunch time at all. Anyways, this is one of the costs of using "cutting-edge" versions, which is not unexpected |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code. In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'. The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code. Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages. Oh, so this is a server-side stuff? Anyways, I guess we should let them resolve that upload/download issue first before dealing with somewhat-trivial stuffs like this one |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Oh, so this is a server-side stuff? Anyways, I guess we should let them resolve that upload/download issue first before dealing with somewhat-trivial stuffs like this one The latter is surely more important! I'm not sure whether it's a server issue. It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203. I´ve seen it happen with other projects too like SIMAP, SETI and Yoyo. |
||
|
|