Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4237 times and has 8 replies Next Thread
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Suspicious errors

I brought this up in the server upgrade thread but it quickly got buried and ignored.

Several results are going to error status when passing through the first validation check instead of being marked inconclusive.


CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 3-- - In Progress 3/19/12 17:09:40 3/25/12 17:09:40 0.00 0.0 / 0.0
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 2-- 640 Error 3/18/12 21:32:51 3/19/12 15:04:25 10.71 53.0 / 0.0
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 0-- 640 Error 3/15/12 12:47:54 3/18/12 21:28:15 3.60 62.0 / 0.0
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 1-- 640 Inconclusive 3/15/12 12:47:31 3/15/12 21:16:12 4.04 88.5 / 0.0
Is a very good example of the problem. The two results with error status show a normal completion log.

If the techs could look into this, I would appreciate it.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 20, 2012 4:17:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

What version of BOINC are you running?? 3rd party version or the latest one for WCG? 6.10.58 is the latest for WCG.. if you are using a older version or one Tailored for another project you work with.. errors can happen with some projects...
[Mar 20, 2012 11:19:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

Have seen that both originals were PV and then when the validator gets to it, 1 going error and the other going inconclusive, where it's in my opinion preferred to leave the status Pending Validation on the provisionally good one. It's seen irrespective of version [quorum participant clients can be checked]. The agents do either flat break the science app/task, that does the work, or just do the task traffic managing... 6.10.58 eliminates any doubt there may be.

4th party btw, as the Berkeley recommended are mostly fine too [sometimes they're too quick declaring a new point release for general production ready]. I'm surprised though to read that YUM of various Linux distros actually has the version 7 alpha clients in it. That's flat foolish... use at own risk. I'm testing and have but for one documented case in the first hour of the new server software going live, not had a single bad result, and it cannot even be proven it was the client causing this.

--//--
[Mar 20, 2012 11:46:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

What version of BOINC are you running?? 3rd party version or the latest one for WCG? 6.10.58 is the latest for WCG.. if you are using a older version or one Tailored for another project you work with.. errors can happen with some projects...

Clearly people misunderstood this post. These work units are not erroring. They are not all mine, many times it is the wingman assigned error status.

The problem is with the server code and needs to be looked at from that side. If they're invalid, they need to be marked as such, but I suspect they aren't even that.

I remember one unit in particular where the only two results returned, both with normal finishes, were marked error.

If a substantial percentage of the work (and from what I can tell this is running about 2%) is being reported in error erroneously, then there is a tremendous waste of CPU power across the grid.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 20, 2012 3:56:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

How many "people" were there responding? Yes, the thought had occurred before in the other thread that these results were erroneously rated to have failed. That would be a science specific validator rule, not necessarily widespread. Let's see what techs have to say first.

--//--
[Mar 20, 2012 4:33:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

And it may be something specific to HCMD2. I certainly haven't seen this behavior prior to the server code upgrade. In the other thread my reply followed a totally different concern and was probably just skipped over.

As long as this has reached the attention of the techs, I'll be happy. As a CA you have the power to bring it to their attention. When one is losing ~72 hours per day to a bug it's difficult to consider it as trivial.

Also, I would never consider using a 7.x client. I'm dissatisfied enough with the 6.12.x series.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 20, 2012 4:53:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

Of course with "just skipped over" and "ignored" comments, you may consider opening a support mail. Works for me when there is good reason to use that pathway... 72 hours/day is not chicken little.

In the near 1000 results I've completed since server 7 up, I've seen 2 inconclusive for a quorum 2, most completed for SN2S, so I'm optimistic.

--//--

edit: And seeing a tech reading this thread, consider it reported up the ladder.
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 20, 2012 5:17:49 PM]
[Mar 20, 2012 5:16:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

Including a couple more results for comparison. These are on both Ubuntu 11.10 and Windows 7

https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=418201304


CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 3-- 640 Valid 3/19/12 17:09:40 3/20/12 11:25:40 5.49 78.0 / 83.3
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 2-- 640 Error 3/18/12 21:32:51 3/19/12 15:04:25 10.71 53.0 / 0.0
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 0-- 640 Error 3/15/12 12:47:54 3/18/12 21:28:15 3.60 62.0 / 0.0
CMD2_ 2243-1C1Y_ B.clustersOccur-1ZG2_ A.clustersOccur_ 0_ 1-- 640 Valid 3/15/12 12:47:31 3/15/12 21:16:12 4.04 88.5 / 83.3

https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=419359568

CMD2_ 2175-1NZW_ A.clustersOccur-3DPT_ B.clustersOccur_ 45_ 65905_ 66049_ 2-- 640 Pending Validation 3/16/12 23:12:08 3/17/12 18:08:00 4.11 97.3 / 0.0
CMD2_ 2175-1NZW_ A.clustersOccur-3DPT_ B.clustersOccur_ 45_ 65905_ 66049_ 1-- 640 Error 3/16/12 18:07:30 3/16/12 22:31:40 0.79 28.4 / 0.0
CMD2_ 2175-1NZW_ A.clustersOccur-3DPT_ B.clustersOccur_ 45_ 65905_ 66049_ 0-- - In Progress 3/16/12 18:07:26 3/28/12 18:07:26 0.00 0.0 / 0.0

https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=418969170


CMD2_ 2249-1BBN_ A.clustersOccur-1RW6_ A.clustersOccur_ 3_ 3-- 640 Valid 3/19/12 05:25:33 3/19/12 22:07:14 5.24 67.4 / 204.1
CMD2_ 2249-1BBN_ A.clustersOccur-1RW6_ A.clustersOccur_ 3_ 2-- 640 Valid 3/19/12 05:25:29 3/20/12 00:45:51 7.63 340.8 / 204.1
CMD2_ 2249-1BBN_ A.clustersOccur-1RW6_ A.clustersOccur_ 3_ 1-- 640 Error 3/15/12 22:52:06 3/17/12 15:47:22 4.82 308.6 / 0.0
CMD2_ 2249-1BBN_ A.clustersOccur-1RW6_ A.clustersOccur_ 3_ 0-- 640 Error 3/15/12 22:51:45 3/19/12 03:48:54 6.18 142.1 / 0.0
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 20, 2012 5:25:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Suspicious errors

Haven't seen any wrongfully go to error status in the last 24 hours and I only lost 39.5 hours the 24 hours prior to that. Looks like it has either been fixed or moved on by itself.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 25, 2012 4:07:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread