Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 13
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2760 times and has 12 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Why Is the Server Using Replication 3 on Some WUs?

Project Name: Smash Childhood Cancer
Created: 09/10/2019 17:25:27
Name: SCC1_0003451_OPN-b_D_13491
Minimum Quorum: 2
Replication: 3

If _0 and _1 are determined to be invalid, wouldn't just _2 and _3 be sufficient to make a valid work unit (which is quorum 2, replication 2) assuming both return as valid. Why is the _4 WU being sent? Are we doing more work than is necessary?
----------------------------------------
[Edit 1 times, last edit by Doneske at Sep 11, 2019 9:16:48 PM]
[Sep 11, 2019 7:50:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2155
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
High rate of Invalids

There is suddenly a high rate of Invalids on several of my devices in the past hour!
It's a mix of Valid and Invalid. Also, I see several WUs "Waiting to be sent".

Result Name                    OS                 AVN Status         Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003416_OPN-b_C_41563_4-- Linux 708 Server Aborted 9/11/19 19:19:57 9/11/19 20:34:21 0.00 0.0/0.0
SCC1_0003416_OPN-b_C_41563_2-- Linux 708 Valid 9/11/19 19:19:55 9/11/19 19:27:19 0.12 0.7/0.7
SCC1_0003416_OPN-b_C_41563_3-- Linux CentOS Linux - In Progress 9/11/19 19:19:55 9/21/19 19:19:55 0.00 0.0/0.0
SCC1_0003416_OPN-b_C_41563_1-- Linux Fedora 708 Invalid 9/11/19 09:54:23 9/11/19 19:05:14 0.15 12.4/0.7
SCC1_0003416_OPN-b_C_41563_0-- Linux Fedora 708 Invalid 9/11/19 09:47:37 9/11/19 09:58:56 0.11 1.0/0.7

Result Name                    OS           AVN Status      Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003456_OPN-b_D_44226_2-- Linux - In Progress 9/11/19 19:06:41 9/21/19 19:06:41 0.00 0.0/0.0
SCC1_0003456_OPN-b_D_44226_4-- Linux Ubuntu - In Progress 9/11/19 19:06:40 9/21/19 19:06:40 0.00 0.0/0.0
SCC1_0003456_OPN-b_D_44226_3-- Linux - In Progress 9/11/19 19:06:38 9/21/19 19:06:38 0.00 0.0/0.0
SCC1_0003456_OPN-b_D_44226_1-- Linux Fedora 708 Invalid 9/11/19 11:28:26 9/11/19 19:03:44 0.07 8.1/0.0
SCC1_0003456_OPN-b_D_44226_0-- Linux Ubuntu 708 Invalid 9/10/19 09:33:17 9/11/19 11:28:10 0.15 11.8/0.0

Result Name                   OS           AVN Status      Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003438_OPN-b_C_3264_2-- Linux - In Progress 9/11/19 19:06:41 9/21/19 19:06:41 0.00 0.0/0.0
SCC1_0003438_OPN-b_C_3264_3-- Linux Fedora - Detached 9/11/19 19:06:41 9/11/19 19:25:10 0.00 0.0/0.0
SCC1_0003438_OPN-b_C_3264_4-- Linux Ubuntu - In Progress 9/11/19 19:06:40 9/21/19 19:06:40 0.00 0.0/0.0
SCC1_0003438_OPN-b_C_3264_1-- Linux Fedora 708 Invalid 9/11/19 11:28:26 9/11/19 19:03:44 0.06 7.0/0.0
SCC1_0003438_OPN-b_C_3264_0-- Linux Ubuntu 708 Invalid 9/10/19 09:33:16 9/11/19 11:28:10 0.13 10.2/0.0

Result Name                    OS             AVN Status      Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003453_OPN-b_D_28452_4-- Linux - In Progress 9/11/19 19:06:41 9/21/19 19:06:41 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_28452_2-- Linux Ubuntu - In Progress 9/11/19 19:06:40 9/21/19 19:06:40 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_28452_3-- Linux openSUSE - In Progress 9/11/19 19:06:35 9/21/19 19:06:35 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_28452_1-- Linux Fedora 708 Invalid 9/11/19 11:28:26 9/11/19 19:03:44 0.15 17.9/0.0
SCC1_0003453_OPN-b_D_28452_0-- Linux Ubuntu 708 Invalid 9/10/19 09:33:16 9/11/19 11:28:10 0.34 26.6/0.0

Result Name                    OS           AVN Status      Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003453_OPN-b_D_47232_2-- Linux Fedora - Detached 9/11/19 19:06:41 9/11/19 19:25:10 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_47232_3-- Linux - In Progress 9/11/19 19:06:41 9/21/19 19:06:41 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_47232_4-- Linux Ubuntu - In Progress 9/11/19 19:06:40 9/21/19 19:06:40 0.00 0.0/0.0
SCC1_0003453_OPN-b_D_47232_1-- Linux Fedora 708 Invalid 9/11/19 11:28:26 9/11/19 19:03:44 0.05 6.6/0.0
SCC1_0003453_OPN-b_D_47232_0-- Linux Ubuntu 708 Invalid 9/10/19 11:40:25 9/11/19 11:28:10 0.13 9.6/0.0

Result Name                    OS           AVN Status             Sent Time         Due / Return Time CPUh  Claimed/Gr.
SCC1_0003451_OPN-b_D_40459_2-- Linux Ubuntu - In Progress 9/11/19 19:33:18 9/21/19 19:33:18 0.00 0.0/0.0
SCC1_0003451_OPN-b_D_40459_1-- Linux Ubuntu 708 Pending Ver. 9/11/19 19:33:09 9/11/19 20:26:53 0.03 5.6/0.0
SCC1_0003451_OPN-b_D_40459_0-- Linux Fedora 708 Invalid 9/11/19 14:22:49 9/11/19 18:59:50 0.04 5.4/0.0
SCC1_0003451_OPN-b_D_40459_3-- - Waiting to be sent — — 0.00 0.0/0.0

----------------------------------------
[Edit 2 times, last edit by adriverhoef at Sep 11, 2019 9:01:14 PM]
[Sep 11, 2019 8:32:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
UBT - JohnR
Cruncher
Joined: Apr 30, 2006
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

The validator is not working correctly.
[Sep 11, 2019 8:52:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

Since having 2 invalids the supply has been drying up and the message for SCC which would normally show because I set the profile to limit the supply

222 World Community Grid 9/11/2019 8:52:22 PM No tasks are available for Smash Childhood Cancer

has disappeared... there's simply nothing coming for SCC. Of course because of the 2 invalid, any new work requires a wingman and all the completed work is being changed to pending verification, having lost reliability i.e. piles of sudden repair jobs being sent out, but where are the reliable hosts since all are losing reliability.

edit: got some but the "No tasks are available" remains gone.
----------------------------------------
[Edit 3 times, last edit by Former Member at Sep 11, 2019 9:48:15 PM]
[Sep 11, 2019 9:40:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Stiwi
Advanced Cruncher
Joined: May 19, 2012
Post Count: 75
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

I got also many invalids/ pending verification :(
[Sep 11, 2019 10:05:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JMrkvicka
Cruncher
United States
Joined: Jul 14, 2005
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

Something interesting is going on. I am set to crunch only SCC (HSTB is turned on but tasks for this project are as rare as honest politicians). I have averaged ~100-120 completed tasks per day. My average for Monday and Tuesday this week is ~2500 and it appears I am on track for another big day today. Are the huge number is 15 minutes tasks expected?
----------------------------------------

[Sep 11, 2019 10:39:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JMrkvicka
Cruncher
United States
Joined: Jul 14, 2005
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

And now it appears I shot the golden goose!

09/11/19 6:38:32 PM | World Community Grid | Reporting 1 completed tasks
09/11/19 6:38:32 PM | World Community Grid | Requesting new tasks for CPU
09/11/19 6:38:33 PM | World Community Grid | Scheduler request completed: got 0 new tasks
09/11/19 6:38:33 PM | World Community Grid | No tasks sent
09/11/19 6:38:33 PM | World Community Grid | No tasks are available for Help Stop TB
09/11/19 6:38:33 PM | World Community Grid | No tasks are available for the applications you have selected.
09/11/19 6:38:33 PM | World Community Grid | Tasks are committed to other platforms
----------------------------------------

[Sep 11, 2019 10:46:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

The OPN-b_C_, _D_, and _E_ sets are all very short tasks. That is probably why JMrkvicka is seeing surprisingly high results all of the sudden. That's not a problem necessarily.

I can also report that I'm having a small, but significant, number of my returns labeled invalid. Around 11:22:15, from 13:08:50 to 13:13:43, from 16:07:25 to 16:32:35, and 18:00:36 to the present (18:54:32). There are many results that are valid, but I have gotten about 210 invalid results, so I'm slightly concerned as well, since I did not see this issue before. All my machines are definitely validated as returning valid results, and I haven't changed anything today. So, UBT - Johnr might be onto something with the validator.
----------------------------------------

[Sep 11, 2019 11:05:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7662
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

Between 2 days ago and yesterday I had about 18000 units done and I see about 45 invalid units for those 2 days and today. I am on track to do about 8000 units today. That comes to about .173%. Granted that is higher than my normal 0% invalid, but I don't think it is necessarily bad considering the number of units processed. Given the number of work units in play there are bound to be a few errors. The invalids are coming from the machines which have processed the most units and with one exception are all from machines running Linux. Several machines show no invalids.
I seem to recall on the single quorum projects, once the number of units which are either errors or invalids reach 5, the unit is pulled from further processing and the techs ( or scientists) decide if the unit is just bad or can be fixed and resubmitted.
Because of the redundancies built into the BOINC framework, at this time it does not appear to be of great concern.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 11, 2019 11:57:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High rate of Invalids

I have 10 machines running that are using various Intel processors and 2 machines running AMD processors. I currently have about 250 WUs flagged invalid and 4200 flagged as Pending Verification. Sunday, I uploaded almost 60,000 WUs that resulted in 0 invalids and less than 15 pending verification. All of the WUs pending verification and/or marked invalid are from the Intel processors. That is not saying Intel is the problem as one of the AMD processors isn't running SCC1 as the project cannot keep the machine busy so it is exclusively on MIP1 and the other machine is an 8 core AMD processor that doesn't run that much. The interesting thing to me is I don't have a single anomaly prior to 18:52 UTC today on any machine. I don't think it is WU related as each machine was working on different WUs at the time the invalids started occurring. Did something happen on the server side at about 18:52 to cause things to start becoming invalid? I don't think it is a member problem as several members are reporting the same thing.
----------------------------------------
[Edit 1 times, last edit by Doneske at Sep 12, 2019 1:14:43 AM]
[Sep 12, 2019 1:13:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread