Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3593
Posts: 3593   Pages: 360   [ Previous Page | 25 26 27 28 29 30 31 32 33 34 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5821699 times and has 3592 replies Next Thread
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

However, one of the problems relates to the definition of 'reliable'. Because of the length of time needed to complete each unit and the need to hold a cache, very few machines can classify as 'reliable' even if they never have an error.

I would say they need to tighten the criteria. If you keep the default buffer of 0.1 + 0.5 days, you would probably qualify with no problem. "Reliable" should be special, not ordinary.

The intent should be to get good results back early of course.
----------------------------------------
[Edit 1 times, last edit by Jim1348 at Nov 26, 2019 4:03:49 PM]
[Nov 26, 2019 4:02:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Jim

If I was to implement your settings which would mean a minimum cache of 2.4 hours and a maximum cache of 14.4 hours, I would never get any WUs as they are taking 27 hours without counting any queuing time.

Owing to the paucity of availability, the settings need to be at least 1.5 days + 1.5 days in order to get 1 and have another waiting. That would mean a turnaround of 3 days which is less than half the allowed time. I think a better definition of 'reliable' would be half the allowed time, which could be implemented as an across the board definition.

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Nov 26, 2019 6:26:51 PM]
[Nov 26, 2019 5:48:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Jim

If I was to implement your settings which would mean a minimum cache of 2.4 hours and a maximum cache of 14.4 hours, I would never get any WUs as they are taking 27 hours without counting any queuing time.

Owing to the paucity of availability, the settings need to be at least 1.5 days + 1.5 days in order to get 1 and have another waiting. That would mean a turnaround of 3 days which is less than half the allowed time. I think a better definition of 'reliable' would be half the allowed time, which could be implemented as an across the board definition.

I have no problem with half the allowed time, if that works for you. But I have not had problems with the 0.1 + 0.5 day cache settings on any of my machines, except for work unit availability. I run Ryzens (1700, 2600, 2700) mainly, but also Coffee Lakes, all under Ubuntu 18.04.3. I think they should tailor "reliable" for the better machines, which are becoming more common anyway.

However, whatever works for them is OK with me.

EDIT: The bottleneck seems to be identifying "reliable" machines. There may be no solution except sending out enough to find them. The more the "unreliable" machines suck them up, the longer that will take.
----------------------------------------
[Edit 2 times, last edit by Jim1348 at Nov 26, 2019 7:06:33 PM]
[Nov 26, 2019 6:38:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Greetings,

I am making a few configuration changes after discussing it with the team. The rules for reliable hosts (per app version) had a clause that said avg turn around was 1.5 days. This was set for all projects. We are going to see if this helps get us more reliable hosts, but we are relaxing that to 2.5 days avg turn around. This change is project wide. Also, we are changing the reliable host time to complete from 35% of original deadline to 50% of the original deadline, this means a 7 day workunit will have 3.5 days instead of the 2.8 we allocated before. We will be trying these settings for the next few weeks as this problem didn't show up right away for this project. It will give us a chance to evaluate if this is the right solution project wide or if changes in the code are needed to do it application by application.

I'm working on the deployment now, so it should be in place in about 30-45 minutes.

Thanks,
-Uplinger
[Nov 26, 2019 7:42:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks for reading the comments 🙌

PS, could not fathom 21 per 30 minutes (1008 a day), until seeing the noon stats

Statistics Date		Total Run Time
(y:d:h:m:s) Points Generated Results Returned
11/26/2019 1:171:12:41:04 2,325,425 439
11/25/2019 3:308:18:05:40 6,817,641 1,309
11/24/2019 6:183:00:58:27 11,161,094 2,087
11/23/2019 8:002:16:09:25 13,827,241 2,450
11/22/2019 8:273:23:31:07 15,276,844 2,603
11/21/2019 8:281:07:34:31 15,097,973 2,536


Quite a bit less than the daily validation suggested... 1700-2000 before the randomization, than catching up to 2500.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 26, 2019 7:52:22 PM]
[Nov 26, 2019 7:44:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Since the average time to process one of these units is a little over a day, now that Uplinger has loosed the log jam, I suspect the daily totals to rebound in a day or two. Not only is the availability back to a steady trickle, but since he has tweeked the reliable host issue, that should also increase the throughput. Hopefully we will see the effect in completed units by tomorrow.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 26, 2019 8:26:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Keith

Thank you for listening to our raving. If you consider an anology with marketing, Delft would be the manufacturer, WCG would be the intermediary (or shopkeeper) and we would be the customer. Whilst the customer is not always right, woe betides a shopkeeper who doesn't listen to his customers.

I have now changed my cache settings in device profile to connecting every 1.5 days with 1.5 days extra cache to allow for my 27 hours crunching time in order to have a WU waiting for crunching to finish.

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Nov 26, 2019 9:02:23 PM]
[Nov 26, 2019 9:01:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Keith --

Thanks for making work units available again for this project.
I wondered why I hadn't receive any in a long time.
And I had rigged a solenoid to press my left mouse button every 15 seconds to hit the update button. (Just kidding, of course tongue )
But anyway, I did received 2 WUs this afternoon, which should give me my bronze badge.
[Nov 26, 2019 11:07:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

I'd like to talk a little more about what "reliable" means, if I may.

I've come to the conclusion that I have a problem with any system which relies upon saying that a reliable host is one that returns a result within x days. That immediately excludes any machine with a cache larger than x. Why exclude them?

BOINC Manager is designed to try to ensure that all tasks are returned by their deadline. Surely the test should be more along the lines of 'How many WUs has this machine returned which were over deadline?', or perhaps 'How long ago is it since this machine returned a WU which was over deadline?'.

Example: A (sub-)project has a deadline of 7 days. WUs average 1 day to process. I have a machine which is always on and has a cache of three days to guard against outages over a weekend. Most WUs will be returned after around 3 days -- too late to record the machine as reliable if the definition is 'within 2 days'. BUT, if you send my machine a WU with a deadline of 2 days, BOINC Manager will panic and start that WU straight away, and you'll get it back in 1 day - well within the two day deadline.

Shouldn't you be measuring how well machines do what they're told to do, and not just performance against some unknowable deadline? If BOINC Manager doesn't know about it, and so cannot react to it, it is simply arbitrary and not a useful measure.
[Nov 27, 2019 12:15:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Agreed. I have a mix of older and newer machines so the return time varies from 10-60hrs. But, They are solid machines and I very rarely get an error (so far, none on ARP)!
[Nov 27, 2019 1:27:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3593   Pages: 360   [ Previous Page | 25 26 27 28 29 30 31 32 33 34 | Next Page ]
[ Jump to Last Post ]
Post new Thread