Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 30
Posts: 30   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2514 times and has 29 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

Thanks for the reply Kevin! biggrin biggrin biggrin
[May 29, 2009 6:21:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

Here is some more detail about what goes on.

HCMD2 uses homogeneous redundancy (read about it here: http://boinc.berkeley.edu/trac/wiki/HomogeneousRedundancy )

We determined during the beta that we had to divide the work into the following classes:
Class ID = Class Description
257 is Linux, unknown cpu type
258 is Linux, Intel cpu
259 is Linux, Amd cpu
385 is Windows, unknown cpu type
386 is Windows, Intel cpu
387 is Windows, AMD cpu
514 is Mac (Pre-Leopard), Intel
516 is Mac (pre-Leopard), PPC
770 is Mac (Leopard and newer), Intel

There is a shared memory segment that is used to buffer results available to send to the clients. When a request comes in, this shared memory segment is used to look up a result for the client (instead of an expensive database query). The shared memory for World Community Grid has 3600 'slots' for results. When we start the scheduler, the slots are divided by projects. If HR is being used for a project, then only X number of slots can be filled with a result that has already been assigned to a HR class (a result is assigned to a HR class when the first result for the workunit is distributed - all later assignments for results in the workunit must only be sent to clients that share the same hr class).

The slots in the shared memory segment are filled by a 'feeder'. It periodically queries the database and assigns results to a slot when the slot is empty.

The issue is that if too many results start to be assigned to a given hr class, then when the feeder queries the database, it only gets results assigned to that hr class (it uses the limit clause). This eventually results in only results being available to send for that particular hr_class. Other clients cannot fetch work. This is what happened to us.

We are working on resolving the underlying issue so that there are not too many workunits to send.

The current allocation for HCMD2 allocates the results in this fashion:

HR type wcg_proc: weight 101.000000 nslots 642
class 0: rac 0.000000 max_slots 321 cur_slots 0
class 257: rac 1228.441244 max_slots 1 cur_slots 0
class 258: rac 2519412.954252 max_slots 28 cur_slots 0
class 259: rac 558776.017246 max_slots 6 cur_slots 0
class 385: rac 1539.069244 max_slots 1 cur_slots 0
class 386: rac 21111229.380889 max_slots 235 cur_slots 0
class 387: rac 3139239.463209 max_slots 35 cur_slots 0
class 514: rac 173896.347886 max_slots 1 cur_slots 0
class 516: rac 51921.605863 max_slots 1 cur_slots 0
class 770: rac 1169106.633711 max_slots 13 cur_slots 0


By the way - I above identified Linux - Intel as the issue. It is actual Linux - AMD. And in particular a specific computer that was sending lots of bad results. I have throttled that computer manually so that it is no longer making things worse and worse. I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster.
----------------------------------------
[Edit 1 times, last edit by knreed at May 29, 2009 6:37:24 PM]
[May 29, 2009 6:36:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2977
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

Thanks Kevin for your more than detailed response.

Hopefully, with all this extra background information, people will be a little more tolerant and patient when issues such as this arise wink
----------------------------------------

[May 29, 2009 7:10:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

Ok - one last follow up post.

There was one additional issue and it is that results return with version 6.11 do not match with results with version 6.13. This is causing extra generation of copies. This will decrease rapidly going forward and should not be an issue in the future.
[May 29, 2009 7:27:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nasher
Veteran Cruncher
USA
Joined: Dec 2, 2005
Post Count: 1422
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

wow thanks for the information alot of things i didnt realize happend on que information

oh is there a way for us to see (and check) if our system lists as an intell or AMD or other for instance?
----------------------------------------

[May 29, 2009 8:12:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

Message log:

28/05/2009 16:54:26 Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2009 8:14:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

(...)
I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster.



By ''drop the per-day limit'' do you mean you intend to 'lower the limit' or 'abandon the limit'?

Thanks.
[May 29, 2009 11:47:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Shortage of Work - Why no Announcement

BOINC can cycle through 1 result per second and fail them... that's lots of broken results moving into the 'rush' queue, particularly on multi-core devices. The device quota was 320, then increased to 640 to cater for HCMD2 shorties, so it's save to think it's going to be in that ballpark.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2009 11:53:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
out of WU's again.

I have not been getting any new WU's for over an hour now. Is the queue empty again?
[May 30, 2009 5:03:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: out of WU's again.

I have started a new thread about it as this thread was about no announcement.... biggrin
[May 30, 2009 5:48:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 30   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread