World Community Grid - View Thread - Shortage of Work

World Community Grid Forums

Category: Completed Research

Forum: Help Cure Muscular Dystrophy - Phase 2 Forum

Thread: Shortage of Work - Why no Announcement

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 30

[ ]

Author

This topic has been viewed 5333 times and has 29 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Shortage of Work - Why no Announcement

Thanks for the reply Kevin! biggrin

[May 29, 2009 6:21:03 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Shortage of Work - Why no Announcement

Here is some more detail about what goes on.

HCMD2 uses homogeneous redundancy (read about it here: http://boinc.berkeley.edu/trac/wiki/HomogeneousRedundancy )

We determined during the beta that we had to divide the work into the following classes:
Class ID = Class Description
257 is Linux, unknown cpu type
258 is Linux, Intel cpu
259 is Linux, Amd cpu
385 is Windows, unknown cpu type
386 is Windows, Intel cpu
387 is Windows, AMD cpu
514 is Mac (Pre-Leopard), Intel
516 is Mac (pre-Leopard), PPC
770 is Mac (Leopard and newer), Intel

There is a shared memory segment that is used to buffer results available to send to the clients. When a request comes in, this shared memory segment is used to look up a result for the client (instead of an expensive database query). The shared memory for World Community Grid has 3600 'slots' for results. When we start the scheduler, the slots are divided by projects. If HR is being used for a project, then only X number of slots can be filled with a result that has already been assigned to a HR class (a result is assigned to a HR class when the first result for the workunit is distributed - all later assignments for results in the workunit must only be sent to clients that share the same hr class).

The slots in the shared memory segment are filled by a 'feeder'. It periodically queries the database and assigns results to a slot when the slot is empty.

The issue is that if too many results start to be assigned to a given hr class, then when the feeder queries the database, it only gets results assigned to that hr class (it uses the limit clause). This eventually results in only results being available to send for that particular hr_class. Other clients cannot fetch work. This is what happened to us.

We are working on resolving the underlying issue so that there are not too many workunits to send.

The current allocation for HCMD2 allocates the results in this fashion:

HR type wcg_proc: weight 101.000000 nslots 642
class 0: rac 0.000000 max_slots 321 cur_slots 0
class 257: rac 1228.441244 max_slots 1 cur_slots 0
class 258: rac 2519412.954252 max_slots 28 cur_slots 0
class 259: rac 558776.017246 max_slots 6 cur_slots 0
class 385: rac 1539.069244 max_slots 1 cur_slots 0
class 386: rac 21111229.380889 max_slots 235 cur_slots 0
class 387: rac 3139239.463209 max_slots 35 cur_slots 0
class 514: rac 173896.347886 max_slots 1 cur_slots 0
class 516: rac 51921.605863 max_slots 1 cur_slots 0
class 770: rac 1169106.633711 max_slots 13 cur_slots 0

By the way - I above identified Linux - Intel as the issue. It is actual Linux - AMD. And in particular a specific computer that was sending lots of bad results. I have throttled that computer manually so that it is no longer making things worse and worse. I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster.

----------------------------------------
[Edit 1 times, last edit by knreed at May 29, 2009 6:37:24 PM]

[May 29, 2009 6:36:55 PM]

gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Shortage of Work - Why no Announcement

Thanks Kevin for your more than detailed response.

Hopefully, with all this extra background information, people will be a little more tolerant and patient when issues such as this arise wink

----------------------------------------

[May 29, 2009 7:10:25 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Shortage of Work - Why no Announcement

Ok - one last follow up post.

There was one additional issue and it is that results return with version 6.11 do not match with results with version 6.13. This is causing extra generation of copies. This will decrease rapidly going forward and should not be an issue in the future.

[May 29, 2009 7:27:35 PM]

nasher
Veteran Cruncher
USA
Joined: Dec 2, 2005
Post Count: 1423
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

1 year badge for Nutritious Rice for the World

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Outsmart Ebola Together

5 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Shortage of Work - Why no Announcement

wow thanks for the information alot of things i didnt realize happend on que information

oh is there a way for us to see (and check) if our system lists as an intell or AMD or other for instance?

----------------------------------------

[May 29, 2009 8:12:09 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Shortage of Work - Why no Announcement

Message log:

28/05/2009 16:54:26 Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 29, 2009 8:14:43 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Shortage of Work - Why no Announcement

(...)
I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster.

By ''drop the per-day limit'' do you mean you intend to 'lower the limit' or 'abandon the limit'?

Thanks.

[May 29, 2009 11:47:26 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Shortage of Work - Why no Announcement

BOINC can cycle through 1 result per second and fail them... that's lots of broken results moving into the 'rush' queue, particularly on multi-core devices. The device quota was 320, then increased to 640 to cater for HCMD2 shorties, so it's save to think it's going to be in that ballpark.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 29, 2009 11:53:57 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


out of WU's again.

I have not been getting any new WU's for over an hour now. Is the queue empty again?

[May 30, 2009 5:03:06 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: out of WU's again.

I have started a new thread about it as this thread was about no announcement.... biggrin

[May 30, 2009 5:48:46 PM]

[ ]