Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: Shortage of Work - Why no Announcement |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 30
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for the reply Kevin!
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
Here is some more detail about what goes on.
----------------------------------------HCMD2 uses homogeneous redundancy (read about it here: http://boinc.berkeley.edu/trac/wiki/HomogeneousRedundancy ) We determined during the beta that we had to divide the work into the following classes: Class ID = Class Description 257 is Linux, unknown cpu type 258 is Linux, Intel cpu 259 is Linux, Amd cpu 385 is Windows, unknown cpu type 386 is Windows, Intel cpu 387 is Windows, AMD cpu 514 is Mac (Pre-Leopard), Intel 516 is Mac (pre-Leopard), PPC 770 is Mac (Leopard and newer), Intel There is a shared memory segment that is used to buffer results available to send to the clients. When a request comes in, this shared memory segment is used to look up a result for the client (instead of an expensive database query). The shared memory for World Community Grid has 3600 'slots' for results. When we start the scheduler, the slots are divided by projects. If HR is being used for a project, then only X number of slots can be filled with a result that has already been assigned to a HR class (a result is assigned to a HR class when the first result for the workunit is distributed - all later assignments for results in the workunit must only be sent to clients that share the same hr class). The slots in the shared memory segment are filled by a 'feeder'. It periodically queries the database and assigns results to a slot when the slot is empty. The issue is that if too many results start to be assigned to a given hr class, then when the feeder queries the database, it only gets results assigned to that hr class (it uses the limit clause). This eventually results in only results being available to send for that particular hr_class. Other clients cannot fetch work. This is what happened to us. We are working on resolving the underlying issue so that there are not too many workunits to send. The current allocation for HCMD2 allocates the results in this fashion: HR type wcg_proc: weight 101.000000 nslots 642 class 0: rac 0.000000 max_slots 321 cur_slots 0 class 257: rac 1228.441244 max_slots 1 cur_slots 0 class 258: rac 2519412.954252 max_slots 28 cur_slots 0 class 259: rac 558776.017246 max_slots 6 cur_slots 0 class 385: rac 1539.069244 max_slots 1 cur_slots 0 class 386: rac 21111229.380889 max_slots 235 cur_slots 0 class 387: rac 3139239.463209 max_slots 35 cur_slots 0 class 514: rac 173896.347886 max_slots 1 cur_slots 0 class 516: rac 51921.605863 max_slots 1 cur_slots 0 class 770: rac 1169106.633711 max_slots 13 cur_slots 0 By the way - I above identified Linux - Intel as the issue. It is actual Linux - AMD. And in particular a specific computer that was sending lots of bad results. I have throttled that computer manually so that it is no longer making things worse and worse. I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster. [Edit 1 times, last edit by knreed at May 29, 2009 6:37:24 PM] |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2977 Status: Offline Project Badges: |
Thanks Kevin for your more than detailed response.
----------------------------------------Hopefully, with all this extra background information, people will be a little more tolerant and patient when issues such as this arise |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
Ok - one last follow up post.
There was one additional issue and it is that results return with version 6.11 do not match with results with version 6.13. This is causing extra generation of copies. This will decrease rapidly going forward and should not be an issue in the future. |
||
|
nasher
Veteran Cruncher USA Joined: Dec 2, 2005 Post Count: 1422 Status: Offline Project Badges: |
wow thanks for the information alot of things i didnt realize happend on que information
----------------------------------------oh is there a way for us to see (and check) if our system lists as an intell or AMD or other for instance? |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Message log:
----------------------------------------28/05/2009 16:54:26 Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
(...) I am also going to drop the per-day limit for computers as well so that automated mechanisms take effect faster. By ''drop the per-day limit'' do you mean you intend to 'lower the limit' or 'abandon the limit'? Thanks. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
BOINC can cycle through 1 result per second and fail them... that's lots of broken results moving into the 'rush' queue, particularly on multi-core devices. The device quota was 320, then increased to 640 to cater for HCMD2 shorties, so it's save to think it's going to be in that ballpark.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have not been getting any new WU's for over an hour now. Is the queue empty again?
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have started a new thread about it as this thread was about no announcement....
|
||
|
|