World Community Grid - View Thread - Work unit availability

World Community Grid Forums

Category: Active Research

Forum: OpenPandemics - COVID-19 Project

Thread: Work unit availability

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 822

[ ]

Author

This topic has been viewed 1941842 times and has 821 replies

Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:

90 day badge for OpenPandemics - COVID-19


Re: Work unit availability

If that’s the case, I have to wonder why you’d intentionally delay the science results just to placate slower devices

That's why Keith is the lead tech and you're not. shhh

because he's the authority and you're not, is precisely why I asked him and not you shhh

so let's set the ad hominem aside and let him speak for himself.

----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti

[Apr 14, 2021 2:13:53 AM]

Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:


Re: Work unit availability

Since this impromptu stress-test has occurred- and the world didn't end- would it make any sense just to leave the configuration parameter at the higher value, and unleash the grid to do what it does best? Who would complain if project progress surged ahead?

say it louder for the people in the back.

----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti

[Apr 14, 2021 2:15:50 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2567
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers

14 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Work unit availability

Look at your invalids, and server aborted first.......
Things did not end well

This is not normal: https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=618783001

And: https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=618613464

And there's tons more out there.

----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 14, 2021 2:20:18 AM]

[Apr 14, 2021 2:18:24 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Work unit availability

Good evening,

Our plan is to maintain the pace (which was increased to 2000 work units every 30 minutes) so that we do not over load the pipeline. All of this requires consideration and planning. For example, creating too many results at one time could over load a database, it could cause file upload handler to become overloaded, it could fill up the researchers servers, etc....There are lots of parts that need to be considered on everything. Some of the things we need to consider could cause harm to other projects running here on World Community Grid. As with many of our research projects we start them at a pace that makes sense with us and the researchers.

This heavy scheduling did however bring to light an issue with scheduler that I will need to fix going forward.

Thanks,
-Uplinger

[Apr 14, 2021 3:30:30 AM]

Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:


Re: Work unit availability

thanks for the reply. sounds like you'll continue to increase WU availability as your confidence in your process and system stability grows, and that's good to hear.

you could also likely leave the WU distribution quantity the same (2000/30mins) and just increase the WU size (more jobs per WU, or harder jobs per WU) to increase the work being done without negatively impacting your infrastructure. that's kind of a win-win.

----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti

[Apr 14, 2021 3:47:39 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Work unit availability

The only part of the pipeline that work tested was the ability to send out work units to the members. I think it sent out about 70-80k before it was caught and fixed. This means that during that time, we sent out about 3 times the work we generally send out for WCG (normally we send around 40-50k between all projects). This part of the pipeline is probably the easiest part, thus it was handled well by the servers. We did not have to worry about validating 3x as many results or the uploads of 3x the results on the grid, etc...These are able to be run over time on the backend and are not things the members notice (other than validation)

I do understand the questions folks have, but I can not cause the entire infrastructure to crash. The team at World Community Grid are very keen on keeping everything running smoothly from HSTB to ARP to OPNG and everything in between.

So, please there is no need for quarrels between members as I hold the keys and if we can increase it, trust me I will let everyone know.

Thanks,
-Uplinger

[Apr 14, 2021 3:55:28 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Work unit availability

The complexity of the work units (harder jobs) is given to us from the researchers. These are the target/ligand pairs or jobs in a given work unit. They need to identify which combinations they would like to expand upon. Currently to get things flowing and to make sure the work being done is validated on their servers in the end, getting a baseline for these is best. I do not know what a really difficult combination is yet and I imagine it may cause problems to my work unit generator. Tweaks will probably be needed on that end. Just adding more jobs into a work unit does not solve post processing for example. We still need to validate each job within a work unit and then they perform more analysis on each pair.

Thanks,
-Uplinger

[Apr 14, 2021 4:01:45 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2567
Status: Offline
Project Badges:


Re: Work unit availability

Thanks Uplinger!

So, what went wrong with all these invalids, and Server Aborts?

[Apr 14, 2021 4:01:52 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Work unit availability

I need to go into the validator logs to dig these out. Give me a few minutes to check...

Edit: As for server aborts, it's because they got to 5 total results sent out. This causes the work unit to be marked as error. I should probably increase that to 7 that we've done in the past for other projects. I'll review that value tomorrow.

Thanks,
-Uplinger

----------------------------------------
[Edit 1 times, last edit by uplinger at Apr 14, 2021 4:06:04 AM]

[Apr 14, 2021 4:04:59 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2567
Status: Offline
Project Badges:


Re: Work unit availability

There are tons of them out there. I reacted because I never have any invalids from my computers, so when the server aborts begun, I went and checked. I think I have at least two more such WU's with wingmen also going invalid. I'm not the only one who has them, and it begun after the big dump of WU's.

Edit: Also, these "invalids" seems to run much shorter than usual, looking as if they weren't created correctly.

----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 14, 2021 4:09:42 AM]

[Apr 14, 2021 4:07:50 AM]

[ ]