World Community Grid - View Thread - When ARP runs on your system

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: When ARP runs on your system

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 19

[ ]

Author

This topic has been viewed 5031 times and has 18 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: When ARP runs on your system

Running 128 simultaneously and averaging 33 hours. I was thinking it would have been a lot worse than that. I'll take it and run with it.

[Jun 10, 2020 6:59:53 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: When ARP runs on your system

Once upon a time during Clean Energy there was a proposal of staggered starting but with now reading 128 concurrent and no result crashing, no need, but I do wonder what happens of this beast is shutdown and started all 128 simultaneous.

[Jun 10, 2020 7:17:20 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: When ARP runs on your system

I haven't seen anything about starting up or restarting after a checkpoint. As I see it, a lot of those 128 would be trying to checkpoint at the same time when the next 12.5% has been completed and that might well be a problem. And if they were all to be at the same stage then there would be considerable bandwidth required when they all try to report at about the same time. Or a lot of queuing would take place.

Of course, if the machine were to be hibernated instead of being shut down you would not have the same problem with bunching because they would all restart from where they left off instead of back to the last checkpoint.

Mike

[Jun 10, 2020 8:56:56 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: When ARP runs on your system

Not much, just put maintenance on it this morning and it came right back up and all 128 were in a running state after about 2 minutes. Bandwidth isn't a problem with 1G fibre to the premises. Machine has 256GB memory and all 8 memory channels are populated. HD averages about 4Mb writes per second. All very manageable.. Only real anomaly I have noticed is the hardware interrupts are very high and take about 4% of the processing time. These WUs do not bunch up. Even if you started all 128 at the same time, the inherent variability in run times guarantees they end and report singly. Same thing with checkpoints.

----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 11, 2020 2:41:11 PM]

[Jun 11, 2020 2:36:51 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: When ARP runs on your system

With 128 units all starting at the same time you would inevitably get some bunching. Say the difference in run times between first and last was 2 hours, in a perfect world, they would all be finishing at about 1 minute intervals. But we don't live in a perfect world. There would be bunching especially near the middle of the spread. Maybe seconds apart but still bunching so uploading/reporting would overlap.

Mike

[Jun 11, 2020 2:56:01 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: When ARP runs on your system

Define bunching... I never have more than 3 end within 5 minutes of each other. Is that a "bunch"? I say nay nay. It only takes 2 to 5 seconds per WU to transmit the entire set(60M) of files to WCG. Even if I had 20 (which I never do) uploading at the same time, they would be gone in less than a minute. My experience has been that the spread in runtimes is considerable. min of 28 hours and max of 54 hours but the graph would look like a bell curve. 80% run in the 32 to 39 hour range. Bear in mind, that the 128 thread machine is just one machine, there are 11 others running ARP1 varying between 8 and 16 simultaneous WUs so they are uploading and downloading at the same time. Network link is mostly idle. Maxed out, the link can do about 130MB per second. So, unless there is a simultaneous upload of about 50 work units (which will never happen except at the end of a maintenance window) it isn't any kind of a problem.

----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 11, 2020 7:32:11 PM]

[Jun 11, 2020 7:31:08 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: When ARP runs on your system

So the 128 thread machine is a slow one. Working on the 80%, we have, say, 102 ending in a 7 hour window, so averaging 4 minutes apart. I had presumed it to be much faster than that because of your bandwidth.

Even spacing never happens in the real world but your upload speed seems to be sufficient to compensate for that and the spread of computing times is higher than I imagined. So my apologies.

Mike

[Jun 11, 2020 8:23:25 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2361
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: When ARP runs on your system

So the 128 thread machine is a slow one.

You can't be serious, Mike, how long does it take for your machine to run 128 ARP1 tasks? laughing

Let's be real, entity's device is blowing yours out of the water. devilish

Executing many ARP1s at the same time is having a serious, detrimental impact on their runtimes on a machine.
When running only one ARP1 my machine will mostly finish it in 16 hours, however when I run 5 ARP1s ~~at once~~ simultaneously, they will only finish in 22-24 hours. sad

----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 12, 2020 2:06:46 PM]

[Jun 12, 2020 12:50:04 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: When ARP runs on your system

By slow I was simply referring to the time per unit and not the huge output that 128 threads brings. I would not normally recommend more than 50% of threads for arp. Some of the problem is alleviated by the huge bandwidth that entity has.

If he only wants to run arp then that is fair enough, but if he wants to run other projects as well, it is better to spread them across all machines so each has a mixture rather than one project per machine.

Personally, I have an i7-3770 with 8 threads which crunches 4 arp almost as fast as 1 but performance drops off above that, so I run a mix. My priority, currently, is opn followed by mcm but am keeping arp ticking over.

Mikei

[Jun 12, 2020 1:24:51 PM]

[ ]