Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 19
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4938 times and has 18 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

Running 128 simultaneously and averaging 33 hours. I was thinking it would have been a lot worse than that. I'll take it and run with it.
[Jun 10, 2020 6:59:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

Once upon a time during Clean Energy there was a proposal of staggered starting but with now reading 128 concurrent and no result crashing, no need, but I do wonder what happens of this beast is shutdown and started all 128 simultaneous.
[Jun 10, 2020 7:17:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

I haven't seen anything about starting up or restarting after a checkpoint. As I see it, a lot of those 128 would be trying to checkpoint at the same time when the next 12.5% has been completed and that might well be a problem. And if they were all to be at the same stage then there would be considerable bandwidth required when they all try to report at about the same time. Or a lot of queuing would take place.

Of course, if the machine were to be hibernated instead of being shut down you would not have the same problem with bunching because they would all restart from where they left off instead of back to the last checkpoint.

Mike
[Jun 10, 2020 8:56:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

Once upon a time during Clean Energy there was a proposal of staggered starting but with now reading 128 concurrent and no result crashing, no need, but I do wonder what happens of this beast is shutdown and started all 128 simultaneous.

Not much, just put maintenance on it this morning and it came right back up and all 128 were in a running state after about 2 minutes. Bandwidth isn't a problem with 1G fibre to the premises. Machine has 256GB memory and all 8 memory channels are populated. HD averages about 4Mb writes per second. All very manageable.. Only real anomaly I have noticed is the hardware interrupts are very high and take about 4% of the processing time. These WUs do not bunch up. Even if you started all 128 at the same time, the inherent variability in run times guarantees they end and report singly. Same thing with checkpoints.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 11, 2020 2:41:11 PM]
[Jun 11, 2020 2:36:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

With 128 units all starting at the same time you would inevitably get some bunching. Say the difference in run times between first and last was 2 hours, in a perfect world, they would all be finishing at about 1 minute intervals. But we don't live in a perfect world. There would be bunching especially near the middle of the spread. Maybe seconds apart but still bunching so uploading/reporting would overlap.

Mike
[Jun 11, 2020 2:56:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

Define bunching... I never have more than 3 end within 5 minutes of each other. Is that a "bunch"? I say nay nay. It only takes 2 to 5 seconds per WU to transmit the entire set(60M) of files to WCG. Even if I had 20 (which I never do) uploading at the same time, they would be gone in less than a minute. My experience has been that the spread in runtimes is considerable. min of 28 hours and max of 54 hours but the graph would look like a bell curve. 80% run in the 32 to 39 hour range. Bear in mind, that the 128 thread machine is just one machine, there are 11 others running ARP1 varying between 8 and 16 simultaneous WUs so they are uploading and downloading at the same time. Network link is mostly idle. Maxed out, the link can do about 130MB per second. So, unless there is a simultaneous upload of about 50 work units (which will never happen except at the end of a maintenance window) it isn't any kind of a problem.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 11, 2020 7:32:11 PM]
[Jun 11, 2020 7:31:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

So the 128 thread machine is a slow one. Working on the 80%, we have, say, 102 ending in a 7 hour window, so averaging 4 minutes apart. I had presumed it to be much faster than that because of your bandwidth.

Even spacing never happens in the real world but your upload speed seems to be sufficient to compensate for that and the spread of computing times is higher than I imagined. So my apologies.

Mike
[Jun 11, 2020 8:23:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

So the 128 thread machine is a slow one.
You can't be serious, Mike, how long does it take for your machine to run 128 ARP1 tasks? laughing

Let's be real, entity's device is blowing yours out of the water. devilish
Executing many ARP1s at the same time is having a serious, detrimental impact on their runtimes on a machine.
When running only one ARP1 my machine will mostly finish it in 16 hours, however when I run 5 ARP1s at once simultaneously, they will only finish in 22-24 hours. sad
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 12, 2020 2:06:46 PM]
[Jun 12, 2020 12:50:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: When ARP runs on your system

By slow I was simply referring to the time per unit and not the huge output that 128 threads brings. I would not normally recommend more than 50% of threads for arp. Some of the problem is alleviated by the huge bandwidth that entity has.

If he only wants to run arp then that is fair enough, but if he wants to run other projects as well, it is better to spread them across all machines so each has a mixture rather than one project per machine.

Personally, I have an i7-3770 with 8 threads which crunches 4 arp almost as fast as 1 but performance drops off above that, so I run a mix. My priority, currently, is opn followed by mcm but am keeping arp ticking over.

Mikei
[Jun 12, 2020 1:24:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread