World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: Help Conquer Cancer

Thread: GPU Optimisations

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 198

[ ]

Author

This topic has been viewed 52443 times and has 197 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GPU Optimisations

My first HD 7770 (on the E8400 CPU) is doing a work unit in 4 minutes 54 seconds.
The second HD 7770 (on a Core i7-3770) is doing a work unit in 4 minutes 40 seconds.

I don't think you are gaining much.

Approximately by 30% more returned results per day thinking

Ok, let's assume that you are getting better number-of-results per time on the aggregate compared to a single GPU-WU done one at a time. How do we account for that better yield? There is no such thing as a free lunch: I strongly suspect that your power input must be higher, and that is inconsistent with your claim of '99% GPU utilisation and no significant fan noise nor power consumption increase'.
;
; andzgridPost#743
;

[Dec 15, 2012 12:03:50 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1405
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: GPU Optimisations

All 24 went w/o problem in app. 1h10m (1.15 hours) each

It's about how much CPU you want to spend.
Spending 3 full cores of my not OC-ed i7 2600 and running 3 tasks concurrently on my not OC-ed HD 7770, I make 31 tasks in that 70 minutes.
5 cores left for other CPU-tasks.

[Dec 15, 2012 9:57:25 AM]

coolstream
Senior Cruncher
SCOTLAND
Joined: Nov 8, 2005
Post Count: 475
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

180 day badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

20 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: GPU Optimisations

23 of 32 errored at the end of crunching (exceed maximum elapsed time which is probably set to 1.5 hours). App_info worked fine even if elapsed time for some tasks was app. 1.6 hours.

Now testing 24.

...

All 24 went w/o problem in app. 1h10m (1.15 hours) each (ATM 2 Valid, 22 PVal).

Good setting for shrubbing during night, tomorrow will try 28 concurrent.

Cheers and NI! peace

Unless things have changed or I have misinterpreted the information, maximum elapsed time is more like 10 hours
LINK https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401797

----------------------------------------

Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY.

[Dec 15, 2012 11:12:54 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GPU Optimisations

All 24 went w/o problem in app. 1h10m (1.15 hours) each (ATM 2 Valid, 22 PVal).

Good setting for shrubbing during night, tomorrow will try 28 concurrent.

Cheers and NI! peace

Unless things have changed or I have misinterpreted the information, maximum elapsed time is more like 10 hours
LINK https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401797

It's probably a painful consequence of the single pool sourcing of work for GPU+CPU. The CPU versions have a [guess], 5 times original runtime estimate time out. If then assigned to a GPU with the same task header info...

[Dec 15, 2012 1:31:07 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

180 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: GPU Optimisations

Approximately by 30% more returned results per day thinking

If I would do this for money, I would definitely make thorough analyses of my expenses as well as my production. Since I am not, I am satisfy with rough analysis based on small-sample observations and some logic biggrin

1. Noise is really subjective category and what I perceive as "no increase", for somebody else could be significant. So I am gonna to agree with you, that this part of my claim might be misleading. Anyway, I am repeating, that according to me, there is not significant fan noise increase smile

2. When I am running 32 concurrent GPU tasks, CPU utilization is way south of 100% (thus less power consumption). If I run only one GPU task + 7 CPU tasks, the CPU utilization on those 7 cores is 99-100% (thus higher power consumption). That is why I assume (and claim that) the system power consumption is not increasing significantly (if at all) if I run 32x GPU WU's compared to 1x GPU + 7x CPU.

Cheers and NI! peace

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Dec 15, 2012 9:41:02 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:


Re: GPU Optimisations

All 24 went w/o problem in app. 1h10m (1.15 hours) each

This is exactly what I expect from 7770: to be significantly more productive than my 7750 smile

Cheers

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Dec 15, 2012 9:42:27 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:


Re: GPU Optimisations

All 24 went w/o problem in app. 1h10m (1.15 hours) each (ATM 2 Valid, 22 PVal).

Good setting for shrubbing during night, tomorrow will try 28 concurrent.

Cheers and NI! peace

Unless things have changed or I have misinterpreted the information, maximum elapsed time is more like 10 hours
LINK https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401797

Maybe it is because of app_config confused

IDK. But all "24 concurrent crunching" tasks finished in less than evil 1.5 hours, thus w/o errors biggrin

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Dec 15, 2012 9:49:09 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GPU Optimisations

I know, I know - Nvidia cards are not as well suited the the 7xxx series.
BUT I've got 2 x GTX 690's on the crunch, and want to optimize their performance.
They are in an SR-2 board with dual 5650 cpu's.
Disabled SLI - as that was erroring everything out.
GPU idle time was and still is too high - everything stops on GPU WU's far too often when at half way and finishing off each WU.
8 GPU WU's running simultaniously = stopping frequently.

The point of my post is to ask if anyone knows how much CPU work is involved with each GPU WU.
Is it maybe that the GPU part of the WU is finishing and having to wait for the CPU bit to catch up ??
I've got 16 CPU cores running at the same time.
So I could cut them back and allocate 1.25 OR maybe 1.5 cores to each GPU WU.
Something like this.....
<app_config>
<app>
<name>hcc1</name>
<max_concurrent>24</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1.25</cpu_usage>
</gpu_versions>
</app>
</app_config>

Currently got 24 cpu cores. 2 x WU's per gpu card with 1 cpu core each, leaving 16 cpu cores crunching non gpu.

Any thoughts on this idea of more than 1 core per GPU WU ??
Been tried before ?, or is cpu portion small and I've just got to put up with constant idle gpu time.

[Feb 9, 2013 2:00:11 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GPU Optimisations

Hi,

Not sure you fully understand the use of these 2:

<gpu_usage>.5</gpu_usage>
<cpu_usage>1.25</cpu_usage>

You've set the the GPU to run 2 at the time and CPU allocation *per* GPU job to 1.25. The highest any WCG project uses as being single threaded, is 1 CPU processor thread, so reduce that value to 1.00 at least. But, given the 5650 CPU's you can try <cpu_usage>0.5</cpu_usage> and see if throughput per hour increases, meaning if one task is in GPU phase, the other task can use the CPU, in alternation. Then later try <gpu_usage>.25</gpu_usage> meaning 4 GPU tasks, using combined 4 * .5 = 2 CPU cores together. Increment that in small steps to find the biggest hourly production, without them going error/invalid.

edit: To add, with 2 such cards and CPUs, [16 processor threads], and <max_concurrent> of 24 you could theoretically pump the <gpu_usage>.5</gpu_usage> way down to .83333. Any spare instance what is not being run on the GPU will be running as CPU tasks, provided you've selected to also run HCC-CPU only tasks. The idea though is to experiment with the control values to find the maximum throughput maintaining valid results [Pending Validation is usually a good indicator that the task is OK]

edit2: I've never read [can't remember seeing it] if having 2 cards of same or different capability and using .083333 is a control value *per* GPU card or if it works for the combination i.e. 12 for the 2 cards together. With that value a total max of 12 GPU tasks of hcc1 would then mean 6 concurrent on each. anyone in the know from hands on?

edit3: made a oops. The .83333 value of course to read .083333, equivalent to 12 concurrent.

----------------------------------------
[Edit 3 times, last edit by Former Member at Feb 9, 2013 4:18:26 PM]

[Feb 9, 2013 2:17:37 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: GPU Optimisations

Hi,

Not sure you fully understand the use of these 2:

<gpu_usage>.5</gpu_usage>
<cpu_usage>1.25</cpu_usage>

You've set the the GPU to run 2 at the time and CPU allocation *per* GPU job to 1.25. The highest any WCG project uses as being single threaded, is 1 CPU processor thread, so reduce that value to 1.00 at least. But, given the 5650 CPU's you can try <cpu_usage>0.5</cpu_usage> and see if throughput per hour increases, meaning if one task is in GPU phase, the other task can use the CPU, in alternation. Then later try <gpu_usage>.25</gpu_usage> meaning 4 GPU tasks, using combined 4 * .5 = 2 CPU cores together. Increment that in small steps to find the biggest hourly production, without them going error/invalid.

edit: To add, with 2 such cards and CPUs, [16 processor threads], and <max_concurrent> of 24 you could theoretically pump the <gpu_usage>.5</gpu_usage> way down to .83333. Any spare instance what is not being run on the GPU will be running as CPU tasks, provided you've selected to also run HCC-CPU only tasks. The idea though is to experiment with the control values to find the maximum throughput maintaining valid results [Pending Validation is usually a good indicator that the task is OK]

edit2: I've never read [can't remember seeing it] if having 2 cards of same or different capability and using .83333 is a control value *per* GPU card or if it works for the combination i.e. 12 for the 2 cards together. With that value a total max of 12 GPU tasks of hcc1 would then mean 6 concurrent on each. anyone in the know from hands on?

G'Day,
Appreciate all the input I can get. (and info for others to work with also)
Took me few days on and off to get working
I am currently running this...
<max_concurrent>24</max_concurrent>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>

The 1.25 was asking the question (not using) (think I understand - sort of) if throwing extra cpu to the gpu tasks would decrease the long inactive periods at the 49.707% & 99.707% points.
Most often wait about a minute if EITHER WU from same card (2 out of 4 WU's) are at either those percentage spots - hope that makes sense.
Are others having such long waits at half way and end of GPU WU's ?? I have not seen any posts about my concern (or is it normal on multi, I don't get long times on single GPU WU's)

The 2 cpus have 24 threads in total, so I currently have 8 deidcated to gpu and 16 to cpu WU's.
I will experiment a little and see how I go and report back with my findings.
At least I found out that slil 690's (quad sli) needs to be disabled before too many error'd out.

Input always welcome,
Thanks

SR-2 Ghetto Style !!
Poor quality pics - sorry (ram fans @ 3500rpm & front rad fans @ 2100)

[Feb 9, 2013 4:14:53 PM]

[ ]