| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
HGW = HCCv7.08-GPU-WUs
CTG = CPU-thread-count to HGW-task-count (ratio) On my (machine)*, I have observed that if I manually effect a (dovetail)** as tight as possible two (HGW)***, there occurs an elapsed-time under 5-minutes nominal for each HGW in a pair. When the said HGW are not ideally dovetail-ed, that is, if the CPU-phases or the GPU-phases of a pair of HCW are both turned on at the same time, the elapsed-time of each of the HGW in a pair is increased to above 5-minutes nominal. It seems reasonable that better performance would come about when the workload is distributed between PUs (processing-units) spatially and temporally rather than each PU getting hit by twice the load at the same time. It's akin to working 12hrs and next resting for 12hrs as opposed to working 24hrs and next resting for 24hrs. The dovetail condition is lost after some time though and the HGW pair enters into a 'lock' whereby regaining the dovetail condition is difficult under a 1:2 CTG, but less than that under a 2:2 (mathematically reduced to 1:1) CTG. It would be nice if there is planned an effort to add code so that a dovetail of two HGW is orchestrated programatically. Notes: *Ubu12.10-HD7770-AMD1090T(6-cores) running BOINC_v7.0.42. **When the CPU-phase of a GPU-WU is on, the GPU-phase of the other GPU-WU is also on -- which is the first of two syncs in a dovetail; the 2nd sync is when the CPU-phase of a GPU-WU switches off as close as possible to the time when the GPU-phase of the other WU switches off. In short, at any one time, an ideal GPU-WU-dovetail is one where there is as little a time where either 2-CPU-phases or 2-GPU-phases are simultaneously turned on or off. ***(1 CPUs + 0.5 ATI GPUs) effected via app_config: two cores for two HGW. ; ; andzgridPost#771 ; |
||
|
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges:
|
You might consider a few more experiments in the meantime andzgrid...
----------------------------------------I have q6600 and 2600K and 3770K rigs running where on the lesser cards like the 5870 some will allow up to 9 tasks to run concurrently. unfortunately not all cards will do this without errors at the start. The result of running more wu's concurrently is that at any one time there will be some that are in cpu phase and others that are in gpu phase. It is not perfect as due to the different load of each wu there will still be a tendency toward grouping but I have found that a once a day re- alignment is enough. In doing this it is not necessary to give each wu a full cpu core. I think that until there are no mid wu pauses in the wu's it will be difficult to implement any fix to keep the timing the way you want it, so these ideas I offer as a possible way to go forward ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello OldChap, and thanks for your response.
Two seems to be the magic number to dovetail HCC-GPU7.08-WUs: 1] The CPU-phase to GPU-phase time-ratio for the HCC-GPU7.08-WUs is around 1:1 2] Even numbered GPU-WUs divided by two gives the number of dovetail-ed GPU-WUs. 3] To accommodate odd-numbered GPU-WUs for a dovetail, the CPU-phase to GPU-phase time-ratio of a GPU-WUs needs to fall around the 2:3 ratio. That ratio demands a counterpart 2-CPU threads to 3-GPU tasks. If a pair of WUs can be made to programmatically sync in a dovetail, the 0.5 CPU-thread is a perfect match. The pauses (mid-WU and between WUs within an HCC-GPU7.08-WU) are needed as a slack to: 1] re-align sync timings to account for the variability in a WU. 2] adjust for the hunt for the optimum timings given the variability of processing-unit performance with loading. Because we currently don't have such programmatic fine-tuning, while it may not be necessary to give each wu a full cpu core, I found out that a 1:1 CPU-thread to GPU-WU is better able to absorb variances much like what I imagine a programmatic control would have provided. The dovetailed state remains stable and longer under a 1:1 ratio compared to that of a 1:2 ratio (or 0.5 CPU). In any case... Happy New Year 2013 every one ! ; ; andzgridPost#772 ; |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
I was glad to see this thread as I, too, have been trying to figure out how to keep 2 HCC1 GPU Wus from both entering the CPU phase at the same time, as that slows things down enormously on my power-efficient but not terribly fast hardware (AMD HD 7750 with Phenom II X4 910e). Even if I manually "dovetail" them, to borrow andzgrid's term, they seem to get back into the slow, overlap timing in a few hours.
----------------------------------------I have tried 2 GPU WUs, each getting its own CPU core as well as 2 GPU WUs sharing a CPU core. Maybe I'll give the 2:3 ratio a try. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have tried 2 GPU WUs, each getting its own CPU core as well as 2 GPU WUs sharing a CPU core. Maybe I'll give the 2:3 ratio a try. Yap, I can confirm the tendency for the phases to lock up together so that getting into a dovetail is made difficult. If so, that leaves a machine needing to have the capacity to adequately handle the number of concurrent GPU-WUs. My experience with my HD7770 is that three concurrent HCC_7.08-GPU-WUs made my HD7770 struggle to provide a responsive UI. Load up some more UI interaction, and that UI risks possibly freezing, else I may have to provide more cooling to the GPU or the CPU or both after I overclock them. While this may be doable and workable, it goes against the 'theme' of the Radeon HD77xx series. The performance theme is where the HD79xx targets; and the efficiency theme is right at home with the HD77xx series, leaving the HD78xx as the middle-ground.The only way that I see to take the baby-sitting and/or the guesswork out of making a dovetailed OpenCL GPU-WU work is to introduce programmatic control. Or, like a couple, to have the dovetail seamlessly integrated right from the start and therefore there would be no need for programmatic-control. ; ; andzgridPost#788 ; |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
Right now I'm running 4 HCC GPU WUs on my 7750, giving each half a CPU core. AMD Overdrive shows 96-97% GPU usage. Right now I'm not doing anything with that machine except crunching. The UI isn't unreasonably slow for running BOINC Manager, to my surprise.
----------------------------------------![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Right now I'm running 4 HCC GPU WUs on my 7750, giving each half a CPU core. AMD Overdrive shows 96-97% GPU usage. Right now I'm not doing anything with that machine except crunching. The UI isn't unreasonably slow for running BOINC Manager, to my surprise. Four concurrent on an HD7750 left alone to crunch? Hmm.. ok, that's good. Maybe I got a product-defective HD7770, or I inadvertently damaged it some way, I don't know. I once tried four concurrent on that card and the UI froze. Also, I do some other stuff while crunching so I can't afford to load my GPU to the max. I could have used the onboard-GPU in my Ubu-machine but with Ubu12.10 the UI became mangled, and so that left me needing to use the HD7770. I'll see if I can perhaps downgrade to Ubu12.04 (where the onboard-GPU used to work) or do some research on how to enable the onboard-GPU under Ubu12.10 -- then I'll take a shot at loading four concurrent HCC_7.08-GPU-WUs on my HD7770.; ; andzgridPost#789 ; |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Update to andzgridPost#789
I first tried 3 concurrent (0.666 CPU + 0.333 ATI), and I stabilized that, and then I tried 4 concurrent (0.5 CPU + 0.25 ATI) which seemed to hold so far unlike before where my Ubu12.10-machine crashed on 4 concurrent. I guess it was the GPU-cooling, or lack of it to be exact. So, I nudged up the GPU-fan RPM and set all GPU-hardware clocks to default for my HD7770 -- and that seemed to stabilize the machine. However, the runtimes for 4 concurrent are only slightly faster compared to that of 3 concurrent. Its that diminishing-returns I read about. In all cases of 2,3, or 4 concurrent for my HD7770: the better the dovetailing of WUs, the faster the performance, with the 3 or 4 concurrent doing a good job of breaking the lock-step of phases. I guess I need not use another GPU-hardware to render the Ubu12.10 UI after all ! ; ; andzgridPost#792 ; |
||
|
|
|