| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
AMD Platform Optimization
please read for all developers https://community.amd.com/thread/213045 Processor: 8 AuthenticAMD AMD FX-8320E Eight-Core Processor [Family 21 Model 2 Stepping 0] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1 Memory: 15.95 GB physical, 18.82 GB virtual http://esa-space.blogspot.com/2017/ boinc optimization thoughts... http://32ipi028l5q82yhj72224m8j.wpengine.netd...imizing-For-AMD-Ryzen.pdf **** " AMD Software Optimization Guide for Ryzen chromatix chromatix Apr 22, 2017 4:46 PM (in response to tagoo) A slide deck on the subject got leaked a while ago. The executive summary, as far as I can remember it: Don't use non-temporal accesses (unless you REALLY know what you're doing, and you probably don't). Don't use manual prefetching. The automatic prefetchers work better, and don't consume decode bandwidth or op-cache space. Organise your data in memory so that the automatic prefetchers are maximally effective. This may involve using structs-of-arrays instead of arrays-of-structs, or vice versa, depending on access patterns. Minimise data movement between CCXes, as the bandwidth available between them is significantly less than within them. This may involve careful choice of worker-thread count and affinity. SMT is new to AMD, but works similarly to Intel's HT and has similar tradeoffs. Ensure any thread affinity settings account for this. Aside from the above, it is implied that Ryzen mostly responds well to code optimised for Intel CPUs. If the older AMD-specific ISA extensions are avoided, code optimised for older AMD CPUs should also run well, as long as the above guidelines are also accounted for. Interestingly, adjusting existing code for the above guidelines seems to have a small net positive effect on Intel CPUs as well. This may obviate the need to have separate Intel and AMD code paths. Agner Fog says he's nearly finished adding his analysis of Ryzen to his own famous optimisation manuals. This will no doubt be very illuminating." |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
|
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
for further links and thought visit : http://bit.ly/HPC-Dev
PC/Mac/Windows/Linux/Android https://www.khronos.org/news/events/2016-isc-high-performance https://www.khronos.org/assets/uploads/develo...IGGRAPH%20BOF%20Aug08.pdf HPC Report https://www.microsoft.com/en-us/download/details.aspx?id=54507 Microsoft HPC Pack 2016 including linux https://technet.microsoft.com/en-us/library/cc514029(v=ws.11).aspx all HPC Packs 2016,2012 to 2008 info and download https://msdn.microsoft.com/en-us/library/ff976568.aspx Microsoft High Performance Computing for Developers - info and downloads ** OpenVX for high performance Computing : Multi platform spec https://www.khronos.org/news/tags/tag/OpenVX https://www.khronos.org/news/press/openvx-1.2...on-power-efficient-vision |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
for a comparison of GFlops/Mips throughput of various Boinc Tasks ..
here we show the relevance of the code or function used ... AVX for example is multi threaded ! and so is the FPU pipeline of the AMD FX & Ryzen processor..... http://bit.ly/HPCImpact (original non edited photos ...) and set 2 (newer) http://bit.ly/2HPCImpact .... see the work throughput GFlops compared to code efficiency per task ! sometimes entropy is needed to for-fill the task one would imagine (for example on android) http://bit.ly/tRNG-Dev the improvement of the boinc and worldcommunitygrid projects has been observed, noted and one feels improved upon, .. further improvement should be implemented as soon as possible; To improve work versus output efficiency. thank you kindly programmers/Workers & scientists for your perseverance & effort. RS |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
High Performance Computing best practice http://bit.ly/HPCBestPrac
|
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
https://www.youtube.com/watch?v=mLQGXlxemlg - Optimizing HPC Service Delivery by a life time super computing tec
|
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
CPU Optimisation - utility and function.
http://gpuopen.com/compute-product/codexl/ - CodeXL is a code efficiency analyser optimiser debugger for GPU and CPU and system. https://github.com/GPUOpen-Tools/CodeXL/releases/latest |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
[url]http://bit.ly/CoXLPhoto[/url] - CodeXL in action photos
[url]http://support.amd.com/TechDocs/24593.pdf[/url] - AMD64 Architecture Programmer’s Manual Volume 2: System Programming |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
http://www.noamross.net/blog/2013/4/25/faster-talk.html - speeding up code a guide - profiling and bench-marking.
http://www.pgroup.com/doc/pgi17ug-x64.pdf - PGI Compiler guide http://www.agner.org/optimize/ - code optimisation for all programmers on X86,X86-64bit and some others.. this is a terrific resource ! |
||
|
|
QuantumEthos
Senior Cruncher Joined: Jul 2, 2011 Post Count: 336 Status: Offline Project Badges:
|
http://hgpu.org information; interesting learning & source
http://dspace.princeton.edu/jspui/bitstream/8...princeton_0181D_11168.pdf Optimization for parallel computing information. https://arxiv.org/pdf/1705.05249 - CLBlast: A Tuned OpenCL BLAS Library demonstration. |
||
|
|
|