Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Drug Search for Leishmaniasis Forum Thread: 64-bit app on the way? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
Author |
|
cw64
Advanced Cruncher Joined: Oct 6, 2007 Post Count: 120 Status: Offline Project Badges: |
Think I remember reading that developing 64-bit apps was reserved for projects with a certain amount of estimated runtime. Does this project fall into said category?
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Asked the same question off-line. The plan IIRC was defined as those existing projects with enough live left and any new project. Of course we've just come out of the 32bit beta version.
----------------------------------------Suggest we give it time to stabilize, this being an entirely new engine to WCG [VINA], so they'll likely want to get some production out of this before getting onto the next step. Meantime, the next project work was started long before this one launched and it will run on VINA too, so it's thought to be a much less arduous process to carry this to launch, now that there is a working BOINC wrapped version on multiple platforms and the diverse set of CPU's. It's a silent launch BTW. The PR wave will be in a little, so those joining from press announcements and entirely new to crunching, will find a reasonably well running science with a steady supply of work and techs that have had a chance to unwind a little from weekend and night-work, ready for the next wave :D --//-- [Edit 1 times, last edit by Former Member at Sep 1, 2011 10:32:28 AM] |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: |
Thanks for the info, SekeRob!
---------------------------------------- |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
Thanks, Sek, for the inside info re silent launch, etc.
----------------------------------------I was wondering why there had been no post about the launch of DSFL in the Member News thread, and was about to post a query on that. Back to the 64-bit topic, I gather that only minimal speed gains were achieved by the migration of C4CW to 64-bit. 64-bit floating-point-intensive programs are limited by the same FPU as limits the 32-bit versions, and they use more memory bandwidth and amount of CPU cache memory. I suggest that if WCG have resources to allocate to speeding up the science programs they should be look in the places that will give the greatest and easiest gains. If that means converting to 64 bit, fine, but there may be other optimisations that give better yields. A gain in just one project provides long-term gains in all the other projects because the duration of the faster project is shortened, leaving more computing resources for all other projects. Examples: About 1 year ago, a new version of HCC was released that cut the crunching-time of WUs by about 70%. I think it improved the usage of CPU memory. Another example: The documentation for AutoDock VINA, used in DSFL, says that it runs several times faster than the ordinary AutoDock that is used for FAAH. Converting FAAH to VINA, already on the to-do list, might be more productive than switching to 64-bit. Yet another: It is known, though I have not verified it, that running multiple CEP2 WUs simultaneously slows down all processing on the machine, including WUs of other projects. I have not seen any explanation of this, but perhaps fixing it would be very worthwhile. I have a theory on it but have no inside info and I am not always right: When a CEP2 starts up, it generates "zillions" of small data files. Accessing these would cause considerable overhead for the operating system, and clog up the system disk cache. On my machines I do not see much activity from the HDD LEDs once the initial file creation period has finished, so any bottleneck is not due to the slowness of mechanical HDDs, but it could be due to CPU time spent by the o/s in dealing with the cache. I have not looked at the code of a multi-tasking multi-processor o/s, but I think that handling the disk cache would be single-threaded, and this would cause a bottleneck if tasks on multiple processors were requiring cache activity simultaneously. My other bit of info is that I once had a brief peek in the Slot directories of a few of the CEP2 WUs. In the few subdirectories that I examined, the contents were identical across slots, meaning that they are only accessed for reading. If this is so for a large proportion of these little CEP2 data files, there might be a gain if these subdirectories were moved from the Slot directories to another directory that provides common access for all WUs. If the data change periodically, these common directories could be given names that code for a version-number. This would vastly decrease the number of files needing to be manipulated by the o/s kernel & its disk cache and speed it up. These data directory trees would only need to be created once for each version change and would thus do away with the need for every WU to create all the files on startup, which does cause mechanical HDDs a few histrionics. Just a theory ... And: Many times, members have asked for projects that use the number-crunching capabilities of high-end graphics cards (keywords GPU crunching, GPGPU, CUDA, Brook). The pros and cons have been discussed here many times. I don't know whether any of the WCG projects use the multimedia/vector extension instructions of x86 CPUs, but there might be gains to be had if they did. There might be compatibility problems between CPU models that would affect validation of WUs within each quorum, but this might be much easier to manage than incompatibilities between GPUs. And using the vector instructions would be much easier to program than GPGPU. Furthermore, only a minority of members might run GPGPU projects, while most or all x86 CPUs have some MMX instructions. Please send me my 2c at ... no, donate it to WCG. [Edit 2 times, last edit by Rickjb at Sep 2, 2011 12:43:59 PM] |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
Yet another: It is known, though I have not verified it, that running multiple CEP2 WUs simultaneously slows down all processing on the machine, including WUs of other projects. I have not seen any explanation of this, but perhaps fixing it would be very worthwhile. Why don't you mention your ideas on the CEP2 forum? I see a slowdown with multiple CEP2 work units even though I have a fast SSD (Vertex 2) and use a read/write cache, so practically all of my operations are out of main memory. It seems to be limited by the I/O capabilities of my quad-core at this point. Any help would be useful. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Rickjb,
There are 2 motivations 1. There is a gain in speed... 4-8% on my Linux and W7-64 (some had even a slower operation, so WCG in an uncertain future could test [see post] which is fastest for a device and then send either 32 or 64 bit.) 2. There are devices sniffing at WCG that are 'pure' 64 bit i.e. they can currently only participate in Clean Water. There's is a technical post explanation on this... can't seem to see it this quickly... oh wait there is also the search-able News: http://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=157 Think to not need to add more atm. --//-- |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
Thanks, Sek. I didn't know about 64-bit using the extra CPU registers, a plus for 64-bit. But in 64-bit, data objects smaller than 32 bits get put on 64-bit (8 byte) boundaries in the memory instead of 32-bit boundaries, and gobble up more cache. It would be CPUs with limited cache that are worst-affected, eg notebook and low-end desktop CPUs such as Intel Celeron & Pentium plus most AMDs.
@Jim1348: I thought of posting in the CEP2 forum, but hesitated because forum administrators frown on duplicate postings. But after reading your comment I posted a link there in thread Re: RAMDisk . Perhaps it belongs in the Suggestions/Feedback Forum. I would welcome a comment on this from the CEP people or WCG tech staff. Perhaps a CA will convey a message |
||
|
robertmiles
Senior Cruncher US Joined: Apr 16, 2008 Post Count: 443 Status: Offline Project Badges: |
Back to the 64-bit topic, I gather that only minimal speed gains were achieved by the migration of C4CW to 64-bit. 64-bit floating-point-intensive programs are limited by the same FPU as limits the 32-bit versions, and they use more memory bandwidth and amount of CPU cache memory. I've seen signs that BOINC can only assign a total of 4 GB memory to all the 32-bit applications running at once, even if it is running on a 64-bit computer where it is allowed to use considerably more memory as long as the rest is for 64-bit applications. Nothing definite, but it looks like more checking into whether this is correct would be worthwhile due to 64-bit applications not being subject to this memory restriction. Yet another: It is known, though I have not verified it, that running multiple CEP2 WUs simultaneously slows down all processing on the machine, including WUs of other projects. I have not seen any explanation of this, but perhaps fixing it would be very worthwhile. I have a theory on it but have no inside info and I am not always right: When a CEP2 starts up, it generates "zillions" of small data files. Accessing these would cause considerable overhead for the operating system, and clog up the system disk cache. On my machines I do not see much activity from the HDD LEDs once the initial file creation period has finished, so any bottleneck is not due to the slowness of mechanical HDDs, but it could be due to CPU time spent by the o/s in dealing with the cache. I have not looked at the code of a multi-tasking multi-processor o/s, but I think that handling the disk cache would be single-threaded, and this would cause a bottleneck if tasks on multiple processors were requiring cache activity simultaneously. My other bit of info is that I once had a brief peek in the Slot directories of a few of the CEP2 WUs. In the few subdirectories that I examined, the contents were identical across slots, meaning that they are only accessed for reading. If this is so for a large proportion of these little CEP2 data files, there might be a gain if these subdirectories were moved from the Slot directories to another directory that provides common access for all WUs. If the data change periodically, these common directories could be given names that code for a version-number. This would vastly decrease the number of files needing to be manipulated by the o/s kernel & its disk cache and speed it up. These data directory trees would only need to be created once for each version change and would thus do away with the need for every WU to create all the files on startup, which does cause mechanical HDDs a few histrionics. Just a theory .... I've found, that for some Windows versions (Vista and probably 7), the system disk cache tries to continue to hold any file that has been accessed by an application for as long as that application is running, due to the SuperFetch feature. I don't know if it also holds multiple copies. If so, such a change might also call for making sure that the application avoids accessing any file it does not need to access, for as long as it can. And: Many times, members have asked for projects that use the number-crunching capabilities of high-end graphics cards (keywords GPU crunching, GPGPU, CUDA, Brook). The pros and cons have been discussed here many times. I don't know whether any of the WCG projects use the multimedia/vector extension instructions of x86 CPUs, but there might be gains to be had if they did. There might be compatibility problems between CPU models that would affect validation of WUs within each quorum, but this might be much easier to manage than incompatibilities between GPUs. And using the vector instructions would be much easier to program than GPGPU. Furthermore, only a minority of members might run GPGPU projects, while most or all x86 CPUs have some MMX instructions. Another variation of this to consider: Include TWO or more versions of the application program in each workunit , compiled from the same source code but using different choices of which of the higher level CPU instructions are available. Also include a small wrapper program to check which CPU it is running on and decide which version of the application program actually runs. Note that this would call for some extra beta testing to make sure that all the versions give results close enough, and to decide if this speeds up workunits enough on the higher level CPUs to be worthwhile. [Edit 3 times, last edit by robertmiles at Sep 3, 2011 11:26:55 AM] |
||
|
|