| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 23
|
|
| Author |
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
(MM and JB) Thanks for your answers about the effects of OS word size (32/64 bit) on WU throughput. It seems that we have a difference of experience, at least with HCC. Even if the speed difference is mostly in integer code, all projects would be affected to some degree. I would like to see some actual CPU time comparisons for FAAH, DDDT, etc., and I would also like to see some logical explanations. There is potentially a speed gain for all of WCG in this question.
----------------------------------------I could possibly explain a small difference with floating-point, but not for integer. AFAK, under many flavours of Unix (TM), when a user-mode application such as a WCG science app executes an FPU instruction, it may trap to the OS (kernel-mode)(software interrupt), and the FPU would run in 64-bit mode in the case of a 64-bit OS. There could then be small speed gains. However, an app's integer instructions would be done in user-mode, and I don't know how speeds could be different. We need a real hardware/OS-software guru here - perhaps someone out there in Linux-land knows. MM: Re the Nehalem: It may (or may not) be possible to copy the entire BOINC data directory from an LGA775 machine onto the Nehalem, and run some of the WUs there. The CPU times for the same WUs on the 2 machines could then be compared. Run the test with the Nehalem offline, and scrub data afterward. (BOINC 6.2.xx uses a separate data directory). Thanks, too, d_v. See also thread http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=22018 [Edit]: Any 32-bit app that makes a large number of system-calls, eg I/O, could run faster under a 64-bit OS because those system calls will be done in 64-bit mode. Maybe HCC spends lots of time doing system calls. [Edit 2 times, last edit by Rickjb at Oct 18, 2008 4:32:33 PM] |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
MM,
----------------------------------------Since we are discussing real throughput and not granted credits here you should forget that I am using Linux for the 64-bit mode. I mentioned it because I do not want to hide anything on the experiment conditions, but I think that the key point in "Ubuntu 64" in that case is in "64" and not in "Ubuntu". As I have said earlier in this thread I think the performance benefit would be the same between XP64 and XP32. It's only that I have no recycled XP64 at hand and I have no intention to pay yet another SW license to Microsoft. To both Rick and MM. I have been too lazy to look for an authorized article describing the main features of the architecture of a Q6600 so I can only make guesses on why the benchmark and the real throughput are so much better for integer crunching in 64-bit mode. Maybe the procs are able to do some kind of paralleling when they work on integers in 64-bit mode? Maybe it is more simple than that and the difference is simply in feeding the CPUs: integer numbers being half the size of FP ones you can keep twice more integers in the same cache size and push twice more of them at once between RAM, cache and the CPUs (and that would not be possible in 32-bit mode?). Too simple, or even simplistic? Maybe. Anybody working in processor design is welcome to bring us sound explanations, but the results are there. Regarding "some actual CPU time comparisons for FAAH, DDDT, etc..." I have made some for DDDT but I prefer to keep them for myself because the "weight" of DDDT units is not homogeneous. First, the nature of the process itself is not the same and, DDDT being not deterministic, "anything can happen". Next, and this is more important, DDDT WUs encapsulate several elementary jobs to keep them not too short and to avoid overloading the servers. So you cannot assume that figures for a given DDDT WU can be fairly compared to the figures of another one. Especially in my case: since I am using Linux on one side and XP in the other one WUs come from two different pools which may be much different. I have also observed the same "discrepancy" between two sets of DDDT WUs in the same system but at different dates. Anyway I have never been able to find a marked advantage between 64-bit and 32-bit modes for this project. I have also done a few ones with FAAH, but honestly much less. Because the diversity of durations seems even bigger for FAAH than for DDDT and that leads nowhere. Also I do not like the way this "diversity" is disrupting the scheduling of the Boinc client in my quad. Cheers. Jean. |
||
|
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Just wanted to stop in and thank both Jean and Sek for their insight and advice.
----------------------------------------I heard today from WCG that I was correct in my assumption that hiding the computers doesn't mask that cpu credit per second on the page I linked to earlier in this thread. I'll wait till the NDA is lifted and then get this beast up and on the program.. Just thinking, anyway to store 6-7 days work at the initial contact on the BOINC app so I can work it the week before the NDA lifts and then dump all the work at once or is that credit per second's work done number based on the BOINC benchmark app? Either way, this sucker will get some work done this fall! ![]() ![]() |
||
|
|
|