Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 24
Posts: 24   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2769 times and has 23 replies Next Thread
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hyperthreading support

This one comes up time and time again. So once more, here it goes...

Short answer:
Rosetta IS using 100% of your CPU. There is nothing going to waste. The task manager is displaying incorrect information...as it does on ALL HT processors.

This question comes up because people don't understand how hyperthreading works. They believe the Intel marketing spin and think they know what's going on.

Detailed Answer:
HyperThreading(HT) is Intel's brand name for Simultaneous MultiThreading (SMT)...the generic name for this capability in computer science. Intel is not the first chip maker to use this technology...not even close. It's been around for years.

In order to understand HT, you need to understand how an operating system schedules work on the CPU. A CPU can only run one program at a time. We can run multiple programs because modern operating systems multi-task. i.e. the OS runs one program after another very quickly, but the CPU is only ever running a single program at any given time.

The operating system sees multi-threaded programs no differently than having to schedule multiple processes. In short, it sees each thread as a separate "task". So again, at any one time, the CPU is only ever running a single "task".

Intel developed HT primarily because their CPUs were becoming less efficient as they were climbing the GHz ladder. In order to bump their CPUs higher and higher, they needed to increase the length of their instruction pipeline. A longer pipeline has several problems though:
1. Some instructions take mutliple cycles to execute, thus leaving gaps in the pipeline.
2. If there is a wrong guess in the branch prediction unit, the whole pipeline has to be flushed and started again. Thus when Intel processors guess wrong, they really take a hit compared to other processors.

They can't do anything about #2 except get better at branch prediction, but #1 can sort of be solved by introducing HT to the CPU.

HT works by inserting another thread's instructions into the gaps in the pipeline left by the currently running thread. This makes the pipeline more efficient. So really, at best, you're getting a few extra instructions done per cycle...which, if perfect, would be quite a boost in efficiency. There is one big drawback to HT though:
Another thread's instructions could conflict with the currently running thread causing thrashing. (contention with integer or floating point units, RAM, cache, disk, etc.)

This is why some benchmarks show improvement with HT and some show a loss in performance. It all depends on what's running on the system at any given time and how the instructions are layed out in each thread going through the CPU. Remember, you have many processes and threads running on the system. Any one that's runnable can be run at the same time as any other runnable thread.

But how does another thread's instructions get inserted into the pipeline since the OS can only run 1 task at a time. The answer is that the CPU tricks the OS into thinking there are 2 processors. By tricking the OS into thinking there are 2 processors, the OS will schedule 2 tasks to run at the same time.

That's why the Task Manager shows 2 CPUs when there is really only 1. It's also why it shows 50% when the CPU is actually running at 100%. 1 CPU is running to capacity and the "other" is not (and it can't...because there isn't one). The numbers for the 2nd CPU are not reliable as it's a complete fake. It's an indication of some HT efficiency, but that's really all you can get out of it (from Task Manager anyway).

Rosetta is not multi-threaded nor is it likely to be. Changing something to be multi-threaded doesn't mean it will run any faster and sometimes it actually slows the program down if it's not done properly.

Rosetta is still utilizing your HT processor to it's full extent, task manager just isn't telling you the real story. However, it also means that other programs running on your system my get an ever so slight boost as other thread's instructions are being inserted into the gaps in the Rosetta pipeline.

SMP, is the most efficient way of running multiple threads, because there are 2 of every resource...there is no contention (except for possibly RAM issues...but hey, that's why SMP boards are so expensive :-)

Dual Core is sort of a poor man's SMP. It's not as efficient as SMP, but it's good enough and the price is right.

HT is a fool's SMP. It's all a trick. While there is some benefit to it, there is just as much of a chance for performance loss as there is gain...in either cases it's usually not that much compared to the same CPU with HT turned off. There are a few applications which lend themselves to really benefiting from HT (and a few which really die on HT). If you're using one of these, then by all means use HT.

The only way you can really compare efficiency is to run a reproducable, broad spectrum benchmark of some kind that closely resembles your own workload. Remember, you can't dictate which threads get run together...the OS decides that...so sometimes you'll gain...and sometimes you'll loose.

Hmmm...this seems to have turned into a bit of a rant. Sorry if it got confusing. I really hope it helped clarify what HT is and how it works and that you now understand why the Task Manager is basically lying to you on an HT system.
----------------------------------------
Rick Alther
Former World Community Grid Developer
----------------------------------------
[Edit 3 times, last edit by Alther at Nov 8, 2005 2:08:04 AM]
[Nov 8, 2005 2:05:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Interesting and I believe you are correct but Intel did claim that HT was more effecient at multitasking than SMP, must depend on what's runnin like you state.
[Nov 8, 2005 3:35:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Interesting and I believe you are correct but Intel did claim that HT was more effecient at multitasking than SMP, must depend on what's runnin like you state.

Could you please point me to that claim? Maybe their marketing dept. is narrowing the scope of the word "efficient" (e.g. price-wise).

As I said in my previous post, HT is only sometimes slightly faster than a non-HT processor. HT (and SMT in general) will likely get better in the future, but now it's hit or miss.
----------------------------------------
Rick Alther
Former World Community Grid Developer
----------------------------------------
[Edit 1 times, last edit by Alther at Nov 8, 2005 6:49:09 PM]
[Nov 8, 2005 6:48:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Alther,

I think some of your comments are misleading. Mostly true, but misleading.

I have a good bit of experience with SMT on the POWER platform and I can tell you that in our benchmarks, for most multi-threaded applications, there IS a substantial increase in throughput when SMT is utilized.

I understand perfectly well about how Windoze calculates CPU util and how that is misleading. Calculating % CPU util requires the algorithm to predict the remaining capacity in the system, which can't be done unless the contents of the instruction stream is completely known. It never is.

Even is there is only a single FP unit, it is possible that you could get more throughput with your app, IF your app could be made multi-threaded and IF there is a significant amount of time spent outside the FPU. In fact, I have read that there is even some possible parallelism in the FPU if instructions in the two streams are using different resources.

The thing that bothers me about your rant is that it implies that other apps will not benefit much from SMT or HT. That is simply not true. While it is clearly NOT as good as SMP, as you point out, it is NOT a fake and is NOT useless. It just 'depends'. (as with so many performance-related things)

Anyway, why not simply allow 2 instances of the app to run and test it. Make it a config option. Dual core systems are coming out now anyway, so you might as well exploit them.


Jim Cioffi
[Nov 23, 2005 2:59:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Alther,

I think some of your comments are misleading. Mostly true, but misleading.

I have a good bit of experience with SMT on the POWER platform and I can tell you that in our benchmarks, for most multi-threaded applications, there IS a substantial increase in throughput when SMT is utilized.

I don't believe it's misleading at all. I don't know much about the POWER architecture, but SMT on Pentium at least is a hit or miss thing, primarily due to their very long pipeline. Sometimes it helps individual apps, sometimes it hurts. Sometimes it helps the overall system, sometimes it slows the overall system down.

I understand perfectly well about how Windoze calculates CPU util and how that is misleading. Calculating % CPU util requires the algorithm to predict the remaining capacity in the system, which can't be done unless the contents of the instruction stream is completely known. It never is.

It's nowhere near that complicated. The OS doesn't need to know what's in the pipeline. It just queries the performance counters on the chip. Intel does the work for them...as it should.

Even is there is only a single FP unit, it is possible that you could get more throughput with your app, IF your app could be made multi-threaded and IF there is a significant amount of time spent outside the FPU. In fact, I have read that there is even some possible parallelism in the FPU if instructions in the two streams are using different resources.

Yes, Rosetta and Fight AIDS, and just about any other scientific application we are likley to run, spend most of they're time doing FP calculations. Unless the application can do some significant calculations independent of other calculations, it makes no sense to even bother trying to make Rosetta, or any other scientific application multi-threaded. Most of these applications can't do this because they rely on prior calculations. i.e. it's all sequential.

The thing that bothers me about your rant is that it implies that other apps will not benefit much from SMT or HT. That is simply not true. While it is clearly NOT as good as SMP, as you point out, it is NOT a fake and is NOT useless. It just 'depends'. (as with so many performance-related things)

The common perception among the non-technical folks seems to be that if they have an HT processor, they are running twice as fast (or significantly faster) than a non-HT processor with the same speed rating. This simply is NOT true. Not even close. The confusion comes from Intel's marketing spin and the lack of understanding among most people to really know how the technology works.

All I'm saying is that HT may or may not be beneficial to you. It depends entirely on the set of applications you have running at any given time. Some will work well together and you'll see a small performance boost, others don't work well and you'll see degraded performance. As you say, "it depends", on your individual computing habits.

Anyway, why not simply allow 2 instances of the app to run and test it. Make it a config option. Dual core systems are coming out now anyway, so you might as well exploit them.

That's also out of our hands with the UD client and UD isn't going to change it. The BOINC client, however, allows you allocate all of your processors (including fake HT "processors") to a project and it will run n projects at the same time.
----------------------------------------
Rick Alther
Former World Community Grid Developer
[Dec 5, 2005 4:41:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
smile Re: hyperthreading support

You may not get exactly twice the CPU power but darn close to it lol. Haven't you ran any other apps that support HT and seen them peg Both monitors at 100% ? WCG runs 50% on both on average. Now unless it is programmed to look like that on purpose it is NOT utilizing all CPU cycles available plain and simple. I can also tell this becuase I can multitask just fine and even run some fairly demanding video games with WCG going. I like that but also know that means it's not using all that's available.


--
(This post has been edited for profanity - nelsoc)


This is because these programs are written with multiple threads, in the case of DVD Decrypter one thread encodes, one thread talks to the hard drive, one thread does the video and one thread manages the threads (lol on that one). That is why they are indeed faster on HT systems. That doesn't change the fact that your CPU still can only do one FP thread at a time, and that's where almost all of your cruching on this project is done.
[Dec 5, 2005 6:03:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Here's a more everyday analogy and maybe it will help people visualize how HT works.

Think of the CPU as a major roadway. Lots of traffic on it and they don't have stop signs or traffic lights. The traffic just flows.

Think of the "2nd" processor (the HT thread) as a secondary roadway that intersects the main highway.

The vehicles on each represent CPU instructions, with the major roadway being the primary thread that's running and the vehicles on the secondary roadway waiting for a gap in traffic so they can turn onto the major road.

When there is a large enough gap in the traffic on the major roadway, a car from the secondary roadway can merge into the traffic and "execute". If the vehicle happens to be a tractor trailer (a large instruction), that gap has to be pretty large and several smaller gaps may pass by while waiting for a gap of the correct size.

Thus, you can see that only a few vehicles from the secondary road ever get a chance to merge into the primary traffic for any given time period. This explains why you can only get a slight boost from HT.

Now, say there is a tollbooth up ahead on the primary road and the only currency (a resource) they accept are guilders. However, a car from the secondary roadway only has florins. The tollbooth won't let the car through and all traffic now stops completely (the pipeline gets flushed and all the instructions in it must be reexecuted). This is where HT can cause a performance hit.

Turning HT off is just like putting a roadblock up on the secondary roadway. The main traffic goes through just as it always does, but there is now no chance for resource contention from the other traffic (yes, the pipeline can still be flushed if a branch prediction or something else goes wrong, but that's always there).

Also, you need to remember that the "2nd CPU" in the Task Manager measures this secondary roadway, so "100% utilization" here means a few cars were able to merge, while "100% utilization" for CPU 1 means many, many cars have passed by. The 2 "cpus" are not equal by any means.

Some specialized and highly tuned applications can code their software up so that traffic is nicely distributed on both roadways so there are more gaps in the primary traffic so more cars from the secondary roadway can merge in. Video editing software comes to mind here where most of the time is spent in SSE instructions (which are "large", and thus leave a larger gap in traffic allowing more secondary vehicles to merge).

Finally, remember that our grid apps run at the lowest priority on the system. Rosetta/AutoDock run only when there is absolutely nothing else left to run. They are immediatley kicked from the CPU when something else needs it. This explains why you can run very demanding software (games, compilers, encoders, etc.) with a grid app running and see no performance hit (unless you start paging due to RAM constraints...but that's another story).

Does this help visualize HT for folks?
----------------------------------------
Rick Alther
Former World Community Grid Developer
[Dec 8, 2005 5:20:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nealschim
Cruncher
Joined: Nov 24, 2005
Post Count: 14
Status: Offline
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Yes biggrin good explanation...
----------------------------------------

[Dec 9, 2005 6:36:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Excellent!!! One of the best explanations I've seen/heard.

Just joined today (partly 'cause SETI Classic has been beamed-up biggrin), so be gentle with a newbie.

I have several machines, one is a HT 3.2 running 2003 Ent Ed. This machine was running the SETI cmd=line - 2 instances, each set with an affinity for a specific 'processor'. Periodic checking of Task Mangler showed both 'processors' (virtual processors? threads?) were running 100%

So my question is, since running other apps can keep both threads at 100%, is there a way to have the GUI WCG client run 100% on both?

Or at least trick us hardware newbies into thinking the system is 100% busy? <grin>
[Dec 11, 2005 9:03:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hyperthreading support

Hello Steerpike_ca,
No. Bad bad idea. The problem is with the Global OS variable(s) that the UD Windows agent uses. So two or more instances will all be using the same synchronization variable. BOINC can run multiple instances (it is designed to run on multiple processors / cores) but UD.exe cannot. You have to establish a full virtual environment for UD.exe using VMWare or similar vrtual machine environment.
[Dec 12, 2005 12:28:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 24   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread