Guidance on choosing CPUs

This is a short guide covering some issues concerning modern CPU performance and how to hopefully choose the best CPU for your task.

General Advice

Broadly speaking when purchasing hardware you should consider what kind of applications you are going to run on it. If you have a single or small number of processes then it's unlikely that using a processor with a large number of cores will improve performance. This is particularly true of NUMA type architectures, where not only may you be choosing a slower processor, you may also be slowing down that processors access to data in memory. If you are choosing hardware for a specific task then it is worth spending some time looking at what processes will be running on it, what their memory footprints will be and what kind of IO they will be using. Ideally you should test on existing hardware before purchase.

If you are going to be running a small number of single threaded proceses then, particularly if they will be CPU bound, you should seriously consider single processor solutions with the smallest number of cores and the fastest clockspeeds.

Sizing processors for general purpose use is harder but you should probably avoid choosing a large number of cores with a relatively small amount of memory as the overhead of accessing "non local" memory may offset any advantage you get from the extra cores.

Clock speed and Single threaded performance

Historically processor performance (at least within the x86 architecture) could be aligned with clockspeed which improved steadily year on year, the '80s and '90s saw CPU performance double every 18-20 months. until the mid 2000s when it seems that chip design hit a number of limits (heat dissipation, die size...) which effectively stalled CPU clock speeds at around 3-4GHz(1). Since then CPU development has largely concentrated on hypertrheading and multicore designs and to a certain extent desktop cpu design has followed server architectures (NUMA etc). In some cases it appears that single threaded CPU bound apps will actually run slower on newer CPU cores. In a 2011 talk at The Salishan Conference on High Speed Computing Chuck Moore indicated that single threaded performance may actually be decreasing with newer hardware as the CPU manufacturers trade off individual core speed with the fastest processor throughput.

decpref.png singlethreadedfloat.png

However in the same period multithreaded applications have seen a dramatic increase in system performance with the increases in core density.

We have seen similar indications running local benchmarks on a range of hardware.

NUMA Architecture.

NUMA (Non-uniform memory access) architecture dedicates individual banks of memory to specific processors in an attempt to speed up memory access by avoiding blocking when multiple processors attempt to access the same memory locations. Conversely if your process needs to access memory belonging to another processor you will take a performance hit as the architecture will have to negotiate passing the data. In the case of 4 processor intel designs this can take up to twice as long as (assuming the processors are laid out in a square pattern, with links on the sides) you may have to go through an intermediery. AMD avoid this by interconnecting all the processors. When sizing NUMA servers it's important to specify the hardware such that you can run all the processes that will acess common data on the same processor and also specify enough memory that at least all the common data can fit in the memory bank allocated to that processor. Usually the memory is by default shared equally amongst processors but it may be possible to allocate more memory to individual processors at a bios or hardware (DIMM slot) level. In linux it is possible to control the memory allocation policy (see numaclt)

Xeon vs Pentium (server vs Desktop)

In general terms Intel will bring new tech thorugh on the desktop platform first and it will hit the server marked 6-12 months afterwards, if you are searching for the best single threaded performance this may be particularly attractive as such iCore processors may suit your needs better than Xeons. It is possible to purchase some rackmounted servers with effectively desktop (workstation) type motherboards and processors or indeed if you are searching for very fast performance it may make be attractive to purchase a desktop since this would allow more of the cost to be spent on the processor. Purchasing a desktop in such circumstances is attractive, particularly given that the hardware could be refreshed several times over the lifetime of an equivalent server chassis but comes with a bunch of management/housing issues. Currently there doesn't seem to be enough demand for such a machine to allow us to have identified an appropriate rachmounted solution.

Executive Summary

For CPU bound processes size the kit so that you get one core per process (plus one or two for OS). Maximise the processors then maximise the core count. When doing this size the processors such that all the processes accessing shared data can fit in the processors local memory. Choose the fastest processor from the group you now have from a reliable set of benchmarks (assuming you can obtain them).

If you are writing your own code consider parellalising the task and if so invest time in doing this. If it is highly parallel then invest in learning CUDA/OpenCL and buy a GPU.

References

Multi-core and multi-threading performance (the multi-core myth?)

https://blogs.oracle.com/cmt/entry/a_few_thoughts_about_single http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/

-- IainRae - 03 Aug 2014

Topic attachments
I Attachment Action Size Date Who Comment
pngpng decpref.png manage 87.0 K 25 Aug 2014 - 12:46 IainRae  
pngpng singlethreadedfloat.png manage 80.6 K 04 Aug 2014 - 12:34 IainRae graphs showing specfloat figures for various CPUs against their release date
pngpng singlethreadedint.png manage 74.1 K 04 Aug 2014 - 12:33 IainRae graphs showing specinf figures for various CPUs against their release date
Topic revision: r6 - 23 Oct 2014 - 11:28:55 - IainRae
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies