The results of the simulations revealed a considerable boost in speed with the transition from two to four multicores but an immaterial increase from four to eight multicores. When platforms with over eight multicores were tested, performance started to decrease – by sixteen multicores, the total speed was at par with the performance of only two multicores. As more chips were added, the team registered an even steeper decline.
This performance-related phenomenon could be attributed to the deficiency in memory bandwidth in addition to the conflict involving processors and the specific memory bus suitable for each. The memory bus is an essential component of the system as it is a set of wires used to relay memory addresses and data to and from the system’s random access memory (RAM).
Using a simple supermarket analogy of cashiers and checkout counters, say there are only two cashiers working, but theoretically the system can be faster with four, eight, or even sixteen cashiers. However, the checkout system would decrease in performance if the cashiers did not have immediate access to the groceries or if other cashiers kept getting in their way.
Similarly, if the cores did not have direct access to individualized memory caches (the input for the processor), it would reduce the performance of the processors – this, according to simulations of high-performance computers conducted at Sandia by Richard Murphy, Arun Rodrigues, and former student Megan Vance.
“The difficulty is contention among modules,” says James Peery, director of Sandia’s Computation, Computers, Information, and Mathematics Center. “The cores are all asking for memory through the same pipe. It’s like having one, two, four, or eight people all talking to you at the same time, saying, ‘I want this information.’ Then they have to wait until the answer to their request comes back. This causes delays.”
The famous claim of Moore’s Law, which states that the rate of increase in the number of transistors on an integrated circuit will double roughly every two years, can only be realized by extending development efforts of multicore technologies. “Multicore gives chip manufacturers something to do with the extra transistors successfully predicted by Moore’s Law,” Rodrigues says. “The bottleneck now is getting the data off the chip to or from memory or the network.”
Other ways to combat this issue would be to increase the clock speed of single cores, given that the considerable majority of applications are designed for single-core platforms on word processors, music, and video applications. However, power consumption, increased heat, and basic laws of physics concerning parasitic currents call for designers to extend their bounds in enhancing chip speeds for common silicon processes.
TFOT has previously written about the “Larrabee,” – Intel’s upcoming many-Core GPU, which would be the market’s first “many-core,” likely containing ten or more individual x86 processor cores inside the silicon package – the company expects the product to be ready for launch sometime in the next couple of years.. AMD is also developing a 12-core processor codenamed Magny-Cours and targeted for release in the first half of 2010.
Additional information on Sandia’s research can be obtained here.
Image icon credit: Randy Montoya, Sandia National Laboratories