‘Larrabee’ – Intel’s Upcoming Many-Core GPU

Intel Corporation will be unveiling a paper detailing the first ever “many-core” blueprint named “Larrabee.” Larrabee is a new approach to the software rendering 3-D pipeline, a ”many-core” programming model and performance analysis tool for several applications. Larrabee would be the market’s first “many-core” likely containing ten or more individual x86 processor cores inside the silicon package and the company expects it to be ready in 2009 or 2010.

 Larrabee slide block diagram (Credit: Intel)
Larrabee slide block diagram (Credit: Intel)

The personal computer graphics market would be Larrabee’s first application. The “many-core” is based on an array of multiple processors, where the individual processors are similar to Intel processors which power laptops, PCs, and servers that access the internet.

With the expectation of dozens, hundreds, and thousands of cores to power future computers, Larrabee is the starting point for industry-wide endeavour to develop and optimize new software which can utilize the numerous cores. Intel’s largest investment is focused onto this tera-scaled research, and simultaneously Intel is working together with over 400 universities, DARPA, and companies such as Microsoft and HP. Nevertheless, Intel has a number of internal teams, projects, and software-related efforts underway to speed the advancements.

This project is able to provide developers freedom to expand upon the opportunity into many areas and market segments. As an example, as games and applications become more realistic, they are still restricted to a limited framework. The multi-core system would allow a developer to have unlimited freedom and a blank canvas to produce some of the best 3-D applications.

The basic architecture of Larrabee imitates the now outdated Intel Pentium processor, which applies a short execution pipeline with a fully coherent cache structure. A dual-issue processor allows two instructions that meet certain criteria to be executed in parallel, and it takes a minimal amount of logic to implement this very simple form of out-of-order execution. There are important modern improvements made to the architecture such as a wide vector processing unit (VPU), multi-threading, 64-bit extensions, and sophisticated pre-fetching. This will increase available computational power combined but at the same time conserve familiarity and ease of programming with the similar Intel architecture. 

The system also includes a number of fixed function logic blocks to support graphics and other applications. These units are precisely selected to balance strong performance per watt and to help in the flexibility and programmability of the architecture. A coherent on-die 2nd level cache enables resourceful inter-processor communication and also for high-bandwidth local data which can be accessed by CPU cores, thus making the writing of software programs effortless.

The Larrabee native programming model sustains a range of highly parallel applications, comprising of those that draw on irregular data structures. This allows growth of graphics APIs, rapid innovation of new graphics algorithms, and true general purpose computation on the graphics processor with reputable PC software development tools. The system also includes task scheduling, which is executed completely using software, compared to using fixed function logic. Therefore, rendering pipelines and other complex software systems can adjust their resource scheduling based each workload’s unique computing demand.

The Larrabee architecture is able to execute four threads per core with separate register sets per thread. This permits the use of an uncomplicated effective in-order pipeline, but preserves many of the latency-hiding advantages of more complex out-of-order pipelines when running highly parallel applications.

The architecture also utilizes a 1024 bits-wide, bi-directional ring network (i.e., 512 bits in each direction) which aids agents in communicating with each other in a low latency manner producing super fast communication between cores. Furthermore, it wholly supports IEEE standards for single and double precision floating-point arithmetic. Support for these standards is a pre-requisite for many types of tasks including financial applications.

Preliminary implementations of the architecture will aim discrete graphics applications, support DirectX and OpenGL, and run existing games and programs. Moreover, highly parallel applications including scientific and engineering software will receive the rewards from the Larrabee native C/C++ programming model.

TFOT has previously written about Intel’s new system-on-a-chip with an EP80579 Integrated Processor family which can be applied to security, mobile Internet devices, storage, communications, and industrial robotics applications. TFOT also recently covered AMD’supcoming 12-Core Processor, which serves as competition to Intel, targeted for release in the first half of 2010. You can also check out our article about the Qosmio quad core HD processor from Toshiba Corporation which uses the streaming media processing power of the Toshiba’s Quad Core HD Processor SpursEngine SE1000, planned to reach the Japanese market in late July 2008. 

Additional information on Intel’s Larrabee can be obtained from Intel’s website.