One of the most interesting questions surrounding the next generation of video game consoles is what sort of GPU designs they might field. This time around, all three manufacturers are rumored to be using AMD technology.
The appropriately named VGleaks.com website has a full write-up on what it claims is the next-generation Xbox (codenamed Durango’s) GPU. It’s extremely thorough — if this is a fake, it’s an excellent one — but the one thing the article doesn’t do is detail exactlyhow the Durango differs from the Graphics Core Next architecture powering AMD’s Radeon HD 7000 family.
Original image by VGLeaks
The short answer? Not very much. If VGLeaks is correct, Durango’s GPU contains 12 “SC” blocks — the term “SC” is analogous to AMD’s CU (Compute Unit). Each SC/CU contains four SIMDs, each SIMD contains a 16-lane vector unit. That works out to 768 “cores” in AMD parlance, or a bit under 40% as many as you’ll find in a Radeon 7970. Each SIMD contains 256 vector general purpose registers (vGPR) and 512 scalar general purpose registers — just like Tahiti.
Texture mapping units (TMUs) and raster outputs (ROPs) appear to be allocated in a similar manner. Durango is said to be capable of outputting 16 pixels per clock cycle. With the chip clocked at 800MHz, that gives the core a fill rate of 12.8GPixels/second. Durango’s bilinear texture fetch rate is given as 38.4 Gtexels/second when retrieving data from main memory and 153.6GB/second when reading from the 32MB ESRAM cache.
That main memory figure is in line with the fill rate on a Radeon 7770 (pre-GHz Edition).
Where are the differences?
Assuming that this information is accurate, the major difference between Durango and a standard GCN is in the cache structure. Modern Radeons have a 16K L1 cache that’s four-way set associative and a shared L2 cache that’s 16-way associative. Durango reportedly has a 64-way associative L1 cache (still at 16K) and an 8-way associative L2.
Here’s why that’s significant. A CPU/GPU cache has two goals: First, it needs to provide the data the CPU is looking for as quickly as possible. Second, it needs to be accurate. Increasing the set associativity of a cache increases the chance that the processor will find the data it is looking for — but it also increases search time.
GCN cache design.
The other major difference between GCN and Durango is the amount of L2 cache per SC/CU. The Radeon 7970 has 32 Compute Units and 768K of L2 cache total split into six 128K blocks. Durango has 12 Compute Units and 512K of L2 cache, broken into four 128K blocks. Proportionally, there’s a great deal more L2 cache serving each CU with Durango.
GCN and Durango also have different memory structures. Modern graphics cards have 1-2GB frame buffers — typically high-speed GDDR5. The next-generation Xbox eschews this for 32MB of ESRAM cache and shared bandwidth to main memory. At first glance, this looks similar to the Xbox 360′s arrangement — but the situation here is, I think, rather different.
The Xbox 360: Original image courtesy of Beyond3D
The 10MB EDRAM buffer inside the Xbox 360 provides the Xenos GPU with a huge amount of bandwidth (256GB/s) but a comparatively small amount of storage. This substantially reduced the amount of GPU traffic that would’ve otherwise been carried over the relatively anemic system bus.
Here’s where I have to pause and note an eyebrow-raising claim for the next-generation Xbox. According to leaked specs, the console will offer 8GB of RAM and 68GB/s of memory bandwidth. To put that in perspective, Intel’s Sandy Bridge-E processors, with quad-channel memory support, only offer up to 51.2GB/s of bandwidth using DDR3-1600. The only way to hit 68GB/s is to use a quad-channel memory controller and DDR3-2133. Is that technically possible? Absolutely. But given that console manufacturers are reportedly pursuing $399 and $499 SKUs for launch, it’s a surprisingly aggressive target.
Having that much system bandwidth to play with would mean Durango wouldn’t be as reliant on a small, high-speed data cache to hit its performance targets, and that likely creates additional flexibility for programmers to play with. The VGLeaks website implies that the benefits of the 32MB ESRAM cache are latency-related rather than bandwidth, and that makes sense given the figures we’ve seen.