Here the article I referenced earlier where Anandtech is projecting Scorpio real world bandwidth usage to be in the same class as the Pros despite the significant theoretical difference.
"What makes things especially interesting though is that Microsoft didn’t just switch out DDR3 for GDDR5, but they’re using a wider memory bus as well; expanding it by 50% to 384-bits wide. Not only does this even further expand the console’s memory bandwidth – now to a total of 326GB/sec, or 4.8x the XB1’s DDR3 – but it means we have an odd mismatch between the ROP backends and the memory bus. Briefly, the ROP backends and memory bus are typically balanced 1-to-1 in a GPU, so a single memory controller will feed 1 or two ROP partitions. However in this case, we have a 384-bit bus feeding 32 ROPs, which is not a compatible mapping.
What this means is that at some level, Microsoft is running an additional memory crossbar in the SoC, which would be very similar to what AMD did back in 2012 with the Radeon HD 7970. Because the console SoC needs to split its memory bandwidth between the CPU and the GPU, things aren’t as cut and dry here as they are with discrete GPUs. But, at a high level, what we saw from the 7970 is that the extra bandwidth + crossbar setup did not offer much of a benefit over a straight-connected, lower bandwidth configuration. Accordingly, AMD has never done it again in their dGPUs. So I think it will be very interesting to see if developers can consistently consume more than 218GB/sec or so of bandwidth using the GPU."
So...326 vs 218. I feel like we're missing something. M's engineers are not idiots.
The ROP count isn't just a solid number, the clock-speed affects ROPs. So let's break this down.
32 at 911mhz = 218gb/s
32 at 1172 = Xgb/s
Theoretical maximum the ROP units can have in Memory bandwidth, it does not work that way, but since AnandTech assumes it does, we'll do it too. As there is overhead, and you'd never want to use your entire bandwidth just for the rasterizer output pipeline alone. The CPU also wants some of that juicy bandwith you know
. But let's just say for the sake of Anandtech it does fill it up completely, and we do know that the ROPs are affected by clockspeed of the GPU.
the 32's we can stripe away as they are the same for either system. So we can just solve for X.
X = (1172/911)*218 = 280.456641 or 281gb/s
So if we keep the constants we come at a 281gb/s bandwidth for the Scorpio. Which is correct considering it is clocked about 28% faster.
Now if we subtract this value from the 326gb/s number.
326gb/s - 281gb/s = 45gb/s.
So the "crossbar" is 45gb/s and probably reserved for the CPU to access the memory without bothering the rasterizer operations pipeline. It can also probably be used by the GPU itself (As it goes through the GPU) for other tasks than graphical (for instance physics calculations. This actually is a smart design, as it would allow the CPU to access and manage it's memory via it's own dedicated controller/path while having plenty of bandwith for itself to keep it fed. 45gb/s is plenty even more than enough for a Jaguar. In fact AMD chips are notorious for performing better with higher clocked memory/more bandwith, unlike Intel where it matters less usually, on am AMD it can give you a nice boost. So if it is indeed a dedicated 45gb/s for the Jaguar + (as the power management stuff is something I really want to learn more about, it seems very interesting and novel and unlike the normal Jaguar/Puma chips), then we might have found the reason why it punches above it's weight.
So yes, by just going "OMG it's 218 just like the Pro", Anandtech doesn't do itself any favours and is actively spreading FUD. They of all people should know that ROP is more than just their number alone and that clock speed affects the ROP speed. That is assuming that MS put 32 ROPs on there (that isn't even certain yet). It could be more, and it could be less. I expect MS to cheap out on these things.
, they've done that too much as of late. (One Drive and Windows phones are recent examples. ).
PS: With ROP's we are talking about fillrate, not bandwith
. But since AnandTech chooses to go this route, I found it funny to stick with it. The ROP fillrate should align with the memory controller's bandwith. So you can't just say it has <x> bandiwth and since it is 32 in both they are teh same speed. No with a higher clockspeed the fillrate increases but with a higher clockspeed also the Memory controller's speed increases. It just isn't as simple as "32 = 32", there are a lot more factors to compare.