Keresés

Új hozzászólás Aktív témák

  • Petykemano

    veterán

    válasz Petykemano #5859 üzenetére

    IC: One of the modern design choices of the modern x86 core is the decode width of the variable instruction set - Intel and AMD's highest performance cores, all the way back since Ryzen, have been 4-wide. However, we're seeing dual 3-wide designs or 6-wide designs, relying on the op-cache to save power. Obviously 4-wide was great for AMD in Zen 1, and we're still at 4-wide for Zen 3: where does the roadmap go from here, and from a holistic perspective how does the decode width size of x86 change the fundamental IPC modelling?

    MC: I think it comes back to that balance aspect, in the sense that I think going beyond four with the number of transistors and the smarts we have in our branch predictor, and the ability to feed it worked fine. But we are going to go wider, you're going to see us go wider, and to be efficient, we'll have the transistors around the front end of the machine to make it the right architectural decision. So it's really having the continuous increase in transistors that we get, allowing us to beef up the whole design to continue to get more and more IPC out of it.

    IC: On the concept of cache – AMD’s 3D cache announcement leading to products coming next year is obviously quite big. I'm not going to ask you about specific products, but the question is more about how much cache is the right amount? It’s a stupidly open ended question, but that's the way it's intended!
    MC: It's a great question! It's not just even about how much is the right amount, but at what level, what latency, what is sharing the cache and so on. As you know, those are all trade-offs that we have to decide how to make, and understand what that will mean for software.
    We have chosen that our core complex is going to have to a split L3 (in VCache). If we had one gigantic L3 shared across all the threads, the more you share a giant L3 across the threads, the latency of a given thread gets longer. So you're making a trade-off there of sharing, or getting more capacity and a lower thread count versus the latency it takes to get it. So we balanced for trying to hit on that lower latency, providing great capacity at the L3 level. That's the optimization point we've chosen, and as we continue to go forward, getting more cores, and getting more cores in a sharing L3 environment, we’ll still try to manage that latency so that when there are lower thread counts in the system, you still getting good latency out of that L3. Then the L2 - if your L2 is bigger then you can cut back some on your L3 as well.

    IC: TSMC has showcased an ability to stack 12 die with TSVs, similar to the V-Cache concept. Realistically, how many layers could be supported before issues such as the thermals of the base die become an issue?
    MC: There’s a lot to architecting those levels beyond the base architecture, such as dealing with temperature, and there's a lot of cost too. That probably doesn’t answer your question, but different workloads obviously have different sensitivity to the amount of cache, and so being flexible with it, being able to have designs both with stacking and without stacking, is critical because some workloads. [Always having stacked cache] would be way too expensive for the performance uplift it would bring for some use cases. I can't really comment on how many levels of stacking we can do or we will do, but it's an exciting technology that kind of continues to grow.

    [link]

    [ Szerkesztve ]

Új hozzászólás Aktív témák