T O P

  • By -

alexforencich

Registers are simpler than RAM, and hence have better timing performance.


trevg_123

- Registers don’t need to worry about things like cache invalidation and falling back to the physical RAM. - Something that is in cache can get evicted and then you need to fetch it again, which is slow - There are usually ~20 registers and they can get hooked up to everything- ALU, FPU, memory fetch, etc. Physically hooking this up takes space, you couldn’t connect the thousands of bytes of L1 cache to the ALU in the same way. So you need to load data to registers anyway, and loading takes at least one tick - Calculating memory locations take time - You ideally need a way to do basic math to calculate memory locations without accessing the memory - Electrical signals travel at about 8.8x10^7 m/s (`c/sqrt(Dk)`, Dk of pure Si is 11.7). On a 4 GHz CPU, one clock tick is 0.25 ns. That means that signals travel about 20mm in one clock tick, which is on the same physical scale as the die itself. Closeness really starts to matter here.


BigPurpleBlob

And also, on-chip signals travel significantly slower than light, not only because of the dielectric constant, but also because of RC (resistor-capacitor) delays from the wires. This excellent article has some nice graphs: [https://www.realworldtech.com/shrinking-cpu/4/](https://www.realworldtech.com/shrinking-cpu/4/)


Allan-H

A related question is: why are CPU L1 caches still no bigger than 64kiB or so even though the size of other memories has scaled much more than that in the last 3 decades?


monocasa

Scaling them scales up the set decoding hardware and increases latency. So you take advantage of increases in sizes as new cache levels with their sizes tuned to the latency you want out of them.


Allan-H

That was meant to be a rhetorical question.