removed by mod
fedilink
hendrik
link
fedilink
English
4
edit-2
4d

AI inference is memory-bound. So, memory bus width is the main bottleneck. I also do AI on an (old) CPU, but the CPU itself is mainly idle and waiting for the memory. I’d say it’ll likely be very slow, like waiting 10 minutes for a longer answer. I believe all the AI people use Apple silicon because of the unified memory and it’s bus width. Or some CPU with multiple memory channels. The CPU speed doesn’t really matter, you could choose a way slower one, because the actual multiplications aren’t what slows it down. But you seem to be doing the opposite, get a very fast processor with just 2 memory channels.

NeatoBuilds
link
fedilink
English
24d

So if I had more memory channels it would be better to have say ollama use the cpu versus the gpu?

hendrik
link
fedilink
English
2
edit-2
4d

Well, the numbers I find on google are: a Nvidia 4090 can transfer 1008 GB/s. And a i9 does something like 90 GB/s. So you’d expect the CPU to be roughly 11 times slower than that GPU at fetching an enormous amount of numbers from memory.

I think if you double the amount of DDR channels for your CPU, and if that also meant your transfer rate would double to 180 GB/s, you’d just be roughly 6 times slower than the 4090. I’m not sure if it works exactly like that. But I’d guess so. And there doesn’t seem to be a recent i9 with quad channel. So you’re stuck with a small fraction of the speed of a GPU if you’re set on an i9. That’s why I mentioned AMD Epyc or Apple processors. Those have a way higher memory throughput.

And a larger model also means more numbers to transfer. So if you now also use your larger memory to use a 70B parameter model instead of an 12B parameter model (or whatever fits on a GPU), your tokens will now come in at a 65th of the speed in the end. Or phrased differently: you don’t wait 6 seconds, but 6 and a half minutes.

battlesheep
creator
link
fedilink
English
24d

The i9-10900 has 4 channels (Quadro-Channel DDR4-2933 (PC4-23466, 93.9GB/​s). would this be better in this way than an i9-14xxx (Dual-Channel DDR5-5600 (PC5-44800, 89.6GB/​s))?

does the numbers (93 GB/s and 89GB/s) mean the speed for a RAM-stick or the speed all together? maybe an old i9-10xxx with 4channel-ram was better than a new dual-channel.

hendrik
link
fedilink
English
1
edit-2
4d

Seems it means all together. (5600MT/s / 1000) x 2 sticks simultaneously x 64bit / 8bits/Byte = 89.6 GB/s

or 2933/1000 x 4 x 64bit / 8 = 93.9 GB/s

so they calculated with double the DDR bus width in the one example, and 4 times the bus width in the other one. That means dual or quad channel is already factored in in those numbers. And yes, the old one seems to be slightly better than the new one. At least regarding memory throughput. I suppose everything else has been improved on. And you need to put in 4 correct RAM sticks to make use of it in the first place.

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 118 users / day
  • 548 users / week
  • 1.39K users / month
  • 3.89K users / 6 months
  • 1 subscriber
  • 4.17K Posts
  • 86.7K Comments
  • Modlog