Self hosting an LLM for research

lemmyvore

Can they not get a TPU on USB, like the Coral Accelerator or something?

Terrasque

It’s less the calculations and more about memory bandwidth. To generate a token you need to go through all the model data, and that’s usually many many gigabytes. So the time it takes to read through in memory is usually longer than the compute time. GPUs have gb’s of RAM that’s many times faster than the CPU’s ram, which is the main reason it’s faster for llm’s.

Most tpu’s don’t have much ram, and especially cheap ones.

Self hosting an LLM for research

Self hosting an LLM for research

Selfhosted