@uis@lemm.ee

If you “talk to” a LLM on your GPU, it is not making any calls to the internet,

No, I’m talking about https://en.m.wikipedia.org/wiki/External_memory_algorithm

Unrelated to RAGs

@brucethemoose@lemmy.world

https://en.m.wikipedia.org/wiki/External_memory_algorithm

Unfortunately that’s not really relevant to LLMs beyond inserting things into the text you feed them. For every single word they predict, they make a pass through the multi-gigabyte weights. Its largely memory bound, and not integrated with any kind of sane external memory algorithm.

There are some techniques that muddy this a bit, like MoE and dynamic lora loading, but the principle is the same.

removed by mod

removed by mod

Selfhosted