Can I run local LLMs on Intel ARC/AMD with 8GB of RAM?

Terrasque

Another thing, llama.cpp support offloading layers to gpu, you could try opencl backend for that for non-nvidia gpu’s. But llama.cpp can also run on cpu-only, with usable speed. On my system, it does about 150ms per token on a 13b model.

koboldcpp is probably the most straight forward to get running, since you don’t have to compile, it has a simple UI to set launch parameters, and it also have a web ui to chat with the bot in. And since it use llama.cpp it support everything that does, including opencl (clblast in launcher)

@MigratingtoLemmy@lemmy.world

Thanks, I’ll take a look

Can I run local LLMs on Intel ARC/AMD with 8GB of RAM?

Can I run local LLMs on Intel ARC/AMD with 8GB of RAM?

Selfhosted