Altima NEO

I’ve got a 3090, and I feel ya. Even 24 gigs is hitting the cap pretty often and slowing to a crawl once system ram starts being used.

@brucethemoose@lemmy.world

You can’t let it overflow if you’re using LLMs on windows. There’s a toggle for it in the Nvidia settings, and get llama.cpp to offload though its settings (or better yet, use exllama instead).

But…. Yeah. Qwen 32B fits in 24GB perfectly, and it’s great, but 72B really feels like the intelligence tipping point where I can dump so many API models, and that won’t fit in 24GB.

removed by mod

removed by mod

Selfhosted