publication croisée depuis : https://lemmy.world/post/1474932
Hi there.
I wanted to run LLMs locally on my server (for better privacy), and was wondering if:
- I could use Intel ARC/AMD GPUs - these are often less expensive and AMD has open source drivers, which is something I like.
- If a PCIe x4 Gen 3 slot would be enough (it’s an x16 slot with x4 speeds) - this is an important consideration.
- Would 8GB of RAM (in the GPU, I believe it’s called VRAM?) be enough?
I’m looking at language models to train on my Reddit and Lemmy content, in an aim to make it write like me (and maybe even better than me? Who knows). I don’t quite know which models I will train, or how I will do so (I certainly won’t be writing anything from scratch), but I was wondering; with the explosion of FOSS AI models, maybe something like this would be possible with the hardware constraints I mentioned above?
Does the speed of the connection between the GPU and the CPU really matter in such applications?
Thanks!
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Just ask if you want some clarification.
As for GPU, I’m waiting… IMHO it’s just too expensive now. And sadly, Nvidia is currently the only game in town. Some software works on amd, but just about everything works on Nvidia.
That said, my PC has 48gb system ram, and I can run 65b models on it with about 1s per token. With a few layers offloaded to my 10gb GPU. That would otherwise require 2x 3090 or 4090 (2x4090 would be about 20x faster though…)
I certainly will! I’m just not very good with maths either, and although I know what floating point numbers are, I would have to read more about it to make sure I understand your comment.
Those are some insane requirements to run models haha. How long does it take for you to train your models on datasets (for me, a “dataset” would be my entire Reddit/Lemmy comment history)?