I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:
I’d be happy about any recommendations!
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
I’m also aware of LocalAI with automatic model swapping and OpenAI compatible API.
But unless I’m mistaken, they all use ggml behind the scenes? So you might want to look for something that uses vllm or exllama or something if you want a completely different backend.
Vllm unfortunately doesn’t support switching the model without a restart.
I would not recommend LocalAI. There documentation is somewhat lacking and it’s an all in one utility with many moving parts. The parts also tend to break, quite often.