Serge
5.5k 393What is Serge ?
Serge is a chat interface crafted with llama.cpp for running GGUF models. No API keys, entirely self-hosted !
-
🌐 SvelteKit frontend
-
💾 Redis for storing chat history & parameters
-
⚙️ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings
Serge Demo
⚡️ Quick start
🐳 Docker:
🐙 Docker Compose:
Then, just visit http://localhost:8008, You can find the API documentation at http://localhost:8008/api/docs
🖥️ Windows
Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.
☁️ Kubernetes
Instructions for setting up Serge on Kubernetes can be found in the wiki.
🧠 Supported Models
Category | Models |
---|---|
Alfred | 40B-1023 |
Code | 13B, 33B |
CodeLLaMA | 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python |
Falcon | 7B, 7B-Instruct, 40B, 40B-Instruct |
LLaMA 2 | 7B, 7B-Chat, 7B-Coder, 13B, 13B-Chat, 70B, 70B-Chat, 70B-OASST |
Med42 | 70B |
Medalpaca | 13B |
Medicine-LLM | 13B |
Meditron | 7B, 7B-Chat, 70B |
Mistral | 7B-V0.1, 7B-Instruct-v0.2, 7B-OpenOrca |
MistralLite | 7B |
Mixtral | 8x7B-v0.1, 8x7B-Dolphin-2.7, 8x7B-Instruct-v0.1 |
Neural-Chat | 7B-v3.3 |
Notus | 7B-v1 |
Notux | 8x7b-v1 |
OpenChat | 7B-v3.5-1210 |
OpenLLaMA | 3B-v2, 7B-v2, 13B-v2 |
Orca 2 | 7B, 13B |
Phi 2 | 2.7B |
Python Code | 13B, 33B |
PsyMedRP | 13B-v1, 20B-v1 |
Starling LM | 7B-Alpha |
Vicuna | 7B-v1.5, 13B-v1.5, 33B-v1.3, 33B-Coder |
WizardLM | 7B-v1.0, 13B-v1.2, 70B-v1.0 |
Zephyr | 3B, 7B-Alpha, 7B-Beta |
Additional models can be requested by opening a GitHub issue.