ExLlama 3.0k
A fast inference library for running LLMs locally on modern consumer-class GPUs.
vLLM 18.9k
A high-throughput and memory-efficient inference and serving engine for LLMs.
gpt4all 64.8k
Open-source large language models that run locally on your CPU and nearly any GPU.
Kobold cpp Rocm 276
AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading.
Kobold cpp 3.8k
KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models.
Mudler LocalAGI 310
LocalAGI is a small virtual assistant that you can run locally, made by the LocalAI author and powered by it.
Mudler LocalAI 19.9k
Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required.
Ollama 63.3k
Ollama is LLMs Backend that allow you to get up and running with large language models locally.
Text Gen WebUI 36.5k
Oobabooga Text Generation WebUI is a Gradio browser interface for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
Llama.cpp 57.4k
The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook.