Type something to search...
Ollama

Ollama

Ollama

63.3k 4.5k
04 May, 2024
  Go

What is Ollama ?

Ollama is LLMs Backend that allow you to get up and running with large language models locally.

Run Llama 2, Code Llama, and other models. Customize and create your own.


Install Ollama

macOS

Download the latest release on Github :

Download

Linux & WSL2

Terminal window
curl https://ollama.ai/install.sh | sh

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.


Quickstart

To run and chat with Llama 2:

Terminal window
ollama run llama2

Manual Install on Linux

Download the ollama binary

Ollama is distributed as a self-contained binary. Download it to a directory in your PATH:

Terminal window
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama

Create a user for Ollama:

Terminal window
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

Create a service file in /etc/systemd/system/ollama.service :

[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target

Then start the service:

Terminal window
sudo systemctl daemon-reload
sudo systemctl enable ollama

Install CUDA drivers (optional – for Nvidia GPUs)

Download and install CUDA.

Verify that the drivers are installed by running the following command, which should print details about your GPU:

Terminal window
nvidia-smi

Start Ollama

Start Ollama using systemd :

Terminal window
sudo systemctl start ollama

Update Ollama

Update ollama by running the install script again:

Terminal window
curl https://ollama.ai/install.sh | sh

Or by downloading the ollama binary:

Terminal window
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama

Viewing logs

To view logs of Ollama running as a startup service, run:

Terminal window
journalctl -u ollama

Uninstall Ollama

Remove the ollama service:

Terminal window
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /etc/systemd/system/ollama.service

Remove the ollama binary from your bin directory (either /usr/local/bin , /usr/bin , or /bin ):

Terminal window
sudo rm $(which ollama)

Remove the downloaded models and Ollama service user:

Terminal window
sudo rm -r /usr/share/ollama
sudo userdel ollama

Model library

Ollama supports a list of open-source models available on ollama.ai/library

Here are some example open-source models that can be downloaded:

ModelParametersSizeDownload
Llama 27B3.8GBollama run llama2
Mistral7B4.1GBollama run mistral
Dolphin Phi2.7B1.6GBollama run dolphin-phi
Phi-22.7B1.7GBollama run phi
Neural Chat7B4.1GBollama run neural-chat
Starling7B4.1GBollama run starling-lm
Code Llama7B3.8GBollama run codellama
Llama 2 Uncensored7B3.8GBollama run llama2-uncensored
Llama 2 13B13B7.3GBollama run llama2:13b
Llama 2 70B70B39GBollama run llama2:70b
Orca Mini3B1.9GBollama run orca-mini
Vicuna7B3.8GBollama run vicuna
LLaVA7B4.5GBollama run llava

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.


Customize a model

Import from GGUF

Ollama supports importing GGUF models in the Modelfile:

  1. Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.

Terminal window
FROM ./vicuna-33b.Q4_0.gguf
  1. Create the model in Ollama

Terminal window
ollama create example -f Modelfile
  1. Run the model

Terminal window
ollama run example

Import from PyTorch or Safetensors

See the guide on importing models for more information.

Customize a prompt

Models from the Ollama library can be customized with a prompt. For example, to customize the llama2 model:

Terminal window
ollama pull llama2

Create a Modelfile :

Terminal window
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Next, create and run the model:

Terminal window
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.

For more examples, see the examples directory. For more information on working with a Modelfile, see the Modelfile documentation.


CLI Reference

Create a model

ollama create is used to create a model from a Modelfile.

Terminal window
ollama create mymodel -f ./Modelfile

Pull a model

Terminal window
ollama pull llama2

This command can also be used to update a local model. Only the diff will be pulled.

Remove a model

Terminal window
ollama rm llama2

Copy a model

Terminal window
ollama cp llama2 my-llama2

Multiline input

For multiline input, you can wrap text with """ :

Terminal window
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal models

Terminal window
>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.

Pass in prompt as arguments

Terminal window
$ ollama run llama2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

List models on your computer

Terminal window
ollama list

Start Ollama

ollama serve is used when you want to start ollama without running the desktop application.


Building

Install cmake and go :

Terminal window
brew install cmake go

Then generate dependencies:

Terminal window
go generate ./...
```sh
Then build the binary:
```sh
go build .

More detailed instructions can be found in the developer guide

Running local builds

Next, start the server:

Terminal window
./ollama serve

Finally, in a separate shell, run a model:

Terminal window
./ollama run llama2

REST API

Ollama has a REST API for running and managing models.

Generate a response

Terminal window
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'

Chat with a model

Terminal window
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'

See the API documentation for all endpoints.


Community Integrations

Web & Desktop

Terminal

Database

Package managers

Libraries

Mobile

Extensions & Plugins