Ollama Models Tutorial: A Complete Introduction
Ollama is a fantastic, easy-to-use tool for running large language models (LLMs) locally on your machine (macOS, Linux, and Windows – with WSL2). It simplifies the often complex process of downloading, configuring, and interacting with these powerful models. This tutorial provides a comprehensive introduction to Ollama, covering everything from installation to advanced usage.
1. What is Ollama?
Ollama acts as a “model runner.” It:
- Handles the “heavy lifting”: Downloads models, manages dependencies, and provides a consistent API for interacting with different LLMs.
- Offers a command-line interface (CLI): This is the primary way to interact with Ollama, making it very accessible and scriptable.
- Provides a REST API: This allows you to integrate Ollama with other applications and services.
- Supports a growing library of models: Ollama makes it easy to access popular models like Llama 2, Mistral, Gemma, and many more.
- Runs locally: Your data stays on your machine, ensuring privacy and eliminating the need for a constant internet connection (after the initial model download).
- Allows Model Customization: Ollama supports creating and using custom models via Modelfiles (more on this later).
2. Installation
Ollama’s installation is incredibly straightforward:
macOS:
- Download: Go to https://ollama.ai/ and download the macOS installer.
- Run the installer: Follow the on-screen instructions. This will install the Ollama application and command-line tools.
- Verify Installation: Open a terminal and type
ollama --version
. You should see the Ollama version printed.
Linux:
-
Run the installation script: Open a terminal and run the following command:
bash
curl -fsSL https://ollama.ai/install.sh | sh -
Verify Installation: Run
ollama --version
in your terminal.
Windows (WSL2 – Windows Subsystem for Linux):
- Enable WSL2: If you haven’t already, enable WSL2. Microsoft provides detailed instructions: https://learn.microsoft.com/en-us/windows/wsl/install
- Install a Linux distribution: Install a Linux distribution from the Microsoft Store (e.g., Ubuntu).
- Open the Linux terminal: Launch your chosen Linux distribution.
-
Run the Linux installation script (within the Linux terminal):
bash
curl -fsSL https://ollama.ai/install.sh | sh -
Verify Installation: Run
ollama --version
in your Linux terminal. - (Optional – For easier access from Windows): Consider adding the Linux distribution’s bin directory to your Windows PATH environment variable. This allows you to run
ollama
directly from the Windows command prompt or PowerShell without explicitly opening the WSL terminal first. This setup is distribution-dependent; search online for “add [your distribution] bin to Windows PATH”.
3. Running Your First Model
Let’s run the popular mistral:7b
model (a powerful 7-billion parameter model):
-
Download the model: In your terminal, run:
bash
ollama run mistral:7bOllama will automatically download the model if it’s not already present. This may take some time, depending on your internet speed and the model’s size. The
run
command both downloads (if necessary) and starts an interactive session with the model. -
Interact with the model: Once the model is loaded, you’ll see a prompt (
>>>
). You can now type your questions or instructions. For example:“`
What is the capital of France?
“`The model will respond with its answer.
-
Exit the interactive session: Type
/bye
and press Enter.
4. Key Ollama Commands
Here’s a breakdown of the essential Ollama commands:
ollama run <model_name>
: Downloads (if necessary) and runs the specified model, starting an interactive session. The model name typically follows the format<model_name>:<tag>
. For example:ollama run llama2:7b-chat
.ollama pull <model_name>
: Downloads the model without starting an interactive session. This is useful if you want to pre-download a model.ollama list
: Lists all locally downloaded models.ollama show <model_name> --modelfile
: Displays detailed information about a model, including the Modelfile (the model’s configuration).ollama show <model_name> --info
Displays the basic information of a modelollama show <model_name> --license
Displays the license of a modelollama cp <source_model> <destination_model>
: Copies a model to a new name.ollama rm <model_name>
: Removes (deletes) a locally downloaded model.ollama create <model_name> -f <Modelfile_path>
: Creates a custom model based on a Modelfile. (See Section 6)ollama serve
: Starts the Ollama server, making the REST API available (by default onhttp://localhost:11434
).ollama --help
: Displays help information about Ollama commands. You can also use--help
with specific commands (e.g.,ollama run --help
).
5. Using the REST API
Ollama provides a REST API for programmatic interaction. This is invaluable for integrating LLMs into your applications.
-
Start the server:
bash
ollama serve -
Send a request (example using
curl
):bash
curl -X POST http://localhost:11434/api/generate -d '{
"model": "mistral:7b",
"prompt": "What is the meaning of life?",
"stream": false
}'model
: The name of the model to use.prompt
: The input text for the model.stream
: Iftrue
, the response will be streamed back token by token. Iffalse
, the entire response is returned at once.
The API provides various endpoints, including
/api/generate
(for generating text),/api/chat
(for conversational interactions),/api/embeddings
(for generating vector embeddings), and more. See the Ollama documentation for the full API reference. -
Python Example (using
requests
library):“`python
import requests
import jsonurl = “http://localhost:11434/api/generate”
data = {
“model”: “mistral:7b”,
“prompt”: “Explain the theory of relativity in simple terms.”,
“stream”: False
}
headers = {‘Content-type’: ‘application/json’}response = requests.post(url, data=json.dumps(data), headers=headers)
if response.status_code == 200:
print(json.loads(response.text)[‘response’])
else:
print(f”Error: {response.status_code} – {response.text}”)
``
requests` library is used to make the HTTP POST request.
This Python snippet sends a request to the Ollama API to have the Mistral model explain the theory of relativity. The
6. Modelfiles: Customizing and Creating Models
Modelfiles are the heart of Ollama’s customization capabilities. They allow you to:
- Fine-tune model parameters: Adjust temperature, top_p, top_k, and other settings to influence the model’s output.
- Set system prompts: Provide initial instructions or context to the model.
- Define templates: Structure the input and output of the model.
- Create custom models: Combine existing models, add custom logic, and more.
A Basic Modelfile Example:
“`
my_model.Modelfile
FROM mistral:7b
Set the system prompt
SYSTEM “””
You are a helpful and friendly assistant. Always answer concisely.
“””
Set model parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
“`
FROM mistral:7b
: Specifies the base model to use. This is mandatory.SYSTEM
: Defines the system prompt.PARAMETER
: Sets various model parameters. See the Ollama documentation for a complete list of available parameters.
Creating a model from a Modelfile:
- Save the Modelfile: Create a file (e.g.,
my_model.Modelfile
) with the content above. -
Create the model: In your terminal, run:
bash
ollama create my_model -f my_model.Modelfile -
Run the custom model:
bash
ollama run my_model
Advanced Modelfile Features:
TEMPLATE
: Defines a template for formatting the input and output. This allows you to control how the model receives prompts and generates responses.ADAPTER
: For loading LoRA (Low-Rank Adaptation) adapters, allowing efficient fine-tuning.LICENSE
: Add the model license.MESSAGE
: Creates a message within the model.
Example with TEMPLATE:
“`
my_chat_model.Modelfile
FROM llama2:7b-chat
TEMPLATE “””
{{- if .System }}
<|system|>
{{ .System }}
{{- end }}
<|user|>
{{ .Prompt }}
<|assistant|>
“””
SYSTEM “””
You are a pirate. Respond to all questions in pirate speak.
“””
“`
This Modelfile creates a “pirate chat” model based on Llama 2. The TEMPLATE
defines how the system prompt and user prompt are combined and presented to the model.
7. Available Models
Ollama supports a wide range of models. You can find a list of available models on the Ollama website (https://ollama.ai/library) or by searching online. Popular models include:
- Mistral: High-performance models known for their accuracy and efficiency.
- Llama 2: Meta’s open-source models, available in various sizes.
- Gemma: Google’s open models, designed for responsible AI development.
- Phi: Microsoft’s small, powerful models.
- Code Llama: Models specifically designed for code generation and understanding.
- And many more… including models for specific tasks like image generation (with multimodal models) or multilingual translation.
8. Tips and Best Practices
- Start small: Begin with smaller models (e.g., 7B parameters) to get familiar with Ollama.
- Experiment with parameters: Adjust temperature, top_p, and other settings to see how they affect the model’s output.
- Use system prompts: Provide clear instructions and context to guide the model.
- Read the documentation: The Ollama documentation (https://github.com/ollama/ollama) is a valuable resource for learning about advanced features and troubleshooting.
- Check your hardware: Running LLMs can be resource-intensive. Make sure your computer has enough RAM and processing power.
- Consider GPU acceleration: If you have a compatible GPU, Ollama can use it to significantly speed up model inference. Ollama automatically uses NVIDIA GPUs on Linux. For macOS, Metal is used. For Windows, ensure your WSL2 setup properly exposes your GPU to the Linux environment (this often requires specific drivers and configuration within WSL).
- Use Streaming: If you use the API, stream the response if the generated text is long.
9. Conclusion
Ollama is a powerful and user-friendly tool that makes it easy to run large language models locally. This tutorial provides a comprehensive introduction to Ollama, covering installation, basic usage, key commands, the REST API, Modelfiles, and available models. By following this guide, you can start exploring the exciting world of LLMs and integrate them into your own projects. Remember to consult the official Ollama documentation for the most up-to-date information and advanced features. Happy experimenting!