ollama·Beginner·Last tested: 2026-03·~5 min read

Ollama

Ollama is a local runtime for large language models. It provides a simple interface to run and manage models like Gemma, Qwen, DeepSeek, and Llama locally without requiring cloud APIs.

Key Features

Local model execution - Run LLMs entirely on your machine
REST API - HTTP endpoints for integration with applications
Multi-platform support - Works on macOS, Windows, Linux, and Docker
Model library - Access to dozens of open models from the community
Language bindings - Official Python and JavaScript SDKs
Built-in integrations - Connect to Claude, Codex, and other AI tools

Installation

macOS/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:

irm https://ollama.com/install.ps1 | iex

Docker:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Info

The installer automatically starts the Ollama service on port 11434.

Basic Usage

Run a model:

ollama run gemma3

REST API:

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{"role": "user", "content": "Why is the sky blue?"}],
  "stream": false
}'

Python SDK:

from ollama import chat

response = chat(model='gemma3', messages=[
  {'role': 'user', 'content': 'Why is the sky blue?'}
])
print(response.message.content)

Tip

Browse available models at ollama.com/library - includes models from Meta, Google, Microsoft, and others.

Notable Details

License: MIT
Language: Go
Community: 166K+ GitHub stars, active Discord and Reddit communities
Backend: Built on llama.cpp for efficient inference
Ecosystem: Extensive community integrations including web UIs, desktop apps, and IDE extensions