Skip to main content

Local AI models with Ollama

Why local AI models?

Running AI models locally on your desktop provides several important advantages:

AspectLocal ModelsCloud API
PrivacyYour data stays on your computerData sent to cloud servers
CostFree after installationPay per API call
SpeedNo internet latencyDepends on connection
OfflineWorks without internetRequires internet connection
ControlFull control over your dataData handled by third parties

Why Ollama?

Ollama is the leading platform for running open-source AI models locally. Key features:

  • ✅ Easy installation and setup
  • ✅ Thousands of available models
  • ✅ Lightweight and fast
  • ✅ Cross-platform (Windows, macOS, Linux)
  • ✅ Simple model management
  • ✅ OpenAI-compatible API

Installation

1. Download and install Ollama

Visit ollama.ai and download the installer for your operating system.

2. Verify installation

After installation, verify Ollama is working:

ollama --version

For Windows, restart your terminal after installation.

3. Pull a model

Download a model (example with qwen2.5):

ollama pull qwen2.5:7b

This will download the model. Depending on your internet speed, this may take a few minutes.

Model installation

Quick start

For the best balance between quality and performance, we recommend:

ollama pull qwen3-vl:4b

This is the recommended model because it:

  • ✅ Requires only 4GB of RAM
  • ✅ Includes vision capabilities (see images)
  • ✅ Offers good speed and quality balance
  • ✅ Works well on most hardware
  • ✅ Fully open and free to use

Install other models

You can install additional models:

# Other popular models
ollama pull llama2:7b # Excellent all-purpose model
ollama pull mistral:7b # Fast and capable
ollama pull neural-chat:7b # Great for conversations

Hardware recommendations

Ollama works on various hardware. Here's what you need for different models:

Model SizeRAM neededGraphics CardPerformance
3-4B4GB minimumNot requiredFast (5-10 tokens/sec)
7B8GB recommendedOptional (faster)Good (2-5 tokens/sec)
13B+16GB+ recommendedGPU strongly recommendedSlower without GPU

GPU acceleration: If you have an NVIDIA GPU, Ollama will automatically use it for faster inference.

Configuration in the desktop app

After installing Ollama and models:

  1. Open the AI-School Desktop application
  2. Go to SettingsLocal Models
  3. Check that Ollama is detected
  4. Select your model from the dropdown
  5. You're ready to use local AI!

Available models

Popular models available via Ollama:

Vision models (can see images)

  • qwen3-vl:4b (recommended) - Fast vision model
  • llama2-vision:13b - More powerful vision model
  • minicpm-v:latest - Compact vision model

Text models

  • qwen2.5:7b - Excellent for all tasks
  • llama2:7b - Classic, well-tested
  • mistral:7b - Fast and efficient
  • neural-chat:7b - Conversational focus
  • openchat:7b - Good all-rounder

Specialized models

  • codegemma:7b - For programming tasks
  • sqlcoder:7b - SQL database queries
  • dolphin-mixtral:8x7b - Powerful mixture model

Start with qwen3-vl:4b and explore other models based on your needs!