Local AI Agent Guide

Build a Local AI Work Agent Template with Python, Ollama, and Qwen3 4B

Use this guide to set up the same starter pattern shown in Clean Build Studio's video: a small, local-first Python project that talks to Ollama on your own computer, keeps work files inside an intentional workspace, and verifies the foundation with one clean response: Local model connected.

Updated: 2026-05-18 Difficulty: Beginner Time: 30-60 minutes Primary path: framework-free Python + Ollama
1

Outcome

By the end, you will have a reusable local AI work-agent starter template that:

  • Runs a small local language model through Ollama.
  • Uses qwen3:4b as the default model, matching the video and repository.
  • Calls Ollama's local chat endpoint at http://localhost:11434/api/chat from Python.
  • Keeps configuration in .env instead of hard-coding model and path settings.
  • Separates connectors, which decide what the agent is allowed to see, from processors, which clean and validate outputs.
  • Proves the base connection with a deterministic first test.
Python project
localhost Ollama
Qwen3 4B
Clean response
Important framing: this is not a full autonomous work agent yet. It is the foundation for later experiments such as summarizing notes, extracting action items, cleaning documents, and creating daily briefings without uploading private work data to a cloud AI tool.
2

Privacy and access model

The key design choice from the video is not merely "use a local model." It is use a local model inside a controlled workspace. Treat the project like a workbench: local, limited, explicit, and auditable.

Local model

Python sends prompts to Ollama on localhost, not to an external cloud API.

Intentional inputs

Files should go into a narrow inbox folder before the agent is allowed to process them.

Safe defaults

Do not give the template full-computer access by default. Expand access only when a future task requires it.

Do not skip the boundary: the template is most useful when it prevents accidental exposure. A local AI agent should not scan your whole computer just because it technically can.
3

Prerequisites

Required

  • A laptop or desktop that can run Ollama.
  • Python 3 with venv support.
  • Git, if you want to clone the starter repository.
  • Basic terminal comfort.
  • Enough disk space for the Qwen3 4B model.

Assumptions

  • Model: qwen3:4b.
  • Ollama base URL: http://localhost:11434.
  • Python dependencies: requests and python-dotenv.
  • The real .env, virtual environment, and private data folders stay out of Git.
4

Setup steps

1. Install Ollama

Install Ollama from the official site, then confirm it is available:

ollama --version

On Linux, the official install pattern is:

curl -fsSL https://ollama.com/install.sh | sh

If Ollama is installed but not running on Linux, start and check the service:

sudo systemctl start ollama
sudo systemctl status ollama

2. Pull and test Qwen3 4B

ollama pull qwen3:4b
ollama run qwen3:4b

Try a simple prompt such as:

Say hello in one sentence and confirm the local model is working.

Exit the Ollama chat with:

/bye

3. Clone the starter template

git clone https://github.com/clean-build-studio/local-ai-work-agent-template.git
cd local-ai-work-agent-template
Note: the repository README uses a placeholder clone path in one section. Use the actual Clean Build Studio repository above if you are following the video source directly.

4. Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate

On Windows PowerShell:

.venv\Scripts\Activate.ps1

5. Install dependencies

pip install -r requirements.txt

For this first template, the dependency list is intentionally small:

requests
python-dotenv

6. Create your local .env

cp .env.example .env

Use settings like these:

OLLAMA_MODEL=qwen3:4b
OLLAMA_BASE_URL=http://localhost:11434
AGENT_NAME=Local Work Agent

ENABLE_THINKING=false
ENABLE_FILE_ACCESS=true
ENABLE_CALENDAR_ACCESS=false
ENABLE_WEATHER_ACCESS=false

DATA_INBOX_PATH=data/inbox
DATA_PROCESSED_PATH=data/processed
DATA_OUTPUTS_PATH=data/outputs

ALLOW_FULL_COMPUTER_ACCESS=false
SAVE_OUTPUTS=true
Git safety: commit .env.example, but do not commit your real .env, your virtual environment, or private files placed in local data folders.
5

Recommended project structure

The video uses a framework-free layout so the moving parts are visible. A practical starter structure looks like this:

local-ai-work-agent-template/
├── app/
│   ├── main.py
│   ├── agent.py
│   ├── config.py
│   ├── prompts/
│   │   └── system_prompt.md
│   ├── processors/
│   └── connectors/
├── data/
│   ├── inbox/
│   ├── processed/
│   └── outputs/
├── examples/
├── scripts/
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

Connectors = sensors

Connectors decide what the agent is allowed to access: a file inbox, a calendar export, or later maybe a weather API. Keep them narrow.

Processors = cleanup and validation

Processors clean model responses, validate output formats, transform data, and protect downstream code from messy model behavior.

Data folder rule

Use data/inbox for files you intentionally give the agent, data/processed for intermediate or cleaned files, and data/outputs for final summaries, task lists, and briefings.

6

Core code pattern

The first version of the agent has one job: send a message to the local model and get a response back. It is not reading files or using tools yet.

app/config.py: read local settings

import os
from dotenv import load_dotenv

load_dotenv()

OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen3:4b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
AGENT_NAME = os.getenv("AGENT_NAME", "Local Work Agent")
ENABLE_THINKING = os.getenv("ENABLE_THINKING", "false").lower() == "true"

app/agent.py: post to Ollama's chat endpoint

payload = {
    "model": OLLAMA_MODEL,
    "messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    "think": ENABLE_THINKING,
    "stream": False,
    "options": {
        "num_predict": 80,
        "temperature": 0,
        "top_p": 0.1,
        "num_ctx": 1024,
    },
}

response = requests.post(
    f"{OLLAMA_BASE_URL}/api/chat",
    json=payload,
    timeout=180,
)
response.raise_for_status()

app/prompts/system_prompt.md: keep the first test strict

You are a local AI work assistant.

This is a connection test.

You must return only the exact text requested by the user.

Do not explain.
Do not reason.
Do not include thinking.
Do not include markdown.
Do not include extra words.
7

Guardrails for Qwen3 overthinking

Qwen3 can support thinking-style output. That is useful for harder tasks, but it can be noisy for a simple connection test. The video's practical lesson is to treat this as a predictable local-model behavior to manage, not as a failure.

  1. Turn thinking off in configuration: set ENABLE_THINKING=false.
  2. Reinforce with the user prompt: if thinking is disabled, prefix the test prompt with /no_think.
  3. Make the system prompt strict: no explanation, no reasoning, no markdown, no extra words.
  4. Reduce randomness: use temperature: 0 and a low top_p for the connection test.
  5. Limit output length: use a small num_predict so the model has enough room to answer, but not enough room to ramble.
  6. Clean known leakage: strip <think>...</think> blocks and reject responses that begin like a reasoning trace.
Deterministic test idea: because this first test expects exactly Local model connected., your cleanup function can return only that phrase if it appears anywhere in the model output. That distinguishes "the connection works" from "the model followed the format perfectly."
8

Verify the first agent test

From the project root, with the virtual environment activated, run:

python3 -m app.main

Expected output:

Local Work Agent is starting...
Using local model: qwen3:4b
Thinking enabled: False

Agent response:
Local model connected.
Success check: this proves Python can talk to Ollama locally, Ollama can load Qwen3 4B, and the agent layer can clean the response enough to return the exact output expected.

If the model responds but the format is wrong, the local model path is alive; focus next on prompt strictness, thinking settings, and cleanup logic.

9

Troubleshooting

  • python command not found: use python3 on macOS/Linux.
  • venv is missing on Linux: install it with sudo apt-get install python3-venv, then recreate the virtual environment.
  • ModuleNotFoundError: No module named 'requests': activate the virtual environment and run pip install -r requirements.txt.
  • Ollama install error mentioning zstd: install zstd, then rerun the Ollama install command.
  • Cannot connect to Ollama: confirm Ollama is running and that OLLAMA_BASE_URL is http://localhost:11434.
  • Model not found: run ollama pull qwen3:4b and confirm the model tag in .env matches exactly.
  • Qwen returns reasoning text: verify ENABLE_THINKING=false, use /no_think, lower randomness, limit num_predict, and keep the system prompt strict.
  • The response gets cut off: raise num_predict slightly. If the model rambles, lower it again.
  • Private files are at risk of being committed: update .gitignore so .env, .venv/, and private files under data/ are excluded.
10

What to build next

Once the connection template is stable, add useful work one controlled capability at a time:

  1. Email summary agent: start with exported or sample email text in data/inbox, not a live inbox.
  2. Meeting notes summarizer: read one local note, produce a concise summary and action-item list.
  3. Document cleanup: normalize messy text and save a cleaned version to data/processed.
  4. Daily briefing: combine a few intentionally provided files into a short work brief.
Expansion rule: add one connector, one processor, and one success check at a time. The point is practical local AI: small models, real tasks, private data, and low overhead.
11

Sources and related links

Local source notes and transcript files were saved beside this guide in the Hermes guide-publishing workspace.