Outcome
By the end, you will have a reusable local AI work-agent starter template that:
- Runs a small local language model through Ollama.
- Uses
qwen3:4bas the default model, matching the video and repository. - Calls Ollama's local chat endpoint at
http://localhost:11434/api/chatfrom Python. - Keeps configuration in
.envinstead of hard-coding model and path settings. - Separates connectors, which decide what the agent is allowed to see, from processors, which clean and validate outputs.
- Proves the base connection with a deterministic first test.
Privacy and access model
The key design choice from the video is not merely "use a local model." It is use a local model inside a controlled workspace. Treat the project like a workbench: local, limited, explicit, and auditable.
Local model
Python sends prompts to Ollama on localhost, not to an external cloud API.
Intentional inputs
Files should go into a narrow inbox folder before the agent is allowed to process them.
Safe defaults
Do not give the template full-computer access by default. Expand access only when a future task requires it.
Prerequisites
Required
- A laptop or desktop that can run Ollama.
- Python 3 with
venvsupport. - Git, if you want to clone the starter repository.
- Basic terminal comfort.
- Enough disk space for the Qwen3 4B model.
Assumptions
- Model:
qwen3:4b. - Ollama base URL:
http://localhost:11434. - Python dependencies:
requestsandpython-dotenv. - The real
.env, virtual environment, and private data folders stay out of Git.
Setup steps
1. Install Ollama
Install Ollama from the official site, then confirm it is available:
ollama --version
On Linux, the official install pattern is:
curl -fsSL https://ollama.com/install.sh | sh
If Ollama is installed but not running on Linux, start and check the service:
sudo systemctl start ollama
sudo systemctl status ollama
2. Pull and test Qwen3 4B
ollama pull qwen3:4b
ollama run qwen3:4b
Try a simple prompt such as:
Say hello in one sentence and confirm the local model is working.
Exit the Ollama chat with:
/bye
3. Clone the starter template
git clone https://github.com/clean-build-studio/local-ai-work-agent-template.git
cd local-ai-work-agent-template
4. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
On Windows PowerShell:
.venv\Scripts\Activate.ps1
5. Install dependencies
pip install -r requirements.txt
For this first template, the dependency list is intentionally small:
requests
python-dotenv
6. Create your local .env
cp .env.example .env
Use settings like these:
OLLAMA_MODEL=qwen3:4b
OLLAMA_BASE_URL=http://localhost:11434
AGENT_NAME=Local Work Agent
ENABLE_THINKING=false
ENABLE_FILE_ACCESS=true
ENABLE_CALENDAR_ACCESS=false
ENABLE_WEATHER_ACCESS=false
DATA_INBOX_PATH=data/inbox
DATA_PROCESSED_PATH=data/processed
DATA_OUTPUTS_PATH=data/outputs
ALLOW_FULL_COMPUTER_ACCESS=false
SAVE_OUTPUTS=true
.env.example, but do not commit your real .env, your virtual environment, or private files placed in local data folders.Recommended project structure
The video uses a framework-free layout so the moving parts are visible. A practical starter structure looks like this:
local-ai-work-agent-template/
├── app/
│ ├── main.py
│ ├── agent.py
│ ├── config.py
│ ├── prompts/
│ │ └── system_prompt.md
│ ├── processors/
│ └── connectors/
├── data/
│ ├── inbox/
│ ├── processed/
│ └── outputs/
├── examples/
├── scripts/
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md
Connectors = sensors
Connectors decide what the agent is allowed to access: a file inbox, a calendar export, or later maybe a weather API. Keep them narrow.
Processors = cleanup and validation
Processors clean model responses, validate output formats, transform data, and protect downstream code from messy model behavior.
Data folder rule
Use data/inbox for files you intentionally give the agent, data/processed for intermediate or cleaned files, and data/outputs for final summaries, task lists, and briefings.
Core code pattern
The first version of the agent has one job: send a message to the local model and get a response back. It is not reading files or using tools yet.
app/config.py: read local settings
import os
from dotenv import load_dotenv
load_dotenv()
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen3:4b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
AGENT_NAME = os.getenv("AGENT_NAME", "Local Work Agent")
ENABLE_THINKING = os.getenv("ENABLE_THINKING", "false").lower() == "true"
app/agent.py: post to Ollama's chat endpoint
payload = {
"model": OLLAMA_MODEL,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
"think": ENABLE_THINKING,
"stream": False,
"options": {
"num_predict": 80,
"temperature": 0,
"top_p": 0.1,
"num_ctx": 1024,
},
}
response = requests.post(
f"{OLLAMA_BASE_URL}/api/chat",
json=payload,
timeout=180,
)
response.raise_for_status()
app/prompts/system_prompt.md: keep the first test strict
You are a local AI work assistant.
This is a connection test.
You must return only the exact text requested by the user.
Do not explain.
Do not reason.
Do not include thinking.
Do not include markdown.
Do not include extra words.
Guardrails for Qwen3 overthinking
Qwen3 can support thinking-style output. That is useful for harder tasks, but it can be noisy for a simple connection test. The video's practical lesson is to treat this as a predictable local-model behavior to manage, not as a failure.
- Turn thinking off in configuration: set
ENABLE_THINKING=false. - Reinforce with the user prompt: if thinking is disabled, prefix the test prompt with
/no_think. - Make the system prompt strict: no explanation, no reasoning, no markdown, no extra words.
- Reduce randomness: use
temperature: 0and a lowtop_pfor the connection test. - Limit output length: use a small
num_predictso the model has enough room to answer, but not enough room to ramble. - Clean known leakage: strip
<think>...</think>blocks and reject responses that begin like a reasoning trace.
Local model connected., your cleanup function can return only that phrase if it appears anywhere in the model output. That distinguishes "the connection works" from "the model followed the format perfectly."Verify the first agent test
From the project root, with the virtual environment activated, run:
python3 -m app.main
Expected output:
Local Work Agent is starting...
Using local model: qwen3:4b
Thinking enabled: False
Agent response:
Local model connected.
If the model responds but the format is wrong, the local model path is alive; focus next on prompt strictness, thinking settings, and cleanup logic.
Troubleshooting
pythoncommand not found: usepython3on macOS/Linux.venvis missing on Linux: install it withsudo apt-get install python3-venv, then recreate the virtual environment.ModuleNotFoundError: No module named 'requests': activate the virtual environment and runpip install -r requirements.txt.- Ollama install error mentioning
zstd: installzstd, then rerun the Ollama install command. - Cannot connect to Ollama: confirm Ollama is running and that
OLLAMA_BASE_URLishttp://localhost:11434. - Model not found: run
ollama pull qwen3:4band confirm the model tag in.envmatches exactly. - Qwen returns reasoning text: verify
ENABLE_THINKING=false, use/no_think, lower randomness, limitnum_predict, and keep the system prompt strict. - The response gets cut off: raise
num_predictslightly. If the model rambles, lower it again. - Private files are at risk of being committed: update
.gitignoreso.env,.venv/, and private files underdata/are excluded.
What to build next
Once the connection template is stable, add useful work one controlled capability at a time:
- Email summary agent: start with exported or sample email text in
data/inbox, not a live inbox. - Meeting notes summarizer: read one local note, produce a concise summary and action-item list.
- Document cleanup: normalize messy text and save a cleaned version to
data/processed. - Daily briefing: combine a few intentionally provided files into a short work brief.
Sources and related links
- Clean Build Studio video: Can a Small Local AI Model Do Real Work? Python + Ollama Agent Template — primary walkthrough and design rationale.
- Clean Build Studio GitHub repository — starter template, setup commands, and project files.
- Ollama — local model runner used by the template.
- Ollama API documentation — REST API reference for local chat requests.
- Ollama Qwen3 4B model page — model tag and run/API examples.
Local source notes and transcript files were saved beside this guide in the Hermes guide-publishing workspace.