Beyond the Hype: Building Your Own Local, Context-Aware "Scout" AI Assistant with Ollama and LangChain

If you’ve been scrolling through Hacker News this week, you probably saw the headlines about Microsoft’s new AI initiative, Scout. Designed to be a deeply integrated, highly proactive "addictive" AI personal assistant, Scout represents Big Tech’s latest push to insert agentic AI into every nook and cranny of our daily operating systems. While the marketing teams are busy pitching this as the ultimate productivity hack, those of us in the developer community are looking at this with a healthy dose of skepticism—and curiosity.

As developers, we don't just want to use these tools; we want to know how they work under the hood. More importantly, we care deeply about data privacy, offline capabilities, and avoiding vendor lock-in. Why should we stream our keystrokes, terminal commands, and private repository structures to a remote corporate cloud just to get context-aware assistance?

Today, we are going to demystify the tech behind proactive AI assistants like Microsoft Scout. Better yet, we’re going to build our own local, privacy-first developer assistant—let's call it DevScout. By the end of this post, you'll have a running terminal assistant that watches your local workspace, reads your active code context, and uses a local Large Language Model (LLM) to provide proactive assistance without sending a single byte of data to the cloud.

Understanding the Architecture of an Active AI Assistant

What makes an assistant like Scout different from a standard chat interface like ChatGPT or Claude? It boils down to two concepts: proactivity and ambient context.

A standard LLM chat is reactive; you paste code, ask a question, and get an answer. An active assistant, however, operates on a continuous feedback loop. It runs quietly in the background, observes state changes in your development environment (file saves, terminal errors, git branch changes), processes this context, and suggests solutions before you even ask.

To build our local alternative, we need three core components:

The Brain (Local LLM): We’ll use Ollama to run a highly capable open-source developer model (like llama3.1 or qwen2.5-coder) locally on our machine.
The Context Engine (Vector Database & File Watcher): We’ll write a Python daemon that watches our local project workspace for changes and maintains an active context window.
The Agentic Loop (LangChain): We will use LangChain to orchestrate the prompt construction, tool calling, and execution.

Here is a high-level look at the system architecture we are building:

+-----------------------------------------------------------+
|                      Local Workspace                      |
|  [main.py]  --->  (File System Events)  ---> [Watchdog]    |
+-----------------------------------------------------|-----+
                                                      | (Detects Changes)
                                                      v
+-----------------------------------------------------------+
|                     Context Manager                       |
|  - Reads modified file context                             |
|  - Truncates & formats payload                            |
+-----------------------------------------------------|-----+
                                                      | (Payload)
                                                      v
+-----------------------------------------------------------+
|                     DevScout Agent                        |
|  - System Prompt (Rules of engagement)                    |
|  - Local LLM via Ollama (qwen2.5-coder / llama3)           |
+-----------------------------------------------------------+

Step 1: Setting Up the Local LLM with Ollama

Before writing any code, we need to spin up our local inference engine. Ollama makes this incredibly straightforward. If you haven't installed it yet, head over to their website or use Homebrew if you're on macOS/Linux:

brew install ollama
ollama serve

Once Ollama is running, we want to download a model optimized for coding tasks. qwen2.5-coder:7b is currently one of the best performing open-source models for local code generation and reasoning, but llama3.1:8b is also an excellent general-purpose choice. Let's pull the Qwen coder model:

ollama pull qwen2.5-coder:7b

You can verify the model is working locally by running:

ollama run qwen2.5-coder:7b "Write a Python function to check if a number is prime."

Step 2: Building the Workspace Watcher

To make our assistant proactive, it needs to monitor our workspace. We'll use Python's watchdog library to monitor file changes in our project directory. When a file is saved, our agent will analyze the changes and proactively offer optimizations, security fixes, or test cases.

Let's set up our project directory and install the necessary dependencies:

mkdir devscout-agent
cd devscout-agent
pip install watchdog langchain-community langchain-core requests

Now, let's create our watcher script. Save this file as watcher.py. This script will listen for file modification events, filter out unwanted directories (like .git, __pycache__, or node_modules), and pass the file contents to our local LLM agent.

import time
import os
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from agent import DevScoutAgent

class WorkspaceHandler(FileSystemEventHandler):
    def __init__(self, agent):
        self.agent = agent
        self.last_triggered = 0
        self.cooldown = 3 # Avoid triggering multiple times on rapid saves

    def on_modified(self, event):
        if event.is_directory:
            return
        
        # Ignore common dependency and hidden directories
        ignored_paths = ['.git', '__pycache__', 'node_modules', '.venv', 'env']
        if any(ignored in event.src_path for ignored in ignored_paths):
            return

        # Simple debounce logic
        current_time = time.time()
        if current_time - self.last_triggered < self.cooldown:
            return
        self.last_triggered = current_time

        print(f"\n[DevScout] Detected modification in: {os.path.basename(event.src_path)}")
        self.process_change(event.src_path)

    def process_change(self, file_path):
        try:
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()
            
            # Send the changed code to our local agent
            self.agent.analyze_code(file_path, content)
        except Exception as e:
            print(f"[Error] Failed to read {file_path}: {e}")

def monitor_directory(path_to_watch):
    agent = DevScoutAgent()
    event_handler = WorkspaceHandler(agent)
    observer = Observer()
    observer.schedule(event_handler, path=path_to_watch, recursive=True)
    observer.start()
    
    print(f"🕵️ DevScout is actively watching: {os.path.abspath(path_to_watch)}")
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
        print("\nDevScout has stopped watching.")
    observer.join()

if __name__ == "__main__":
    # Watch the current directory
    monitor_directory(".")

Step 3: Creating the Agentic Engine

Now we need to write the logic that powers our agent. Create a file named agent.py in the same directory. This script will interface with our local Ollama instance using LangChain's Ollama integration.

We'll configure our prompt to ensure the agent acts as an experienced software engineer and remains concise. We don't want our console flooded with fluff; we want direct, actionable feedback.

import os
from langchain_community.llms import Ollama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class DevScoutAgent:
    def __init__(self):
        # We target our local Ollama instance running the qwen2.5-coder model
        self.llm = Ollama(
            model="qwen2.5-coder:7b",
            temperature=0.2 # Low temperature for more deterministic, precise code generation
        )
        self.parser = StrOutputParser()
        self.prompt_template = ChatPromptTemplate.from_messages([
            ("system", """You are "DevScout", a highly skilled, silent pair-programmer. 
You are observing a developer write code in real-time. 
Your task is to analyze the file they just saved and provide high-value, actionable feedback.

Rules of Engagement:
1. ONLY respond if you see critical bugs, performance issues, security vulnerabilities, or obvious refactoring opportunities.
2. If the code looks clean, robust, and well-written, respond with exactly: "No issues detected. Code is clean!"
3. Be incredibly concise. Do not explain basic concepts. Provide code blocks for suggested improvements.
4. Focus heavily on security (e.g., hardcoded secrets, SQL injection, unsafe parsing)."""),
            ("user", "File: {file_path}\n\nCode Content:\n```\n{code_content}\n```")
        ])
        
        # Compose the LCEL (LangChain Expression Language) chain
        self.chain = self.prompt_template | self.llm | self.parser

    def analyze_code(self, file_path, code_content):
        # Skip empty files
        if not code_content.strip():
            return

        print("[DevScout] Analyzing code quality and security...")
        try:
            # Invoke the local chain
            response = self.chain.invoke({
                "file_path": file_path,
                "code_content": code_content
            })
            
            # Format the console output
            output = response.strip()
            if "No issues detected" in output:
                print("🟢 DevScout: Everything looks great!")
            else:
                print("\n🔔 DevScout Alert & Suggestions:")
                print("================================================================================")
                print(output)
                print("================================================================================\n")
        except Exception as e:
            print(f"[Error] Agent invocation failed: {e}")

Testing Our Local "Scout"

Let's test our local system to see how it performs compared to a cloud-based agent. Run the watcher script in your terminal:

python watcher.py

Now, open a new terminal tab and create a file in the same folder named vulnerable_app.py. We will write some deliberately bad code to see if our local scout catches it:

import sqlite3

def get_user_data(user_id):
    # Intentional SQL injection vulnerability and unclosed connection
    conn = sqlite3.connect("database.db")
    cursor = conn.cursor()
    query = f"SELECT * FROM users WHERE id = '{user_id}'"
    cursor.execute(query)
    return cursor.fetchall()

Save the file. Immediately check the terminal where watcher.py is running. You should see DevScout react instantly, printing out a security alert and a code block suggesting parameterized queries:

[DevScout] Detected modification in: vulnerable_app.py
[DevScout] Analyzing code quality and security...

🔔 DevScout Alert & Suggestions:
================================================================================
CRITICAL ISSUES DETECTED:

1. SQL Injection Vulnerability:
You are interpolating the `user_id` parameter directly into the SQL string. 
This allows for malicious SQL injection attacks. Use parameterized queries instead.

2. Resource Leak:
The SQLite database connection (`conn`) is not being closed after execution. Use a context manager.

Recommended Refactoring:

```python
import sqlite3

def get_user_data(user_id):
    query = "SELECT * FROM users WHERE id = ?"
    with sqlite3.connect("database.db") as conn:
        cursor = conn.cursor()
        cursor.execute(query, (user_id,))
        return cursor.fetchall()
```
================================================================================

Now, rewrite vulnerable_app.py using the suggested refactored code and save it again. You’ll see the proactive assistant check the new diff and politely output:

🟢 DevScout: Everything looks great!

Why Running Local AI is the Real Developer Win

While tech giants are trying to convince us that we need massive server farms to run helpful AI features, our custom implementation proves otherwise. Running a local, proactive assistant offers several key advantages for developers:

1. True Data Privacy

When you use local models via Ollama, zero data leaves your machine. You can confidently run this agent inside proprietary enterprise repositories, financial codebases, or sensitive client applications without violating NDAs or compliance policies.

2. Zero Cost and Unlimited Tokens

There are no API keys, monthly subscriptions, or rate limits. You can run millions of evaluation loops, build deep retrieval pipelines (RAG), and process massive directories without ever seeing an API bill.

3. Zero Latency

Because there is no network round-trip time, a local model running on Apple Silicon (M-series) or modern NVIDIA GPUs can run inference almost instantly. This tight feedback loop is exactly what makes tools feel "ambient" rather than intrusive.

Conclusion: The Future of Developer Tools is Open

While Microsoft's Scout aims to capture consumers by integrating deeply into the Windows ecosystem, developers have the unique skill set to build and customize our own environments. By combining open-source building blocks like Ollama, LangChain, and Python file watchers, we can construct custom agents that fit our specific workflows perfectly—without compromising our privacy.

The code we wrote today is just the baseline. You can easily expand DevScout to read compile errors from your terminal logs, parse your Git diffs to auto-generate pull request descriptions, or integrate with a vector database (like Chroma or Qdrant) to index your entire codebase for semantic search.

How do you feel about the push towards deeply integrated, proactive AI assistants? Would you trust an out-of-the-box system like Microsoft's Scout, or do you prefer the control of building your own local tooling? Let me know in the comments below!

If you enjoyed this tutorial and want more deep dives into open-source AI, system design, and practical developer tools, don't forget to subscribe to the "Coding with Alex" RSS feed or drop your email below!