Beyond Chatbots: How to Build Domain-Specific AI Agents with Claude's Tool Use

We’ve all seen the headlines. Anthropic recently published a fascinating paper on "Making Claude a Chemist," detailing how they coaxed their frontier model into successfully navigating complex chemical synthesis, molecular design, and scientific literature. It’s an impressive feat for the scientific community, but as software engineers, DevOps practitioners, and system architects, it raises a much larger, highly practical question: How do we take a general-purpose LLM and turn it into a highly specialized, domain-specific expert for our own production systems?

Whether you need an AI agent that understands the nuances of your proprietary codebase, an automated SRE that can debug Kubernetes clusters, or a financial analyst that parses complex tax codes, the blueprint is the same. It’s not about retraining a multi-billion-parameter model from scratch. Instead, it’s about mastering Tool Use (Function Calling), Retrieval-Augmented Generation (RAG), and stateful agentic workflows.

Today, we’re going to look under the hood of how domain-specific AI works. We’ll build a production-ready agentic workflow in Python using Claude’s Tool Use API, simulating a "DevOps Cloud Architect" capable of analyzing live infrastructure, querying APIs, and safely recommending architectural fixes.

The Architecture of Domain Expertise

To make an LLM like Claude 3.5 Sonnet act like a specialized professional, you cannot rely solely on the system prompt. If you tell Claude, "You are a world-class AWS Cloud Architect," it will do its best based on its training data up to its knowledge cutoff. But it cannot see your actual AWS environment, it doesn't know your company's naming conventions, and it can't check if a subnet has available IP addresses.

To bridge this gap, we must build an agent loop. Here is the conceptual architecture of a domain-specific agent:


+-------------------------------------------------------------+
|                         User Input                          |
+------------------------------+------------------------------+
                               |
                               v
+------------------------------+------------------------------+
|                Orchestrator (Claude 3.5 Sonnet)             |
|  - System Prompt (Rules, Guardrails, Domain Knowledge)       |
|  - Available Tools (Definitions & JSON Schemas)             |
+------------------------------+------------------------------+
                               |
            +------------------+------------------+
            | Decides to call a tool              | Returns final answer
            v                                     v
+-----------+-----------+               +---------+-----------+
|  Tool Execution Engine|               |     User Output     |
|  - Query Database     |               +---------------------+
|  - Run CLI Command    |
|  - Call Internal API  |
+-----------+-----------+
            |
            | Sends execution results back to model
            v
+-----------+-----------+
|   Claude processes    |
|   tool output         |
+-----------------------+

In this loop, the LLM acts as the central processing unit (CPU), deciding which "peripherals" (tools) to trigger based on the user's intent. Let's look at how we can implement this pattern deterministically.

Setting Up Claude's Tool Use API

To follow along, make sure you have the official Anthropic SDK installed in your environment:

pip install anthropic

Let's write a script that defines two custom tools for our DevOps Cloud Architect Agent: one to fetch current VPC configuration, and another to estimate AWS costs for proposed resources. By giving the model access to these mock APIs, we elevate it from a generic copy-paster of Terraform templates to a context-aware systems engineer.

Step 1: Define the Domain Tools

First, we define our Python functions that represent the "real-world" actions our agent can take, along with the JSON schemas that explain these tools to Claude.

import os
from anthropic import Anthropic

# Initialize the client. Ensure ANTHROPIC_API_KEY is in your environment variables.
client = Anthropic()

# Mock database of our current cloud infrastructure
MOCK_VPC_DATABASE = {
    "vpc-0a1b2c3d": {
        "cidr_block": "10.0.0.0/16",
        "region": "us-west-2",
        "subnets": [
            {"id": "subnet-111", "az": "us-west-2a", "available_ips": 12},
            {"id": "subnet-222", "az": "us-west-2b", "available_ips": 250}
        ]
    }
}

def get_vpc_details(vpc_id: str) -> dict:
    """Retrieve live configuration data for a specific VPC."""
    print(f"\n[SYSTEM] Executing tool: get_vpc_details for {vpc_id}")
    return MOCK_VPC_DATABASE.get(vpc_id, {"error": "VPC not found"})

def calculate_monthly_ec2_cost(instance_type: str, quantity: int) -> dict:
    """Calculate the estimated monthly cost for EC2 instances."""
    print(f"\n[SYSTEM] Executing tool: calculate_monthly_ec2_cost for {quantity}x {instance_type}")
    prices = {"t3.medium": 30.12, "m5.large": 70.08, "c5.xlarge": 124.10}
    unit_price = prices.get(instance_type, 50.00) # Default fallback
    total = unit_price * quantity
    return {"instance_type": instance_type, "quantity": quantity, "estimated_monthly_cost_usd": total}

Step 2: Declare the Tools to Claude

Now, we describe these tools to Claude using Anthropic’s structured JSON format. This tells the LLM exactly what arguments our functions accept and what types they expect.

tools = [
    {
        "name": "get_vpc_details",
        "description": "Retrieves real-time infrastructure metrics and subnets for a specified AWS VPC ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "vpc_id": {
                    "type": "string",
                    "description": "The AWS VPC ID, starting with vpc-"
                }
            },
            "required": ["vpc_id"]
        }
    },
    {
        "name": "calculate_monthly_ec2_cost",
        "description": "Calculates estimated pricing for runing EC2 instances monthly based on instance type and scale.",
        "input_schema": {
            "type": "object",
            "properties": {
                "instance_type": {
                    "type": "string",
                    "description": "The AWS EC2 instance type (e.g., t3.medium, m5.large)"
                },
                "quantity": {
                    "type": "integer",
                    "description": "Number of instances to provision"
                }
            },
            "required": ["instance_type", "quantity"]
        }
    }
]

The Agentic Execution Loop

The magic happens when we manage the conversational turn state. When Claude decides to use a tool, it stops generating text and returns a response containing a tool_use block. Our application code must intercept this, execute the actual Python function, and feed the results back to Claude to let it formulate its final, expert response.

def run_devops_agent(user_prompt: str):
    print(f"User Question: {user_prompt}\n")
    
    # Establish the domain context in the system prompt
    system_prompt = (
        "You are an expert DevOps and AWS Cloud Architect. Help the user optimize, "
        "debug, and plan their infrastructure using the tools provided. Always base "
        "your advice on actual tool outputs when available."
    )

    # First turn: Send user request and tool definitions to Claude
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        system=system_prompt,
        tools=tools,
        messages=[{"role": "user", "content": user_prompt}]
    )

    # Initialize message history to maintain context
    message_history = [
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": response.content}
    ]

    # Check if Claude wants to use a tool
    if response.stop_reason == "tool_use":
        tool_results = []
        
        for content_block in response.content:
            if content_block.type == "tool_use":
                tool_name = content_block.name
                tool_input = content_block.input
                tool_use_id = content_block.id
                
                # Route to the correct local Python function
                if tool_name == "get_vpc_details":
                    result = get_vpc_details(tool_input.get("vpc_id"))
                elif tool_name == "calculate_monthly_ec2_cost":
                    result = calculate_monthly_ec2_cost(
                        tool_input.get("instance_type"), 
                        tool_input.get("quantity")
                    )
                else:
                    result = {"error": "Unknown tool"}

                # Format the tool output for Claude
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_use_id,
                    "content": str(result)
                })

        # Second turn: Send the execution results back to Claude
        message_history.append({"role": "user", "content": tool_results})
        
        final_response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2000,
            system=system_prompt,
            tools=tools,
            messages=message_history
        )
        
        print("\n[CLAUDE'S ANALYSIS]:")
        print(final_response.content[0].text)
    else:
        # If Claude didn't need any tools to answer
        print("\n[CLAUDE'S ANALYSIS]:")
        print(response.content[0].text)

# Execute the agent
run_devops_agent(
    "Can you check if we can safely deploy 3 new m5.large instances in VPC vpc-0a1b2c3d? "
    "And how much will that add to our monthly bill?"
)

Why This Design Pattern Wins

When you run the code above, watch your console. You’ll see Claude parse the user’s request, determine that it cannot answer without real-world data, call get_vpc_details to verify subnet capacity, call calculate_monthly_ec2_cost to estimate prices, and then write a highly tailored report based on those specific tool outputs.

This approach solves the three biggest issues developers face when implementing generative AI in production:

Hallucination Control: By grounding Claude’s reasoning in deterministic, real-world data returned from API calls, we dramatically reduce its tendency to make up configuration specifications or IP ranges.
Up-to-Date Systems: Traditional models are frozen in time based on their training run. Connecting tools to live endpoints (like Kubernetes clusters, Datadog APIs, or internal databases) ensures the agent works with real-time state.
Auditability and Security: Because the execution of the Python function happens in your environment (not on Anthropic's servers), you have complete control over read/write permissions, database credentials, and execution boundaries.

Taking it to Production: Crucial Developer Practices

Building a prototype in a local script is easy. Deploying a domain-expert agent to production requires robust engineering. Here are the rules of thumb to keep your agents stable and secure:

1. Enforce strict JSON schemas

If your tool definition schemas are vague, the LLM might hallucinate arguments. Use libraries like Pydantic in Python to automatically generate clean, strict JSON schemas from typed code classes, reducing parsing errors down to zero.

2. Design for Read-Only Defaults (The Least Privilege Principle)

If you build an agent that connects to AWS or your production databases, start with strictly read-only tools. If your agent only needs to analyze logs, do not give it a tool that can run kubectl delete pod. If write-access tools are necessary (e.g., automated deployments), require a manual human-in-the-loop confirmation step before executing the final action block.

3. Manage Token Budgets with Context Windows

Agentic loops can quickly consume tokens if you pass massive database dumps back to the model. Always paginate API responses and summarize large payload outputs inside your tools before sending them back to Claude's context window.

Conclusion

You don't need to train a customized language model from scratch to build domain-specific intelligence. By leveraging Claude's advanced reasoning capabilities alongside structured Tool Use, you can create resilient, specialized engineering agents that seamlessly interface with your existing internal systems, databases, and APIs.

The "AI Chemist" of tomorrow isn't just a model with specialized weights; it's a general-purpose model with the right set of tools, guardrails, and context. What tasks in your current deployment or development workflow could you automate by building a specialized agent today?

Have you experimented with Claude's Tool Use API or built an agentic workflow in your DevOps stack? Let's discuss in the comments below! Don't forget to subscribe to "Coding with Alex" for weekly deep-dives into cloud infrastructure, AI engineering, and modern web development.