Building a Tool Server for Your AI Assistant: Lobster’s Architecture

🇨🇳 中文版

AI assistants can talk, but they can’t do things. Everyone knows this problem. OpenAI provided a solution with the Function Calling protocol — letting LLMs recognize which tools to call during a conversation and having an external service execute them. But the protocol only solves half the problem: who provides these tools? Who executes them? Who guarantees safety?

I built a tool server for my AI assistant (OpenClaw, a.k.a. “Lobster”) called Lobster. It runs on a cloud server, and Lobster calls Lobster’s API through Function Calling, transforming from “just a talker” to “someone who gets things done.” This article isn’t about Lobster’s specific tool list — it’s about the architectural patterns: the ToolRegistry pattern, Function Calling bridging, security layering, observability, and plugin extension. You can reuse these patterns to build a tool server for any AI assistant.

Problem 1: How Does the AI “See” Your Tools?

Before an AI assistant can call a tool, it needs to know what tools are available and what parameters each requires. OpenAI defined the Function Calling format — a JSON Schema containing the tool name, description, and parameter definitions. But your tools are Python functions, CLI commands, HTTP requests — all in different formats. How do you unify them into a standard format the AI can understand?

Lobster’s approach is the ToolRegistry. The core is simple: when registering each tool, you provide four things — name, description, parameter definition (JSON Schema), and execution function. Once registered, the ToolRegistry automatically converts all tools into OpenAI Function Calling format.


registry.register(Tool(
    name="stock_quote",
    description="Get real-time stock quote",
    parameters={
        "type": "object",
        "properties": {
            "code": {"type": "string", "description": "Stock code (e.g. sh600519, sz000001, hk00700)"},
        },
        "required": ["code"]
    },
    handler=lambda code: investment_tools.get_stock_quote(code),
    category="investment"
))

After registration, the AI assistant gets the tool list with a single HTTP request:


curl http://localhost:8000/tools/openai

The response is standard Function Calling JSON that can be fed directly to any LLM supporting function calls — GPT, Claude, DeepSeek, local Ollama models, all of them.


{
  "type": "function",
  "function": {
    "name": "stock_quote",
    "description": "Get real-time stock quote",
    "parameters": {
      "type": "object",
      "properties": {
        "code": {"type": "string", "description": "Stock code (e.g. sh600519, sz000001, hk00700)"}
      },
      "required": ["code"]
    }
  }
}

The beauty of this pattern: your tool implementation is a Python function, but the format exposed to the AI is standardized. Adding a new tool is just one register() call — no format conversion worries. The ToolRegistry handles mapping from registration info to Function Calling JSON; you just focus on the business logic.

Problem 2: How Does the AI “Call” Your Tools?

Once the AI assistant knows the tool list and decides during conversation that it needs stock data, it returns a function call instruction:


{"name": "stock_quote", "arguments": {"code": "sh600519"}}

Your server needs to receive this instruction, find the corresponding handler, execute it, and return the result. Lobster uses FastAPI to build an API server where each registered tool is automatically mapped to an execution endpoint:


curl -X POST http://localhost:8000/tools/stock_quote/execute \
  -H "Content-Type: application/json" \
  -d '{"code": "sh600519"}'

The ToolRegistry does three things upon receiving a request: looks up the registered handler, executes it, and returns the result. Lookup is O(1) by name index, execution is a direct Python function call, and the return is in unified JSON format.

The complete interaction chain: LLM identifies intent → generates function call → Lobster API receives it → ToolRegistry looks up and executes → result returned to LLM → LLM organizes a response using real data. Throughout the process, the user only converses with the AI assistant — tool calls are transparent.

A design choice here: your API server could also expose just a single unified /execute endpoint without routing by tool name. Lobster chose per-tool-name routing (/tools/{tool_name}/execute) because it makes per-tool API call logs cleaner — debugging instantly reveals which tool was called. If you don’t have many tools, a unified endpoint works fine too.

Problem 3: Letting AI Do Things Without Letting It Do Anything Reckless

This is the most easily overlooked problem. Your AI assistant can now read/write files, execute commands, make HTTP requests — which means it can also accidentally delete files, run dangerous commands, or request internal network addresses. When giving AI execution capabilities on a cloud server, the security boundary is non-negotiable.

Lobster adds a security validation layer before every tool execution, across three dimensions:

Path Security. File operation tools like file_read and file_write can only access allowed directories. By default, they’re limited to the current working directory. You can extend the allowed scope through configuration, but the AI cannot escape the boundary to read /etc/passwd or write /root/.bashrc.

Command Security. The run_shell tool has a blocklist: rm -rf /, mkfs, dd if=, fork bombs, chmod 777 /, wget — these dangerous commands are intercepted directly. You can also configure an allowlist mode that only permits pre-approved safe commands.

URL Security. http_get and http_post only allow http/https protocols and block localhost, 127.0.0.1, and 0.0.0.0 — preventing the AI from being tricked into requesting internal services on your server (SSRF protection).

There’s also a special check for Python code: the run_python tool scans for five high-risk patterns before execution — os.system, subprocess, eval(), exec(), import — and refuses to execute if any match is found. The calculator tool uses ast.parse + operator mapping instead of eval(), allowing only arithmetic operations.

These validations aren’t decorative optional layers — they’re mandatory checkpoints for every tool execution. The more capable your AI assistant becomes, the more you need to constrain what it can do. Security validation and tool capability should grow together — for every new tool you add, think about how it could be abused, then add corresponding protections.

Problem 4: How Do You Know What the AI Called and How It Went?

Your tool server is running on the cloud — you’re not there watching it. Which tools did the AI call? Which one was the slowest? Which has the highest failure rate? You need to be able to see this information.

Lobster implements two layers of observability:

Execution Statistics. Every tool call is recorded to ~/.lobster/stats/, persisted across sessions. Statistics include: total call count, success/failure counts, average execution time, and recent call history. You can see the most-used tool rankings, the slowest calls, and the tools with the highest error rates. This data helps you decide which tools are worth optimizing and which might have issues.

Caching. The ToolRegistry has a built-in LRU cache with TTL expiration. The same stock code won’t trigger repeated requests to the Sina Finance API within one minute — because market data doesn’t change much in 60 seconds, and cache hit rates directly reduce external API call volume. The cache also tracks statistics: hit rate, cache size, and eviction count.

These two layers are complementary: statistics tell you macro trends (which tools are used most, which are slow), and caching gives you micro optimizations (reducing duplicate calls, lowering latency). If you’re building your own tool server, both layers are worth implementing — it doesn’t need to be complex. A JSON file for stats + an OrderedDict for caching is enough to get started.

Problem 5: How to Add New Tools

Your AI assistant’s needs will keep changing. Today it only needs stock quotes, tomorrow it might need to read emails, and the day after it might need to call your company’s internal API. The tool server can’t be a closed system — it needs a low-cost extension mechanism.

Lobster provides two layers of extension:

Registration Extension. Adding a new tool is a single line of registry.register(). You write a handler function in any Python module, call register, and the ToolRegistry automatically includes it in the Function Calling format. No need to modify API server code, no need to manually write JSON Schema — the parameter definitions provided at registration are automatically converted.

Plugin Extension. ~/.lobster/plugins/*/plugin.py is auto-loaded. You create a directory, put a plugin.py file in it, define your tools, and the next time Lobster starts, they’re automatically discovered and registered. A template generator helps you quickly create plugin skeletons.

The difference between these two layers: registration extension is for tools you develop yourself (you know where the code is, just import it directly). Plugin extension is for tools contributed by others or tools you want to isolate (independent directory, not coupled to the main code). Your project might only need registration extension, but retaining the plugin mechanism means you can add third-party tools with zero cost in the future.

About MCP

You may have noticed that Anthropic has introduced MCP (Model Context Protocol), which is becoming a new trend in tool integration. Cursor, VS Code, and various agent frameworks are adopting MCP. What’s the relationship with Function Calling?

MCP and Function Calling solve the same problem — letting AI call external tools. But the scenarios differ. MCP is designed for IDE and local tool integration — AI agents connect to local tool servers through the MCP protocol, suited for development scenarios. Function Calling is designed for cloud AI assistants — LLMs generate tool call instructions in the conversation flow, executed by your server, suited for assistants deployed in the cloud.

Lobster uses the Function Calling format because it runs on a cloud server, and Lobster is also in the cloud. That’s the right choice. But if you’re doing IDE integration or local agents, MCP is the more appropriate protocol.

The good news: the ToolRegistry pattern is protocol-agnostic. Your tool registration, security validation, caching, and statistics architecture layers don’t depend on the specific Function Calling format. Switching to MCP, you only need to change one layer — from Function Calling JSON to MCP Tool definition format — while all the handlers, security validation, caching, and statistics underneath are fully reusable. The architectural thinking is universal; the protocol format is replaceable.

Architecture Summary

Tying the five problems above together, Lobster’s architecture looks like this:

The Tool Registration Layer (ToolRegistry) centrally manages metadata and execution functions for all tools, automatically converting them into a standard format LLMs can understand. The API Server Layer (FastAPI) maps each registered tool to an HTTP endpoint, allowing the AI assistant to call them over the network. The Security Validation Layer performs three-dimensional checks (path, command, URL) before every execution, ensuring the AI can do things without doing reckless things. The Observability Layer records execution statistics and cache data, giving you control over every tool call the AI makes. The Extension Layer adds new tools through both registration and plugin mechanisms, ensuring the toolset can continuously grow.

These five layers are each independent — you can adopt them as needed. If you only need the minimal setup: one ToolRegistry + one FastAPI server is enough. Security validation and caching can be added later. The plugin system can be skipped. What matters first is getting the “AI call → tool execution → result return” chain working, then gradually hardening it.

Building a tool server for an AI assistant — the core isn’t the tools themselves, but the architecture. Your toolset will change — today it’s stocks and code analysis, tomorrow it might be email and calendar. But the ToolRegistry pattern, security layering, observability, and extension mechanisms — these architectural layers hold steady regardless of how your tools evolve.

Source Code Navigation

Architecture Layer	Source File	Description
Tool Registration	tools.py	ToolRegistry, 14 built-in tools, security validation
API Server	api_cmd.py	FastAPI endpoints, OpenAI Function Calling format
Security Validation	tools.py	validate_path / validate_command / validate_url / _check_python_dangerous
Execution Statistics	stats.py	ToolStatsTracker, persisted statistics
Caching	cache.py	LRU cache, TTL expiration, hit rate stats
Financial Tools	investment.py	stock_quote and investment tool registration
Serena Integration	serena_client.py	Code intelligence framework integration
Memory Store	memory_store.py	TF-IDF + Levenshtein fuzzy search
Plugin System	plugin.py	Auto-discovery and loading of custom plugins
Configuration	config.py	.env loading, LobsterConfig
LLM Client	llm_client.py	EnhancedLLMClient, conversation management, caching, retry

Project repository: https://github.com/erishen/lobster