AI-Analyze: Five Non-Obvious Design Decisions

🇨🇳 中文版

Most articles about code analysis tools talk about “how many languages they support, how many rules they have, how many vulnerabilities they can detect.” ai-analyze supports 7 languages, 12 security rules, and 4-dimensional quality scoring — these feature numbers are right there in the README. This article doesn’t rehash the feature list. Instead, it explores five easily overlooked yet deeply worthwhile design decisions — each with a “why not” story behind it.

Decision 1: MCP Dual-Strategy Is Not About Performance — It’s About Security Boundaries

ai-analyze implements two MCP integration strategies. On the surface, calling Serena’s Python API directly (in-process import) is faster and simpler, while the protocol-compliant stdio client (JSON-RPC 2.0 over subprocess communication) is slower and more cumbersome. It seems like a performance decision.

But the real key difference isn’t performance — it’s the security boundary. The direct-call strategy exposes all of Serena’s tools — including rename_symbol and replace_symbol_body, which can modify source code. Your code analyzer can rename symbols and replace function bodies, meaning a single misjudged analysis result could directly corrupt your code. The stdio client deliberately does not implement these two modification operations. The reason: the subprocess boundary naturally isolates write operations. stdin/stdout pipes can only transmit query requests and return results — they cannot modify the filesystem across processes. This is architecture constraining capability — not “I don’t want to implement this,” but “I shouldn’t let you do this.”


class StdioMCPClient:
    """MCP JSON-RPC 2.0 over stdio transport"""

    async def connect(self):
        self.process = await asyncio.create_subprocess_exec(
            *self.server_command,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
        )

    async def send_request(self, method, params=None):
        request = {
            "jsonrpc": "2.0",
            "id": self._next_id,
            "method": method,
            "params": params or {}
        }
        self.process.stdin.write(json.dumps(request).encode() + b"\n")
        await self.process.stdin.drain()
        response = await self.process.stdout.readline()
        return json.loads(response.decode())

The core judgment behind the two coexisting strategies: is your MCP service for yourself, or for others? For yourself, direct calls are more efficient and you can control yourself not to do dangerous operations. For others, protocol compliance is mandatory — you can’t assume other people’s agents will correctly use modification capabilities. Constraining them through architectural boundaries is more reliable than constraining them through documentation.

Decision 2: The Merge Layer Uses AST as the Backbone — Serena Is Just Supplementary Fields

Serena provides symbol structure — class inheritance, function call chains, cross-file reference relationships. The AST analyzer provides complexity scoring, code smell detection, parameter analysis, and async/static annotations. When merging the two data sources, which one takes the lead?

Intuition might pick Serena — symbol structure is the skeleton of code, while complexity and smells are attributes. But ai-analyze’s UnifiedAnalyzer chose AST as the backbone. The reason: AST produces actionable metrics — “this function’s cognitive complexity is 15, exceeding the threshold of 10” is a finding you can act on immediately. “This class is referenced by 3 modules” is structural information, but it can’t tell you what to do. The merge logic loops primarily through AST’s file list, attaching Serena’s symbol data as supplementary attributes.

Even more interesting: the UnifiedSymbol dataclass has a serena_data field, but the merge code never fills it. This isn’t a bug — it’s an architectural placeholder. The current pipeline can already produce useful analysis reports using AST data alone; Serena data is a future enhancement point that doesn’t block current functionality. Zero runtime overhead for architectural foresight.

The design philosophy this decision conveys: the main loop of data merging should revolve around actionable dimensions, not structural ones. Structural information provides context; metric information drives decisions.

Decision 3: The AI Prompt Includes an “Anti-Hype” Instruction

ai-analyze has three DeepSeek AI calls: code quality assessment, Docker strategy suggestions, and framework upgrade analysis. Every call’s prompt embeds structured facts from AST analysis — Top 5 complex functions, Top 10 code smells, Top 20 dependencies. The AI isn’t asked to discover complexity problems; it’s handed confirmed facts to interpret. This “rules-first, AI-interprets” pattern reduces hallucination risk — the AI can’t claim “this project has no complexity issues” because the prompt explicitly lists 5 functions with complexity above 10.

But the least obvious design decision lies in the framework upgrade analysis prompt. There’s one explicit instruction:

> “If the current version is already the latest stable release, state that clearly — don’t push for an upgrade.”

This instruction counters a fundamental LLM tendency: always suggesting changes. LLM training data is full of “upgrade to the latest version” and “migrate to the new framework” suggestions because technical articles and Stack Overflow answers naturally lean toward recommending change. But in real-world scenarios, a React 18 project doesn’t need to upgrade to React 19 RC, and a Python 3.11 project doesn’t need to upgrade to 3.12. Without this instruction, the AI will almost certainly recommend an upgrade — even when the upgrade has no tangible benefit, even when it introduces compatibility risks.

This follows the same thinking as the temperature gradient across the three AI calls: code quality assessment uses temperature=0.3 (suggestions can have some creativity), Docker strategy and framework upgrades use temperature=0.2 (output must be conservative). The temperature gradient maps to consequence severity — a bad quality suggestion is merely annoying, a bad Docker configuration causes deployment failures, and a bad upgrade recommendation causes production incidents. The more severe the consequences, the lower the randomness.

Decision 4: Security Dimension Weight Is Only 0.15 — Because It’s a “Weak Metric”

The quality scoring formula: overall = complexity × 0.25 + maintainability × 0.35 + reliability × 0.25 + security × 0.15

Why is maintainability weighted highest (0.35)? Because it’s the long-term cost driver — unmaintainable code gets harder to modify every time, with technical debt growing exponentially. Complexity is a “current state” metric, reliability is a “risk” metric, and maintainability is a “trend” metric. Trends predict future costs, so they deserve the highest weight.

But the more interesting question is why security is weighted lowest (0.15). The security score is calculated as max(0, 100 - code_smells // 2 * 10) — dividing the number of code smells by 2 and using it as a rough proxy for security issues. The code author acknowledged this in a comment: “should actually use a dedicated security analysis tool.” The security score is an immature metric. Weighting it at 0.25 would let this rough proxy dominate the overall score, producing unreliable results. Giving a weak metric a low weight is more honest than giving it a high weight — you don’t want a denominator-divided-by-two approximation deciding whether a project can be deployed.

The security scanner itself uses 12 regex-matched rules, with risk scoring using severity-weighted geometric weighting (INFO=0.1, LOW=0.25, MEDIUM=0.5, HIGH=0.75, CRITICAL=1.0) plus a saturation curve (10 CRITICAL findings = max score of 100, more than that doesn’t increase). Geometric weighting prevents the gaming strategy of “fixing a bunch of LOW-level issues to inflate the score,” while the saturation curve prevents large legacy projects from getting inflated scores just because they have many findings. These designs make the security scanner itself quite reliable, but the security dimension’s scoring formula is a rough proxy — so the weight has to be conservative.

Decision 5: Cache Isn’t “Read From the Fastest Layer” — It’s “Auto-Migrate to Faster Layers on Read”

The standard pattern for a three-level cache system (memory → file → Redis) is “write to all layers on write, read from the fastest layer on read.” ai-analyze adds an extra behavior: read-backfill. When a read hits L2 (file cache), the system copies that data to L1 (memory). When it hits L3 (Redis), it backfills to both L1 and L2.

This means the cache auto-warms up according to usage patterns. The first analysis goes through the full L3→L2→L1 chain; the second time, after hitting L2, it copies to L1; the third time, it hits L1 directly. No warmup steps needed, no manual migration — the cache naturally flows toward the faster layers with use.

Redis also has an “absorptive” fault-tolerance design: when the connection fails, _client is set to None, and every subsequent operation checks if not self._client: return None. If Redis completely goes down, no exceptions are thrown — the system silently degrades to L1+L2 and continues running. This doesn’t mean “Redis is optional” — it means “Redis failure should not block the analysis pipeline.”

The incremental analyzer has an additional layer of optimization: when merging cached results with new analysis results, it keeps deserialized Python objects in a memory-based _file_result_cache dictionary. The MultiLevelCache stores serialized JSON, but the merge phase needs objects. Deserialization is an expensive operation — keeping object state in memory avoids the overhead of repeatedly reading from cache and deserializing. This is a third, unlabeled cache layer, existing specifically for the merge phase.

The common thread across these five decisions: every “why not” is more worth telling than the “how.” Don’t expose modification operations (architectural constraint); don’t center on structure (actionability first); don’t let AI push for upgrades (counteracting innate tendencies); don’t give weak metrics high weight (honesty over precision); don’t let caches wait passively (proactive warming). These decisions aren’t code implementation details — they’re expressions of design philosophy. They determine that ai-analyze isn’t just another analysis tool, but a system with genuine engineering judgment.

Source Code Navigation

Module	Source File	Description
MCP Protocol Client	serena_stdio_client.py	Deliberately excludes modification operations for security boundary
MCP Direct Call Client	serena_client.py	In-process full-feature calls, performance-first
Unified Merge Layer	unified_analyzer.py	AST as backbone, Serena as supplement, serena_data placeholder
DeepSeek AI Integration	ai_enhanced_analyzer.py	Rules-first prompts, anti-hype instructions, temperature gradient
Quality Scoring	quality_score.py	Maintainability 0.35 highest, Security 0.15 low-weight honest proxy
Security Scanner	security_scanner.py	Severity-weighted geometric scoring + saturation curve
Three-Layer Cache	multi_level_cache.py	Read-backfill auto-warming, Redis absorptive fault tolerance
Incremental Analysis	incremental_analyzer.py	MD5 change detection + file cache progressive migration + memory object layer
AST Analyzer	ast_analyzer.py	Single tree traversal replacing 3 traversals, longest-task-first scheduling
Pipeline Orchestration	full_analyzer.py	Skip flags as cost-control grid
Docker Generation	docker_generator.py	Content detection over structure detection, Husky removal practical details
Plugin System	plugin_system.py	shared_data pipeline, namespace collision prevention
Exception System	exceptions.py	error_code + context dict, parseable log format

Project repository: https://github.com/erishen/ai-analyze