tech/architecture

ARCHITECTURE

Code architecture AI expert.

production Any — outputs Markdown, Mermaid, JSON
improves: tech

Code Architecture AI

Good architecture decisions are hard to make alone and expensive to reverse. This skill gives engineers an AI partner that knows architecture patterns, trade-off frameworks, and documentation standards — producing first-draft ADRs, identifying anti-patterns, and generating C4 diagrams from verbal descriptions so engineers focus on the judgment calls that require domain expertise.

Whether you are a solo founder choosing a database, a staff engineer reviewing a proposed microservices split, or a CTO documenting the decisions that led to your current stack — the Code Architecture AI structures the conversation and does the heavy documentation lifting.

Pairing by Role

RolePrimary Use
Software EngineerAI reviews proposed design against known patterns, generates trade-off analysis, and drafts the ADR for team review
Staff / Principal EngAI generates C4 context and container diagrams from system descriptions, identifies missing NFRs, spots anti-patterns
Engineering ManagerAI produces architecture decision records backlog, documents existing system decisions, surfaces technical debt patterns
CTO / ArchitectAI structures RFC documents, generates migration roadmaps, produces board-level system topology summaries

The AI does not make architecture decisions — engineers do. The AI prepares the analysis, generates the documentation, and structures the trade-offs. The human applies judgment, organisational context, and technical intuition.

ADR Generation (MADR Format)

Describe the decision context and constraints — the AI produces a complete MADR-format Architecture Decision Record ready for team review and version control.

# ADR-001: Use Event Sourcing for Order Management

## Status
Accepted

## Context
Our order management system needs to support audit trails, temporal queries,
and eventual consistency between inventory, payments, and fulfilment services.
Current approach uses direct DB updates with no event history.

## Decision
Implement event sourcing using Cloudflare Queues + Durable Objects for the
order aggregate. Events are the source of truth; current state is a projection.

## Consequences
### Positive
- Complete audit trail of all order state changes
- Temporal queries ("what was the order state at T?") become trivial
- Event replay enables recovery and new projection creation

### Negative
- Increased complexity in query patterns (CQRS required)
- Event schema evolution requires versioning strategy
- Higher initial development effort (~2 sprint overhead)

## Alternatives Considered
- Change Data Capture (CDC): rejected — tighter DB coupling
- Audit log table: rejected — doesn't enable replay or projections

C4 System Context Diagram (Mermaid)

Describe your system's actors and external dependencies — the AI generates a Mermaid C4 context diagram that renders directly in GitHub, Notion, or any Mermaid-compatible renderer.

C4Context
  title System Context — Order Platform
  Person(customer, "Customer", "Places and tracks orders")
  System(orderSystem, "Order Platform", "Manages the full order lifecycle")
  System_Ext(paymentGateway, "Payment Gateway", "Stripe / PayFast")
  System_Ext(warehouseWMS, "WMS", "Inventory and fulfilment")
  System_Ext(notificationSvc, "Notifications", "Email / SMS / Push")

  Rel(customer, orderSystem, "Places order, tracks status", "HTTPS")
  Rel(orderSystem, paymentGateway, "Authorises payment", "REST/TLS")
  Rel(orderSystem, warehouseWMS, "Sends pick instruction", "Event")
  Rel(orderSystem, notificationSvc, "Triggers notifications", "Event")

Trade-off Matrix

State the architectural choice — the AI generates a structured trade-off matrix covering the dimensions that matter most for your context:

| Dimension                   | Microservices         | Modular Monolith       | Decision                           |
|-----------------------------|----------------------|------------------------|------------------------------------|
| Development speed (early)   | Slower               | Faster                 | Monolith wins at <10 devs         |
| Independent deployment      | Yes                  | No                     | Microservices if CI/CD is mature  |
| Operational complexity      | High                 | Low                    | Monolith until scale demands      |
| Team autonomy               | Yes                  | Coupling risk          | Microservices with strong ownership|
| Data consistency            | Distributed          | ACID                   | Monolith for strong consistency   |

Anti-pattern Detection

Describe your system's current structure — the AI identifies known anti-patterns with specific indicators and remediation recommendations:

## Distributed Monolith (HIGH RISK)
Indicator: 12 microservices with synchronous HTTP calls between all of them.
           Any single service failure cascades to full system outage.
Remediation: Introduce async messaging (queues/events) for non-latency-critical
             paths. Add circuit breakers on synchronous calls. Consider merging
             services that are always deployed together.

## Chatty Microservices (MEDIUM RISK)
Indicator: Product page requires 8 sequential API calls to render.
           p95 latency = sum of all downstream p95s.
Remediation: Introduce a BFF (Backend for Frontend) that aggregates calls
             server-side. Or use GraphQL with dataloader batching.

## God Service (LOW-MEDIUM RISK)
Indicator: OrderService owns orders, inventory, notifications, and reporting.
           Single team bottleneck for all product changes.
Remediation: Extract NotificationService. Extract ReportingService.
             Keep OrderService focused on the order aggregate lifecycle.

MCP Tools

ToolDescription
eng_generate_adrGenerate a complete MADR or Nygard-format ADR from decision context and constraints
eng_review_architectureReview a system description against known patterns and identify missing concerns
eng_c4_diagramGenerate C4 context, container, or component diagrams in Mermaid syntax
eng_tradeoff_analysisProduce a structured trade-off matrix for a given architectural choice
eng_detect_antipatternsIdentify architectural anti-patterns from a system description with remediation guidance
eng_db_selectionRecommend database technology based on workload, consistency, and scale requirements
eng_migration_planGenerate a migration roadmap with approach, phases, risk profile, and rollback strategy
eng_nfr_analysisStructure non-functional requirements (availability, latency, throughput, security) for a system
eng_pattern_recommendRecommend architecture patterns matching a described problem with implementation notes

Tool Schema: eng_generate_adr

{
  "name": "eng_generate_adr",
  "description": "Generate a complete Architecture Decision Record in MADR format",
  "inputSchema": {
    "type": "object",
    "required": ["title", "context", "decision"],
    "properties": {
      "title":        { "type": "string", "description": "Short decision title" },
      "context":      { "type": "string", "description": "Problem statement and constraints" },
      "decision":     { "type": "string", "description": "The chosen approach" },
      "alternatives": { "type": "array",  "items": { "type": "string" }, "description": "Other options considered" },
      "format":       { "type": "string", "enum": ["madr", "nygard"], "default": "madr" },
      "adr_number":   { "type": "integer", "description": "ADR sequence number" }
    }
  }
}

Architecture Patterns Reference

Core patterns the AI knows in detail — problem, solution, and key trade-offs:

PatternProblemSolutionTrade-offs
Event SourcingNeed audit trail and temporal queriesStore events as source of truth; derive state via projection+ Auditability, replay. - Query complexity, event schema evolution
CQRSRead and write models have different shape/scale needsSeparate command (write) and query (read) models and data stores+ Independent scaling. - Eventual consistency, sync overhead
SagaDistributed transactions across servicesChoreography or orchestration of compensating transactions+ No 2PC lock. - Complex failure handling, difficult to debug
OutboxAtomic DB write + event publish (dual-write problem)Write event to outbox table in same transaction; relay polls and publishes+ At-least-once delivery. - Polling latency, relay is a new component
Circuit BreakerCascading failures from unhealthy downstream dependencyTrack failure rate; open circuit to fail fast instead of queuing+ Resilience, fast failure. - False positives, state management
API GatewayClients need a single entry point to multiple servicesGateway handles routing, auth, rate limiting, and protocol translation+ Centralised concerns. - Single point of failure, latency hop
BFFMobile and web need different API shapes from same backendBackend for Frontend aggregates and shapes data per client type+ Client-optimised responses. - Duplicate logic risk if not abstracted
SidecarCross-cutting concerns (logging, mTLS, tracing) pollute service codeDeploy proxy container alongside service; handles cross-cutting concerns+ Language-agnostic. - Resource overhead, network hop within pod
Strangler FigRewriting a legacy system without big-bang cutoverProxy intercepts traffic; gradually route paths to new system+ Low risk migration. - Long-running parallel systems, proxy complexity
BulkheadOne slow consumer exhausts all thread pool / connection resourcesIsolate resource pools per consumer category to contain failure+ Blast radius containment. - Resource underutilisation if pools idle
Retry with BackoffTransient failures cause permanent errors without retry logicExponential backoff with jitter; retry budget to prevent thundering herd+ Resilience to transient faults. - Amplified load if misconfigured
Rate LimiterUncontrolled request volume degrades or crashes the serviceToken bucket or sliding window algorithm enforces request rate per client+ Abuse protection, fair use. - Legitimate traffic throttled under burst

Common Gotchas

See Also