AI Agent Communication Protocols

AI AGENT COMMUNICATION PROTOCOLS

>

production
requires: tech/architecture
improves: tech

AI Agent Communication Protocols

Multi-agent systems move AI from isolated question-answering to distributed work: one agent plans, another retrieves data, another executes code, another verifies output. Getting agents to communicate reliably requires agreed-upon protocols — the same reason HTTP exists for browsers and servers.

As of 2025, three open protocols have emerged as the dominant standards:

ProtocolOriginatorScopeStatus
MCP (Model Context Protocol)AnthropicModel ↔ tools/dataWidely adopted; de facto standard
A2A (Agent-to-Agent)GoogleAgent ↔ AgentOpen spec; growing adoption
ACP (Agent Communication Protocol)Linux Foundation / IBMAgent ↔ Agent (REST)Early adoption; enterprise focus

These protocols are complementary, not competing. MCP handles how an agent accesses tools and data. A2A and ACP handle how agents talk to each other.


MCP — Model Context Protocol

What It Is

MCP is an open standard (MIT licence) for connecting AI models to external context — tools, data sources, and prompts. Think of it as USB-C for AI: one protocol that any model can use to plug into any tool.

Published by Anthropic in November 2024. Adopted by: Claude, Cursor, Windsurf, Cline, Sourcegraph, Zed, and dozens of third-party servers.

Architecture

┌─────────────────────────────────┐
│         MCP HOST                │
│  (Claude Desktop, Cursor, etc.) │
│                                 │
│  ┌──────────────────────────┐   │
│  │      MCP CLIENT          │   │
│  │  (manages connections)   │   │
│  └──────┬───────────────────┘   │
└─────────│───────────────────────┘
          │  JSON-RPC 2.0
          │  (stdio / SSE / HTTP)
          ▼
┌─────────────────────┐
│     MCP SERVER      │
│  (your tool/data)   │
│                     │
│  ├── Resources      │  ← files, DB rows, API responses
│  ├── Tools          │  ← functions the model can call
│  ├── Prompts        │  ← reusable prompt templates
│  └── Sampling       │  ← server asks model to generate text
└─────────────────────┘

The Four MCP Primitives

Resources — Expose data the model can read:

{
  "uri": "file:///project/src/agents.ts",
  "mimeType": "text/typescript",
  "text": "export const AGENTS = [...]"
}

Tools — Functions the model can invoke:

{
  "name": "query_database",
  "description": "Run a SQL query against the data warehouse",
  "inputSchema": {
    "type": "object",
    "properties": {
      "sql": { "type": "string" },
      "database": { "type": "string", "enum": ["production", "analytics"] }
    },
    "required": ["sql"]
  }
}

Prompts — Reusable prompt templates with arguments:

{
  "name": "code_review",
  "arguments": [
    { "name": "language", "required": true },
    { "name": "focus", "required": false }
  ]
}

Sampling — Server requests a model completion (enables recursive agent patterns):

{
  "method": "sampling/createMessage",
  "params": {
    "messages": [...],
    "maxTokens": 1024
  }
}

Transport Options

TransportUse CaseNotes
stdioLocal processes; CLI toolsMost common for local MCP servers
HTTP + SSERemote servers; cloud-hosted toolsUse for production deployments
WebSocketBidirectional real-timeLower overhead than SSE for high-frequency calls

Building an MCP Server (TypeScript)

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';

const server = new Server(
  { name: 'skills-server', version: '1.0.0' },
  { capabilities: { tools: {}, resources: {} } }
);

// Register a tool
server.setRequestHandler('tools/call', async (request) => {
  if (request.params.name === 'resolve_skill') {
    const { path } = request.params.arguments;
    const content = await fetchSkill(path);
    return { content: [{ type: 'text', text: content }] };
  }
  throw new Error(`Unknown tool: ${request.params.name}`);
});

// Register a resource
server.setRequestHandler('resources/read', async (request) => {
  const skill = await fetchSkill(request.params.uri.replace('skill://', ''));
  return { contents: [{ uri: request.params.uri, text: skill }] };
});

const transport = new StdioServerTransport();
await server.connect(transport);

MCP Security Model


A2A — Agent-to-Agent Protocol

What It Is

A2A (published by Google, April 2025, open spec at google.github.io/A2A) defines how AI agents discover and communicate with other AI agents. Where MCP connects agents to tools, A2A connects agents to agents.

Core Concepts

Agent Card — A machine-readable capability manifest. Every A2A agent publishes one at /.well-known/agent.json:

{
  "name": "Grant — Financial Agent",
  "description": "Cash flow, P&L, runway, and South African tax compliance.",
  "url": "https://agents.2nth.ai/agents/grant",
  "version": "1.0.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": false
  },
  "skills": [
    {
      "id": "fin/cash-flow",
      "name": "Cash Flow Management",
      "description": "13-week rolling forecast, working capital, runway planning",
      "examples": ["What is my current runway?", "Build a 13-week cash flow forecast"]
    }
  ],
  "authentication": {
    "schemes": ["bearer"]
  }
}

Task — The unit of work in A2A:

{
  "id": "task-uuid-1234",
  "sessionId": "session-uuid-5678",
  "status": {
    "state": "working",
    "timestamp": "2025-04-01T10:00:00Z"
  },
  "message": {
    "role": "user",
    "parts": [{ "text": "Analyse our Q1 cash position and flag any risks." }]
  }
}

Task lifecycle:

submitted → working → [input-required] → working → completed
                                                  ↘ failed
                                                  ↘ cancelled

A2A Request/Response Flow

Client Agent (Katharine — CRO)
    │
    │  POST /tasks/send
    │  { message: "What is our current runway?", sessionId: "..." }
    ▼
Server Agent (Grant — CFO)
    │
    ├── Returns: { status: "working" }  (if streaming: SSE begins)
    │
    ├── SSE events during processing:
    │   data: { type: "working", artifact: { parts: [{ text: "Analysing..." }] } }
    │
    └── Final: { status: "completed", artifact: { parts: [{ text: "Runway: 8.3 months..." }] } }

Agent Discovery

Agents discover each other either:

  1. Static: Hard-coded agent URLs in an agent registry
  2. Dynamic: Search an Agent Directory (a catalogue of Agent Cards)
// Fetch an agent's capabilities before routing
const agentCard = await fetch('https://agents.2nth.ai/.well-known/agent.json').then(r => r.json());
const canHandleTask = agentCard.skills.some(s => s.id === requiredSkillId);

ACP — Agent Communication Protocol

What It Is

ACP (Linux Foundation AI & Data, 2025) is a REST-based protocol for agent-to-agent communication in enterprise environments. It is more prescriptive than A2A about message structure, error handling, and observability.

Key Differences from A2A

A2AACP
Message formatJSON with parts arrayStructured Message with typed attachments
DiscoveryAgent Cards at /.well-known/agent.jsonAgent Registry service
AuthOAuth 2.1OAuth 2.1 + API keys
StreamingSSESSE + WebSocket
FocusGeneral agent interopEnterprise, regulated industries

ACP Message Structure

{
  "message_id": "msg-uuid",
  "created_at": "2025-04-01T10:00:00Z",
  "sender": {
    "agent_id": "agent-katharine",
    "run_id": "run-uuid"
  },
  "content": [
    { "type": "text/plain", "data": "What is our Q1 pipeline conversion rate?" }
  ],
  "metadata": {
    "session_id": "session-uuid",
    "correlation_id": "corr-uuid"
  }
}

Agent Orchestration Topologies

Protocols define how agents communicate. Topology defines how they are arranged.

1. Sequential Chain

Each agent processes output from the prior agent. Simple, predictable.

User → Agent A → Agent B → Agent C → Response

Use when: Tasks have a clear linear dependency. Example: Research → Summarise → Format.

Risk: A failure at any step blocks the whole chain.

2. Supervisor / Orchestrator

One agent decomposes a task and routes subtasks to specialist agents. Collects and synthesises results.

         ┌─── Specialist A (data retrieval)
User → Orchestrator ─── Specialist B (analysis)     → Synthesised Response
         └─── Specialist C (formatting)

Use when: Tasks require multiple independent capabilities. Example: 2nth.ai agents — Leo routes legal questions, Grant handles financials, Katharine handles revenue.

Implementation: Orchestrator holds conversation context; specialists are stateless. Orchestrator decides routing based on intent classification.

3. Peer-to-Peer with Handoff

Agents handle tasks directly but can hand off to peers when out of domain. Conversation history travels with the handoff.

User → Agent A
       Agent A detects out-of-domain
       Agent A → [handoff with context] → Agent B
                                          Agent B continues

This is what agents.2nth.ai implements via the __handoff__ JSON signal in responses.

4. Hierarchical Multi-Tier

Director agents manage specialist agents; specialist agents may manage worker agents. Mirrors human org structure.

Director Agent (strategic decomposition)
    ├── Domain Agent A (tactical planning)
    │       ├── Worker A1 (tool execution)
    │       └── Worker A2 (tool execution)
    └── Domain Agent B (tactical planning)
            └── Worker B1 (tool execution)

Use when: Tasks are large enough to require independent sub-teams. Example: a software project where an architect agent delegates to frontend, backend, and data agents.

5. Critic / Debate Pattern

Two agents produce competing outputs; a judge agent evaluates and selects or synthesises.

User → Generator A → Critic
       Generator B ↗       → Final Response

Use when: Output quality and accuracy are paramount. Example: legal contract review where two analysis agents identify issues and a synthesis agent produces the final report.


Practical Implementation Patterns

Structured Handoff Payload

When an agent hands off to another, include full context:

interface Handoff {
  to_agent: string;          // target agent identifier
  reason: string;            // why the handoff is happening
  conversation: Message[];   // full conversation history
  context: {                 // structured context to prime the new agent
    original_intent: string;
    completed_steps: string[];
    pending_task: string;
    relevant_artifacts?: Record<string, unknown>;
  };
}

Agent Registry

A registry maps capability requirements to agent endpoints:

const AGENT_REGISTRY = {
  'fin/*':    { url: 'https://agents.2nth.ai/agents/grant',    card: '...' },
  'leg/*':    { url: 'https://agents.2nth.ai/agents/leo',      card: '...' },
  'mkt/*':    { url: 'https://agents.2nth.ai/agents/penny',    card: '...' },
  'data/*':   { url: 'https://agents.2nth.ai/agents/alex',     card: '...' },
  'biz/sales': { url: 'https://agents.2nth.ai/agents/katharine', card: '...' },
};

async function routeTask(task: string, intentPath: string): Promise<AgentEndpoint> {
  const match = Object.entries(AGENT_REGISTRY).find(([pattern]) =>
    minimatch(intentPath, pattern)
  );
  if (!match) throw new Error(`No agent registered for: ${intentPath}`);
  return match[1];
}

Idempotent Task IDs

In distributed agent chains, the same task may be submitted more than once (network retries, orchestrator restarts). Design task handling to be idempotent:

// Use a deterministic task ID based on content + session
const taskId = crypto.subtle.digest(
  'SHA-256',
  new TextEncoder().encode(`${sessionId}:${taskContent}`)
).then(hash => btoa(String.fromCharCode(...new Uint8Array(hash))));

Streaming in Agent Chains

Prefer streaming at every hop so the user sees progress as it happens, not just a final result after all agents finish:

User's browser ←SSE── Orchestrator ←SSE── Specialist Agent
                           ↑
                       Re-streams specialist events
                       with agent attribution metadata

Each streamed event should include agent_id so the UI can attribute partial outputs to the correct agent.

Error Propagation

Define what happens when an agent in a chain fails:

enum AgentFailurePolicy {
  FAIL_FAST = 'fail_fast',      // Abort the whole chain
  SKIP = 'skip',                // Continue with remaining agents
  RETRY = 'retry',              // Retry up to N times with backoff
  FALLBACK = 'fallback',        // Route to a backup agent
  HUMAN_ESCALATION = 'escalate' // Pause chain; notify human
}

For production agent chains: use RETRY (max 3, exponential backoff) for transient failures, FALLBACK for agent unavailability, HUMAN_ESCALATION for ambiguous inputs that no agent can handle confidently.


Observability for Agent Chains

Distributed agents fail in distributed ways. Observability is non-negotiable.

Trace Context

Every task must carry a trace ID through the entire chain (W3C Trace Context format):

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
              │  └── trace-id (16 bytes)           └── span-id   │
              │                                                   └── flags
              └── version

Inject this header at the chain entry point; propagate to every downstream agent call.

What to Log per Agent Invocation

{
  trace_id: string,
  span_id: string,
  parent_span_id: string,
  agent_id: string,
  task_id: string,
  input_tokens: number,
  output_tokens: number,
  model: string,
  latency_ms: number,
  status: 'success' | 'failure' | 'timeout',
  error?: string,
  tool_calls?: { name: string; latency_ms: number }[],
  timestamp: string
}

Metrics to Track

MetricAlert Threshold
End-to-end chain latencyP95 > 30s
Per-agent success rate< 98%
Token consumption rate> budget per task type
Handoff rate by agentSudden spike = routing bug
Task abandonment rate> 5% = UX or capability gap

Protocol Selection Guide

ScenarioRecommended Protocol
Agent needs to call external tools (APIs, DBs, files)MCP
Agent needs to delegate to another agentA2A
Enterprise regulated environment; audit trail requiredACP
Internal agents in the same codebaseDirect function calls + typed interfaces
Agent chain with streaming to end-userA2A over SSE
Discovery of unknown agents at runtimeA2A Agent Cards
Custom protocol within a single platformHandoff JSON + SSE (current 2nth.ai pattern)