API, SDK, MCP, CLI: Building Data Products for the Agent Era

When we launched Data Legion earlier this year, the API was the only way to access our person and company data. A few months later, there are four: a REST API, SDKs in Python and Node.js, an MCP server, and now a CLI. Each one exists because a different class of consumer needed a different interface to the same data.

This post covers why we built each layer, what we learned along the way, and why we think the CLI is the best interface for AI agents working with data today.

Layer 1: The API

The REST API came first. A POST to /person/enrich with an email address returns a full professional profile. A POST to /company/enrich with a domain returns firmographics. SQL-based search endpoints let you query 188M+ person records and 71M+ company records. Utility endpoints clean, validate, and hash data fields for free.

Shellcurl -X POST "https://api.datalegion.ai/person/enrich" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{"email": "jane.doe@techcompany.com"}'

The API is the foundation everything else builds on. It handles authentication, rate limiting, credit tracking, and returns structured JSON with confidence scores per field. Every other layer is a different interface to the same endpoints.

For teams building real-time enrichment pipelines or integrating enrichment into automation platforms like Clay, the API is the right choice. It gives you full control over request construction, error handling, and response parsing.

But not every consumer wants to write HTTP requests from scratch.

Layer 2: SDKs

The Python and Node.js SDKs wrap the API with typed clients, structured response models, and proper error handling.

Pythonfrom datalegion import DataLegion

client = DataLegion(api_key="legion_...")

person = client.person.enrich(email="jane.doe@techcompany.com")

print(person.full_name)
print(person.job_title)
print(person.company_name)

JavaScriptimport DataLegion from "datalegion";

const client = new DataLegion({ apiKey: "legion_..." });

const person = await client.person.enrich({
  email: "jane.doe@techcompany.com",
});

console.log(person.full_name);
console.log(person.job_title);
console.log(person.company_name);

The Python SDK uses Pydantic v2 for response models and supports both sync and async clients. The Node.js SDK is written in TypeScript with zero external dependencies. Both expose typed error classes (AuthenticationError, RateLimitError, InsufficientCreditsError) so you can handle failures precisely.

SDKs matter when enrichment is embedded in application code: a webhook handler that enriches leads at intake, a scoring model that pulls person attributes, a pipeline that backfills company firmographics. The SDK removes the boilerplate of constructing HTTP requests and parsing JSON responses, and your IDE's autocomplete knows every field and method.

The limitation is that SDKs are language-specific. A Python SDK doesn't help a TypeScript project. Neither helps an AI agent that operates outside your codebase entirely.

Layer 3: MCP Server

The Model Context Protocol emerged in late 2024 as a standard for connecting AI assistants to external tools. Anthropic published the spec, and within months Claude, ChatGPT, GitHub Copilot, Cursor, and VS Code all added support. The ecosystem grew to over 20,000 MCP servers by early 2026.

MCP works like this: an AI assistant connects to an MCP server, discovers what tools are available, and calls them as needed during a conversation. The server advertises its capabilities through structured tool definitions with JSON Schema inputs. The assistant picks the right tool based on the user's request.

We shipped a remote MCP server at https://api.datalegion.ai/mcp that exposes our enrich/search and utility tools, mapping directly to the corresponding API endpoints.

JSON{
  "mcpServers": {
    "datalegion": {
      "type": "http",
      "url": "https://api.datalegion.ai/mcp",
      "headers": {
        "API-Key": "YOUR_API_KEY"
      }
    }
  }
}

Once configured, you can ask Claude Desktop or ChatGPT to "look up the person with email jane@acme.com" and the assistant calls person_enrich automatically. You can say "find VPs of engineering at companies under 50 employees" and it calls person_search with the right SQL filter. The assistant picks the tool, constructs the arguments, and presents the results. No code written.

MCP solves the distribution problem. Build one server, and it works in every MCP-compatible client. That is a meaningful advantage over building separate plugins for each AI platform.

But after shipping the MCP server and watching how AI agents actually use data tools in practice, we noticed something.

What We Observed

The AI coding agents that developers use most (Claude Code, Cursor, Windsurf, GitHub Copilot) all have one thing in common. They run shell commands. When an agent needs to look something up, check a status, or call an external service, it reaches for the terminal.

Claude Code has a Bash tool that can execute any shell command. Cursor runs commands in its integrated terminal. These agents think in terms of commands and their output. The terminal is where they live.

MCP works well for conversational AI assistants like Claude Desktop and ChatGPT, where a user asks a question and the assistant selects a tool. But for coding agents that are building, debugging, and automating, the shell is the natural interface. A CLI binary in the PATH that takes flags and returns JSON is immediately accessible to every one of these agents with zero configuration.

No MCP server setup. No protocol negotiation. No configuration file pointing at a URL. Just a command.

That's why we built the CLI last, after seeing how agents actually work, and why we think it's the best interface for AI agents interacting with data.

Layer 4: The CLI

datalegion-cli is a fully async, agent-friendly command-line tool for person and company data. Install it, set an API key, and every person and company enrichment, search, and utility operation is available from the terminal.

Installation

Shellbrew install datalegion-ai/tap/datalegion-cli

Or via pip:

Shellpip install datalegion-cli

Set your API key once

The key is encrypted at rest using Fernet with a machine-tied PBKDF2 derived key. The config file is chmod 600.

Shelldatalegion-cli config set api_key legion_your_key_here

Enrich a person

Shelldatalegion-cli person enrich --email jane.doe@techcompany.com

Returns the full person profile as JSON to stdout:

JSON{
  "matches": [
    {
      "person": {
        "full_name": "jane doe",
        "job_title": "senior product manager",
        "company_name": "tech company",
        "seniority_level": "senior",
        "company_size": "1001-5000",
        "work_email": "jane.doe@techcompany.com",
        "mobile_phone": "+15551234567"
      },
      "match_metadata": {
        "match_confidence": "high"
      }
    }
  ],
  "total": 1
}

Enrich a company

Shelldatalegion-cli company enrich --domain google.com

Search with SQL

Shelldatalegion-cli person search \
  "SELECT * FROM people WHERE job_title ILIKE '%engineer%' AND state = 'california'" \
  --limit 25

Clean, validate, and hash (free, no credits)

Shelldatalegion-cli utility clean \
  '{"email": "  JANE.DOE+work@GMAIL.COM  ", "phone": "(555) 123-4567"}'

datalegion-cli utility validate --email "test@example.com" --phone "+15551234567"

datalegion-cli utility hash jane@example.com

Pipe and compose

Shell# Extract a specific field with jq
datalegion-cli person enrich --email jane@acme.com | jq '.matches[0].person.job_title'

# Feed JSON from stdin
echo '{"email": "jane@acme.com"}' | datalegion-cli person enrich --stdin

# Suppress all non-JSON output
datalegion-cli --quiet person enrich --email jane@acme.com

Why the CLI Works Best for Agents

A few design decisions make the CLI agent-native rather than just agent-compatible.

The most important one: all structured data goes to stdout, and everything else goes to stderr. Logs, progress indicators, credit usage summaries. All on stderr. An agent parsing the output never sees noise mixed into the JSON. Most CLIs get this wrong. They mix human-readable messages into their output, which breaks machine parsing.

Agents can also construct a JSON payload and pipe it directly into the CLI via --stdin. No flag-per-field overhead. No shell escaping issues with complex queries. Just echo '{"email": "..."}' | datalegion-cli person enrich --stdin. This matches how agents naturally compose commands.

The --quiet flag suppresses all log output entirely. Combined with the stdout/stderr separation, it means the only bytes on stdout are the JSON response. An agent calling datalegion-cli --quiet person enrich --email jane@acme.com gets parseable output with zero noise.

And the API key is encrypted at rest, tied to the machine identity via PBKDF2. Set it once with config set api_key and every agent on the machine can authenticate without passing secrets through command-line arguments.

None of these are unusual patterns individually. They're standard Unix conventions. But most developer tools don't follow them consistently, which is why most CLIs are frustrating for agents to use. The difference between "agent-compatible" and "agent-native" is the difference between "it technically works if you parse around the noise" and "it works cleanly out of the box."

How Agents Use It in Practice

A developer asks Claude Code to research a prospect. The agent calls the CLI directly:

Shelldatalegion-cli person enrich --email sarah.chen@acmecorp.com \
  --fields full_name,job_title,seniority_level,company_name,company_size

Need to enrich a list of leads? The agent writes a loop that calls the CLI for each record and parses the JSON output. No SDK installation needed. No import statements. No dependency management.

The debugging story is worth noting too. A developer can run the exact same command the agent ran. The human and the agent share the same interface. That means the developer can verify, reproduce, and trust the agent's data lookups.

The CLI also works for workflows that don't involve agents at all. A bash script that enriches new Stripe customers via webhook. A cron job that re-enriches stale CRM records. A quick lookup during a sales call. The same tool serves all of these because the design is clean enough for both humans and machines.

The Full Stack

Here's how the four layers map to different consumers:

Layer	Best for	Why
API	Custom integrations, automation platforms, direct HTTP	Full control, works with any language, integrates with Clay/Zapier/Make
SDK	Application code in Python or Node.js	Type safety, structured responses, async support, error classes
MCP	Conversational AI assistants (Claude Desktop, ChatGPT, Copilot)	Standard protocol, tool discovery, works across all MCP clients
CLI	AI coding agents, shell scripts, quick lookups, automation	Zero config, stdout/stderr separation, pipes, universal compatibility

None of these layers replaces another. A team might use the SDK in their webhook handler, the CLI in their deployment scripts, the MCP server in their Claude Desktop setup, and the API directly from their automation platform. They all hit the same endpoints, use the same API key, consume the same credits, and return the same data.

Which layer you use matters less than whether the data product you depend on meets you where you work. For a growing number of developers and teams, that's in an AI agent's terminal.

Get Started

The CLI is available now via Homebrew and PyPI.

Shell# Install
brew install datalegion-ai/tap/datalegion-cli

# Authenticate
datalegion-cli config set api_key legion_your_key_here

# Look up a person
datalegion-cli person enrich --email jane@example.com

# Look up a company
datalegion-cli company enrich --domain stripe.com

# Check your credits
datalegion-cli credits balance

This post covers why we built each layer, what we learned along the way, and why we think the CLI is the best interface for AI agents working with data today.

Layer 1: The API

Shellcurl -X POST "https://api.datalegion.ai/person/enrich" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_API_KEY" \
  -d '{"email": "jane.doe@techcompany.com"}'

But not every consumer wants to write HTTP requests from scratch.

Layer 2: SDKs

The Python and Node.js SDKs wrap the API with typed clients, structured response models, and proper error handling.

Pythonfrom datalegion import DataLegion

client = DataLegion(api_key="legion_...")

person = client.person.enrich(email="jane.doe@techcompany.com")

print(person.full_name)
print(person.job_title)
print(person.company_name)

JavaScriptimport DataLegion from "datalegion";

const client = new DataLegion({ apiKey: "legion_..." });

const person = await client.person.enrich({
  email: "jane.doe@techcompany.com",
});

console.log(person.full_name);
console.log(person.job_title);
console.log(person.company_name);

The limitation is that SDKs are language-specific. A Python SDK doesn't help a TypeScript project. Neither helps an AI agent that operates outside your codebase entirely.

Layer 3: MCP Server

We shipped a remote MCP server at https://api.datalegion.ai/mcp that exposes our enrich/search and utility tools, mapping directly to the corresponding API endpoints.

JSON{
  "mcpServers": {
    "datalegion": {
      "type": "http",
      "url": "https://api.datalegion.ai/mcp",
      "headers": {
        "API-Key": "YOUR_API_KEY"
      }
    }
  }
}

MCP solves the distribution problem. Build one server, and it works in every MCP-compatible client. That is a meaningful advantage over building separate plugins for each AI platform.

But after shipping the MCP server and watching how AI agents actually use data tools in practice, we noticed something.

What We Observed

No MCP server setup. No protocol negotiation. No configuration file pointing at a URL. Just a command.

That's why we built the CLI last, after seeing how agents actually work, and why we think it's the best interface for AI agents interacting with data.

Layer 4: The CLI

Installation

Shellbrew install datalegion-ai/tap/datalegion-cli

Or via pip:

Shellpip install datalegion-cli

Set your API key once

The key is encrypted at rest using Fernet with a machine-tied PBKDF2 derived key. The config file is chmod 600.

Shelldatalegion-cli config set api_key legion_your_key_here

Enrich a person

Shelldatalegion-cli person enrich --email jane.doe@techcompany.com

Returns the full person profile as JSON to stdout:

JSON{
  "matches": [
    {
      "person": {
        "full_name": "jane doe",
        "job_title": "senior product manager",
        "company_name": "tech company",
        "seniority_level": "senior",
        "company_size": "1001-5000",
        "work_email": "jane.doe@techcompany.com",
        "mobile_phone": "+15551234567"
      },
      "match_metadata": {
        "match_confidence": "high"
      }
    }
  ],
  "total": 1
}

Enrich a company

Shelldatalegion-cli company enrich --domain google.com

Search with SQL

Shelldatalegion-cli person search \
  "SELECT * FROM people WHERE job_title ILIKE '%engineer%' AND state = 'california'" \
  --limit 25

Clean, validate, and hash (free, no credits)

Shelldatalegion-cli utility clean \
  '{"email": "  JANE.DOE+work@GMAIL.COM  ", "phone": "(555) 123-4567"}'

datalegion-cli utility validate --email "test@example.com" --phone "+15551234567"

datalegion-cli utility hash jane@example.com

Pipe and compose

Shell# Extract a specific field with jq
datalegion-cli person enrich --email jane@acme.com | jq '.matches[0].person.job_title'

# Feed JSON from stdin
echo '{"email": "jane@acme.com"}' | datalegion-cli person enrich --stdin

# Suppress all non-JSON output
datalegion-cli --quiet person enrich --email jane@acme.com

Why the CLI Works Best for Agents

A few design decisions make the CLI agent-native rather than just agent-compatible.

How Agents Use It in Practice

A developer asks Claude Code to research a prospect. The agent calls the CLI directly:

Shelldatalegion-cli person enrich --email sarah.chen@acmecorp.com \
  --fields full_name,job_title,seniority_level,company_name,company_size

Need to enrich a list of leads? The agent writes a loop that calls the CLI for each record and parses the JSON output. No SDK installation needed. No import statements. No dependency management.

The Full Stack

Here's how the four layers map to different consumers:

Layer	Best for	Why
API	Custom integrations, automation platforms, direct HTTP	Full control, works with any language, integrates with Clay/Zapier/Make
SDK	Application code in Python or Node.js	Type safety, structured responses, async support, error classes
MCP	Conversational AI assistants (Claude Desktop, ChatGPT, Copilot)	Standard protocol, tool discovery, works across all MCP clients
CLI	AI coding agents, shell scripts, quick lookups, automation	Zero config, stdout/stderr separation, pipes, universal compatibility

Which layer you use matters less than whether the data product you depend on meets you where you work. For a growing number of developers and teams, that's in an AI agent's terminal.

Get Started

The CLI is available now via Homebrew and PyPI.

Shell# Install
brew install datalegion-ai/tap/datalegion-cli

# Authenticate
datalegion-cli config set api_key legion_your_key_here

# Look up a person
datalegion-cli person enrich --email jane@example.com

# Look up a company
datalegion-cli company enrich --domain stripe.com

# Check your credits
datalegion-cli credits balance

API, SDK, MCP, CLI: Building Data Products for the Agent Era

Layer 1: The API

Layer 2: SDKs

Layer 3: MCP Server

What We Observed

Layer 4: The CLI

Installation

Set your API key once

Enrich a person

Enrich a company

Search with SQL

Clean, validate, and hash (free, no credits)

Pipe and compose

Why the CLI Works Best for Agents

How Agents Use It in Practice

The Full Stack

Get Started

See the data for yourself

Related Articles

A Prospect Research Agent with Claude Code + datalegion-cli

Natural Language Search for People Data

Lead Scoring with Enriched Data

API, SDK, MCP, CLI: Building Data Products for the Agent Era

Layer 1: The API

Layer 2: SDKs

Layer 3: MCP Server

What We Observed

Layer 4: The CLI

Installation

Set your API key once

Enrich a person

Enrich a company

Search with SQL

Clean, validate, and hash (free, no credits)

Pipe and compose

Why the CLI Works Best for Agents

How Agents Use It in Practice

The Full Stack

Get Started

See the data for yourself

Related Articles

A Prospect Research Agent with Claude Code + datalegion-cli

Natural Language Search for People Data

Lead Scoring with Enriched Data