When we launched Data Legion earlier this year, the API was the only way to access our person and company data. A few months later, there are four: a REST API, SDKs in Python and Node.js, an MCP server, and now a CLI. Each one exists because a different class of consumer needed a different interface to the same data.
This post covers why we built each layer, what we learned along the way, and why we think the CLI is the best interface for AI agents working with data today.
Layer 1: The API
The REST API came first. A POST to /person/enrich with an email address returns a full professional profile. A POST to /company/enrich with a domain returns firmographics. SQL-based search endpoints let you query 190M+ person records and 71M+ company records. Utility endpoints clean, validate, and hash data fields for free.
curl -X POST "https://api.datalegion.ai/person/enrich" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_API_KEY" \
-d '{"email": "jane.doe@techcompany.com"}'The API is the foundation everything else builds on. It handles authentication, rate limiting, credit tracking, and returns structured JSON with confidence scores per field. Every other layer is a different interface to the same endpoints.
For teams building real-time enrichment pipelines or integrating enrichment into automation platforms like Clay, the API is the right choice. It gives you full control over request construction, error handling, and response parsing.
But not every consumer wants to write HTTP requests from scratch.
Layer 2: SDKs
The Python and Node.js SDKs wrap the API with typed clients, structured response models, and proper error handling.
from datalegion import DataLegion
client = DataLegion(api_key="legion_...")
person = client.person.enrich(email="jane.doe@techcompany.com")
print(person.full_name)
print(person.job_title)
print(person.company_name)The Python SDK uses Pydantic v2 for response models and supports both sync and async clients. The Node.js SDK is written in TypeScript with zero external dependencies. Both expose typed error classes (AuthenticationError, RateLimitError, InsufficientCreditsError) so you can handle failures precisely.
SDKs matter when enrichment is embedded in application code: a webhook handler that enriches leads at intake, a scoring model that pulls person attributes, a pipeline that backfills company firmographics. The SDK removes the boilerplate of constructing HTTP requests and parsing JSON responses, and your IDE's autocomplete knows every field and method.
The limitation is that SDKs are language-specific. A Python SDK doesn't help a TypeScript project. Neither helps an AI agent that operates outside your codebase entirely.
Layer 3: MCP Server
The Model Context Protocol emerged in late 2024 as a standard for connecting AI assistants to external tools. Anthropic published the spec, and within months Claude, ChatGPT, GitHub Copilot, Cursor, and VS Code all added support. The ecosystem grew to over 20,000 MCP servers by early 2026.
MCP works like this: an AI assistant connects to an MCP server, discovers what tools are available, and calls them as needed during a conversation. The server advertises its capabilities through structured tool definitions with JSON Schema inputs. The assistant picks the right tool based on the user's request.
We shipped a remote MCP server at https://api.datalegion.ai/mcp that exposes nine tools mapping directly to our API endpoints.
{
"mcpServers": {
"datalegion": {
"type": "http",
"url": "https://api.datalegion.ai/mcp",
"headers": {
"API-Key": "YOUR_API_KEY"
}
}
}
}Once configured, you can ask Claude Desktop or ChatGPT to "look up the person with email jane@acme.com" and the assistant calls person_enrich automatically. You can say "find software engineers in Austin" and it calls person_discover with your natural language query. The assistant picks the tool, constructs the arguments, and presents the results. No code written.
MCP solves the distribution problem. Build one server, and it works in every MCP-compatible client. That is a meaningful advantage over building separate plugins for each AI platform.
But after shipping the MCP server and watching how AI agents actually use data tools in practice, we noticed something.
What We Observed
The AI coding agents that developers use most (Claude Code, Cursor, Windsurf, GitHub Copilot) all have one thing in common. They run shell commands. When an agent needs to look something up, check a status, or call an external service, it reaches for the terminal.
Claude Code has a Bash tool that can execute any shell command. Cursor runs commands in its integrated terminal. These agents think in terms of commands and their output. The terminal is where they live.
MCP works well for conversational AI assistants like Claude Desktop and ChatGPT, where a user asks a question and the assistant selects a tool. But for coding agents that are building, debugging, and automating, the shell is the natural interface. A CLI binary in the PATH that takes flags and returns JSON is immediately accessible to every one of these agents with zero configuration.
No MCP server setup. No protocol negotiation. No configuration file pointing at a URL. Just a command.
That's why we built the CLI last, after seeing how agents actually work, and why we think it's the best interface for AI agents interacting with data.
Layer 4: The CLI
datalegion-cli is a fully async, agent-friendly command-line tool for person and company data. Install it, set an API key, and every person and company enrichment, search, and utility operation is available from the terminal.
Installation
brew install datalegion-ai/tap/datalegion-cliOr via pip:
pip install datalegion-cliSet your API key once
The key is encrypted at rest using Fernet with a machine-tied PBKDF2 derived key. The config file is chmod 600.
datalegion-cli config set api_key legion_your_key_hereEnrich a person
datalegion-cli person enrich --email jane.doe@techcompany.comReturns the full person profile as JSON to stdout:
{
"matches": [
{
"person": {
"full_name": "jane doe",
"job_title": "senior product manager",
"company_name": "tech company",
"seniority_level": "senior",
"company_size": "1001-5000",
"work_email": "jane.doe@techcompany.com",
"mobile_phone": "+15551234567"
},
"match_metadata": {
"match_confidence": "high"
}
}
],
"total": 1
}Enrich a company
datalegion-cli company enrich --domain google.comSearch with SQL
datalegion-cli person search \
"SELECT * FROM people WHERE job_title ILIKE '%engineer%' AND state = 'california'" \
--limit 25Clean, validate, and hash (free, no credits)
datalegion-cli utility clean \
'{"email": " JANE.DOE+work@GMAIL.COM ", "phone": "(555) 123-4567"}'
datalegion-cli utility validate --email "test@example.com" --phone "+15551234567"
datalegion-cli utility hash jane@example.comPipe and compose
# Extract a specific field with jq
datalegion-cli person enrich --email jane@acme.com | jq '.matches[0].person.job_title'
# Feed JSON from stdin
echo '{"email": "jane@acme.com"}' | datalegion-cli person enrich --stdin
# Suppress all non-JSON output
datalegion-cli --quiet person enrich --email jane@acme.comWhy the CLI Works Best for Agents
A few design decisions make the CLI agent-native rather than just agent-compatible.
The most important one: all structured data goes to stdout, and everything else goes to stderr. Logs, progress indicators, credit usage summaries. All on stderr. An agent parsing the output never sees noise mixed into the JSON. Most CLIs get this wrong. They mix human-readable messages into their output, which breaks machine parsing.
Agents can also construct a JSON payload and pipe it directly into the CLI via --stdin. No flag-per-field overhead. No shell escaping issues with complex queries. Just echo '{"email": "..."}' | datalegion-cli person enrich --stdin. This matches how agents naturally compose commands.
The --quiet flag suppresses all log output entirely. Combined with the stdout/stderr separation, it means the only bytes on stdout are the JSON response. An agent calling datalegion-cli --quiet person enrich --email jane@acme.com gets parseable output with zero noise.
And the API key is encrypted at rest, tied to the machine identity via PBKDF2. Set it once with config set api_key and every agent on the machine can authenticate without passing secrets through command-line arguments.
None of these are unusual patterns individually. They're standard Unix conventions. But most developer tools don't follow them consistently, which is why most CLIs are frustrating for agents to use. The difference between "agent-compatible" and "agent-native" is the difference between "it technically works if you parse around the noise" and "it works cleanly out of the box."
How Agents Use It in Practice
A developer asks Claude Code to research a prospect. The agent calls the CLI directly:
datalegion-cli person enrich --email sarah.chen@acmecorp.com \
--fields full_name,job_title,seniority_level,company_name,company_sizeNeed to enrich a list of leads? The agent writes a loop that calls the CLI for each record and parses the JSON output. No SDK installation needed. No import statements. No dependency management.
The debugging story is worth noting too. A developer can run the exact same command the agent ran. The human and the agent share the same interface. That means the developer can verify, reproduce, and trust the agent's data lookups.
The CLI also works for workflows that don't involve agents at all. A bash script that enriches new Stripe customers via webhook. A cron job that re-enriches stale CRM records. A quick lookup during a sales call. The same tool serves all of these because the design is clean enough for both humans and machines.
The Full Stack
Here's how the four layers map to different consumers:
| Layer | Best for | Why |
|---|---|---|
| API | Custom integrations, automation platforms, direct HTTP | Full control, works with any language, integrates with Clay/Zapier/Make |
| SDK | Application code in Python or Node.js | Type safety, structured responses, async support, error classes |
| MCP | Conversational AI assistants (Claude Desktop, ChatGPT, Copilot) | Standard protocol, tool discovery, works across all MCP clients |
| CLI | AI coding agents, shell scripts, quick lookups, automation | Zero config, stdout/stderr separation, pipes, universal compatibility |
None of these layers replaces another. A team might use the SDK in their webhook handler, the CLI in their deployment scripts, the MCP server in their Claude Desktop setup, and the API directly from their automation platform. They all hit the same endpoints, use the same API key, consume the same credits, and return the same data.
Which layer you use matters less than whether the data product you depend on meets you where you work. For a growing number of developers and teams, that's in an AI agent's terminal.
Get Started
The CLI is available now via Homebrew and PyPI.
# Install
brew install datalegion-ai/tap/datalegion-cli
# Authenticate
datalegion-cli config set api_key legion_your_key_here
# Look up a person
datalegion-cli person enrich --email jane@example.com
# Look up a company
datalegion-cli company enrich --domain stripe.com
# Check your credits
datalegion-cli credits balance