A prospect research workflow at most B2B companies looks like this. Someone identifies a target account. They open LinkedIn, scrape titles and tenure for the team. They pull funding history from Crunchbase. They cross-reference the website. They take notes in a doc. The notes get pasted into a CRM record. A rep reads them before a call. The whole cycle takes 30 to 90 minutes per account, and the result is stale by the next quarter.
Most of that work is structured data lookups. The judgment, the message, the timing of outreach, those still belong to a human. The lookups don't.
This post walks through building a prospect research agent that does the lookup half. The agent runs on Claude Code and shells out to `datalegion-cli` for company firmographics and decision-maker discovery. The output is a structured brief: a research dossier per account, plus per-contact email-sequence outlines. A rep reads the whole thing in 90 seconds, not 90 minutes.
The agent's prompt and tool list fit on one screen. A working version takes about 10 minutes to set up. We'll use a fictional target company called Larkspur Software so the example is fully reproducible. Replace it with a real domain when you ship.
What you'll have built
By the end:
- A Claude Code agent that, given a target company domain, produces a research dossier with firmographics, recent hiring trends, and a shortlist of decision-makers with sourced contact data and per-field confidence scores.
- A structured email-sequence outline per shortlist contact: a hook angle drawn from the research, body bullets, and a CTA suggestion for an initial email plus two follow-ups.
- A tool surface (
datalegion-cli) the agent calls via Bash. No MCP setup, no SDK install, no API key passed through arguments. - A prompt template you can adapt to your own ICP and brand.
What we're not building: a prose generator. The agent emits structured outlines, not finished email copy. Voice, tone, and final wording are still your job. Reasons covered in the closing section.
Why a CLI for the tool layer
Claude Code's primary tool is Bash. When an agent needs to look something up, check a status, or call an external service, it reaches for the terminal. We made the case for CLI as the agent-native interface when we shipped the CLI in April. The short version: stdout for data, stderr for logs, encrypted machine-tied API key, JSON in and JSON out. Pipes and jq work the way Unix expects.
That design is what makes this agent loop fit on one screen. The agent never constructs an HTTP request, parses a response wrapper, or imports a client library. It runs datalegion-cli person search … and gets parseable output.
Prereqs
You need:
- Claude Code installed locally (
npm install -g @anthropic-ai/claude-codeor via the JetBrains/VS Code plugin). - A Data Legion API key (start a free trial if you don't have one).
- macOS or Linux. Python 3.11+ if you install via pip; nothing extra if you use Homebrew.
Setup (about 30 seconds)
Install the CLI:
brew install datalegion-ai/tap/datalegion-cliOr via pip:
pip install datalegion-cliAuthenticate once. The key is encrypted at rest using a machine-tied derived key, so you don't paste it into prompts or commit it to a repo:
datalegion-cli config set api_key legion_your_key_hereVerify:
datalegion-cli credits balanceIf that returns your credit balance, the CLI is working.
The agent loop
Open Claude Code in any directory. Paste the following prompt, replacing larkspur.example with your target domain:
You are a B2B prospect research assistant. The target is larkspur.example.
Use datalegion-cli to:
1. Pull company firmographics (industry, size, growth signals).
2. Search for decision-makers (VP+ in engineering, product, or operations) with the contact fields you need in one call.
3. Produce a markdown brief with two parts:
PART A: Research dossier.
Company snapshot, hiring/attrition trend summary, and a shortlist of
3-5 decision-makers with role, seniority, years of experience, and
a sourced work email + mobile phone (with confidence) where
available.
PART B: Email-sequence outlines (one set per shortlist contact).
For an initial outreach email and two follow-ups, propose a hook angle
anchored in the research (e.g., a growth signal, a specific role
responsibility, a verifiable fact about the company), 2-3 body bullet
points, and a CTA suggestion. Outline only. No finished prose. The
human will write the actual copy.
All structured data comes from datalegion-cli. Do not fabricate fields that
the CLI didn't return. If a field is missing, write "not available" rather
than inferring. For email outlines, only suggest hooks that are directly
supported by the data; do not invent context.That's the full agent specification. The "do not fabricate" line matters. Models that read JSON output from a tool will sometimes invent plausible-looking fields when the real data is sparse. Pinning the agent to "what the CLI returned, nothing more" is the simplest hallucination guard you can write. Agents passing structured JSON between tools hallucinate less than agents that parse natural-language tool output, which is the whole reason datalegion-cli returns parseable JSON to stdout.
Claude Code reads the prompt, picks datalegion-cli from the available tools, and runs the loop. Here's what each step looks like.
Step 1: Company firmographics
The agent's first call:
datalegion-cli company enrich \
--domain larkspur.example \
--fields name,domain,industry,type,size,founded,legion_employee_count,legion_employee_growth_rate,legion_seniority_distribution,legion_job_function_distributionLarkspur Software is fictional, so the live command returns no match. Here's the response shape you'd see for a real target:
{
"matches": [
{
"company": {
"legion_id": "f8c1d2e3-a4b5-5c6d-8e7f-9a0b1c2d3e4f",
"name": { "display": "Larkspur Software", "cleaned": "larkspur software" },
"domain": "larkspur.example",
"industry": "software",
"type": "private",
"size": "11-50",
"founded": 2022,
"legion_employee_count": 47,
"legion_employee_growth_rate": {
"1m": 0.043,
"3m": 0.097,
"6m": 0.21,
"12m": 0.42
},
"legion_seniority_distribution": {
"c_level": 4,
"vp": 5,
"director": 7,
"manager": 8,
"senior": 12,
"junior": 9,
"intern": 2
},
"legion_job_function_distribution": {
"engineering": 24,
"product": 8,
"operations": 6,
"sales": 4,
"marketing": 3,
"finance": 2
}
},
"match_metadata": {
"matched_on": ["domain"],
"match_type": "exact",
"match_confidence": "high"
}
}
],
"total": 1
}The 6-month employee growth rate of 21% and the 12-month rate of 42% are the kind of signals a rep notices. The seniority distribution gives the agent a map of who exists before it asks for names.
Step 2: Find decision-makers
The agent runs a search query against the people database. Two refinements worth calling out:
- We filter on
company_domain(rather thancompany_name) because the domain is the exact key returned by step 1's company enrich, and it avoids the false positives a fuzzy name match invites. - We add
work_email IS NOT NULLso the shortlist only contains contacts a rep can reach. The CLI's--requiredflag does the equivalent onperson enrich, butperson searchdoesn't carry that flag, soIS NOT NULLin the SQL is the search-side substitute.
datalegion-cli person search \
"SELECT * FROM people WHERE company_domain = 'larkspur.example' AND seniority_level IN ('vp', 'c_level', 'director') AND job_function IN ('engineering', 'product', 'operations') AND work_email IS NOT NULL" \
--limit 5 \
--fields full_name,job_title,seniority_level,job_function,years_of_experience,linkedin_url,work_email,mobile_phone,emails,phonesThe response includes a total field that tells you the full WHERE-clause cardinality across the database, not just this page. So if the search matches 12 people but --limit 5 returns 5, total reads 12. That's how the agent knows whether to widen the filter or accept the shortlist.
A trimmed sample (one match shown; with --limit 5 the array carries up to 5 entries):
{
"matches": [
{
"person": {
"legion_id": "a1b2c3d4-e5f6-5a1b-8c3d-4e5f6a1b2c3d",
"full_name": "marcus delaney",
"job_title": "vp of engineering",
"seniority_level": "vp",
"job_function": "engineering",
"years_of_experience": 14,
"linkedin_url": "https://www.linkedin.com/in/marcusdelaney",
"work_email": "marcus.delaney@larkspur.example",
"mobile_phone": "+15125551234",
"emails": [
{
"address": "marcus.delaney@larkspur.example",
"type": "professional",
"current": true,
"confidence": "high",
"validated": false,
"last_seen": "2026-04-30",
"num_sources": 3
},
{
"address": "marcus@previousco.example",
"type": "professional",
"current": false,
"confidence": "high",
"validated": false,
"last_seen": "2024-11-12",
"num_sources": 2
}
],
"phones": [
{
"number": "+15125551234",
"type": "mobile",
"current": true,
"confidence": "high",
"last_seen": "2026-04-30",
"num_sources": 4
}
]
}
}
],
"total": 12
}The agent reads total: 12, sees that 5 results is a representative sample, and moves on.
The confidence field on each email is sourcing strength: how many independent sources agree the address belongs to this person. That's the signal the agent surfaces in the brief.
Step 3: Synthesize the brief
The agent takes everything it has accumulated and writes a single markdown file with two parts: the research dossier, then per-contact email-sequence outlines. Output looks something like:
# Larkspur Software: research brief
**Industry:** software · **Size:** 11-50 (47 employees) · **Founded:** 2022
## Part A: Research dossier
### Growth signals (last 12 months)
- Employee growth: +42% YoY, +21% in last 6 months
- Hiring concentrated at the senior IC level (12 senior) with steady C-level (4) and VP (5) bench
- Engineering: 24 of 47 employees (~51%); product: 8; operations: 6
### Shortlist
| Name | Role | Seniority | Experience | Email | Mobile | Email confidence |
|---|---|---|---|---|---|---|
| marcus delaney | VP of Engineering | vp | 14 yrs | marcus.delaney@larkspur.example | +1 512-555-1234 | high |
| ... | ... | ... | ... | ... | ... | ... |
### Notes
- Total VP/C-level/Director matches across engineering/product/ops: 12. Shortlist
shows the top 5 by seniority.
- Mobile phone not available for 2 of 5 contacts; email confidence varies.
## Part B: Email-sequence outlines
### marcus delaney, VP of Engineering
**Email 1 (initial)**
- Hook: 42% YoY engineering growth implies infra scaling pain
- Body bullets:
- Connect engineering scale to a problem your product solves
- One concrete artifact (case study, customer name, datapoint)
- Frame as a 15-minute conversation, not a pitch
- CTA: ask for a short call
**Email 2 (first follow-up)**
- Hook: a different angle (don't restate Email 1)
- Body bullets:
- New value or insight relevant to a 14-yr engineering VP
- One specific question that invites a low-effort reply
- CTA: a softer ask (one-line reply, async)
**Email 3 (final follow-up)**
- Hook: brief, time-aware, no pressure
- Body bullets:
- Acknowledge silence
- One-line summary of the value you'd offer
- CTA: leave the door openThe agent emits this for every contact in the shortlist. Two datalegion-cli calls per target (company enrich plus a single shortlist search) keep the round trip tight. Each call returns in under a second; the bulk of the latency is the model thinking between calls. A single target completes in roughly 10 to 15 seconds end-to-end. At a list of 100 targets, you'd run it during a coffee break.
What the agent gets right vs. what still needs you
The agent is good at:
- Pulling structured firmographics consistently (no transcription errors, no copy-paste from LinkedIn).
- Sizing the decision-maker pool with
totalso you know whether the shortlist is exhaustive or a sample. - Reading the
confidencefield on each email and surfacing it in the brief, instead of guessing. - Producing identical brief format every time, so the rep's reading speed compounds.
- Anchoring email-outline hooks to specific data points (a growth rate, years of experience in a role, a job-function mix) rather than inventing them.
The agent is bad at:
- Picking the right person to reach out to. Seniority and job function get you a candidate list. Which one is closest to a decision today is judgment.
- Reading signals the database doesn't carry. A founder tweet about a hiring problem won't show up in the API.
- Writing the prose. The outlines hand you a hook, bullets, and a CTA suggestion. The actual wording, the voice, and the brand-specific rules are yours. Most teams have a tone the agent shouldn't try to imitate from defaults.
- Knowing when not to reach out at all. If the company just laid off 20%, the data shows the attrition spike but won't tell you "wait six weeks before you email."
This pattern, agent for retrieval and structure, human for judgment and voice, is the right split for sales work right now. Build the agent to make the human's first 90 seconds with a new account fast and accurate. Don't build the agent to replace the next 90 seconds.
Run it on your own ICP
Replace larkspur.example with a real target domain. Adjust the seniority_level and job_function filters in the search query to match your buyer. Tune the email-outline prompt with your own voice rules and brand-specific requirements (the article ships with a generic structure; your brand will want specifics). The CLI surface stays the same; the things that change are the search predicate and the outline prompt.
The full surface is documented at docs.datalegion.ai/integrations/cli. The thinking behind the four-layer product (API, SDK, MCP, CLI) and why CLI is the agent-friendly default is in API, SDK, MCP, CLI: Building a Data Product for the Agent Era.
If you ship something with this, we'd like to see it. Send it to success@datalegion.ai.