Skip to main content
Detailed format specifications for all field types.

Cleaned vs Raw Variants

All text data points include both cleaned (normalized) and raw (source) variants:
  • Cleaned: Standardized, normalized text for consistent filtering. Lowercased, accents stripped, symbols normalized.
  • Raw: Original source formatting preserved. Contains all source variations as an array.
Person example (job title):
{
  "title": {
    "cleaned": "senior product manager",
    "raw": [
      "Senior Product Manager",
      "Sr. PM",
      "Sr Product Manager"
    ]
  }
}
Company example (company name):
{
  "name": {
    "cleaned": "stripe inc",
    "raw": [
      "Stripe, Inc.",
      "Stripe"
    ]
  }
}
Use cleaned for filtering and matching. Use raw[] to see original source data. In person data, cleaned/raw is used for: job titles, organization names, headlines, summaries, skills, languages, degrees, and fields of study. In company data, it is used for: company name, headline, and description.

String Fields

All string fields are:
  • Lowercased (except where case matters)
  • Trimmed of leading/trailing whitespace
  • Normalized for accents and special characters

Date Fields

Dates preserve precision and use ISO 8601 format:
FormatExampleDescription
YYYY2020Year only
YYYY-MM2020-03Year and month
YYYY-MM-DD2020-03-15Full date
When multiple sources provide different precisions for the same date, we keep the most specific version.

Phone Numbers

All phone numbers are in E.164 international format:
  • Format: +[country code][number]
  • Example: +15551234567 (US number)
  • Example: +442071234567 (UK number)
All phone numbers include the country code prefix.

Email Addresses

All email addresses are:
  • Lowercased
  • Trimmed of whitespace
  • Checked for syntax/format
Example: JANE.DOE@COMPANY.COMjane.doe@company.com

Geographic Codes

State Codes

ISO 3166-2 format:
  • US-CA (California, United States)
  • US-NY (New York, United States)
  • GB-ENG (England, United Kingdom)

Country Codes

ISO 3166-1 alpha-2 format:
  • US (United States)
  • GB (United Kingdom)
  • CA (Canada)

Continent Codes

Standard continent codes:
  • NA (North America)
  • EU (Europe)
  • AS (Asia)
  • SA (South America)
  • AF (Africa)
  • OC (Oceania)
  • AN (Antarctica)

URLs

All URLs are normalized:
  • Protocol included (https://)
  • Trailing slashes removed (except root paths)
  • Consistent formatting
Examples:
  • linkedin.com/in/janedoehttps://www.linkedin.com/in/janedoe
  • github.com/userhttps://github.com/user

Boolean Fields

Boolean fields use true/false values:
  • is_decision_maker: true or false
  • current: true, false, or null (for personal emails)
  • validated: true or false

Integer Fields

Integer fields are numeric:
  • age: Years as integer
  • birth_year: Year as integer
  • years_of_experience: Years as integer

Enum Fields

Enum fields use lowercase, underscore-separated values:
  • seniority_level: c_level, vp, director, etc.
  • job_function: engineering, sales, marketing, etc.
  • confidence: high, moderate, low
See Enum Values for complete lists.

Array Fields

Arrays contain objects with consistent structure. Person arrays:
  • phones[]: Array of phone objects
  • emails[]: Array of email objects
  • locations[]: Array of location objects
  • experience[]: Array of experience objects
  • education[]: Array of education objects
  • socials[]: Array of social profile objects
  • skills[]: Array of skill objects
  • languages[]: Array of language objects
Company arrays:
  • domains[]: Array of domain objects
  • socials[]: Array of social profile objects
  • tickers[]: Array of ticker objects
  • legion_employee_count_by_month[]: Array of monthly headcount snapshots
All arrays are sorted deterministically. See Array Ordering for details.