Cleaned vs Raw Variants
All text data points include both cleaned (normalized) and raw (source) variants:- Cleaned: Standardized, normalized text for consistent filtering. Lowercased, accents stripped, symbols normalized.
- Raw: Original source formatting preserved. Contains all source variations as an array.
cleaned for filtering and matching. Use raw[] to see original source data.
In person data, cleaned/raw is used for: job titles, organization names, headlines, summaries, skills, languages, degrees, and fields of study. In company data, it is used for: company name, headline, and description.
String Fields
All string fields are:- Lowercased (except where case matters)
- Trimmed of leading/trailing whitespace
- Normalized for accents and special characters
Date Fields
Dates preserve precision and use ISO 8601 format:| Format | Example | Description |
|---|---|---|
YYYY | 2020 | Year only |
YYYY-MM | 2020-03 | Year and month |
YYYY-MM-DD | 2020-03-15 | Full date |
Phone Numbers
All phone numbers are in E.164 international format:- Format:
+[country code][number] - Example:
+15551234567(US number) - Example:
+442071234567(UK number)
Email Addresses
All email addresses are:- Lowercased
- Trimmed of whitespace
- Checked for syntax/format
JANE.DOE@COMPANY.COM → jane.doe@company.com
Geographic Codes
State Codes
ISO 3166-2 format:US-CA(California, United States)US-NY(New York, United States)GB-ENG(England, United Kingdom)
Country Codes
ISO 3166-1 alpha-2 format:US(United States)GB(United Kingdom)CA(Canada)
Continent Codes
Standard continent codes:NA(North America)EU(Europe)AS(Asia)SA(South America)AF(Africa)OC(Oceania)AN(Antarctica)
URLs
All URLs are normalized:- Protocol included (
https://) - Trailing slashes removed (except root paths)
- Consistent formatting
linkedin.com/in/janedoe→https://www.linkedin.com/in/janedoegithub.com/user→https://github.com/user
Boolean Fields
Boolean fields usetrue/false values:
is_decision_maker:trueorfalsecurrent:true,false, ornull(for personal emails)validated:trueorfalse
Integer Fields
Integer fields are numeric:age: Years as integerbirth_year: Year as integeryears_of_experience: Years as integer
Enum Fields
Enum fields use lowercase, underscore-separated values:seniority_level:c_level,vp,director, etc.job_function:engineering,sales,marketing, etc.confidence:high,moderate,low
Array Fields
Arrays contain objects with consistent structure. Person arrays:phones[]: Array of phone objectsemails[]: Array of email objectslocations[]: Array of location objectsexperience[]: Array of experience objectseducation[]: Array of education objectssocials[]: Array of social profile objectsskills[]: Array of skill objectslanguages[]: Array of language objects
domains[]: Array of domain objectssocials[]: Array of social profile objectstickers[]: Array of ticker objectslegion_employee_count_by_month[]: Array of monthly headcount snapshots
Related Documentation
- Array Ordering - How arrays are sorted
- Enum Values - All possible enum values