Delivery & Formats

How data is delivered and what formats are available.

Delivery Methods

Cloud Storage

Delivery to your cloud storage bucket:

Amazon S3: Direct delivery to your S3 bucket
Google Cloud Storage: Delivery to your Google Cloud Storage bucket
Azure Blob Storage: Delivery to your Azure Blob Storage container

Secure File Transfer (SFTP)

Traditional SFTP delivery to your server:

Secure, encrypted transfer
Automated delivery scheduling
Compatible with all standard SFTP clients

We’ll work with you to set up the delivery method that works best for your infrastructure.

File Formats

CSV

Comma-separated values format:

Easy to import into spreadsheets and databases
Flat structure (nested data flattened)
Compatible with Excel, Google Sheets, and all major databases

Best for: Quick analysis, spreadsheet tools, simple database imports

JSON

Structured JSON format:

Preserves nested object structure
Arrays maintained as arrays
Human-readable format

Best for: Applications, APIs, structured data processing

Parquet

Columnar Parquet format:

Optimized for analytics and data warehouses
Efficient compression
Schema embedded in file

Best for: Data warehouses (Snowflake, BigQuery, Redshift), analytics platforms, large-scale processing

Delta

Delta Lake format:

Schema enforcement
Built on Parquet
Native support in Databricks, Spark, and other lakehouse platforms

Best for: Databricks, lakehouse architectures, teams already using Delta Lake

Update Frequency

Monthly Refreshes

Each month you receive a complete, full-file delivery of all records matching your criteria. There are no incremental or diff files. Every delivery is a full snapshot of the dataset.

Data Organization

Versioning

Each delivery includes:

build_version field in each record (schema version)

Integration Examples

Loading into Database

PostgreSQL:

COPY data_legion FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);

BigQuery:

LOAD DATA INTO `project.dataset.data_legion`
FROM FILES (format='PARQUET', uris=['gs://bucket/data-legion-*.parquet']);

Processing with Python

CSV:

import pandas as pd
df = pd.read_csv('data-legion.csv')

JSON:

import json
with open('data-legion.jsonl') as f:
    for line in f:
        record = json.loads(line)

Parquet:

import pandas as pd
df = pd.read_parquet('data-legion.parquet')

Delta:

from deltalake import DeltaTable
dt = DeltaTable('data-legion-delta/')
df = dt.to_pandas()

Quickstart - Get started guide
Person Schema - Person data structure
Company Schema - Company data structure
Get Sample Data - Get a sample dataset

Getting Started

Person Data

Company Data

Data Quality

Data Standardization

SDKs

Integrations

Delivery Methods

Cloud Storage

Secure File Transfer (SFTP)

File Formats

CSV

JSON

Parquet

Delta

Update Frequency

Monthly Refreshes

Data Organization

Versioning

Integration Examples

Loading into Database

Processing with Python

​Delivery Methods

​Cloud Storage

​Secure File Transfer (SFTP)

​File Formats

​CSV

​JSON

​Parquet

​Delta

​Update Frequency

​Monthly Refreshes

​Data Organization

​Versioning

​Integration Examples

​Loading into Database

​Processing with Python

​Related Documentation

Delivery Methods

Cloud Storage

Secure File Transfer (SFTP)

File Formats

CSV

JSON

Parquet

Delta

Update Frequency

Monthly Refreshes

Data Organization

Versioning

Integration Examples

Loading into Database

Processing with Python

Related Documentation