Skip to main content
How data is delivered and what formats are available.

Delivery Methods

Cloud Storage

Delivery to your cloud storage bucket:
  • Amazon S3: Direct delivery to your S3 bucket
  • Google Cloud Storage: Delivery to your Google Cloud Storage bucket
  • Azure Blob Storage: Delivery to your Azure Blob Storage container

Secure File Transfer (SFTP)

Traditional SFTP delivery to your server:
  • Secure, encrypted transfer
  • Automated delivery scheduling
  • Compatible with all standard SFTP clients
We’ll work with you to set up the delivery method that works best for your infrastructure.

File Formats

CSV

Comma-separated values format:
  • Easy to import into spreadsheets and databases
  • Flat structure (nested data flattened)
  • Compatible with Excel, Google Sheets, and all major databases
Best for: Quick analysis, spreadsheet tools, simple database imports

JSON

Structured JSON format:
  • Preserves nested object structure
  • Arrays maintained as arrays
  • Human-readable format
Best for: Applications, APIs, structured data processing

Parquet

Columnar Parquet format:
  • Optimized for analytics and data warehouses
  • Efficient compression
  • Schema embedded in file
Best for: Data warehouses (Snowflake, BigQuery, Redshift), analytics platforms, large-scale processing

Delta

Delta Lake format:
  • Schema enforcement
  • Built on Parquet
  • Native support in Databricks, Spark, and other lakehouse platforms
Best for: Databricks, lakehouse architectures, teams already using Delta Lake

Update Frequency

Monthly Refreshes

Each month you receive a complete, full-file delivery of all records matching your criteria. There are no incremental or diff files. Every delivery is a full snapshot of the dataset.

Data Organization

Versioning

Each delivery includes:
  • build_version field in each record (schema version)

Integration Examples

Loading into Database

PostgreSQL:
COPY data_legion FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
BigQuery:
LOAD DATA INTO `project.dataset.data_legion`
FROM FILES (format='PARQUET', uris=['gs://bucket/data-legion-*.parquet']);

Processing with Python

CSV:
import pandas as pd
df = pd.read_csv('data-legion.csv')
JSON:
import json
with open('data-legion.jsonl') as f:
    for line in f:
        record = json.loads(line)
Parquet:
import pandas as pd
df = pd.read_parquet('data-legion.parquet')
Delta:
from deltalake import DeltaTable
dt = DeltaTable('data-legion-delta/')
df = dt.to_pandas()