Data Pipelines API

The Data Pipelines API contains a list of endpoints that are supported by Mixpanel that help you create and manage your data pipelines.

A pipeline is an end to end unit that is created to export Mixpanel data and move it into a data warehouse.

📘

Trial Version

The Data Warehouse Export API offers a one-time trial. You can call and schedule a trial export by passing trial = true when creating a pipeline. The trial export will automatically stop after 30 calendar days.

Data Pipeline Types

Mixpanel currently supports a data warehouse export pipeline and a raw data pipeline. When you create a pipeline, the type parameter determines whether a data warehouse export pipeline or a raw data pipeline is created.

Export to Data Warehouse

The data warehouse export pipeline is a fully managed pipeline that includes transformations and scheduling. Visit the data warehouse export documentation for more information.

Raw Export Pipeline

The raw export pipeline is a scheduled export that moves your unaltered Mixpanel data to a blob storage destination. Visit the raw export pipeline documentation for more information.

Configure the Destination to Receive Mixpanel Data

Before exporting data from Mixpanel you must configure your data warehouse to accept the data.

For additional information on configuring the Mixpanel export for each type of data warehouse, see:

Authentication

To ensure the security of your data, the Mixpanel API requires a basic system of authentication.

Required Parameter

api_secret - This can be found by clicking on the settings gear in the upper righthand corner and selecting Project Settings.

Authorization Steps

The Data Export API accepts basic access authentication over HTTPS as an authorization method. To make an authorized request, put your project's API Secret in the "username" field of the basic access authentication header. Make sure you use HTTPS and not HTTP - our API rejects requests made over HTTP, since this sends your API Secret over the internet in plain text.

Create a Pipeline


**Request Type: **POST

This request creates the export pipeline. The type parameter defines the kind of pipeline that is initiated. The following data warehouse types are supported:

  1. bigquery Mixpanel exports events and/or user data into Google BigQuery.

  2. aws This options creates the S3 data export and glue schema pipeline. Mixpanel exports events and/or user data as JSON packets. Mixpanel also creates schema for the exported data in AWS Glue. Customers can use AWS Glue to query the exported data using AWS Athena or AWS Redshift Spectrum.

  3. snowflake This option creates the Snowflake export pipeline. Mixpanel exports events and/or user data into Snowflake.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/create

Headers:

Content Type
application/x-www-form-urlencoded

Parameters:

ParameterTypeDescription
typearray of strings
required
Data Warehouse Export:
Type parameters include bigquery, snowflake, azure-blob and aws.

Raw Export Pipeline: Type parameters include s3-raw, gcs-raw and azure-raw. Initializes s3-raw, gcs-raw, or azure-raw pipelines accordingly.
trialboolean
optional
Default: false. A trial pipeline will be created if value is true.

The trial exports all of your events and user data for thirty calendar days, starting from one day before the API call was made. A trial pipeline has default values for the following parameters:

data_source: events and people
sync: false
from_date: <defaults to previous day>
to_date: <no value>
frequency: daily
events: <no value>
schema_typestring
optional
Default: monoschema. Allowed options are monoschema and multischema. monoschema loads all events into a single table. multischema loads every event into its own dedicated table. All user data is exported as monoschema.
data_sourcestring
optional
Default: events. data_source can be either events or people. events exports Mixpanel event data. people exports Mixpanel user data.
syncboolean
optional
Default: false. A value of true updates exported data with any changes that occur in your Mixpanel dataset. These changes include deletions, late data, and imports that fall into your export window.
from_datestring
required
The starting date of the export window. It is formatted as YYYY-MM-DD.
to_datestring
optional
The ending date of the export window. It is formatted as YYYY-MM-DD. The export will continue indefinitely if to_date is empty.
frequencystring
optional
Default: daily. frequency can be either hourly or daily. hourly exports the data every hour. daily exports the data at midnight (based on the projects timezone). frequency should only be passed if your export window is indefinite.
eventsstring
optional
A whitelist for the event you intend to export. It is okay to pass this multiple times to whitelist multiple events.

All events in the project will be exported if no events are specified.
wherestring
optional
A selector expression used to filter by events data, such as event properties. Learn more about how to construct event selector expressions here. This parameter is only valid when data_source is events.
data_formatstring
optional
Default: json. The file format of the exported data. data_format can be either json or parquet.

Return: Create API returns the name of the pipeline created. Use the name of the pipeline to check the status of or cancel the pipeline.

For BigQuery pipelines, the request returns the BigQuery dataset name and URL. Use this URL to access the BigQuery dataset.

Mixpanel creates the dataset within its own Google Cloud Platform project. The service shares a read only view of the dataset created with the user/group provided to the API endpoint.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery-monoschema",
      "trial-people-daily-bigquery-monoschema"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional BigQuery Parameters

📘

Mixpanel creates a dataset in its own BigQuery instance and gives "View" access to the account(s) provided at the time of creating the pipeline.

The following parameters are specific to BigQuery exports.

ParameterTypeDescription
bq_regionstring
required
Default: US.

The following regions are supported for BigQuery:
US
US_EAST_1
US_WEST_2
US_EAST_4
NORTH_AMERICA_NORTHEAST_1
SOUTH_AMERICA_EAST_1
EU
EUROPE_NORTH_1
EUROPE_WEST_2
EUROPE_WEST_3
EUROPE_WEST_6
ASIA_SOUTH_1
ASIA_EAST_1
ASIA_EAST_2
ASIA_NORTHEAST_1
ASIA_NORTHEAST_2
ASIA_NORTHEAST_3
ASIA_SOUTHEAST_1
AUSTRALIA_SOUTHEAST_1
bq_share_with_group string
required
Group account email address to share the data-set with.

Example Request

#Replace API_SECRET with your project's API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="bigquery" \
-d bq_region="US_EAST_4" \
-d trial=true \
-d bq_share_with_group="[email protected]" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"

Example Response

Use the URL that returns as the bigquery_dataset_name to access the BigQuery dataset.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery-monoschema",
      "trial-people-daily-bigquery-monoschema"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional Snowflake Parameters

The following parameters are specific to Snowflake exports.

ParameterTypeDescription
snowflake_share_withstring
required
Name of the account with which the data-set should be shared
regionstring
required
The valid region for the Snowflake instance:
us-west-aws
us-east-aws

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="snowflake" \
-d region="us-west-aws" \
-d trial=true \
-d snowflake_share_with="mysnowflakeaccountname" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"

Additional AWS S3 and Glue Parameters

ParameterTypeDescription
s3_bucketstring
required
s3 bucket to which the data needs to be exported.
s3_regionstring
required
The valid s3 region for the bucket.

The following regions are supported for AWS S3:
us-east-2
us-east-1
us-west-1
us-west-2
ap-south-1
ap-northeast-3
ap-northeast-2
ap-southeast-1
ap-southeast-2
ap-northeast-1
ca-central-1
cn-north-1
cn-northwest-1
eu-central-1
eu-west-1
eu-west-2
eu-west-3
eu-north-1
sa-east-1
s3_rolestring
required
There is no default value. AWS Role the writer should assume when writing to s3.
s3_prefix string
optional
There is no default value. The path prefix for the export.
s3_encryptionstring
optional
Default: none. Options are none, aes and kms. At rest encryption used by the s3 bucket.
s3_kms_key_idstring
optional
There is no default value. If s3_encryption is set to kms, this can specify the custom key id you desire to use.
use_glueboolean
optional
Default: false, Use glue schema export.
glue_databasestring
conditionally required
The glue database to which the schema needs to be exported. Required if use_glue is true.
glue_rolestring
conditionally required
There is no default value. The role that needs to be assumed for updating glue. Required if use_glue is true.
glue_table_prefixstring
optional
There is no default value. Prefix to add to table names when creating them.

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="aws" \
-d trial=true \
-d s3_bucket="example-s3-bucket" \
-d s3_region="us-east-1" \
-d s3_prefix="example_custom_prefix" \
-d s3_role="arn:aws:iam::<account-id>:role/example-s3-role" \
-d use_glue=true \
-d glue_database="example-glue-db" \
-d glue_role="arn:aws:iam::<account-id>:role/example-glue-role" \
-d glue_table_prefix="example_table_prefix" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"

Additional Azure Parameters

The following parameters are specific to Azure Blob Storage, Azure Data Lake, and Azure Raw exports.

ParameterTypeDescription
storage_accountstring
required
Blob Storage Account where the data will be exported.
container_namestring
required
The Blob Container within the account where data will be exported.
prefixstring
optional
A custom prefix for all the data being exported to the container.
client_idstring
required
clientId from the Service Principal credentials.
client_secretstring
required
clientSecret from the Service Principal credentials.
tenant_idstring
required
tenantId from the Service Principal credentials. This is specific to the Active Directory instance where the Service Principal resides.

Example Request

curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="azure-blob" \
-d trial="true" \
-d data_format="parquet" \
-d storage_account="mystorageaccount" \
-d container_name="mixpanel-export" \
-d prefix="custom_prefix/for/data" \
-d schema_type="multischema" \
-d client_id="REDACTED" \
-d client_secret="REDACTED" \
-d tenant_id="REDACTED"
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase

Additional GCS Raw Scheduled Export Parameters

The following parameters are specific to raw scheduled exports to GCS blob storage.

ParameterTypeDescription
gcs_bucketString requiredThe GCS bucket to export the Mixpanel data to.
gcs_prefixString requiredThe GCS path prefix of the bucket.
gcs_regionString requiredThe GCS region for the bucket.
The following regions are supported for GCS:

"northamerica-northeast1"
"us-central1"
"us-east1"
"us-east4" "us-west1”
"us-west2"
"southamerica-east1"
"europe-north1"
"europe-west1"
"europe-west2"
"europe-west3"
"europe-west4"
"europe-west6"
"asia-east1"
"asia-east2"
"asia-northeast1”
"asia-northeast2"
"asia-northeast3"
"asia-south1"
"asia-southeast1" "australia-southeast1"

Cancel a Pipeline


**Request Type: **POST

For a given pipeline name, this request cancels the pipeline and stops any future jobs to be scheduled for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel

Headers:

Content Type
application/x-www-form-urlencoded

Parameters:

ParameterTypeDescription
namestring
required
The name that uniquely identifies the pipeline.

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel \
-u API_SECRET: \
-d name="sample_job_name"

Return: 200 OK indicates a successful cancellation. Any other message indicates failure of the cancellation.

Check the Status of a Pipeline


**Request Type: **POST

Given the name of the pipeline this API returns the status of the pipeline. It returns the summary and status of all the recent run export jobs for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/status

Headers:

Content Type
application/x-www-form-urlencoded

Parameters:

ParameterTypeDescription
namestring
required
The name that uniquely identifies the pipeline.
summarystring
optional
Default: false. Only lists task count by status and no details.
statusarray of strings
optional
Filters the tasks by the given status. Valid options for status are pending, running, retried, failed, canceled, and timed_out.

Example Request: Status with Summary

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d summary="true"

Example Return: Status With Summary

// with summary
{
"canceled": 933,
"retried": 80,
"succeeded": 1
}

Example Request: Status with no Summary and a Filter

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d status="running

Example Return: Status with no Summary and a Filter

//no summary.
{
"canceled": [
{
"name": "company-july-2016-backfill-hourly-monoschema",
"state": "canceled",
"last_finish": "0000-12-31T16:00:00-08:00",
"run_at": "2016-07-26T00:00:00-07:00",
"from_date": "2016-07-26T00:00:00-07:00",
"to_date": "2016-07-26T00:00:00-07:00"
},
{
"name": "company-july-2016-backfill-hourly-monoschema",
.
.

Get a List of Scheduled Pipelines


**Request Type: **GET

This API endpoint returns the list of all the pipelines scheduled for a project.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs \
-u API_SECRET:

Example Result

{
  "9876543210": [
    {
      "name": "events-daily-bigquery-monoschema",
      "Dispatcher": "backfill",
      "last_dispatched": "2019-02-01 12:00:00 US/Pacific",
      "frequency": "hourly",
      "sync_enabled": "true"
    }
  ]
}

Get a Timeline of All Previous Syncs


**Request Type: **GET

This endpoint returns the timestamps of all syncs grouped by date.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/timeline

Parameters:

ParameterTypeDescription
namestring
required
The name that uniquely identifies the pipeline.

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/timeline \
-u API_SECRET: \
-d name=”YOUR_PIPELINE_NAME”

Example Return:

{
  "day_syncs": [
    "date": "2019-08-19",
    "sync_times": [
      "2019-08-19 14:27:46.044605 -0700 PDT"
    ],
    "status": "synced"
  },
  {
    "date": "2019-08-20",
    "sync_times": [
      "2019-08-20 14:33:09.315098 -0700 PDT"
    ],
    "status": "synced"
  },
 ]
}

Updated 20 days ago

Data Pipelines API


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.