Export to Data Warehouse

Data Warehouse Export Overview

Mixpanel enables you to export your Mixpanel data to three popular data warehouse solutions:

This documentation targets users with intermediate or advanced knowledge of databases; and specific knowledge of Redshift Spectrum, BigQuery, or Snowflake technology.

Note

Data Warehouse Export is currently available as a 30 day trial. You can create one data pipe during this period. Data pipelines will close when the trial period ends.

Mixpanel Data Warehouse Export Overview

Before exporting data from Mixpanel you must configure your data warehouse to accept the data.

For additional information on configuring the Mixpanel export for each type of data warehouse, see:

After configuring the data warehouse, you can use Mixpanel's data warehouse export API to export the data by creating a pipeline.

After configuring the warehouse and creating a pipeline with the API, you can begin to query Mixpanel data from the data warehouse.

Data Sources

Mixpanel can export event and people data.

The event data that is exported using the data warehouse export API is the same data exported from the Mixpanel data export API.

The user data that is exported using the data warehouse export API is the same data exported from the Mixpanel engage API.

Scheduling

Mixpanel supports hourly and daily exports.

When an export is started, a backfill can be scheduled so that historical data is also exported to the data warehouse.

Historical data will be exported one day at a time over the course of a few days when a pipeline is created.

The completion time for a backfill depends on the number of days and the amount of data in the project.

Note

Scheduling is daily as part of the trial.

People Data Support

People data is exported to a single table named mp_people_data.

Since people profiles are mutable, the data in the table is replaced every time an export happens based on schedule (daily or hourly).

Service Level Agreement

Mixpanel has the following policy for data latency:

  1. Mixpanel adds 24 hours of end to end latency between when the data is exported from Mixpanel until the data reaches the data warehouse.

  2. Mixpanel adds an additional 24 hours for data that reaches the pipeline late. Mixpanel defines late data as any data point or people profile update that reaches Mixpanel servers later than two hours after the end of export window.

Data Sync

Event data stored in Mixpanel’s datastore and event data in the data warehouse can fall out of sync.

The discrepancy can be be attributed to several different causes:

  • Late data can arrive multiple days later due to a mobile client being offline.
  • The import API can add data to previous days.
  • Delete requests related to GDPR can cause deletion of events and event properties.

Mixpanel is able to detect any changes in your data and replace the old data with the latest version. Data sync helps keep the data fresh, minimizes missing data points, and most importantly keeps your data warehouse GDPR compliant.

Note

Sync is not available as part of the trial.

Transformation Rules

Event data needs to be cleaned when some characters are not legal for table or column names, or when collisions occur in the dataset. This section provides the rules on how Mixpanel cleans data.

Table and Column Names

Mixpanel applies these rules for table and column names:

  • Special characters and whitespaces are replaced with _(underscore).
  • Letters are converted to lowercase.
  • Maximum name length is 128 characters. (The name is truncated after 128 characters.)
  • Properties that start with a $ (dollar sign) will have a prefix of mp_.
  • Properties that conflict with reserved keywords are prefixed with_res_.

Naming Conflicts

There are several naming transformations that occur based on character conflicts.

Capitalized letters are converted to lower case. Any duplicate strings will have a numeric value appended to them to differentiate.

For example if values “XY” and “Xy” are sent in:

  • Both are transformed to “xy”.
  • If “Xy” is sent in after "XY", it becomes “xy_1”.
  • Any subsequent “xy” values inherit incremental numeric values (i.e. xy_2, xy_3, etc.).

Type Conflicts

Mixpanel transforms values to resolve type conflicts.

If a property value is passed with a particular data type and is passed subsequently with a different data type, then the new data type appends to the property name.

For example, if “X” appears with type INT first, then subsequently appears as type STRING (or VARCHAR), then the property name will be transformed to "x_string" with a data type of string.

New Properties

New properties that were not present in previous imports will append to the old rows with a NULL value.

Trial Version

Mixpanel offers a 30-day trial version of the Data Warehouse Export feature. The trial allows for one data export pipeline per project to be created. The pipeline will close and no more data will be exported from Mixpanel after the trial period has expired.

Pass trial=true when creating a pipeline to create a trial export pipeline. As is the case with the non-trial version, you must first configure your data warehouse to receive the data.

Deleting the one pipeline for a project will not allow you to make a second pipeline for that same project.

Trial Limitations

A trial pipeline has several limitations when compared to the full feature. In the trial:

  • Export scheduling is daily only.
  • Data sync is unavailable.
  • You can only create one pipeline per project.
  • All exported schema will be exported as one table.
  • Data is only backfilled for the one day, being the previous day from when the pipeline is created.
  • Pipelines will, by default, include both event and people data.
  • The pipeline cannot filter by event name.
  • The “Create Pipeline” parameters will default to the values highlighted to list in the parameters table.

Export to Data Warehouse


Data Warehouse Export Overview

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.