Azure Blob Storage

Mixpanel’s Azure Blob Export is one type of Data Warehouse Export that exports Mixpanel events and people data and imports them into an Azure Blob storage instance.

Azure Export enables the use of SQL to query your event data, either for expressing new types of queries or for joining against other data already present in your data environment.

This Azure Blob export provides the following functionality:

  • Schedule hourly or daily exports
  • Export, parse, and extract schemas from the data on schedule
  • Clean and transform data into JSON or parquet formats
  • Provide a backwards compatible AVRO schema

Integration Set Up

In order to export data to Azure Blob Storage, you must first give Mixpanel permission to write to Azure. Next you can create a pipeline to export the data. Mixpanel then uploads the data to Azure Blob Storage on a recurring basis. The data is a slightly transformed version of the raw data stored in Mixpanel (For example, some names will be updated if there are collisions).

Edit Mixpanel Permission

Since the Azure authentication mechanisms do not allow for cross-account access, you must provide Azure credentials to use with their Blob storage container.

To achieve this, Mixpanel uses Service Principals created in your Azure Active Directory deployment. To create a Service Principal, use the Azure command line tool and run the following command (with redacted output):

$ az ad sp create-for-rbac --sdk-auth          
{
  "clientId": "redacted",
  "clientSecret": "redacted",
  "subscriptionId": "redacted",
  "tenantId": "redacted",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

If the above step is successful, go to the Azure. Navigate to a blob storage container of your choice and assign the “Storage Blob Data Contributor” to the Service Principal from above.

Mixpanel needs the clientId, clientSecret, and tenantId to access the blob container as the Service Principal. You must provide the blob account name, container name, and (optional) path prefix you want to choose for the export.

You can share the Service Principal’s credentials over the TLS encrypted API at the time that you create your Mixpanel to Azure pipeline.

Schemas and Data Transformation

There are two schema options: all of your Mixpanel events in a single table, or a separate table per event, which can get around some limitations on the number of columns allowed in certain data warehouses.

We recommend putting all events into a single table for ease of querying.

All Events in a Single Table (Option 1)

If the single table option is selected, the resulting table is named mixpanel_events.

The table will have the following columns:

Column Name

Type

Description

mp_event_name

varchar(255)

The name of the event.

time

int

The timestamp representing when the event occurred.

distinct_id

varchar(255)

The unique ID of the user who triggered the event.

The table will also have one column per unique property name seen across all events in the history of the dataset. For example, if you have an event with a “purchase_amount” property and one with a “referrer” property, the table will have a “purchase_amount” column and a “referrer” column. Events that do not have the property will have a NULL value in that column.

One Table Per Event (Option 2)

If the multi-table option is selected, there will be one table created per unique event name.

Each table will be named after the event (following the process described below), so if you have an event called “signup” there will be a table named “signup”.

Each event table will always contain the following columns:

Column Name

Type

Description

time

int

The timestamp representing when the event occurred.

distinct_id

varchar(255)

The unique ID of the user who triggered the event.

The table for each event will also have one column per unique property name seen for that event across the entire history of the dataset. For example, if you have an event with a “purchase_amount” property and one with a “referrer” property, the table will have a “purchase_amount” column and a “referrer” column. Events that do not have the property will have a NULL value in that column.

Example Tables

The following is an example dataset that highlights differences between a one-table schema containing all events and a multi-table schema where each table contains one event.

Take the following raw Mixpanel event stream:

{
    “event”: “Signup”, 
    “properties”: {
        “plan”: “free”, 
        “browser”: “chrome”, 
        “distinct_id”: 1, 
        “time”: 123
    }
}

{
    “event”: “Purchase”, 
    “properties”: {
        “product_id”: “123”, 
        “browser”: “chrome”, 
        “distinct_id”: 1, 
        “time”: 124
    }
}

{
    “event”: “Signup”, 
    “properties”: {
        “plan”: “paid”, 
        “browser”: “firefox”, 
        “ab_test”: 1, 
        “distinct_id”: 2, 
        “time”: 125
    }
}

One Table For All Events

If you export all events to a single table, the table will include the following information :

mp_event_name

time

distinct_id

plan

browser

product_id

ab_test

“Signup”

123

1

"free"

"chrome"

“Purchase”

124

1

"chrome"

"123"

“Signup”

125

2

"paid"

"firefox"

1

One Table Per Event

If you export each event to one table then you will have multiple tables. Continuing with the example, you will have two tables, signup and purchase. Those tables will include the following information:

Table Name: signup

time

distinct_id

plan

browser

ab_test

123

1

"free"

"chrome"

125

2

"paid"

"firefox"

1

Table Name: purchase

time

distinct_id

browser

product_id

124

1

"chrome"

"123"

Transformation Rules

Because some characters used in event data may not be legal in table or column names, and because there may be collisions within the dataset, event data is cleaned using the following rules.

Updated 6 months ago


Azure Blob Storage


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.