Activate Data with API-Managed Reverse ETL

December 10, 2024
Written by
Ravi Singh
Twilion
Nolan Chan
Twilion

Activate Data with API-Managed Reverse ETL

In today's data-driven world, APIs are the cornerstone of data sharing and accessibility. They offer a structured way for different applications and systems to communicate. APIs are key for teams to share or access data stored in databases.

Circular logo with mint green and white shapes, a red asterisk, and two white bars on a dark grid background.

At its core, ETL ( Extract, Transform, Load) is the process of collecting raw data from various sources, transforming it into a usable format, and loading it into a data warehouse for consumption. In 2015, Segment launched its Data Warehouse Destination product, which allowed businesses to seamlessly send data to their data warehouses.

With advancements in Data Warehouse capabilities, there is a trend to access data directly from the Warehouse. And as the data landscape continued to evolve, a new paradigm emerged: Reverse ETL.

Diagram showing two databases connected to a process with four smiley faces indicating output.

Embracing this shift, Segment announced its Reverse ETL offering in 2022, marking a significant step in its journey to democratize access to customer data and address growing demand for more dynamic and actionable data workflows.

Understanding Reverse ETL

While traditional ETL pipelines bring data into a data warehouse from various sources, RETL does the opposite. It takes transformed data from a warehouse and sends it back to operational systems, such as CRMs and marketing platforms.

This enables you to use customers' valuable data to drive better decision-making, optimize processes, and deliver a personalized customer experience across marketing, sales, and other critical touchpoints.

Flowchart showing the customer data pipeline process from application DB to various customer apps.

The Friction of UI-Based RETL

Many existing RETL solutions rely heavily on user interfaces (UIs) for configuration and scheduling. This approach, while user-friendly for business users, can create friction for developers who prioritize automation and streamlined workflows that can be version controlled. UI-centric manual processes hamper their ability to iterate quickly at scale with confidence when managing data activation pipelines. This is where an API-Managed Reverse ETL solution comes into play.

API-Managed Reverse ETL: A Programmatic approach

Increasingly, data activation capabilities can be managed via API. With the launch of new Segment Public APIs for Reverse ETL, data teams can configure and run their Reverse ETL workflows programmatically via powerful APIs. This unlocks a number of valuable use cases.

Key Use Cases for API-Managed RETL

  • DevOps for Data: API-Managed integrates seamlessly into existing CI/CD (continuous integration/continuous deployment) pipelines, enabling data infrastructure to be treated like any other software codebase.
  • Custom Workflows: Build highly tailored data synchronization scenarios that push data into systems where pre-configured connectors might not exist.
  • Advanced Automation: Automate complex tasks by triggering data syncs based on events or changes in your data warehouse. For example, update your marketing automation tool when a lead becomes qualified.
  • Versioning and Approvals: Reduce the risk of a mistake by persisting configuration changes in modern code control systems like GitHub, leveraging these collaborative tools to enforce change control.
  • Test-to-Production: Managing data pipelines in code means they can be easily deployed and iterated on in a test environment before “promoting” to the main business impacting workflows.
Diagram showing data input, processing unit, and output leading to different results.

How can Twilio Segment help?

Segment provides extensive public APIs for automating workflows and activating data through Reverse ETL, as well as managing syncs and integrating with dbt and Git for streamlined data management.

Public APIs

Segment offers a wide range of public APIs for Reverse ETL which you can use to automate your workflow and activate your data from its natural habitat i.e. warehouse. Steps to set up Reverse ETL using Segment Public APIs:

1. Fetch Reverse ETL Source settings

Retrieve your desired Source’s metadataId from catalog/sources as well as understand the settings for the source ( link).

metadataId is a unique identifier used to reference and manage metadata for Source (Snowflake in this case).

curl -i -X GET \
  'https://api.segmentapis.com/catalog/sources?count=0&cursor=string' \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>'

Below is a sample from the response which will give metadataId for the source which will be used to create Source:

{
"id": "BQp3zGUgjV",
	"name": "Snowflake",
	"categories": ["Warehouse"],
	"description": "Warehouse Source for Reverse ETL"
}

2. Create a new Source for your Reverse ETL Job

Once we have the Source metadata, the next step is to create a new source which is typically a Warehouse in case of Reverse ETL. ( Public API Link), and save the sourceID from the response for later.

curl -i -X POST \
  https://api.segmentapis.com/sources \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
  -H 'Content-Type: application/json' \
  -d '{
    "slug": "my-snowflake-source",
    "name": "My Snowflake Source",
    "enabled": true,
    "metadataId": "BQp3zGUgjV",
    "settings": {
		"account" : "segment",
		"database": "segment_reverse_etl",
		"warehouse": "engineering",
		"username": "segment_reverse_etl_user",
		"password": <REDACTED>
	}
  }

3. Create a new Reverse ETL Model

After Source creation, the next step is to write a SQL query model which will be used to extract data to activate from the Warehouse.

curl -i -X POST \
  https://api.segmentapis.com/reverse-etl-models \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
  -H 'Content-Type: application/vnd.segment.v1alpha+json' \
  -d '{
    "sourceId": "c4pVfZGYjzramEVbh4hMZ6",
    "name": "Sample Customer Data Model",
    "description": "This model extracts customer details from sample table.",
    "enabled": true,
    "query": "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 500",
    "queryIdentifierColumn": "C_CUSTKEY",
    "scheduleStrategy": "MANUAL",
    "scheduleConfig": {}
  }
  • Get the sourceId From the POST /sources response in step 2 or from /sources (Public API)
  • scheduleStrategy is set to “MANUAL” as we will be triggering a sync programmatically (see step 6)
// sourceId Request
curl -i -X GET \
  'https://api.segmentapis.com/sources?count=0&cursor=string' \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>'

4. Create Destination

This creates a new Destination where the data from your warehouse is sent to be activated. In the example below, we are creating a webhook destination for which metadataId is known beforehand.

curl -i -X POST \
  https://api.segmentapis.com/destinations \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
  -H 'Content-Type: application/json' \
  -d '{
    "sourceId": "rh5BDZp6QDHvXFCkibm1pR",
    "metadataId": "54521fd525e721e32a72ee91",
    "name": "my destination v1",
    "settings": {
      "apiKey": "3cb9b589437d3904f19b2b791c2cdada",
      "retarget": true
    }
  }'

5. Create Destination Subscription

Once the Destination is created, we then create a Destination Subscription which will send the data extracted via ReverseETL Model to this subscription

curl -i -X POST \
'https://api.segmentapis.com/destinations/{destinationId}/subscriptions' \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
  -H 'Content-Type: application/vnd.segment.v1alpha+json' \
  -d '{
    "name": "Send to webhook",
    "destinationId": "66dba4dfc177d9a9fbdec1b4",
    "actionId": "nFPnRozhz1mh4Gbx4MLvT5",
    "modelId": "rKNgRdfMFeWANFG9K6x3P9",
    "trigger": "event = \"new\"",
    "settings": {
  	  "url": "https://httpstat.us/200",
	  "method": "POST"
    },
    "enabled": true
  }'
  • When creating a Reverse ETL connection, modelIdindicates the Model being used to extract data. 
  • Retrieve destination’s destinationId & actionIdfrom /catalog/destinations ( PAPI link). 
  • There is an actions array inside the object that lists all actions associated with the destination. You can get the actionId from there.
// destinationId & actionId Request
curl -i -X GET \
  'https://api.segmentapis.com/catalog/destinations?count=0&cursor=string' \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>'

6. Trigger Sync for a Reverse ETL Connection

After the set up is done, you can trigger a Reverse ETL sync depending upon your requirements. e.g., a sync is triggered when a load to the warehouse is complete.

curl -i -X POST \
  https://api.segmentapis.com/reverse-etl-syncs \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
  -H 'Content-Type: application/vnd.segment.v1alpha+json' \
  -d '{
    "sourceId": "sourceId",
    "modelId": "modelId",
    "subscriptionId": "subscriptionId"
  }'

7. List Reverse ETL Sync Statuses

curl -i -X GET \
'https://api.segmentapis.com/reverse-etl-models/{modelId}/subscriptionId/{subscriptionId}/syncs?count={number}&cursor={string}' \
  -H 'Authorization: Bearer <YOUR_TOKEN_HERE>'
{
  "data": {
    "syncStatuses": [
      {
        "sourceId": "rh5BDZp6QDHvXFCkibm1pR",
        "modelId": "rKNgRdfMFeWANFG9K6x3P9",
        "syncId": "1",
        "syncStatus": "SUCCESS",
        "duration": "21.481s",
        "error": "",
        "errorCode": "",
        "startedAt": "2024-01-25T18:58:10.16064Z",
        "finishedAt": "2024-01-25T18:58:31.64138Z",
        "extractPhase": {
          "addedCount": "1000",
          "updatedCount": "0",
          "deletedCount": "0",
          "unmodifiedCount": "0",
          "extractCount": "1000",
          "errorCode": "",
          "startedAt": "2024-01-25T18:58:10.31623Z",
          "finishedAt": "2024-01-25T18:58:20.692479Z"
        },
        "loadPhase": {
          "deliverSuccessCount": "1000",
          "deliverFailureCount": "0",
          "errorCode": "",
          "startedAt": "2024-01-25T18:58:20.84456Z",
          "finishedAt": "2024-01-25T18:58:31.64138Z"
        }
      }
    ],
    "pagination": {
      "current": "2024-01-25T18:58:20.84456Z",
      "next": "2024-01-25T19:58:20.84456Z"
    }
  }
}

dbt Extension

With Segment’s dbt extension, you can:

  • Securely connect Segment to a Git repository that stores your dbt models.
  • Use centralized dbt models to set up Reverse ETL.
  • Trigger Reverse ETL syncs from dbt jobs.

Manage Segment Workspace with Terraform & Git Sync

Segment’s Git extension lets you manage versioning by syncing changes you make to various resources in your Segment workspace including Reverse ETL to a Git repository.

With Segment Terraform, setting up and managing Segment Reverse ETL, along with CDP as code is easier and more scalable than ever.

Wrapping up

In conclusion, API-Managed Reverse ETL offers a flexible, programmatic solution for data activation, ideal for developer teams seeking seamless integration with CI/CD pipelines and custom workflows. Unlike UI-based RETL, it provides greater control, automation, and scalability, making it perfect for optimizing data processes. Segment’s public APIs, dbt, and Git extensions further enhance the efficiency of managing and activating data, empowering businesses to make data-driven decisions and improve customer experiences more effectively.

Need more help?

Reach out to friends@segment.com for to learn how you can enable API-Managed Reverse ETL for your organization.