Meiro Pipes Integration
Amplitude captures product behavior. BigQuery has the rest — CRM records, billing tiers, BQML model outputs. Pipes resolves identity across both and closes the loop in both directions.
Free trial · No credit card · Live in minutes
You're in Amplitude. Retention funnels, feature adoption, experiment results — all there. What's missing is commercial context: which of the users who dropped off were on a paid plan? Which power users are up for renewal next quarter?
That data is in BigQuery, joined from CRM and billing tables. Getting it into Amplitude as user properties means a reverse sync — which works until you realize the BigQuery records are keyed on email and Amplitude has those users tracked by device_id from their first anonymous session. The properties sync. They land on partial matches. Your experiment results look clean. Your targeting cohorts aren't.
Amplitude exports behavioral events to BigQuery natively. That leg works. The return leg — enriched BigQuery data back into Amplitude user properties — is where it breaks.
BigQuery stores Amplitude events with nested STRUCT fields and repeated records. When you build enrichment models on top of this, you're joining against nested structures. The reverse ETL connector flattens them back — and Amplitude's schema validation silently drops anything that doesn't match the expected property type or name.
Amplitude's identity model doesn't reach BigQuery. A user tracked as device_id: abc123 in Amplitude who exists in your CRM as email: [email protected] is two records in your warehouse. Your enrichment model joins on email. That user gets skipped. You don't find out until the cohort is 40% smaller than it should be and someone asks why.
The Real Problem
Amplitude's native BigQuery export handles the outbound leg. The gaps are in the return direction and in identity.
BigQuery stores Amplitude event exports with nested STRUCT fields — event properties live inside a repeated record, not as flat columns. Building enrichment models on top requires unpacking those nested structures before joining against CRM or product tables. When enriched records sync back into Amplitude, they need to be flat, correctly typed, and mapped to Amplitude's exact property schema. A FLOAT64 column syncing to an Amplitude integer property fails silently. A nested field that wasn't fully flattened gets dropped. BigQuery's per-byte billing means the change detection query — finding which enriched records changed since the last sync — is a real GCP cost if written without partition pruning.
Identity is the deeper problem. Amplitude builds its internal identity graph within Amplitude: anonymous device sessions merge into authenticated user records when a user logs in. That graph doesn't extend to BigQuery. CRM records are keyed on email. Product database records use account_id. Billing uses customer_id. A reverse ETL connector configured to match on user_id misses every warehouse record carrying a different identifier. Enriched properties land on a subset of the intended users — and Amplitude's UI gives no indication that the sync was incomplete.
Pipes resolves identity across device_id, user_id, email, account_id, and any other identifier before data moves. Nested field flattening and type validation against Amplitude's schema happen in the transform layer before the API call. When the sync runs, the right properties reach the right Amplitude user, and failures surface before ingestion rather than silently inside it.
Pipes connects to Amplitude via its export API and warehouse connector. Events are ingested on a scheduled or near-real-time basis — no replacement of your existing Amplitude SDK or tracking plan required.
Events land in your BigQuery warehouse automatically. Pipes connects directly — browse tables, map columns, model data. Your warehouse stays your source of truth.
Pipes stitches user profiles across Amplitude events and BigQuery records using deterministic matching on email, user_id, device_id, or any identifier you define. Configurable merge limits prevent false matches on shared devices. No probabilistic guesswork.
Enriched profiles and segments flow back into Amplitude via scheduled or real-time sync. Your growth team gets warehouse-enriched cohorts directly in the tool they already use — no reverse ETL vendor required.
Your data science team builds a churn propensity model using BQML. It combines Amplitude behavioral signals — feature adoption, session frequency, last active date — with commercial data from BigQuery: contract value, support ticket volume, renewal date proximity. The model writes a churn_risk_score per user back to a BigQuery table.
You want product and growth teams to filter Amplitude cohorts by churn_risk_score directly — without a SQL query on every request.
Without Pipes: you write a reverse ETL job that reads the BQML output table and calls Amplitude's Identify API. The model output carries email and account_id. Amplitude tracks users by device_id pre-login and user_id post-login. The join fails for anonymous users and multi-device users. Amplitude silently drops churn scores typed as FLOAT64 when the Amplitude property was created as an integer. The cohort your PM builds on "high churn risk + enterprise plan" misses 30% of the at-risk users.
With Pipes: the BQML output is a warehouse source. Pipes resolves email and account_id to the correct Amplitude user_id via the identity graph. FLOAT64 gets coerced to the right Amplitude property type in the transform layer before the API call. The churn_risk_score lands on the correct profile. Cohorts built on it are complete.
Extracting full value usually requires a dedicated analyst or someone with strong technical skills to manage schemas, plan taxonomies, and validate events.— Amplitude user review, G2
ETL tools often run into problems with the ever-changing nature of customer behavioral data, making this a sticking point where single source of truth initiatives break down.— Data engineering community, 2024
Connects to Amplitude via its export API and warehouse connector. Ingests events on a scheduled or near-real-time basis. Supports event filtering and transformation via Pipes sandbox functions. No replacement of your existing Amplitude SDK.
Direct BigQuery connection via service account credentials. Browse datasets, tables, and nested `STRUCT` columns. Map identifier columns to Meiro identity types. Handles nested and repeated field flattening natively — no manual `UNNEST` before sync. Uses partition pruning for change detection to avoid full-table scans and unnecessary GCP query costs.
Deterministic stitching across identifier types: email, user_id, device_id, cookie. Configurable merge limits (maxIdentifiers) and priority hierarchy prevent false merges. No probabilistic matching.
Scheduled exports or real-time Live Profile Sync. Push enriched profiles and audience segments back to Amplitude or any downstream destination via custom send functions.
Sandboxed JavaScript functions for event transformation, filtering, and enrichment. Run inline — no external orchestrator needed.
Deploy on your own infrastructure for full data sovereignty. Or use Meiro Cloud. Your data never leaves your perimeter unless you want it to.
Add Amplitude as a Source via its export API or warehouse connector. Events start landing in your pipeline.
Add your BigQuery credentials. Browse tables, map identifiers, start modeling.
Pipes stitches identity across both systems. Push enriched profiles back to Amplitude or anywhere in your stack.
Connect Amplitude and BigQuery through Pipes. Resolve identity across device, email, and account. Start free.