BigQuery + Google Ads data pipeline 2026: build your reporting warehouse

Every serious Google Ads operation eventually outgrows the platform UI and exported spreadsheets. The reporting questions that matter most — what is my true cost per closed-won deal, which campaigns drive sessions that convert 60 days later, how do Google, GA4, and the CRM reconcile — cannot be answered inside Google Ads because the data lives in three different systems. A BigQuery reporting warehouse is how analytics teams solve this in 2026, and Google has made the on-ramp dramatically easier with the managed Data Transfer Service.

This is a hands-on tutorial for analysts and engineers. We will build the pipeline end to end: provision the Cloud project, configure the Data Transfer Service, enable the GA4 export, design a layered schema, join CRM revenue on GCLID, schedule the daily refresh, put Looker Studio on top, and keep BigQuery costs under control. We assume SQL fluency and basic Google Cloud familiarity. If you are coming from a tracking-first foundation, our GA4 setup and conversion import guide is the right prerequisite.

The pipeline is cheap; the discipline is everything :

The single biggest mistake teams make is letting reporting tools query raw, unpartitioned tables. BigQuery bills on bytes scanned, so one badly built Looker Studio dashboard refreshing every few minutes against a full-history table can quietly burn hundreds of euros a month. The Data Transfer Service is free, storage is nearly free at this scale, and query processing is cheap — if you partition by date, cluster sensibly, and force every downstream report to read curated tables. Architecture discipline, not Google Cloud pricing, determines whether this pipeline costs €40 or €800 a month.

Why build a BigQuery pipeline for Google Ads

The Google Ads UI is excellent for managing campaigns and weak for cross-system analysis. Three structural limits push teams toward a warehouse.

First, the revenue gap. Google Ads knows what it spent and how many conversions it counted, but it does not know your closed-won pipeline, your sales-cycle length, or your refund rate. A lead that Google counts as a €0 conversion event might become a €40,000 enterprise deal nine months later. Without joining CRM revenue back to the click, you optimize toward conversion volume instead of profit — the most common and most expensive reporting failure in B2B PPC.

Second, the cross-channel blend. Modern acquisition runs across Google, Meta, LinkedIn, and more. Each platform reports in its own walled garden with its own attribution. A warehouse is the only place to build a single blended view where spend and outcomes from every channel sit in the same table with consistent definitions. Our cross-channel attribution guide covers the methodology; BigQuery is where you implement it.

Third, the analysis ceiling. Incrementality testing, customer lifetime value modeling, cohort retention, and marketing mix modeling all require event-level and historical data that the UI simply does not expose. A warehouse with two to three years of clean history is the substrate every advanced measurement technique sits on. Google's own open-source MMM framework, covered in our Meridian guide, reads directly from BigQuery.

The economics are favorable. The Data Transfer Service is free. BigQuery's first 10 GB of storage and first 1 TB of query processing each month are free, and beyond that storage runs about €0.02 per GB per month and queries about €5-6 per TB scanned. A well-built single-account warehouse rarely exceeds €150/month and frequently runs under €50. Against the cost of misallocated ad spend, that is a rounding error. The investment is engineering time, not cloud bills.

The payoff is durable. Once the pipeline exists, every new question becomes a SQL query instead of a manual export-and-pivot exercise. Reports refresh themselves. New channels plug into the same schema. And the data foundation compounds: the longer the warehouse runs, the more valuable its history becomes for modeling.

Architecture overview and the three data sources

The reference architecture is a layered warehouse fed by three sources, transformed by scheduled SQL, and surfaced in Looker Studio.

The three data sources:

The layered dataset convention keeps the warehouse maintainable and cost-efficient:

raw_google_ads — untouched DTS exports. Never queried directly by reports. Treat as immutable landing zone.
raw GA4 export — the events_ tables GA4 lands in their own dataset, partitioned by event date.
staging — cleaned, standardized views and tables. Micros cast to currency, columns renamed, refresh-window duplicates removed, date spine and account mapping joined.
reporting — curated, denormalized tables purpose-built for dashboards. This is the only layer Looker Studio touches.

This separation matters for three reasons: it isolates raw data so a transformation bug never corrupts the source, it concentrates expensive joins into scheduled builds rather than per-dashboard-refresh, and it gives you a clean permissions boundary — analysts get read access to reporting only.

The orchestration is deliberately simple. The Data Transfer Service and GA4 export run on Google's schedule, landing fresh data each morning. A few hours later, BigQuery scheduled queries rebuild the staging and reporting tables. Looker Studio reads the result. No external orchestrator, no servers, no containers. You can graduate to dbt and Cloud Composer later, but you do not need them to start, and adding them prematurely is over-engineering.

A note on regions: pick one BigQuery location (EU multi-region for European data residency) and use it for every dataset. Cross-region queries are not allowed, so a location mismatch between your DTS dataset and your CRM dataset will break joins. Decide once, up front.

Setting up Google Ads Data Transfer Service

The Data Transfer Service (DTS) is the foundation, and it is genuinely a few clicks of configuration plus a wait for the backfill.

Prerequisites:

A Google Cloud project with billing enabled
The BigQuery API and BigQuery Data Transfer API enabled
A Google account with read access to the target Google Ads account (or MCC)
The Google Ads account's customer ID (10 digits, no dashes)

Configuration steps:

In the BigQuery console, open Data transfers and click Create transfer.
Choose Google Ads as the source.
Name the transfer, set the schedule to daily, and pick a run time in the early morning.
Set the destination dataset to raw_google_ads.
Enter the Customer ID — use the MCC ID to pull all child accounts in one transfer.
Configure the refresh window (the trailing number of days DTS re-pulls each run) to capture conversion backfill and late attribution.
Authenticate with an account that has the necessary Google Ads access and save.

DTS will create a set of tables in raw_google_ads — campaign, ad group, ad group criteria (keywords), conversions, and more — each suffixed by date or partitioned, depending on the table. The naming follows Google's documented schema; keep that documentation open as a reference because column names are verbose and stat tables separate metrics from attributes.

Run a backfill immediately. A fresh transfer only pulls forward from today. To get history, trigger a manual backfill for the last 12 months (or as far back as you need and the account supports). Backfills run as a series of daily jobs and can take a while for long ranges, but they only need to run once.

Validate the first load. After the initial run completes, sanity-check it: sum cost (in micros, then divide by 1,000,000) for a recent week and compare against the Google Ads UI for the same window. Small discrepancies are normal because of the attribution refresh window and time-zone boundaries, but the totals should be close. If they are wildly off, check that you selected the right customer ID and time zone.

MCC considerations. An MCC transfer tags every row with its customer ID. This is powerful for agencies but multiplies data volume. With many accounts, partitioning by date and clustering by customer ID in your staging layer stops being a nice-to-have and becomes the difference between a €40 and a €400 monthly bill. Plan for it from the first staging table.

Designing the BigQuery schema and staging layer

Raw DTS tables are accurate but awkward: costs in micros, cryptic column names, separate stats and attribute tables, and refresh-window overlap. The staging layer fixes all of this once so downstream queries stay clean.

Core staging transformations:

-- staging.campaign_performance_daily
CREATE OR REPLACE TABLE staging.campaign_performance_daily
PARTITION BY date
CLUSTER BY customer_id, campaign_id AS
SELECT
  _DATA_DATE                         AS date,
  customer_id,
  campaign_id,
  campaign_name,
  metrics_cost_micros / 1000000      AS cost,
  metrics_impressions                AS impressions,
  metrics_clicks                     AS clicks,
  metrics_conversions                AS conversions,
  metrics_conversions_value          AS conversion_value
FROM `raw_google_ads.ads_CampaignBasicStats_*` s
JOIN `raw_google_ads.ads_Campaign_*` c USING (campaign_id)
WHERE _DATA_DATE = c._DATA_DATE;

The specifics of table names vary with the DTS schema version, but the pattern holds: cast micros, rename to readable columns, join stats to attributes, partition by date, cluster by the columns you filter on most.

Deduplicating the refresh window. Because DTS re-pulls trailing days, the same date can appear in multiple snapshot loads. Select the latest snapshot per logical date using a window function or by reading the partition that DTS marks current. Skipping this step double-counts spend in your most recent days — a subtle bug that erodes trust in the warehouse.

Supporting tables you will need:

Partitioning and clustering are not optional. Partition every materialized staging and reporting table by date. Cluster by customer_id and campaign_id (or whatever your reports filter on). This is the lever that controls cost: a query for last month against a date-partitioned table scans only last month's bytes, not three years of history. The difference is often 30x on a mature warehouse.

Views versus tables. For light transformations, views are fine and incur no storage cost — but they re-scan source data on every read. For anything queried repeatedly by dashboards, materialize a partitioned table via scheduled query. The rule of thumb: cheap transformation read once equals view; expensive transformation read many times equals materialized table.

Keep the staging layer boring and deterministic. It should do exactly one job — turn raw DTS and GA4 exports into clean, well-typed, partitioned building blocks — so the reporting layer can focus on the interesting joins.

Joining GA4, Google Ads, and CRM data

This is where the warehouse earns its keep. Three sources, joined correctly, answer questions no single platform can.

The GCLID is the spine of revenue attribution. When someone clicks a Google ad with auto-tagging on, Google appends a GCLID to the landing URL. Capture it in a hidden form field, persist it to your CRM against the lead, and it becomes the join key between Google Ads clicks and closed-won revenue.

-- reporting.campaign_to_revenue
SELECT
  ga.date,
  ga.campaign_name,
  ga.cost,
  ga.conversions,
  COUNT(DISTINCT crm.deal_id)        AS deals_closed,
  SUM(crm.closed_won_amount)         AS crm_revenue,
  SAFE_DIVIDE(ga.cost, SUM(crm.closed_won_amount)) AS cost_to_revenue_ratio
FROM staging.click_with_gclid ga
LEFT JOIN crm.deals crm
  ON ga.gclid = crm.gclid
GROUP BY ga.date, ga.campaign_name, ga.cost, ga.conversions;

The LEFT JOIN is deliberate: unmatched spend must stay visible. If you inner-join, every click without a matched deal silently vanishes, and your cost-per-revenue math flatters itself. Keep all spend in the result and treat the match rate as its own health metric.

Joining GA4 for behavioral depth. The GA4 export lands event-level rows with nested event_params. Unnest them to recover the campaign, session quality, and landing page, then aggregate to the grain you need.

-- staging.ga4_sessions (campaign and landing page recovered from event_params)
SELECT
  PARSE_DATE('%Y%m%d', event_date) AS date,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign,
  COUNTIF(event_name = 'session_start') AS sessions,
  COUNTIF(event_name = 'sign_up')       AS signups
FROM `analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20261231'
GROUP BY date, campaign;

Always constrain the GA4 _TABLE_SUFFIX to a date range — the events tables are large and an unbounded wildcard scan is the classic GA4 cost trap.

Loading the CRM. Get deals into BigQuery with whatever fits your stack: a managed connector like Fivetran, a scheduled Cloud Function hitting your CRM's API, or a nightly CSV load. The minimum columns are deal ID, GCLID, closed-won amount, close date, and stage. Refresh daily so revenue stays current.

When GCLID coverage is poor, match rates suffer and revenue looks understated. The upstream fixes are enhanced conversions and offline conversion import, both of which improve how reliably clicks tie back to outcomes — see our enhanced conversions setup guide. As a fallback in the warehouse, a UTM-based join recovers some unmatched rows, but treat it as a lower-confidence supplement, not a replacement for GCLID.

The output of this section is a single unified table carrying spend, Google conversions, GA4 engagement, and CRM revenue side by side — the table every meaningful PPC report is built on.

Scheduled queries and the daily refresh

With staging and reporting logic written, automate the daily build so the warehouse maintains itself.

Native scheduled queries are the right starting tool. BigQuery lets you schedule any SQL statement on a cron-like cadence at no extra cost beyond the query processing it consumes. Schedule the staging build first, then the reporting build, both timed a few hours after DTS and the GA4 export typically complete in the morning. That ordering ensures each layer reads fresh upstream data.

Use the right write semantics:

For most PPC reporting, rebuilding the trailing N partitions with WRITE_TRUNCATE scoped to those partitions is the simplest correct approach — it naturally absorbs the DTS refresh-window backfill without duplicating older data.

Add a freshness check. A small scheduled query that compares the max date in each source table against today and writes an alert (or fails loudly) catches the silent failure mode where DTS or the GA4 export skips a day and your dashboards quietly show stale numbers. This single guard prevents the most embarrassing class of reporting incident.

-- monitoring.freshness_check
SELECT
  'google_ads' AS source,
  MAX(date)    AS latest_date,
  DATE_DIFF(CURRENT_DATE(), MAX(date), DAY) AS days_behind
FROM staging.campaign_performance_daily
HAVING days_behind > 1;

If that query returns rows, something upstream is stale and needs attention.

When to graduate to dbt. Scheduled queries handle a single-account warehouse comfortably for a long time. Move to dbt when transformation logic exceeds roughly 10-15 interdependent models, when you want automated tests and column-level documentation, or when several analysts edit the same SQL and need version control and lineage. dbt orchestrated by Cloud Composer (managed Airflow) is the common next step — but adopt it because complexity demands it, not because it is fashionable.

Backfill discipline. When you change a transformation, re-run it across history so old partitions reflect the new logic. Parameterize your scheduled queries by date so the same SQL serves both the daily incremental run and a manual full backfill.

Looker Studio reporting and cost management

The warehouse is only useful if people can read it, and Looker Studio is the natural front end for BigQuery.

Connect to curated tables only. Point every Looker Studio data source at the reporting dataset, never at raw exports or wide staging tables. This is the most important cost decision in the entire pipeline. Looker Studio refreshes queries on interaction and on a cache schedule; if a popular dashboard queries a full-history unpartitioned table, the bytes scanned — and the bill — compound fast.

A practical starter dashboard set:

Executive overview — blended spend, conversions, and CRM revenue across channels, trended over time.
Account overview — per-account performance for MCC setups, using the account-map friendly names.
Campaign drill-down — spend, CPA, cost-per-revenue, and GA4 engagement by campaign and ad group.

Cost controls that actually matter:

Monitor the most expensive queries. The INFORMATION_SCHEMA.JOBS view exposes every query, who ran it, and how many bytes it processed. A weekly look at the top consumers surfaces the one report or scheduled query quietly dominating your bill, so you can optimize it — usually by adding a partition filter or materializing a join.

SELECT
  user_email,
  query,
  total_bytes_processed / POW(10, 12) AS tb_processed
FROM `region-eu`.INFORMATION_SCHEMA.JOBS
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY total_bytes_processed DESC
LIMIT 20;

Set a billing budget and alert on the Cloud project regardless of how careful you are. It is the backstop that turns a runaway query from a month-end surprise into a same-day notification.

With curated tables, partition filters, BI Engine for hot dashboards, and weekly cost monitoring, a single-account warehouse comfortably stays in the €30-150/month range while serving fast, trustworthy reports.

SQL patterns for incrementality and LTV foundations

The warehouse is not just prettier reporting — it is the substrate for the measurement work that drives real budget decisions. Here are the foundational patterns.

Geo-holdout incrementality. Incrementality testing asks what conversions would have happened without the ads. The warehouse makes the analysis trivial once a geo experiment runs: tag regions as test or control, then compare conversion rates across the two groups for the test window.

-- incrementality lift by region group
SELECT
  geo_group,
  SUM(conversions) / SUM(sessions) AS conversion_rate
FROM reporting.geo_experiment
WHERE date BETWEEN @test_start AND @test_end
GROUP BY geo_group;

The lift between test and control, normalized for group size, estimates the incremental contribution. The methodology and test design live in our incrementality testing guide; BigQuery is where you compute it at scale and re-run it every quarter without manual effort.

Cohort retention and LTV. Lifetime value modeling starts with cohorting customers by acquisition month and tracking revenue forward.

-- monthly acquisition cohorts with cumulative revenue
SELECT
  DATE_TRUNC(first_close_date, MONTH) AS cohort_month,
  DATE_DIFF(revenue_month, first_close_date, MONTH) AS month_index,
  SUM(revenue) AS cohort_revenue
FROM reporting.customer_revenue_monthly
GROUP BY cohort_month, month_index
ORDER BY cohort_month, month_index;

This cohort matrix is the raw material for LTV curves, payback-period analysis, and CAC-to-LTV ratios by acquisition channel — the metrics that should govern budget allocation. Joined back to the campaign that produced each customer's GCLID, you can compute LTV by campaign, not just CPA by campaign, which is a far better optimization target. Our CAC payback by vertical analysis shows the benchmarks these queries feed.

Blended cross-channel reporting. Once Meta, LinkedIn, and other channels land in the same warehouse with consistent definitions, a single UNION ALL of each channel's daily table followed by GROUP BY channel with SAFE_DIVIDE(SUM(revenue), SUM(cost)) for blended ROAS produces the cross-channel view that no platform UI can. The hard part is consistent definitions, not the SQL.

Marketing mix modeling input. Advanced teams feed the warehouse straight into an MMM. Google's open-source Meridian framework reads weekly aggregated spend and outcomes — exactly the shape your reporting tables already produce. A single scheduled query pivots daily data into the weekly channel matrix Meridian expects, closing the loop from raw click data to statistical budget modeling.

The teams that win are not the ones with the fanciest dashboards — they are the ones who join CRM revenue to the click. The day a Google Ads warehouse stops reporting conversion volume and starts reporting cost-per-closed-won-deal by campaign is the day the marketing team starts making materially better budget decisions. Everything else in the pipeline exists to make that one join reliable, repeatable, and trustworthy.

— The recurring pattern across analytics teams that get this right

These patterns are foundations, not finished models, but they are the hard part to get right structurally. With clean, joined, partitioned data in BigQuery, layering incrementality, LTV, and MMM on top becomes an analytics exercise rather than a data-plumbing nightmare.

For broader context on the privacy and tracking shifts that make a first-party warehouse increasingly essential, see our first-party data strategy guide and the cookieless future analysis.

A reporting warehouse tells you where the money went; an optimization layer decides where it goes next. If you want AI-driven Google Ads optimization that can act on the cost-per-revenue signals your BigQuery pipeline surfaces, SteerAds runs a free 14-day audit on Google and Microsoft Ads accounts.

Sources

cloud.google.com/bigquery/docs/google-ads-transfer
— Google Ads Data Transfer Service documentation
cloud.google.com/bigquery/docs
— BigQuery documentation
support.google.com/analytics
— GA4 BigQuery export documentation
cloud.google.com/looker/docs/studio
— Looker Studio documentation
cloud.google.com/blog/products/data-analytics
— Google Cloud data analytics blog

FAQ

What does the Google Ads to BigQuery pipeline actually cost to run?

For a mid-market account, expect €30-150/month all-in. The Google Ads Data Transfer Service itself is free. BigQuery costs split into storage (around €0.02 per GB per month after the first 10 GB free tier) and query processing (€5-6 per TB scanned, with 1 TB free per month). A typical single-account pipeline stores 5-50 GB and scans well under 1 TB monthly if you partition and cluster tables correctly. The biggest cost risk is unpartitioned tables scanned by ad-hoc Looker Studio queries — that is where accounts accidentally spend €500+/month.

How is the Data Transfer Service different from the Google Ads API?

The Data Transfer Service (DTS) is a managed, scheduled export that lands raw Google Ads reporting tables into BigQuery daily with zero code. The Google Ads API is a programmatic interface you call yourself for real-time or custom data needs. For reporting warehouses, DTS is almost always the right choice — it handles authentication, schema, backfill, and retries automatically. Use the API only when you need data DTS does not export, sub-daily freshness, or write operations like bid changes. Most analysts never touch the API for reporting.

How fresh is Data Transfer Service data?

DTS runs once per day and lands data for the previous day, typically completing in the early morning in your configured time zone. There is a built-in refresh window: DTS re-pulls the trailing several days (configurable, default and maximum vary) to capture conversion backfill and late attribution. This means a conversion attributed three days after the click still appears once the refresh window covers it. For PPC, daily freshness is sufficient — bid and budget decisions rarely need intraday warehouse data. If you need same-day numbers, read the Google Ads UI directly.

Do I need GA4 BigQuery export as well, or is Google Ads DTS enough?

You need both if you want true funnel analysis. Google Ads DTS gives you spend, impressions, clicks, and conversions as Google Ads sees them. The GA4 BigQuery export gives you event-level user behavior — landing pages, session quality, micro-conversions, and cross-session journeys. Joining them lets you answer questions Google Ads alone cannot, such as which campaigns drive high-engagement sessions that convert later. The GA4 export is free to enable and lands an events table partitioned by day. For most reporting warehouses, enable both from day one.

How do I join Google Ads data to my CRM revenue?

The cleanest join key is the GCLID (Google Click Identifier), captured at click time and stored against the lead or deal in your CRM. Export your CRM deals with their GCLID and closed-won revenue into a BigQuery table, then join to the Google Ads click data on GCLID. This connects actual pipeline and revenue back to the campaign, ad group, and keyword that produced the click. If GCLID capture is incomplete, fall back to a UTM-based join, but expect lower match rates. Enhanced conversions and offline conversion import are the upstream fixes for poor GCLID coverage.

Should I transform data with scheduled queries or a tool like dbt?

Start with native BigQuery scheduled queries — they are free to schedule, require no extra infrastructure, and handle the daily build of reporting tables well. Move to dbt when your transformation logic grows past roughly 10-15 interdependent models, when you need testing and documentation, or when multiple analysts edit the same SQL. dbt adds version control, lineage, and data tests that scheduled queries lack. For a single-account PPC warehouse, scheduled queries are usually enough for the first year; introduce dbt when complexity demands it.

Can I run this pipeline for multiple Google Ads accounts under an MCC?

Yes. The Data Transfer Service supports MCC (manager account) level transfers that export all child accounts into a single BigQuery dataset, with each row tagged by customer ID. This is the standard setup for agencies and multi-brand advertisers. Build your staging views to filter or group by customer ID, and add an account-mapping table so reports can show friendly account names. Be mindful of cost: an MCC with 50 accounts produces far more data, so partitioning and clustering by date and customer ID become essential rather than optional.