On the comparative blind tests we run continuously on accounts observed in public benchmarks in 2026 — aggregated Google Ads data — applied to major LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro), a naive prose prompt like "write me 15 RSA headlines for B2B CRM" produces 40 to 55% directly usable output — the rest to fix or discard for redundancy, off-topic, character overflow, or hallucinations on non-existent features. Per tested verticals, the gap sits between 28% (technical B2B verticals where AI hallucinates the most) and 67% (mass-market B2C verticals where language is more standardized). The same case in structured JSON prompt with character_max, theme_distribution, excluded_terms, format_output constraints climbs to 75 to 88% exploitable output. The gain isn't in model quality — it's in constraint precision. A 2026 LLM follows JSON constraints drastically better than prose constraints. ChatGPT isn't magic on Google Ads; structured, it becomes productive.
This article delivers 30 JSON prompts grouped by use case: 6 RSA generation, 5 negatives discovery, 4 narrative audit, 5 executive reporting, 5 tactical optimization, plus 5 annex bonuses. All tested on real accounts and calibrated on the 3 major 2026 LLMs. For pure RSA writing, see our RSA writing method. For complete account audit, our Google Ads audit checklist. And for client reporting, our 10 KPI reporting guide. Our free CTR calculator compares your click-through rate to 2026 medians by vertical.
Why structured JSON prompts (vs prose)
A structured JSON prompt is a prompt formatted in JSON with explicit keys for the model role, business context, constraints (character count, exclusions, format output), few-shot examples and validation instructions. Compared to a natural-language prose prompt, it forces the model to follow verifiable rules — which is much more robust on 2026 LLMs, optimized to follow formalized structures via their RLHF fine-tuning.
Differences observed in blind testing on 200 paired prompts (RSA, negatives, audit):
The 6 critical components of a Google Ads JSON prompt:
role— who the model is ("You're a senior Google Ads specialist"). Frames tone and depth.context— account business data (vertical, ICP, budget, objectives). Short but precise, no vague prose.task— the specific mission with action verb.constraints— hard rules (character count, format, exclusions). This is where the magic happens.examples(optional) — 2 to 4 few-shot examples of expected style. Increases quality by 15-25%.output_format— JSON, markdown, CSV. Imposing the format facilitates automated parsing.
2026 LLMs hallucinate confidently on Google Ads benchmarks. A "what's the average Search CTR in the US 2026" prompt produces a plausible figure but without verifiable source — typically 2.8% or 3.1%. Strict rule: for any benchmark question, either explicitly inject real account data into the prompt (CSV or JSON), or cite an official external source (Search Engine Land, Wordstream, ThinkWithGoogle) you've verified. Adding "no_external_benchmarks": true in constraints reduces stat hallucinations by 80%+ in our tests.
Official references to go further: OpenAI prompt engineering documentation on platform.openai.com, Anthropic guide on docs.anthropic.com and Google AI Studio documentation on ai.google.dev. All three converge on the value of structured prompts — JSON or XML — vs free prose.
The exploitable output gap (40-55% vs 75-88%) doesn't come from a model or LLM raw capability difference. It's the same GPT-5, the same Claude Opus 4.7, the same Gemini 2.5 Pro that produce both results. The explanatory variable is the formal structure of constraints. A 2026 LLM was fine-tuned by RLHF on millions of formal structure examples (JSON, XML, function calling, tool use). When you give it a list of constraints in free prose, it interprets them with significant variance. When you give it the same constraints in JSON, it processes them as a schema to validate — and the compliance rate mechanically rises. It's an emergent property of post-2024 training, not prompt magic.
RSA generation: 6 prompts per format
RSA generation is the #1 use case where AI saves measurable time. The key: don't ask "write 15 headlines", but impose a strict thematic matrix (cf. RSA method) with character count, thematic distribution and term exclusions.
The Responsive Search Ad format combines up to 15 titles and 4 descriptions that Google permutes dynamically at display. The Google Ads learning machine optimizes title/description combinations based on user, query and bid context — but it can only do so correctly if it receives enough thematic variety. On aggregated 2025-2026 Google Ads data, RSAs with less than 10 titles or with poor thematic diversity (3 themes or less of 7 expected) cap their Ad Strength at "Average" and their CTR below the vertical median. The six prompts below aim to automate production while preserving thematic diversity — key to Search Quality Score.
Prompt 1 — Standard Search RSA 15 headlines + 4 descriptions
{
"role": "You're a senior Google Ads copywriter, English-speaking, RSA expert.",
"context": {
"vertical": "B2B SaaS CRM",
"icp": "SMB 20-200 employees, services sector",
"differentiators": ["GDPR compliant", "No commitment", "Made in USA"],
"competitors": ["HubSpot", "Pipedrive", "Salesforce Essentials"],
"tone": "professional direct, factual, evidence-based"
},
"task": "Generate 15 RSA headlines and 4 descriptions for ad group 'B2B SMB CRM'.",
"constraints": {
"headline_max_chars": 30,
"description_max_chars": 90,
"theme_distribution": {
"main_keyword": 3,
"quantified_benefits": 3,
"proof_points": 2,
"direct_cta": 2,
"urgency_offer": 2,
"differentiation": 2,
"brand_only": 1
},
"no_repetition_keyword_exact": true,
"no_external_benchmarks": true,
"include_keyword_in_3_headlines": "B2B CRM"
},
"examples_few_shot": [
{"headline": "B2B SMB CRM · 15-min Demo", "theme": "keyword_cta"},
{"headline": "Save 8 hours/week in 2026", "theme": "quantified_benefit"}
],
"output_format": "JSON array with keys: headline, theme, char_count"
}
Prompt 2 — Simplified Display RSA assets
{
"role": "Display Google Ads copywriter.",
"task": "Generate 8 headlines (30 char max) and 4 descriptions (90 char max) for Display retargeting cart abandoners campaign.",
"context": {
"vertical": "Women's fashion e-commerce",
"audience": "Cart abandoners last 7 days",
"incentive": "-15% code WELCOME15",
"deadline": "48h"
},
"constraints": {
"tone": "urgency without abusive pressure",
"headline_themes": ["incentive", "deadline", "forgotten_product", "social_proof"],
"no_caps_lock": true,
"no_emojis": true
},
"output_format": "JSON object with arrays headlines and descriptions"
}
Prompt 3 — Complete PMax asset group RSA
{
"role": "PMax asset group designer.",
"task": "Generate complete asset group for PMax campaign 'Premium Men's Sneakers'.",
"constraints": {
"headlines_short": {"count": 5, "max_chars": 30},
"headlines_long": {"count": 5, "max_chars": 90},
"descriptions": {"count": 5, "max_chars": 90},
"callouts": {"count": 4, "max_chars": 25},
"structured_snippets": {"count": 3, "header": "brands", "values_max": 4},
"image_brief": "5 prompts to generate images via DALL-E or Midjourney: lifestyle product, close-up detail, 6s video loop, square 1:1, 9:16 vertical"
},
"context": {
"vertical": "Premium sneakers e-commerce",
"price_range": "$200-500",
"audience": "Urban men 25-45, premium-conscious"
},
"output_format": "JSON object structured by asset type"
}
Prompt 4 — Brand defense RSA (competitor bidding on your name)
{
"role": "Brand defense PPC strategist.",
"task": "Generate 15 RSA headlines for Brand Defense campaign — a competitor is bidding on our brand name.",
"context": {
"brand_name": "AcmeCRM",
"competitor_name": "RivalCRM",
"differentiators_vs_competitor": ["10 years older", "Rated 4.8/5", "US Support 7 days/week"]
},
"constraints": {
"headline_max_chars": 30,
"include_brand_in_5_headlines_minimum": true,
"tone": "confident without aggressive (no direct bashing)",
"implicit_comparison": true,
"no_competitor_name_mention": true
},
"output_format": "JSON array with headline, theme, brand_present"
}
Prompt 5 — Seasonal RSA variant
{
"role": "Seasonal campaign copywriter.",
"task": "Adapt existing RSA in Black Friday version.",
"input_existing_rsa": "[Paste here current 15 headlines]",
"constraints": {
"preserve_brand_voice": true,
"preserve_3_brand_headlines": true,
"replace_urgency_offer_with": "Black Friday -40% until Nov 30",
"replace_proof_with": "2026 best-sellers of the year",
"add_urgency_counter": "Only X days left"
},
"output_format": "JSON with original + new version side-by-side"
}
Prompt 6 — Multi-language RSA brand voice consistency
{
"role": "Multilingual copywriter EN/FR/ES/DE.",
"task": "Adapt the following RSA in 4 languages preserving thematic matrix and brand tone.",
"input_rsa_en": "[Paste here 15 headlines EN + 4 desc]",
"constraints": {
"no_literal_translation": true,
"preserve_theme_distribution": true,
"preserve_proof_points_quantified": true,
"adapt_local_idioms": true,
"respect_char_count_per_language": {
"EN": 30, "FR": 30, "ES": 30, "DE": 30
},
"warn_if_translation_doesnt_fit_char_count": true
},
"output_format": "JSON object with key per locale"
}
For these 6 prompts, the time gain observed per public benchmarks we accompany: initial production 45-60 min vs 2-3h pure human, then 15-20 min of human editing to calibrate message-market. Productivity ROI confirmed on standardized ad groups (mass market e-com, volume lead gen); marginal benefit on strategic ad groups (brand premium, B2B niche).
An operational vigilance point: brand voice isn't learned by an LLM in a single prompt. To achieve durable consistency on a volume of RSAs (typically 50 to 200 per month on a mid-market account), you need either to build a library of 8 to 12 representative few-shot examples and inject them systematically in the examples_few_shot block, or use ChatGPT's Projects system (or Claude Projects) to store a brand voice brief shared across all prompts. On accounts observed in public Google Ads benchmarks, the second approach reduces tone variance between RSAs by 60-70% compared to isolated prompts. It implies, however, governance discipline: brand voice brief updates must be tracked and versioned like code.
Negative discovery: 5 prompts per source
AI negative discovery is a high-leverage use case — a mid-market account typically has 200 to 800 hidden negative keywords to discover in its search query report. Doing it by hand = 4 to 8h. With a well-built JSON prompt + clustering, 30 to 45 min. For complete discovery + clustering mechanics, see our article AI negatives discovery + clustering.
The "negatives" discipline is one of the maturity indicators of a Google Ads account. By vertical, the CPA gap between an account with an up-to-date shared negatives list and an account without systematic negatives sits between 15 and 28% — for the same budget, the same Smart Bidding, the same RSAs. The reason is mechanical: without negative filter, broad match pushes budget toward informational or off-topic queries ("how to", "free", "definition"), which consume clicks without converting. The five prompts below address the five complementary signal sources: Google Ads search query report (the base), GA4 bounce rate, Meta Ads search bar, competitor trademark exclusions, and embeddings thematic clustering. Combining all five gives a much more robust discovery discipline than search query alone. For the quick calculation with 2026 benchmarks by vertical, see our free CPA calculator.
Prompt 7 — Negatives from search query report (bulk 500 lines)
{
"role": "PPC negative keywords analyst.",
"task": "Analyze the attached search query report and identify negative candidates.",
"input_csv": "[Paste search query report CSV: query, impressions, clicks, conversions, cost]",
"context": {
"vertical": "B2B SaaS CRM",
"icp_keywords_positive": ["CRM", "software", "SMB", "B2B"],
"icp_keywords_negative": ["free", "open source", "tutorial", "how to"],
"intent_filter": "transactional only"
},
"constraints": {
"min_impressions_threshold": 50,
"min_clicks_threshold": 5,
"max_conv_rate_threshold": 0.005,
"exclude_brand_terms": ["AcmeCRM", "Acme"],
"match_type_recommendation": "broad or phrase per volume",
"no_external_benchmarks": true
},
"output_format": "JSON array with query, recommended_negative, match_type, reason, priority"
}
Prompt 8 — Negatives from GA4 landing page bounce
{
"role": "GA4 + Google Ads correlation analyst.",
"task": "Cross-reference Google Ads queries with GA4 bounce rate per landing page to detect mismatches.",
"input_ga4_csv": "[Page path, sessions, bounce rate, avg session duration]",
"input_gads_csv": "[Search query, landing page, impressions, clicks, conversions]",
"constraints": {
"bounce_rate_threshold": 0.75,
"min_sessions_for_signal": 30,
"correlation_window_days": 30
},
"task_detail": "Identify Google Ads queries leading to landing pages with bounce above 75% — negative candidates.",
"output_format": "JSON array with query, landing_page, bounce_rate, sessions, recommended_action"
}
Prompt 9 — Meta Ads search bar interest analysis negatives
{
"role": "Cross-channel negative keywords strategist.",
"task": "Analyze Meta Ads search bar queries to identify intents not relevant for Google Ads.",
"input_meta_search_terms": "[Paste Meta Ads Search bar terms export]",
"context": {
"google_ads_vertical": "Real estate broker lead gen",
"google_ads_icp": "First-time home buyers 28-45 years"
},
"constraints": {
"intent_categories_to_extract": ["info_only", "wrong_persona", "wrong_geography", "wrong_product"],
"exclude_already_in_negative_list": "[Paste current list]"
},
"output_format": "JSON object grouped by intent_category"
}
Prompt 10 — Competitor trademark exclusion negatives
{
"role": "Trademark negative keywords legal-aware.",
"task": "Generate exhaustive list of negative keywords to exclude my campaigns from competitor brand searches.",
"context": {
"competitors_to_exclude": ["HubSpot", "Pipedrive", "Salesforce", "Zoho", "monday.com"],
"include_misspellings": true,
"include_branded_keyword_combos": true
},
"constraints": {
"match_types": ["exact", "phrase"],
"exclude_generic_terms": ["CRM", "software"],
"include_typo_variants": true
},
"output_format": "JSON array with negative_keyword, match_type, reason"
}
Prompt 11 — Embeddings thematic clustering negatives
{
"role": "Embeddings + clustering negative keywords specialist.",
"task": "Group the list of 500 non-converted search queries by semantic clusters to bundle negatives.",
"input_queries": "[Paste 500 search queries CSV list]",
"constraints": {
"embedding_model": "text-embedding-3-small (suggested)",
"clustering_algorithm": "DBSCAN or KMeans k=15",
"min_cluster_size": 5,
"output_one_negative_per_cluster": true
},
"task_detail": "For each cluster, propose ONE phrase-match negative covering 80%+ of the cluster.",
"output_format": "JSON array with cluster_id, sample_queries, recommended_negative_phrase, coverage_estimate"
}
On accounts we track, these 5 prompts allow discovering typically 150 to 400 candidate negatives per quarterly audit, of which 60-75% retained after human review. Huge time-saving volume vs manual review.
The application granularity remains a human decision. An AI-detected negative can apply at four distinct levels: account (via shared list), campaign, ad group, or campaign group via a scope-restricted shared list. A broad negative ("free", "tutorial") goes systematically at account level; a vertical-specific negative ("divorce attorney" in a multi-practice account) goes at campaign level; a fine-intent negative goes at ad group level. Documented on the official Google Ads page on shared lists, this hierarchy avoids conflicts where a B2C-relevant negative mistakenly applies to a B2B campaign where it loses volume.
Narrative audit: 4 prompts per dimension
AI narrative audit is different from quantified audit (which we generate via script or API). The narrative produces the prose that contextualizes numbers for a business stakeholder. It's where Claude Opus 4.7 particularly excels — its long-distance coherence beats GPT-5 and Gemini on 2-5 page prose reports.
A narrative audit distinguishes itself from quantified audit by its function: produce explanatory text that the recipient can read and arbitrate, not a series of KPIs. The four dimensions audited below (structure, creative, tracking, budget) cover together in the majority of cases more than 80% of operational problems of a mid-market Google Ads account. To frame these audits, two principles must be strictly respected: directly inject account data into the prompt (CSV or JSON export, not a prose description), and prohibit external benchmarks (no_external_benchmarks: true). Without these two rules, the LLM produces a falsely convincing audit that mixes real observations on your data and hallucinations on non-existent benchmarks. The boundary between useful audit and dangerous audit holds to this detail.
Prompt 12 — Account structure audit
{
"role": "Senior Google Ads auditor with 10 years experience.",
"task": "Analyze the attached account structure and produce an 800-word narrative report.",
"input_account_structure_csv": "[Paste campaigns + ad_groups + keywords counts export]",
"dimensions_to_audit": [
"naming_convention_consistency",
"campaign_budget_allocation",
"ad_group_size_balance",
"match_types_distribution",
"shared_negative_lists_usage"
],
"constraints": {
"tone": "factual, no complacency, no alarmism",
"include_priority_actions": "top 3 quick wins + top 2 strategic",
"no_external_benchmarks": true,
"use_only_provided_data": true
},
"output_format": "Structured Markdown: Executive Summary, Findings per dimension, Priority Actions, Risks"
}
Prompt 13 — Creative audit (RSA, Ad Strength, pinning)
{
"role": "Creative Google Ads auditor.",
"task": "Audit creative quality of the attached account, focus RSA and Asset Reports.",
"input_rsa_export": "[Paste RSA export all campaigns: ad_group, headlines, descriptions, ad_strength, pinning]",
"input_asset_report": "[Paste Asset Report: asset, performance_label]",
"checks": [
"headlines_count_per_rsa (target 15)",
"thematic_diversity (7 themes expected)",
"pinning_excessive (warning if >1 pin per RSA)",
"ad_strength_poor_count",
"low_performing_assets_count"
],
"output_format": "CSV table (ad_group, issue, severity, recommended_fix) + synthesis paragraph"
}
Prompt 14 — Conversion tracking audit
{
"role": "Conversion tracking auditor.",
"task": "Detect tracking anomalies on the account.",
"input_conversions_export": "[Paste Tools > Conversions export]",
"input_gtm_setup": "[Paste GTM tags summary]",
"checks": [
"duplicates_conversion_actions",
"missing_enhanced_conversions",
"inconsistent_attribution_models",
"stale_conversion_actions_no_data",
"consent_mode_status"
],
"constraints": {
"include_remediation_steps": true,
"include_estimated_signal_loss_percent": true
},
"output_format": "Markdown report with sections per check"
}
Prompt 15 — Budget pacing audit
{
"role": "Budget pacing analyst.",
"task": "Detect over/underspend per campaign over the last 30 days.",
"input_daily_spend_csv": "[Paste daily spend per campaign 30d export]",
"input_target_budgets": "[Daily budget target per campaign]",
"checks": [
"deviation_from_target_per_day",
"weekday_vs_weekend_pattern",
"early_month_overspend",
"ramping_campaigns_unstable"
],
"constraints": {
"tolerance_threshold_percent": 8,
"flag_if_consecutive_overspend_days": 3
},
"output_format": "Markdown with drift table + explanatory paragraph"
}
For the complete pillar audit, see our Google Ads audit checklist. The prompts above chain in a semi-automated audit workflow, with human validation between each step.
Executive reporting: 5 prompts per stakeholder
Reporting is the other use case where AI massively saves time — an account manager spends on average 6 to 12h/month on client reports. With per-stakeholder persona prompts, this drops to 1-2h. See our 10 KPI client reporting guide for indicators to include.
The golden rule of executive reporting is that a report doesn't summarize numbers: it summarizes the recipient's mental frame. A CEO wants a 1-page business view (300 words max, no jargon, with a headline figure and a plan). A CFO wants payback period and LTV:CAC ratio, in finance vocabulary not marketing. A sales team wants lead quality expressed in MQL→SQL→deal, not in CTR. A marketing director wants weekly anomalies, proposed tests and contextual ratios. On accounts observed in public Google Ads benchmarks, the gap between generic reporting (the same report for all stakeholders) and per-persona personalized reporting is measured in recipient attention duration — which goes on average from 90 seconds to 4-6 minutes per report. The real ROI isn't in hours saved by the account manager but in finer decisions made by recipients.
Prompt 16 — CEO executive summary reporting
{
"role": "CEO-grade executive reporter.",
"task": "1-page exec summary synthesis from monthly data attached.",
"input_monthly_data": "[Paste dashboard data: spend, conversions, CPA, ROAS, vs target]",
"audience": "Non-technical CEO, 2-minute attention",
"constraints": {
"max_length_words": 300,
"no_jargon": true,
"structure": ["headline_metric_vs_target", "what_drove_change", "next_month_plan"],
"tone": "factual without embellishment",
"include_risks": true
},
"output_format": "1-page Markdown with 3 sections"
}
Prompt 17 — Marketing team weekly reporting
{
"role": "Performance marketing weekly briefer.",
"task": "Weekly operational brief for marketing team.",
"input_weekly_data": "[CSV last 7 days vs previous 7 days]",
"audience": "Mid-level marketing team, technique-friendly",
"constraints": {
"max_length_words": 500,
"include_anomalies_first": true,
"include_test_recommendations": "1-2 per week",
"use_jargon_authorized": ["CTR", "CPA", "ROAS", "LTV", "Smart Bidding"]
},
"output_format": "Markdown: Highlights / Lowlights / Anomalies / Tests proposed"
}
Prompt 18 — Sales team lead quality reporting
{
"role": "MQL/SQL pipeline analyst.",
"task": "Brief for sales team on Google Ads lead quality.",
"input_crm_export": "[Paste CRM export with source = Google Ads]",
"audience": "Sales team, focus quality not quantity",
"metrics_to_include": [
"MQL_count",
"SQL_conversion_rate_from_MQL",
"deal_velocity_days",
"closed_won_count",
"average_deal_value"
],
"constraints": {
"include_lead_scoring_distribution": true,
"flag_underperforming_campaigns_lead_quality": true
},
"output_format": "Sales-friendly Markdown report"
}
Prompt 19 — CFO LTV:CAC reporting
{
"role": "CFO-grade financial reporter PPC.",
"task": "Financial report focus payback period and LTV:CAC.",
"input_data": "[CAC per monthly cohort 12 last months + LTV cohort 12m]",
"audience": "CFO, focus cash flow and margins",
"constraints": {
"include_payback_period_calculation": true,
"include_ltv_cac_ratio_per_cohort": true,
"include_blended_vs_paid_only_cac": true,
"no_marketing_jargon": true,
"use_finance_vocabulary": true
},
"output_format": "Structured Markdown finance sections"
}
Prompt 20 — Agency client monthly QBR reporting
{
"role": "Agency QBR reporter.",
"task": "Quarterly business review report for agency client.",
"input_quarter_data": "[Paste quarter data + Q-1 comparison + Y-1 comparison]",
"audience": "Client decision-maker + ops team",
"constraints": {
"include_strategic_recommendations_top_3": true,
"include_competitive_benchmark_directional_only": true,
"include_next_quarter_roadmap": true,
"tone": "partner not vendor",
"max_length_words": 1500
},
"output_format": "Long form QBR-style Markdown"
}
Tactical optimization: 5 prompts per decision
AI tactical optimization is the most delicate use case — it's where AI can give high-impact recommendations, but also dangerous recommendations if poorly framed. Always validate humanly before execution. For general optimization mechanics, see our complete Performance Max 2026 guide.
The critical distinction between analysis prompts and decision prompts must be explicit in your workflow. Analysis prompts (RSA, negatives, audit, reporting sections above) produce content you validate before publication or distribution. The cost of a bad analytical prompt is editing delay. Decision prompts (the five below) produce recommendations that, executed, durably modify account performance: a premature Target CPA→Target ROAS switch can cost three weeks of relearning; poorly calibrated ad groups consolidation can break relevant audiences; a campaign pause based on 30 days of noise can cut a healthy learning trajectory. For these five prompts, the absolute rule is to explicitly ask the model for a confidence score and a rollback plan. Below 0.75 confidence, don't deploy. Without documented rollback, don't deploy either.
Prompt 21 — Smart Bidding rebid decision (Target CPA → Target ROAS)
{
"role": "Smart Bidding strategy advisor.",
"task": "Advise on the Target CPA → Target ROAS switch for the attached campaign.",
"input_campaign_data": "[Paste 90d data: conversions, value, CPA, ROAS, learning phase status]",
"decision_criteria": [
"min_50_value_based_conv_per_week",
"value_signal_reliability",
"learning_phase_stable_30_days",
"target_ROAS_realistic_vs_history"
],
"constraints": {
"give_go_no_go_recommendation": true,
"include_target_ROAS_initial_value": true,
"include_rollback_plan": true,
"include_monitoring_metrics_first_14_days": true
},
"output_format": "JSON: recommendation, target_ROAS_initial, rollback_trigger, kpi_to_monitor"
}
Prompt 22 — Restructure decision (ad groups consolidation)
{
"role": "Account restructure strategist.",
"task": "Identify ad groups to consolidate to reach Smart Bidding signal threshold.",
"input_ad_groups_data": "[Paste ad groups export: conv 30d, spend, thematic structure]",
"criteria": {
"min_conv_per_ad_group_week": 5,
"thematic_proximity_threshold": 0.75,
"preserve_separate_match_types": true,
"preserve_separate_audiences": true
},
"constraints": {
"max_ad_groups_per_consolidation": 4,
"preserve_naming_convention": true,
"include_keyword_remap_plan": true
},
"output_format": "JSON array with consolidation_group, source_ad_groups, target_ad_group_name, keywords_to_migrate"
}
Prompt 23 — New negative launch decision
{
"role": "Negative keyword scope advisor.",
"task": "For each candidate negative attached, advise the application granularity.",
"input_negatives_candidates": "[Paste candidate negatives list with query history]",
"decision_levels": ["account_level", "campaign_level", "ad_group_level", "shared_negative_list"],
"criteria": {
"applies_to_all_campaigns": "account_level",
"applies_to_specific_vertical": "campaign_level",
"applies_to_specific_match_type_intent": "ad_group_level",
"reusable_pattern": "shared_negative_list"
},
"output_format": "JSON array with negative, recommended_level, justification"
}
Prompt 24 — Campaign pause decision
{
"role": "Campaign pause/keep decision advisor.",
"task": "Analyze if campaign should be paused or reworked.",
"input_campaign_60d": "[Paste 60d campaign data + account benchmarks]",
"decision_criteria": [
"CPA_vs_target_3x_above",
"conversion_rate_below_account_avg_50pct",
"trajectory_30d_improving_or_degrading",
"strategic_value_brand_or_test"
],
"constraints": {
"include_alternatives_to_pause": ["restructure", "rebid", "creative_refresh", "audience_pivot"],
"include_estimated_recovery_time_per_alternative": true
},
"output_format": "JSON: recommendation, alternatives_ranked, rationale"
}
Prompt 25 — Cross-channel budget allocation decision
{
"role": "Cross-channel budget allocator.",
"task": "Recommend budget shift Google Ads vs Meta Ads vs Microsoft Ads per marginal ROI.",
"input_channels_data": "[Paste spend, conv, CAC, marginal CAC last 30d per channel]",
"context": {
"total_budget_usd_monthly": 27500,
"current_split": {"google": 0.65, "meta": 0.25, "microsoft": 0.10},
"constraints_business": ["Google brand minimum $2.2k/month", "Microsoft B2B priority"]
},
"constraints": {
"max_shift_percent_per_iteration": 0.15,
"include_marginal_CAC_logic": true,
"no_external_benchmarks": true
},
"output_format": "JSON: recommended_split, shift_per_channel_usd, rationale_per_shift"
}
Best practices: guardrails, validation, A/B
JSON prompt usage best practices aren't optional — without them, AI produces falsely convincing outputs that cause real damage in account. Three pillars: guardrails, systematic human validation, A/B vs naive version to measure real gain.
A guardrail is an explicit constraint you impose on the model to limit its output space: prohibit external benchmarks, require a confidence score, refuse to produce if provided data is insufficient. On accounts observed in public Google Ads benchmarks, agencies that don't impose these guardrails publish on average 12 to 18% of AI content containing at least one factual hallucination (invented figure, non-existent feature, fictional source). With the five guardrails below activated systematically, the hallucination rate drops to 2-4% — and those that remain are almost always detected in schema validation because the model explicitly marks its hypotheses. Guardrail discipline is cumulative: each guardrail added eliminates an error class without additional cost, and the initial investment (5-10 min to write the constraints) is amortized from the second use of the prompt.
The 5 essential guardrails to include in each prompt:
no_external_benchmarks— prevents the model from inventing sectoral stats. Forces use of only provided data.use_only_provided_data— strict variant of the previous. Any data not provided = unknown, not invented.flag_assumptions_explicitly— the model must explicitly list the assumptions it makes. Allows validation.include_confidence_score— for decisions, ask the model for a 0-1 confidence score. Filter below 0.7.request_clarification_if_data_insufficient— instead of inventing, ask for clarifications.
Systematic human validation pipeline:
- AI output — collect the raw JSON.
- Schema validation — verify the JSON parses, character_count constraints hold, exclusions are respected.
- Semantic spot check — human review 10-20% of output for message-market coherence.
- Pilot test — deploy on 1 ad group or 1 campaign for 7 days before industrialization.
- A/B measurement vs baseline — compare to non-AI equivalent outputs.
Naive vs structured prompt A/B test — clear methodology:
# Pseudo-code workflow A/B prompt comparison
import openai
def run_ab_prompts(naive_prompt, structured_prompt, n_runs=20):
naive_outputs = [openai.chat.completions.create(
model="gpt-5", messages=[{"role": "user", "content": naive_prompt}]
) for _ in range(n_runs)]
structured_outputs = [openai.chat.completions.create(
model="gpt-5", messages=[{"role": "user", "content": structured_prompt}]
) for _ in range(n_runs)]
metrics = {
"char_count_compliance": compare_char_compliance(naive_outputs, structured_outputs),
"theme_diversity": compare_diversity(naive_outputs, structured_outputs),
"human_edit_time_avg": measure_edit_time(naive_outputs, structured_outputs),
"hallucination_rate": detect_hallucinations(naive_outputs, structured_outputs),
}
return metrics
Quality tests run on 200 prompt pairs (RSA, negatives, audit):
Per public benchmarks where we compared well-prompted AI RSAs vs pure human RSAs over 21 days: CTR comes out equivalent to +5-8% in favor of AI, but conversion rate 0 to 3% lower (AI optimizes the hook, not complex message-market matching). The net business gain is on productivity (45 min vs 2-3h per RSA) and multi-account consistency, not on pure performance. Industrialize on standardized ad groups; keep humans on strategic ad groups (premium brand, B2B niche, top revenue).
To automate prompt deployment in production pipeline, see our complementary articles AI RSA + ad rotation, AI negatives discovery + clustering, and AI images for Google Ads. For infrastructure-side automation (n8n, Zapier, MCP), see n8n Google Ads and Google Ads API Python.
For advertisers who want to deploy this AI discipline without building the prompt infrastructure themselves, our SteerAds audit integrates the 30 prompts above into its pipeline and delivers a ready-to-action report in 72h, with systematic human validation and A/B pilot test on 1 ad group before industrialization. The 30 JSON prompts aren't a finished product — they're templates to adapt to your vertical context, your brand voice and your tracking maturity. The discipline that matters isn't the prompt itself but the workflow around: guardrails, validation, A/B measurement, monthly iteration. Without this discipline, ChatGPT effectively remains magic — that works one time in two and costs dearly when it fails — see also Microsoft Advertising Research for more details.
Sources
Official sources consulted for this guide:
FAQ
What does a naive ChatGPT prompt produce on Google Ads vs a structured JSON prompt?
On the tests we run continuously, a naive prose prompt like 'write me 15 RSA headlines for B2B CRM' typically produces 40 to 55% directly usable output — the rest to fix or discard (redundancy, off-topic, character overflow, hallucinations on non-existent features). The same case in structured JSON prompt (with character_max constraints, theme_distribution, explicit exclusions, format_output) climbs to 75 to 88% exploitable output. The gain isn't in model quality — it's in constraint precision. A 2026 LLM (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro) follows JSON constraints much better than prose constraints. Official documentation: platform.openai.com/docs/guides/prompt-engineering.
Does ChatGPT hallucinate on Google Ads stats? What precautions?
Yes, systematically and confidently. Asking ChatGPT 'what's the average Search CTR on Google Ads in the US 2026' produces a falsely precise figure (often in the plausible zone but without verifiable source) — typically 2.8% or 3.1% out of context vertical. Strict rule: never use LLM outputs on benchmark stat questions without verifying the source. For audit or reporting prompts, you must explicitly inject the real account data into the prompt (CSV or pasted JSON) and ask the model to analyze ONLY this data — not produce external benchmarks. The JSON guardrail `data_source: account_csv_only, no_external_benchmarks: true` reduces stat hallucinations by 80%+ in our tests.
Which model to use for which Google Ads use case?
On accounts observed in public benchmarks in 2026, Claude Opus 4.7 dominates on narrative audit and executive reporting tasks (prose coherence, stakeholder-aware tone, 1M token context window length). GPT-5 remains the robust default on RSA generation and negatives (bounded textual creativity, precise character count constraint following). Gemini 2.5 Pro is the best for tasks requiring real-time web grounding (competitor verification, Google Ads features news). Practical recommendation: industrialize on Claude for repetitive multi-account tasks (consistency), keep GPT-5 and Gemini in alternation for blind A/B output quality tests. No 2026 model is strictly superior on all axes — diversifying limits the bias of one provider.
Should you fine-tune a model on your own Google Ads data?
No, in 95% of cases, fine-tuning is over-engineering for the majority of advertisers. Typical OpenAI 2026 fine-tuning cost: $880 to $4,400 setup + recurring. To beat a well-built JSON prompt, you need 500+ quality examples from your account — which most advertisers don't have on their own. The pragmatic 2026 path: structured JSON prompts + few-shot examples (3 to 5 account examples injected directly into the prompt) + retrieval augmented generation (RAG) for internal knowledge bases. RAG runs around $22 to $88/month per volume. It's 95% of fine-tuning value at 5% of the cost. Fine-tuning relevant only for agencies industrializing on 100+ accounts with constrained brand voice.
How to measure if AI really improves my Google Ads performance vs just editorial effort?
Classic holdout test: 14 to 21 days on the same ad group, alternate AI-generated RSAs (structured prompt) and pure human-generated RSAs. Measure CTR, conversion rate, CPA. On accounts we track, well-prompted AI RSAs deliver an equivalent CTR 5-8% higher than human RSAs, but with a 0 to 3% lower conversion rate (AI optimizes the hook, not the message-market matching). The net gain is on production time (45 min AI vs 2h human per ad group), not on pure performance. Conclusion: AI is a production accelerator, not a performance magician. Industrialize on standardized ad groups, keep humans on strategic ad groups (brand, top sales).