Keyword cannibalization detection

What this does

You keep a Google Sheet of target keywords – the ones each of your clients is trying to rank for. The workflow watches that sheet. Whenever you add or edit a keyword, it triggers: it pulls 30 days of Google Search Console data for the right client, finds every page on that client's site that's ranking for the keyword, and asks GPT-4o to assess cannibalization risk.

The output goes back into the sheet next to each keyword: High / Moderate / Low / None risk, plus a sentence explaining the reasoning ("5 pages from the same domain rank for this keyword, top page is at position 8, others scattered 14-42 – classic cannibalization") and a remediation step ("Consolidate the 3 weakest pages into the strongest, 301 redirect the others").

About 2-4 minutes per keyword from edit-the-sheet to risk-assessment-written-back. Across 100-200 keywords for an agency portfolio, you have a full cannibalization audit in an evening – not a week.

The problem this solves

Cannibalization is one of those SEO problems that's obvious in hindsight and invisible until you go looking. You write a post about "B2B email marketing best practices" in 2022. In 2023 a new strategist writes a similar post titled "Email marketing for B2B SaaS". In 2024 someone updates a product page with the H1 "B2B email marketing software". Now Google sees three pages from your domain all kinda-sorta targeting the same intent, and none of them rank because Google can't decide which one is the real answer.

Most teams find out about cannibalization in one of three ways:

They don't. The pages quietly underperform. Traffic that should be going to one page at position 2 is instead split across three pages at positions 14, 27, and 41. None of them earns meaningful traffic.
A consultant audits the site and points it out. That's a $5-15k engagement, takes 2-4 weeks, and produces a one-time report. Six months later there's new cannibalization the audit didn't catch.
They run cannibalization tools. Existing tools (Keylogs, Ahrefs cannibalization report, etc.) flag pairs of URLs ranking for the same keyword. But "two URLs rank for the same keyword" isn't cannibalization on its own – sometimes it's totally fine (a category page + a deep article that target related but distinct intents). The signal-to-noise ratio is bad enough that most teams turn the alerts off.

This workflow gets the signal-to-noise right because of the AI step. GPT-4o looks at the full ranking pattern – how many pages, at what positions, with what click distribution – and judges whether it's actually a cannibalization problem or just two pages doing different jobs. It's the difference between "1 page at position 4 + 1 page at position 38 = probably fine, they target different intents" and "5 pages between positions 8-42 = clear cannibalization, consolidate".

For an agency with multiple retainers, this turns cannibalization auditing from a quarterly $5k consultant engagement into a permanent monitor that catches new cannibalization within days of it appearing.

What you put in

A one-time setup:

A Google Sheet with two tabs:
- Tab 1: Client URLs – one row per client (domain + GSC property)
- Tab 2: Target Keywords – one row per keyword (keyword text + which client it's for)
GSC access to each client's property (service account)
An OpenAI API key with GPT-4o access

Per-run inputs: none. The workflow runs every time someone edits the Target Keywords sheet.

What you get out

For each keyword you add or edit, within 2-4 minutes the sheet row gets these columns populated:

Risk level – High / Moderate / Low / None
Reasoning – 1-2 sentences explaining the risk assessment (which pages compete, at what positions, what the pattern suggests)
Observations – Specific data points the AI noticed (e.g. "Top-ranking page has 4x the impressions but worst CTR – likely outdated meta description")
Remediation steps – Concrete next actions (consolidate-and-redirect, rewrite-the-loser, distinguish-the-intents, leave-alone)
The competing URLs themselves – Each cannibalizing page's URL, current position, clicks, impressions, CTR

The sheet becomes your live cannibalization dashboard. Sort by risk level, filter to a single client, hand the "High" rows to your strategist as a fix list. The "None" rows are already audited – no work needed.

How long per keyword

Workflow time: 2-4 minutes per keyword. GSC API call (1 min), grouping + matching logic (a few seconds), GPT-4o analysis (1-2 min), write back to sheet (a few seconds).

Your time: seconds. You add a keyword, the workflow does the rest while you go do something else. Come back later, the row is populated.

For a full portfolio audit of 200 keywords across 4 clients: about 8-12 hours of workflow runtime (parallel where possible), zero hours of human time during the run. Down from 16-32 hours of manual cannibalization checking for the same coverage.

When this is a good fit

You manage multiple client sites (the workflow's 4-client routing is built for agencies). For a single site, the manual version is fine.
Your clients have at least 50 indexed pages each. Below that, cannibalization is rare and easy to spot by eye.
You're already maintaining a target keyword sheet per client. The workflow piggybacks on that – it doesn't ask you to create new infrastructure.
You're willing to act on the output. Cannibalization risks need a strategist's call (consolidate? rewrite? leave alone?). The workflow does the detection + suggestion, not the execution.

When this isn't a good fit

You manage one site. Build the manual version in an afternoon, save 1-2 weeks of setup.
You don't have target keywords sheeted out. The workflow needs that as input – without it, there's no scope of what to check.
You're trying to automate the fix as well as the detection. The remediation steps are suggestions; the actual consolidating / redirecting / rewriting needs human judgement and access to your CMS. Different workflow.
Your clients use non-Google-Search-Console SEO tooling exclusively. The workflow runs on GSC API. Bing Webmaster Tools data is out of scope.

What's actually under the hood

The workflow runs on n8n. About 25 nodes. The shape:

Google Sheets trigger – watches the Target Keywords tab, fires on edit
Fetch client URLs from the Client URLs tab
Router – 4-way client routing based on which client the edited keyword belongs to
Per-client branch: fetch 30 days of GSC data for that client
Per-client branch: group the GSC response by keyword (each keyword gets an array of pages that rank for it, with position + clicks + impressions + CTR)
Merge – combine all 4 client branches back into one stream
Match the target keywords from the sheet against the actual GSC data – flag keywords not found in GSC, only continue with the ones that have ranking data
GPT-4o agent – for each matched keyword, ask GPT-4o to assess cannibalization risk given the page rankings
Structured output parser – force the AI response into a fixed JSON shape (risk level + reasoning + observations + remediation + competing URLs)
Write the parsed result back to the Target Keywords sheet

The 4-way routing is what makes this scalable. Each client's GSC property has its own service-account access and its own rate-limit budget – running them in parallel branches means the workflow doesn't choke on the slowest client.

The AI prompt is where the signal-to-noise problem gets solved. The prompt encodes the rules: 5+ pages from the same domain ranking for the keyword = High risk. 3-4 pages with overlapping intent = Moderate. 2 pages with clear dominance (one at position 1-3, one below 20) = Low or None. The prompt also gets fed the click + impression distribution, so it can tell the difference between "2 pages competing but one is winning" (often fine) and "2 pages competing and splitting the clicks" (real problem).

What you own at handover

The full n8n workflow file
The Google Sheet templates (Client URLs tab + Target Keywords tab) with the column structure the workflow expects
The GPT-4o prompt + structured output schema, documented in plain English
A setup doc covering service account creation per client, GSC permission grants, OpenAI key setup, Sheets sharing
A runbook for the common errors: GSC quota exceeded for a single client, AI parser failure when GPT-4o invents fields, sheet-row-conflict when two edits hit close together
A Loom showing one keyword going through the full workflow
Optional add-on: scale beyond 4 clients (the workflow can be parameterized to N clients with a different routing approach – mention if you have 5+ retainers)
Optional add-on: bulk-import of historic target keywords (so you can audit 1,000+ legacy keywords in one batch run instead of one-at-a-time)

Why I can help

The mechanics are roughly: GSC API + GPT-4o + Google Sheets. The hard part is the risk-classification logic, which is where most cannibalization tools (including the well-funded ones) fail.

Three specific decisions in this workflow that took iteration:

The "is this actually cannibalization or just multiple pages doing different jobs" check. GPT-4o is good at this if you give it the click + impression + position distribution. Naive versions just count pages ("3 pages = cannibalization") and produce 80% false positives.
The 30-day window. Too short (7 days) and the data is too noisy – Google's SERP shuffles a lot week-to-week. Too long (90 days) and you miss recent cannibalization. 30 days is the sweet spot for B2B SaaS sites; we'd retune for sites with very different traffic profiles.
The structured output parsing. GPT-4o's natural output drifts in format – sometimes risk levels are "High / Med / Low", sometimes "1/2/3/4", sometimes "Critical / Warning / OK / Fine". The structured output parser locks the shape so the sheet always gets exactly the same column structure regardless of how GPT-4o phrased the response.

These are the things that turn the workflow from "interesting demo" into "thing that runs unattended and produces output your strategist trusts". The setup phase tunes them for your specific portfolio.

What it costs to run

Per keyword: about $0.02-0.05 in OpenAI tokens (GPT-4o is the cost driver; the GSC API is free). For 200 keywords audited per month: $5-15/month in OpenAI cost.

n8n hosting: about $10-15/month if self-hosted at the volume an agency uses.

Build cost: 1-2 weeks of my time. The AI prompt tuning is the part that takes the longest – it gets tuned against real cannibalization examples from your client portfolio so the risk-classification matches what your strategist would have called manually.

How to start

Book a call. Bring one client's target-keyword sheet + access to their GSC. We'll run the workflow on 10-20 keywords during the call so you can see what the output looks like for a site you already understand. From there, the rollout to additional clients is a setup per client (mostly service-account plumbing).