How to Find Orphan Pages on Your Website

13 min read

An orphan page is any page on your website that has no internal links pointing to it.

It might exist in your sitemap. It might even be indexed by Google. But because nothing on your site links to it, crawlers can only find it by going directly to the URL - there is no navigable path from the rest of your site.

For SEO, this creates two problems. First, orphan pages receive no internal PageRank. Every page that earns authority through backlinks and content quality can pass that authority to other pages through internal links - but not to orphans. They sit isolated, disconnected from your site’s link graph.

Second, Googlebot relies heavily on internal links to discover and recrawl content. A page with no internal links will be recrawled far less frequently than one embedded in your site’s structure. New content on that page takes longer to be indexed. Updates take longer to surface.

The good news is that orphan pages are findable, and fixing them is mostly a matter of adding the right internal links.


What Causes Orphan Pages

They accumulate in predictable ways:

Content published without being linked from anywhere. A blog post goes live, gets submitted to the sitemap, but no one added a link to it from a related post, a relevant guide, or a category page. This is the most common cause on content-heavy sites.

Pages created for specific campaigns that were never removed. A landing page for a seasonal promotion, an event page, a PPC-specific page. The campaign ends, the page stays, the link from the campaign is removed but the page itself is never redirected or deleted.

Old content left behind during a redesign or navigation update. When nav menus are rebuilt, some pages that were previously reachable through the old navigation fall off. They stay in the sitemap but nothing in the new structure links to them.

Pagination pages and filtered views. Page 2, page 3, filtered search result pages - these are often generated dynamically but not explicitly linked from anywhere.

Tags and categories with very few posts. A tag page with one post, no description, and no links pointing to it is essentially an orphan that adds little value.


Step 1: Crawl Your Site to Map Reachable Pages

Start by running a full-site crawl that begins from your homepage and follows every internal link it finds. The output is the list of pages reachable through your internal link structure.

Using redCacti: Add your site and run a crawl.

Add a website in redCacti to crawl

The pages report shows every URL discovered through internal links. Export this as a CSV.

redCacti Crawl report showing pages crawled along with identified internal and external links, broken links, orphan pages, and link suggestions

Using Screaming Frog (free up to 500 URLs): Enter your homepage URL and run a standard crawl. Export all crawled URLs from the Internal tab.

What this crawl represents: This is essentially what Googlebot sees when it crawls your site starting from your homepage. Every URL in this list has at least one internal link pointing to it. Every URL NOT in this list is a candidate orphan.


Step 2: Export Your Sitemap URL List

Your sitemap is your declared inventory of pages - the URLs you have told Google and other crawlers about.

Getting the sitemap URL list:

Most sites publish their sitemap at yoursite.com/sitemap.xml. Open this in a browser and you will see the list of URLs, or it may point to a sitemap index with multiple child sitemaps.

For a site with a large sitemap, use a tool to extract all URLs:

# Extract all URLs from a sitemap using curl and grep
curl -s https://yoursite.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+'

Or paste the sitemap URL into a sitemap parser tool to get a clean list.

Export this list to a separate column in the same spreadsheet as your crawl results.


Step 3: Compare Crawl Results Against the Sitemap

The orphan pages are the URLs that appear in your sitemap but do not appear in your crawl results.

In a spreadsheet:

  1. Column A: all URLs from your site crawl
  2. Column B: all URLs from your sitemap
  3. Use VLOOKUP or COUNTIF to find URLs in column B that have no match in column A
=IF(COUNTIF($A:$A, B2)=0, "ORPHAN", "OK")

Any row marked “ORPHAN” is a page that exists in your sitemap but cannot be reached by following internal links from your homepage.

In redCacti:

The orphan pages report does this comparison automatically. Pages flagged as orphans are those present in your sitemap or discovered through other means but not reachable through your internal link graph.

redCacti Orphan Pages report showing pages crawled that were identified as orphan pages

This is the easiest available way to identify orphan pages regularly and get the maximum out of your SEO efforts.

If you would like to try out redCacti for free, sign up here.


Step 4: Cross-Reference with Google Search Console

GSC adds a third data set: pages Google has actually indexed. This is valuable because it can reveal orphan pages that are not even in your sitemap.

Finding indexed pages not in your sitemap:

In GSC -> Indexing -> Pages, look at the “Indexed, not submitted in sitemap” list. These pages are indexed but you have not declared them in your sitemap. Some may have internal links (fine), some may be genuine orphans that Google found through external links or old cached data.

Finding indexed orphans:

The overlap you want to identify:

  • In GSC index: yes
  • In your sitemap: yes (or no)
  • In your crawl results: no

These are the highest-priority orphans - Google knows they exist but cannot reliably reach them through your site’s link structure.


Step 5: Classify Each Orphan

Once you have your list, classify each orphan before deciding what to do. Not every orphan is the same kind of problem.

Type A - Valuable content that should be linked: Good pages that simply got published without being linked from anywhere. These should get internal links added pointing to them from relevant pages. This is the most common type and the most straightforward fix.

Type B - Intentionally standalone pages: Privacy policy, terms of service, some legal pages. These may legitimately have no contextual internal links beyond a footer link. Check whether a footer or navigation link counts as an internal link in your crawl - it often does. If these pages show as orphans, it may mean your footer links are not being followed by the crawler.

Type C - Old campaign and landing pages: Pages that served a past purpose but have no ongoing value. Assess whether to: delete and redirect to a relevant page, keep with a noindex tag, or link from a relevant archive or resources page.

Type D - Duplicate or thin content: Near-duplicate pages, very thin pages with no unique content, or generated pages (empty tag/category pages). These are candidates for noindex or deletion rather than internal linking.

Type E - Pages that should not be indexed: Internal tools, admin-adjacent pages, draft content that was published by mistake. These should have a noindex tag added or be password protected, not fixed with internal links.


A Simpler Mental Model

Think of your website as a building. Internal links are the corridors. A page with no internal links pointing to it is a room with no corridor connecting it to the rest of the building. A visitor who finds the room from outside (a direct URL) can enter, but anyone walking through the building will never discover it exists.

Fixing orphan pages means adding corridors - placing internal links on relevant pages that naturally lead readers (and crawlers) to the orphaned content.


How Many Orphan Pages Is Normal?

Our analysis of 95 enterprise SaaS companies found orphan pages across virtually every site audited - even well-maintained ones. The issue is not whether you have orphan pages (you almost certainly do) but the proportion.

A site with 200 blog posts and 15 orphans is in reasonable shape. A site with 200 posts and 80 orphans has a structural problem - it means a significant portion of its content is disconnected from the internal link graph.

As a rough benchmark: if more than 15-20% of your indexed pages have no internal links, orphan page cleanup should be a priority.


Summary Checklist

  • Run full-site crawl starting from homepage - export URL list
  • Export sitemap URL list
  • Compare the two lists to identify URLs in sitemap but not in crawl
  • Cross-reference with GSC indexed pages for additional orphans
  • Classify each orphan: valuable content / intentional standalone / old campaign / thin content / should not be indexed
  • Prioritise orphans with existing backlinks or historical traffic
  • Plan internal linking fixes for Type A orphans
  • Plan deletion or noindex for Type C, D, E orphans


Orphan pages are one of the easier SEO wins available on most sites because the fix - adding internal links - does not require content creation, technical changes, or external outreach. It is purely a matter of connecting what you already have.

The challenge is finding them systematically. Manual browsing will not surface orphans, because by definition there is no link to follow to them.

Find orphan pages on your site ->

The free sitemap audit compares your sitemap against crawl data to surface orphan pages alongside broken links.


Also in this series: How to Fix Orphan Pages That Google Can’t Find - How to Identify Pages Missing from Your Sitemap

Newsletter

Weekly SEO teardowns

Internal linking, broken links & orphan pages — straight to your inbox, every week.

Subscribe free

redCacti Team

The team behind redCacti - helping websites improve their SEO through better internal linking.

Related Posts