Internal Links

Internal Linking Audit: A Step-by-Step Guide

Internal Linking Audit: A Step-by-Step Guide

How long has it been since you checked if your internal links actually make sense?

Key Takeaways

  • Step 1: Crawl Your Site
  • Step 2: Find Your Orphan Pages
  • Step 3: Analyze Link Depth and Distribution
  • Step 4: Check for Broken and Redirected Internal Links
  • Step 5: Evaluate Your Anchor Text
  • Step 6: Map Your Content Clusters

I'm not talking about running a quick broken link checker and calling it a day. I mean really sitting down, pulling up your site's link structure, and asking yourself whether the way pages connect to each other still reflects what your site is about and where you want visitors to go. If you're like most people, the answer is "never" or "maybe once, a long time ago." That's not judgment — it's just reality. Internal links are one of those things that accumulate quietly, like dust on a shelf. You don't notice the mess until you really look.

So let's look. Here's how to run an internal linking audit that actually tells you something useful. Fair warning: some of these steps are quick and some will eat up an entire afternoon. That's just how it goes.

Step 1: Crawl Your Site

You need a crawler. You probably already know about Screaming Frog — it's the industry standard and it's been around long enough that most SEO professionals could run it in their sleep. The free version handles up to 500 URLs, which is fine for smaller sites. If you've got more than that, you'll need the paid license or an alternative like Sitebulb or the crawler built into Ahrefs. There are also open-source options like Scrapy if you're comfortable with Python, though setting those up takes more time than most people want to spend on this.

Point your crawler at your homepage and let it go. What you're doing is simulating what a search engine does — following every internal link on every page, building a map of your entire site from the inside out. This takes anywhere from thirty seconds for a small site to several hours for a large one. Let it finish completely. Don't stop it early because you're impatient. You need the full picture. This ties directly into How to Fix Orphan Pages with Internal Links, which is worth reading next.

While that's running, I want to take a quick detour and talk about why crawling from your homepage specifically matters. Your homepage is almost certainly the most-linked-to page on your site, both internally and externally. It's where crawlers start, it's where most visitors land first, and it's the page with the most authority. When you crawl from the homepage, you're seeing your site the way Google sees it — starting from the strongest point and following links outward. If a page can't be reached by following links from the homepage, that's a significant finding. It doesn't necessarily mean the page is invisible to Google (it might be in your sitemap, or external sites might link to it directly), but it means your internal architecture isn't supporting it.

Some people like to also crawl from their XML sitemap, which gives you a list of every URL you've told search engines about. Comparing the sitemap crawl to the homepage crawl can reveal interesting discrepancies — pages in the sitemap that aren't linked to from anywhere, pages that are well-linked but missing from the sitemap. Both scenarios are problems worth knowing about.

Step 2: Find Your Orphan Pages

Internal Linking Audit: A Step-by-Step Guide
Internal Linking Audit: A Step-by-Step Guide

Once the crawl is done, the first thing to look for is orphan pages. These are pages that exist on your site but have zero internal links pointing to them. They're islands. Disconnected from everything. And they're shockingly common.

In Screaming Frog, you can find these by going to the "Links" report and filtering for pages with zero inlinks. In Sitebulb, there's a dedicated orphan pages report that makes it even easier. However you find them, the output is going to be a list of URLs that nothing on your site links to.

Now here's where it gets interesting, and where I tend to go down a rabbit hole every single time. Not all orphan pages are equal. Some are genuinely forgotten content — old blog posts that fell off the front page and were never linked to from anywhere else, product pages for discontinued items, landing pages from campaigns that ended two years ago. These are easy decisions. Either link to them from somewhere relevant, redirect them to a better page, or just remove them entirely.

But some orphan pages are surprising. Sometimes you'll find a page that gets significant organic traffic despite having no internal links. This happens when external sites link to it directly, giving it enough authority to rank on its own. These pages are gold mines in a way — they're already performing, and adding internal links to them (and from them to other relevant pages) could boost their performance even further while spreading some of that externally-earned authority to the rest of your site. I once found a blog post during an audit that was driving 15% of a client's total organic traffic, and literally nothing on the site linked to it.

Not the homepage, not the blog index, not a single other post. It was ranking purely on backlinks from a Reddit thread that went viral. We added it to the site's main content hub, linked to it from five related posts, and linked from it to the client's money pages. Traffic to those money pages went up measurably within a few weeks. That's not a guaranteed outcome — nothing in SEO ever is — but it illustrates why orphan page discovery matters.

The less exciting orphan pages are the ones that shouldn't exist at all. Old tag pages, parameter-generated URLs, test pages someone forgot to delete, pages created by plugins that nobody asked for. WordPress is notorious for creating pages you didn't know about — author archives, date-based archives, attachment pages for every image you've ever uploaded. These phantom pages aren't just useless; they're actively wasteful. Every URL that Google crawls and indexes is consuming a small portion of your site's total crawl budget and diluting your site's overall quality signals. Get rid of them. Noindex them, redirect them, delete them. Whatever makes sense for each case. We cover this in more detail in The Ultimate Guide to Internal Linking for SEO.

This step alone, honestly, could take you half a day on a site of any real size. It's worth the time. Orphan pages are probably the single most common structural problem I see in audits, and fixing them has a surprisingly large impact relative to the effort involved.

Step 3: Analyze Link Depth and Distribution

Now we get into the stuff that requires a little more patience and a little more thought. Link depth is the number of clicks it takes to reach a page from the homepage. A page linked directly from the homepage has a depth of 1. A page linked from a page that's linked from the homepage has a depth of 2. And so on.

Pull up the crawl depth report from your crawler. Most decent SEO crawlers will show you a distribution — how many pages are at depth 1, depth 2, depth 3, and beyond. What you want to see is a pyramid shape, or something close to it. A relatively small number of top-level pages at depth 1 (your main categories, your most important content), a larger number at depth 2 (subcategories, key blog posts), and the bulk of your content at depth 3 or 4. What you don't want to see is a large chunk of pages at depth 5, 6, 7, or deeper. Those deep pages are hard for crawlers to find and they receive very little internal link equity.

But — and this is where I always start second-guessing the simple version of this advice — depth isn't purely a function of hierarchy. It's a function of linking. If your homepage links to a blog post directly, that post is at depth 1 regardless of where it sits in your content hierarchy. Internal links create shortcuts that can dramatically reduce the effective depth of pages that are hierarchically deep. This is why hub pages, "best of" roundups, and curated resource pages are so powerful from an architectural standpoint. They create direct pathways from high-authority, shallow pages to deep content that would otherwise be buried.

Link distribution is the related but distinct question of how evenly internal links are spread across your pages. In most sites, the distribution is extremely uneven. The homepage and a few popular pages have hundreds of internal links. Most pages have a handful. And some pages — the orphans we already discussed — have none. A certain amount of unevenness is natural and even desirable; you want your most important pages to have the most internal links. But extreme imbalances suggest a structural problem.

Look for pages that should be important but have very few internal links. These are often your "money pages" — product pages, service pages, conversion-oriented landing pages. Ironically, these are the pages that get the least editorial attention because they're not blog posts or news articles that get regularly published and linked to. Your blog might produce three new posts a week, each one getting links from the homepage feed, from category pages, from related posts widgets. Meanwhile, your core service page hasn't received a new internal link in six months. This imbalance is almost universal, and correcting it is one of the highest-impact things you can do in an internal linking audit.

I'd suggest creating a simple spreadsheet at this point. List your most important pages — the ones you most want to rank — and note how many internal links each one has. Then compare that to your average. If your average page has 15 internal links and your most important page has 4, something needs to change. If you want to go further, How to Create a Site Architecture That Search Engines Love has you covered.

Step 4: Check for Broken and Redirected Internal Links

This one's straightforward in concept but tedious in practice. Broken internal links — links that point to pages returning 404 errors — are obviously bad. They waste crawl budget, they create dead ends for users, and they squander whatever link equity the linking page was trying to pass. Every SEO crawler will flag these for you. Fix them. Either update the link to point to the correct URL, redirect the broken URL to the right destination, or remove the link entirely if the content it pointed to no longer exists.

Redirected internal links are a subtler problem. These are links that point to URLs that don't 404 but do redirect — usually 301 redirects from an old URL to a new one. The link still works, technically. The user ends up in the right place. But there's a small efficiency loss. Every redirect adds a tiny bit of latency, and while Google says it passes link equity through 301 redirects, there's long-standing debate about whether some equity is lost in the process. More importantly, having a lot of internal redirects is just sloppy. It suggests a site that's been restructured without updating its internal links, which is a maintenance problem that tends to compound over time.

The fix is easy: update the links to point directly to the final destination URL instead of the old URL that redirects. It's just tedious because you have to find every instance of the old URL across your entire site and update it. If you're on WordPress, a search-and-replace plugin like Better Search Replace can handle this in bulk. If you're on a custom CMS, you might need to hit the database directly. Either way, it's unglamorous work but it tightens up the technical health of your link structure.

Step 5: Evaluate Your Anchor Text

Anchor text — the clickable text of a link — is something people obsess over for external links but largely ignore for internal links. That's a mistake. The anchor text of your internal links tells search engines what the target page is about. If every internal link to your "hiking boots" category page uses the anchor text "click here," you're wasting a signal. If those links instead use anchor text like "hiking boots," "best hiking boots for beginners," or "our hiking boot collection," you're giving Google clear information about the page's topic.

During your audit, export the anchor text data from your crawler. Look at the most-linked-to pages and examine what anchor text is being used. You're looking for a few things here. First, is the anchor text relevant? Does it describe what the target page is actually about? Second, is there reasonable variety? Using the exact same anchor text for every link to a page looks unnatural, though for internal links this is less of a concern than it is for external links. Third, are there instances of meaningless anchor text — "click here," "read more," "learn more," "this page" — that could be replaced with something descriptive?

Don't go crazy with this. You don't need to hand-craft every single anchor text on your site. But you should make sure your most important pages are being linked to with relevant, descriptive anchor text at least most of the time. It's a small signal, but small signals add up.

Step 6: Map Your Content Clusters

This is the step where the audit transitions from technical inspection to strategic thinking. Content clusters — or topic clusters, or hub-and-spoke models, whatever you want to call them — are groups of related pages connected by internal links. The idea is that you have a central "pillar" page covering a broad topic, surrounded by supporting pages that cover specific subtopics in detail, all linked together. This connects to what we discuss in Pillar Pages and Topic Clusters: The Complete Guide.

Does your site actually have these clusters? Or does it just have a bunch of blog posts floating around independently? During your audit, pick a few of your most important topics and trace the internal links. Start at what should be the hub page and see where the links go. Then visit the pages that should be supporting content and see if they link back to the hub and to each other. In my experience, this is where most sites fall apart. They might have a great pillar page and great supporting content, but the links between them are sparse or nonexistent. The content exists but the cluster doesn't, because clustering is a function of links, not just topical relevance.

If you find that your clusters are poorly connected — and you almost certainly will — the fix is to go through each cluster and add the missing links. Every supporting page should link to the pillar page. The pillar page should link to every supporting page. Supporting pages should link to each other where it makes sense contextually. This creates a tight web of related content that signals topical authority to search engines and helps users find related information.

I've gone back and forth over the years about how formal this cluster mapping needs to be. Some SEO professionals create elaborate spreadsheets mapping every page to a cluster, tracking every link between them, color-coding by status. Others just eyeball it. I think the right approach depends on the size of your site. If you've got 50 blog posts, you can probably hold the cluster map in your head. If you've got 500, you need a spreadsheet. If you've got 5,000, you might need a proper visualization tool — something like Gephi or even a custom script that renders your internal link graph as a network diagram. Seeing the clusters visually can reveal patterns that spreadsheets hide.

There's something else worth mentioning here that I don't see discussed enough. When you're mapping clusters, you'll sometimes find pages that don't fit neatly into any cluster. They're about a topic that's tangentially related to several of your main themes but not squarely within any one of them. These interstitial pages are tricky. You can try to force them into a cluster, but that feels artificial and the links won't make much contextual sense. You can leave them unclustered, but then they risk becoming orphans. Or you can use them as bridges between clusters — pages that legitimately connect two different topic areas and link to both. That last option is often the best, but it requires some editorial judgment about whether the connections are genuine.

Step 7: Build a Fix List and Prioritize

By now you've got a lot of data and probably a long list of problems. Orphan pages. Pages buried too deep. Broken links. Redirect chains. Weak anchor text. Disconnected content clusters. The temptation is to try to fix everything at once. Don't. You'll burn out or make mistakes or both.

Instead, sort your findings by impact. The highest-impact fixes are usually the ones that affect your most important pages — the pages that drive the most traffic, the most conversions, the most revenue. If your top-performing blog post has no internal links to your product pages, that's a high-priority fix. If an obscure tag page from 2019 has a broken link, that can wait.

I generally work through the fix list in this rough order: broken links first (because they're actively harmful and easy to fix), orphan pages next (because they're either wasted assets or wasted crawl budget), then link depth and distribution issues (because they affect your most important pages' ability to rank), and finally anchor text and cluster optimization (because these are more about refinement than repair). There's more to get into, and How to Handle Links During a Site Migration is a great place to start.

Set up a recurring reminder to rerun this audit. Quarterly is ideal. Twice a year is acceptable. Once a year is the bare minimum. Sites change constantly — new content gets published, old content gets updated or removed, redesigns shuffle navigation around — and every change has the potential to introduce new linking problems. An internal linking audit isn't a one-time project. It's a practice. And if you approach it that way, each audit gets easier because you're catching problems early instead of letting them compound for years.

The truth about internal linking audits is that they're never really done. Every time you publish a new page, you're changing the link graph. Every time you update an old page, you might be adding or removing links without thinking about the structural implications. Every time your CMS updates and changes how it generates certain pages, your architecture shifts slightly. You could audit today, fix everything, and find new issues three months from now. That's not a sign of failure — that's just how websites work. They're living systems, always growing and shifting, and there's always more to find.

Anurag Sinha
Written by

Anurag Sinha

Web developer and technical SEO expert. Passionate about helping businesses improve their online presence through smart linking strategies.

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Your email will not be published.

Related Articles