Find and Redirect Broken Links to Your Website

Have you ever tried Opensiteexplorer.com by Moz? It’s a tool that is spidering and indexing the web much like Google, Yahoo and Bing. But they use their data to provide webmasters with information about links. Links are still one of the driving forces in rankings today.

So why am I telling you about the free 30 day trial they offer? Because you can get a quick export of your top pages, and find any pages that are returning a “404 Page Not Found” status code. These are broken links. You may be able to find broken links on your website that other sites are still linking to. And by setting up redirects to a page on your website that is live, you can reclaim some lost potential. Bonus: this is not just an SEO thing. This is a W3C standard thing.

Identify the Problem Pages

Export Top Pages Opensiteexplorer
  1. Enter your website URL.
  2. Navigate to top pages.
  3. Export .csv
  4. Once it is finished, open the .csv

Map Where Those old URLs Should Be Pointing

Sort by status code. We are looking for pages with the status code of 404, Page Not Found. Then sort by Number of Linking Root Domains. Sometimes MOZ shows only 1 linking root domain. And that is your domain. This is definitely worth investigating, but let’s start with the easy pages.

Set up your spreadsheet with the old URLs in the first column, and the new URLs in the second column.
The old URLs should be entered as relative and not absolute.

A relative URL looks like the red portion of the URL below:
Absolute vs Relative URL

The second column for the new URLs should be absolute URLs.

301 Redirects Excel

Don’t know what used to exist on the page? Try seeing if a cached copy exists on the Wayback Machine.

Old Website on Wayback Machine

Don’t have a cached copy? Try putting that URL back into Opensiteexplorer.com and click on some of the pages that are still linking to it.

If you can’t find a close match, that is okay. Then redirect to the home page, or a parent page that will offer users similar content.

It is also okay to leave the 404 errors in place. If a page is truly no longer on your website, or if the tool is picking up a URL that you believe never existed, then leave it to return a 404 error. (These are third party spiders, showing you raw data, they sometimes get lost and show you URLs that are just crawl errors on their part. You can ignore these.)

Once you have the URLs mapped out into two columns, you need to create some simple HTACCESS code. Such as:

Redirect 301 /the-old/relative-url.htm /the-new-url/

You may also speed this up with a spreadsheet formula like:

=CONCATENATE(“Redirect 301 “,A2,” “,B2)

After you’ve applied your formula to the whole list, do a SORT BY, Z-A. Why? Sorting from Z-A will make sure that the longer URLs end up at the top.

Else if on line 1 you have:

Redirect 301 /the-old/ /new/

And on line 2 you have:

Redirect 301 /the-old/relative-url.html /the-new-url/

You won’t get what you expect. Instead you’ll have a redirect chain where users who access /the-old/relative-url.html end up at:
/new/relative-url.htm

Create Redirects

Example 301 Redirects htaccess File
Copy and paste this code into your .htaccess file. You will need to FTP into your website to do this. You can connect via FTP with a third party tool like FileZilla, using the login credentials provided to you by your web host. Or if you are using a website management platform like CPANEL, these platforms usually contain a built in FTP management tool that you can use.

NOTE: Be sure to create a backup of the .htaccess file in case you need to restore it. And don’t overwrite anything in there. Copy and paste your code at the bottom of the file, then save it.

Caution

After you have completed the above make sure to click through your website, and see if you created any redirect problems. You may have a page that is redirecting in what is known as an infinite loop where it never stops redirecting. This might be caused by redirecting to a URL that then points back to that same URL with a redirect. Be thorough and make sure to check your website manually or if you feel comfortable, with an automated tool like XENU Link Sleuth.

Back up Your Work, and Have FTP Access to Your Website

It is possible to break the .htaccess file if this is your first time changing it. It is very strict and extra spaces or incorrectly formatted lines can cause the site to crash. Please have the original copy backed up and available so that you can quickly restore it while you work out the kinks. And as an additional word of caution, if you do break it and you were using a content management system like WordPress while editing, you won’t be able to get back into the content management system. You will be forced to restore the backup by FTP. So make sure you have FTP access before doing this.

Estimated time: depends on the size of your site or if you had/have a blog. But plan on 2-5 hours.

Gains: minor – medium (if you find a heavily linked-to page that is now a broken link)

Advice: It may be easier and safer to have your web developer perform this maintenance activity for you. Feel free to share this blog post with him or her.

Please use the Comments section below for any questions you might have.

Let us know what you think!