How to perform technical seo audit like a boss! by Digital Marketing Sapiens

How to Perform Technical SEO Audit like a Boss!

SEO is dead. SEO is not what it used to be in [Insert Year Here]. You might have come across a number of articles discussing these or similar topics. What has actually happened is Google’s take no prisoners approach towards spam has made it harder for the sites to rank using run of the mill SEO tactics. Over last few years SEO has finally started heading towards achieving the synergy between what the user/visitor wants to see and what a search engine like Google wants to see. Google’s mission statement reads, “Google’s mission is to organize the world’s information and make it universally accessible and useful”. “Useful” is the keyword here. Just throwing a bunch of $$$ at building links to your site doesn’t drive rankings anymore. Stuffing your content with keywords doesn’t help either. What does work is creating value and delivering an experience to both the users and search engines. And it all starts with a thorough technical audit of the site to make sure that there are no bottlenecks affecting user experience or crawling of the site. In this article we will walk you through the exact same steps that we follow here at Digital Marketing Sapiens. Let’s face it, we do add some secret sauce to it; drop a lineif you want us to work our magic on your site. Note – The audit assumes that you have set-up and verified search console (Google Webmaster Tools) account for your site(s). If you have not, please visit search console and follow the instructions to verify your site. Enough talking, let’s get our hands dirty, shall we?

Disillusionment Audit The Disillusionment Charm makes you blend in with the surrounding, making it impossible for anyone to spot you or to communicate with you. The charm was used intentionally by Professor Dumbledore and, the fan favorite, Lord Voldemort when they didn’t want to be found. If you are wondering why people just can’t find your website even after all the links that you have built, maybe, the answer is you have, however unintentionally, cast a disillusionment charm on your site. The steps mentioned below help you cast a counter charm and make the website more visible to search engines and users alike.

Robots.txt File The Robots.txt file controls how the search engine bots can access a site. There are two major reasons why every site should have a properly configured Robots.txt file.

 

To make sure that all the pages that need to be indexed are available for search engines. To make sure that the private/sensitive pages do not get indexed accidentally. How does the Robots.txt file work? User-agent: * Disallow: /

Explanation

User-agent: It refers to the bot that is trying to access the site.*: It means the directive applies to all the user-agents. Disallow: It means the bots should not visit the pages/directories mentioned here. /: It means that the bot is not allowed to access any of the pages on the site. Typically only the disallow directive is used followed by a directory path (for example, /don’tgo-here/). A webmaster can specify different access permissions to different bots based on their user-agent string. How to verify if it is working as expected?  

Try to access the file manually – Visit http://yourdomain.com/robots.txt and make sure that the file accessible. Login into your search control account. Navigate to Crawl = > Robots.txt Tester in the left pane. You will see a window similar to one below.

Enter your important pages in the box (highlighted) and click test. You need to make sure that all your important pages are accessible to the bots. The snapshot below shows the expected results.

XML Sitemap Simply put, an XML sitemap is collection of all the pages in your site. It helps in improving the indexation of the site and also provides information about site hierarchy to the search engines. How to verify is the sitemap is working as expected?   

Try to access the file manually – Visit http://yourdomain.com/sitemap.xml and make sure that the file is accessible without any errors. If the sitemap is missing, you can use a free tool like the one here or here. Click here for a comprehensive list of XML sitemap generator tools. Login into Search Console and navigate to Crawl => Sitemaps in left pane. Check if there are any issues with the sitemap(s) that you have submitted or submit one if you haven’t already.

Pro-tip: Run the pages listed in XML sitemap through Screaming Frog to make sure that there are no 301 redirects or 404 error pages present.

HTML Sitemap An HTML sitemap is again a collection of all the important links in the site. A well-defined HTML sitemap helps visitors in locating content on your site. How to check if HTML sitemap is working as expected?  

Try to access the file manually and make sure the page is accessible. Pro-tip: Run the pages listed in HTML sitemap through Screaming Frog to make sure that there are no 301 redirects or 404 error pages present.

Canonicalization Canonicalization refers to making sure that each page/file on the site is accessible through only one URL. How to make sure the Canonicalization is working as expected?

Check if the pages on your site are accessible through multiple URL combinations. For example,   

http://www.yourdomain.com/ http://www.yourdomain.com/index.html http://yourdomain.com/



http://yourdomain/index.html

Serving same content through multiple URLs results in duplicate content issues and neither search engines do not like to be presented with duplicate content every now and then. How to fix Canonicalization issues?   o o o o 

Set preferred domain – Login into search console. Navigate to Site Settings (click the gear icon) and in the preferred domain section select the option you want. Set-up 301 redirects – Once you decide on your preferred domain make sure that you redirect all the other variations to the preferred one. For the example mentioned above, http://www.yourdomain.com/ – Preferred Domain http://www.yourdomain.com/index.html – 301 redirect to preferred domain http://yourdomain.com/ – 301 redirect to preferred domain http://yourdomain/index.html – 301 redirect to preferred domain Use of Rel=canonical directive – Make sure that each page on your site points to itself through Rel=canonical directive. Refer to the screenshot below to see it in action.

A self-referencing canonical also helps in combating scrapers and/or avoiding duplication issues if you make use query parameters for tracking or other purposes resulting in multiple URLs serving same page content. To learn more about various canonicalization issues and their solutions click here.

Meta Robots Tags These tags are included within the page code and act as directives to control behavior of search engine crawlers at page level. There are 4 different combinations of meta robots tags.    

 

An incorrectly specified meta robot tag might cause the page to drop off the search engine index or might affect crawling and indexing of the pages linked from it. An unintentional (NoIndex, NoFollow) tag on the homepage might cause the entire site to fall out of search engine index. How to verify if the tags are working as expected?

   

Fire up Screaming Frog tool. Set the mode to Spider and enter the site you want to crawl. Once the crawl gets completed, navigate to “Directives” tab and click “Export”. Open your exported CSV file and look at the “meta-robots” column to verify that you have setup the tags correctly.

4XX Errors These errors refer to requests that cannot be fulfilled by server due to some errors or bad syntax due to the fault of the client making the request. Hence these are also known as Client Side Errors. There are number of 4XX errors with 404 being the most common when it comes to website usability. Having large number of 404 errors is generally considered a bad thing as it negatively affects user experience and also impacts proper crawling and indexing of the site. How to check and fix 404 errors?   

Run a XENU scan and check if any 404 errors (both internal and external) are present in the site. Login into Search Console and navigate to Crawl => Crawl Errors to see if Google have encountered any errors on the site. List out all the 404 errors and see if appropriate redirects can be setup for those.

5XX Errors These errors refer to some issue on server side due to which it is unable to fulfill the request. These are also known as Server Side Errors. A large number of 5XX errors indicate low uptime on server side. How to check for 5XX errors?

Login into Search Console and navigate to Crawl => Crawl Errors to see if Google have encountered any errors on the site.

Site Depth

Site depth refers to how many times a user will have click to reach a particular page from the homepage. The closer a page is to root (homepage) the better its chances of getting crawled and receiving link juice or pageviews. How to measure page depth?    

Fire up Screaming Frog. Set the mode to “Spider” Enter your domain name and wait for the crawl to finish. Click on “Site Structure” in right hand corner. You will see a window similar to one below.

See if there are alternatives to the URLs that are 4 or more clicks away from the root.

Blocked Resources These refer to the support files (JS, CSS, images etc.) that Googlebot needs in order to fully render the page. Why does it matter?

Blocking these resources can affect Googlebot’s ability to crawl and index the site. The page layout will also differ substantially from what the user is able to view. How to fix this issue?

Login into Search Console and navigate to Google Index = > Blocked Resources in left pane. See if Googlebot has highlighted any resources that need to be unblocked. Update and verify the robot.txt file to make the resources accessible to Googlebot.

Cloaking/SPAM Detection Cloaking refers to the practice of showing different content to users and search engines depending on user agents. Cloaking can be intentional or an unwanted result of website getting hit by a spam attack. Why does it matter?

Search engines recommend providing exact same content to both users and crawlers. Any attempt at cloaking, when detected, results in penalties and/or substantial ranking drop. When the website gets hit by Pharma or other types of spam, the spammers make use of user-agent strings to provide spam version of the site to search engines and clean version to user. This makes it difficult to detect the attacks until the webmasters get notices in Search Console. How to detect cloaking or SPAM?  

Use UA Switcher plugin for Chrome to mimic different user-agents while visiting the site’s pages. The Plugin has an extensive database of different user-agents that can be spoofed to check and verify the pages.

Yule Ball Readiness Audit Now that you have rid your site of the disillusionment charm and are ready to receive crawlers and visitor, it is time to make the site more presentable. The action items covered below will help you in improving user experience and site usability.

URL Audit URL audit refers to  o o o o 

o o o o o

Friendliness- Are the URLs straightforward, meaningful and short wherever possible? A simple Screaming Frog crawl can tell you if the URLs are friendly or not. Run a Screaming Frog crawl. Export the URLs to a CSV or Excel file. Look for long, unwieldy URL strings. Create friendly URLs and set-up appropriate redirects wherever required. Use of word separator – Check if the site is using hyphen (-) or underscore (_) as word separator. It is recommended to use Hyphen as search engines tend to consider as space rather than a connector like underscore. Run a Screaming Frog crawl. Export the results to a CSV or Excel file. Filter for underscores. Check if the URLs that are using underscores can be changed to Hyphenated version. Make sure you take a look at other factors like number of links to the page before making the change.

  o o  o o o o

Keyword Usage – Check if the URLs contain your target keywords. It is recommended to include your most important keyword in the URL as it clearly highlights the page context to the crawlers as well as the visitors. Consistent Linking Pattern – Make sure that you follow either Relative or Absolute addressing across the site and don’t mix and match randomly. An absolute address refers to full URL path including the domain name. Relative addressing refers to only the directory path and does not include full domain name. We recommend using absolute addressing instead of relative. Canonical URLs – Also make sure that only Canonical version of the URLs is used for internal linking, be it through main navigation or content. This saves unnecessary redirection delays. Run Screaming Frog crawl. Navigate to “Response Codes” tab. Export data to CSV file and check for internal 301 redirects. Replace the redirected URLs with their destination URLs.

Main navigation Usually displayed at the top of your page(s) and contains links to your internal pages.  

Check if all the important product/services pages are linked from main navigation. In case of large sites check if all the major directory index pages are linked from main navigation.

Breadcrumb Navigation The term breadcrumb comes from the fairy tale Hansel and Grethel. They used breadcrumbs to find their way home when they were left alone in the woods. Breadcrumb navigation refers to a prominently placed navigation trail to help users understand where they are in the site and the corresponding site hierarchy. How do Breadcrumb help?  

Clearly defined hierarchy of navigation path makes it easier for visitor to locate where they are browse upwards easily if required. In a content heavy site or an Ecommerce site, breadcrumb navigation helps in improving user flow between various pages.

Fold Analysis Search engines recommend placing important blocks of content above the fold. Google actively looks at the layout of the page and rates sites based on above the fold user experience. Why does it matter?

Users come to the site looking for information or the services/products that they are interested in. If the area above the fold is dominated by ads or doesn’t really have much content it results in bad user experience and, more often than not, a bounce. How to verify content is visible above the fold?   

The easiest way to do that is by manually visiting the site and checking if the content is visible at least partially. Tools like Where is the fold help in checking the site for multiple screen resolutions at once. Getting a list of typical screen resolutions from Google Analytics and combining that with where is the fold tool would be the best way to confirm if the site content is visible above the fold.

Meta Tags Meta tags contain the information that is generally displayed in search results. Meta tags (title and description) can and do influence the Click through Rate (CTR) to your site. How to detect Meta tag issues?     

Run a Screaming Frog Crawl. Export the results into a CSV or Excel file. Look for pages with duplicate or missing Meta tags. Look for pages where titles are longer than 60 characters and descriptions are longer than 150 characters. Login into Search Console. Navigate to Search Appearance = > HTML Improvements and check if Google has highlighted any Meta errors.

Header (H1) Tags Header tags (H1, H2, H3 etc.) are used to define heading and hierarchy within the page content. H1 tags has the highest weightage and should be   

Unique for each page. Used only once per page Different from Title tag (recommended) Why does the H1 tag matter?

H1 tag is the most important heading on your site. You can think of it as the digital equivalent of front-page headline on a newspaper. Being the most visible element on the page, H1 tag plays an extremely important role in retaining user attention and highlighting context of the page. How to identify and fix H1 tag issues?  

Run a Screaming Frog crawl for the site. Export the crawl results in either a CSV or an Excel file.

   

Check for pages with multiple H1 tags. Check for pages where H1 tag is same as Meta title tag. Check for pages where no H1 tag is present. Check for pages with duplicate H1 tags.

Structured data Structured data refers to special snippets of code that are added to the page code. The markups vary depending on website context and can refer to recipe, videos or a local business. Why does it matter?

These snippets provide additional context to the search engines and the reviews/ratings also show up in search results and have been proven to influence to CTR positively. How to check for structured data?

Use Google’s structured data testing tool to check if your set-up is working as expected. You can also login into Search Console and navigate to Search Appearance = > Structured Data to see if there are any issues that need to be fixed.

Thin Content Content plays an extremely important role in achieving good ranking positions in search results. Having in-depth content related to central page theme is one of the major ranking signals and also helps in retaining users on the page by helping them achieve their search objective. Thin content refers to content pieces of inadequate length that do not fulfill user’s search objectives and more often than not result in a bounce. How does thin content affect?  

Search Engines tend to ignore pages with thin content. The crawl/indexation frequency for think content pages is usually very low. Pogosticking effect – Even if the page gets ranked for some reason, the users don’t stick around and immediately go back to search page resulting in short click. Short clicks usually indicate low quality and result in ranking drop. How to identify think content pages?

  

Run a Screaming Frog crawl on the site. Export the crawl results in a CSV or an Excel file. Navigate to column “word Count” and look for pages with less than 250 words. How to fix thin content pages?

   

Define a central theme for theme for your page. Do thorough keyword research and find out related terms around the central topic. Go through search suggestions and Q&A sites to get a gist of what is being asked for related to your central topic. Identify top competitors and analyze their content.

Duplicate content Duplicate content might refer to within the site duplication or content that plagiarised from external sites. Why does it matter?

Search Engines do not like duplicate content. If your page(s) has duplicate content the chances of getting good ranking drop substantially. A duplicate content does not necessarily need to be plagiarised content. The site might also accumulate duplicate content due to canonicalization issues, shared content across subdomains or even regional sites sharing same content. How to identify duplicate content?  

Use copyscape.com or similar tool to identify if the content is present of any external source. Use siteliner.com to identify within the site duplication.

How to fix duplicate content pages?    

Page speed refers to time any webpage takes to fully render when requested. Why does it matter?   

Google considers page speed as a ranking factor. [Source] Nearly half the users expect the sites to load in less than 2 seconds. [Source] With more and more people browsing the web using mobile devices, it is imperative for the webmasters to improve site loading speeds. How to improve page speed?



Head over to Google page speed insights or http://webpagetest.org/ to measure your pages’ loading performance.

ď&#x201A;ˇ

Note down and implement the recommendations or share with your developer for implementation.

The action items described in the audit and the checklist should help you in identifying and fixing any issues that might be affecting your websiteâ&#x20AC;&#x2122;s visibility in search results. You should also be able to significantly improve the user experience. If you find the audit helpful, please show us your appreciation by sharing it on social media. If you want to receive our blog posts via email, subscribe to our mailing list.