On the 11th of September, Brighton SEO held their Crawl and Indexation summit – a fantastic opportunity for SEO professionals and enthusiasts to learn all that’s involved in a search engines crawl and indexation process.

Across the summit, were a number of experienced professionals that presented a variety of case studies. These case studies showcased some of the most common challenges of optimising a website for crawl and indexation. Not only was this covered, but more importantly, how to successfully overcome these difficulties using a variety of techniques, tools and approaches.

Before starting on the findings – it’s important to understand why it’s so important to take crawl and indexation into focus when considering SEO?

Crawling the internet

There is no doubt that the internet has become a fundamental part of our lives. As it continues to evolve, it is becoming far easier for everyone to quickly create new content and upload it onto the internet. In fact, as of 2016 there were around 1.7 billion live websites, this number is now estimated to be around 2 billion. However, out of these large amount of sites, only 400 million are estimated to be actually active.

The question you may find yourself asking, is how does exactly Google find and organise such a massive volume of information?

The process begins with the crawl, the crawl process is essentially how Google searches for new websites across the internet. This is done by using programs – crawlers or spiders – that are specifically designed to crawl the internet and look for new URLs that are linked to existing lists of already crawled sites or sitemaps. Once it finds a new URL, crawlers then follow the path of links that exist on that page and crawl to another page.

On average, it can take between a day or up to 6 months for a site to be crawled. Whilst this may be the case, new websites or recently updated URLs will often be given priority and crawled more frequently. An efficient way to ensure that a website is crawled, is informing Google that it exists using the Google search console.

Indexing crawl data

Once the URLs have been crawled, they are then stored within Google servers to be organised, or indexed. This process starts by the systems rendering the content of the pages, whilst taking key signals such as keywords or website freshness into account. This information is then stored, and continuously tracked in the Search Index (also called Caffeinne).

After all of this is carried out, Google adds entries for all the words built within a webpage and makes it available to appear in the search results.

Why is crawling and indexation important for SEO?

The main purpose of SEO is to improve a website’s ranking position in the search results. This can only be achieved if the content is visible to search engines in the first instance. If a website cannot be found by the search engines, it will not appear in the search results.

With this in mind, the top priority for an SEO specialist is to help search engines finding the content of a website, this can be achieved by using different tactics:

Building a functional site architecture

Site architecture is the structure that specifies how a website’s pages are linked together. Having a clear and concise website structure is fundamental for the crawling and indexing process. Once a crawler finds a website, it is very important that it can easily find all the information that is relevant to be indexed. With this in mind, the architecture of a site needs to be clear and straightforward so that it shows an obvious path to the most relevant parts of the website. Duplicate pages, broken links or pages that are too deep into the structure are common issues that can seriously stall or block the crawl process, ultimately damaging the indexation.

Optimising the crawl budget

Depending on the website, you may find that not all of the content available is relevant enough to be crawled and indexed. That is why controlling the crawl budget is so important as the more content that a search engine needs to crawl on a website, the longer it takes to complete the process. For instance, if we are considering an ecommerce site with thousands or millions of different pages, this can seriously affect the crawl and indexation. With this in mind there are different techniques that can be used to let search engines know exactly which pages are relevant to be crawled and the ones that are not such as using canonicals tags to identify relevant content or blocking the access to non-relevant pages with robots.txt.

Setting up a sitemap

A sitemap is a blueprint of a website structure that includes all the relevant pages in it and helps Google to know what pages are important to be crawled. This is done by setting up an XML file that lists and organizes all the URLs on the website and indicates Google that those pages are good quality landing pages and thus should be crawled and indexed. Although there is no way to guarantee that a website is going to be crawled by setting up a sitemap, it definitely increases the chances of Google finding and considering it for indexation.


Polaris is an award-winning B2B SEO agency in London specialising B2B, PPC, e-commerce and the healthcare industry.

Other posts