Google’s John Mueller recently addressed the issue of discovered but not crawled pages on large websites. He provided insights on how to address this problem, emphasizing the importance of indexing status in the Search Console and the potential reasons why Google may discover a page but decline to index it.
In the Search Console, indexing status informs publishers about how much of their site is indexed and eligible for ranking. If a page is discovered but not indexed, it indicates that there may be an underlying problem that needs attention.
Google’s official documentation lists only one reason for this issue: the page was found by Google but not crawled yet. This happens when Google intends to crawl the URL but expects it to overload the site, so the crawl is rescheduled. However, John Mueller expands on this and offers more explanations for why a page may be discovered but not indexed.
De-indexing Non-indexed Pages to Improve Overall Indexing
There is an idea that removing certain pages from a website can improve crawling by reducing the number of pages for Google to crawl. It is commonly believed that Google has a limited crawl capacity allocated to each site, also known as a crawl budget. However, Google has announced that there is no crawl budget in the way that SEOs perceive it. Instead, the decision about how many pages to crawl is influenced by various factors, such as the server’s capacity to handle extensive crawling.
One of the main reasons why Google is selective about crawling is that it doesn’t have the capacity to store every webpage on the internet. Therefore, it tends to prioritize the indexing of pages that have value and exclude others.
A question was posed to John Mueller asking if de-indexing and aggregating 8 million used products into 2 million unique indexable product pages would help improve crawlability and indexability. While Mueller acknowledged that he couldn’t address the specific issue, he provided general recommendations.
He first suggested reviewing the large site’s guide to crawl budget in Google’s documentation. In many cases, the ability to crawl more is limited by how well a website can handle increased crawling. However, the overall quality of the website is a more important factor. Unless the actual quality is improved, simply reducing the number of indexable pages won’t benefit search rankings.
Mueller then offered two main reasons for the discovered but not indexed problem:
1. Server Capacity
Google’s crawling and indexing capability can be limited by a website’s ability to handle increased crawling. As a website grows larger, more bots are required to crawl it, and it’s not only Google’s bots that crawl large sites. There are other legitimate bots as well as malicious ones attempting to access the site. This means that the server resources can be stretched thin, especially during peak hours when there may be thousands of bots crawling the website.
If a website has millions or hundreds of thousands of pages, it may require a dedicated server or cloud hosting that offers scalable resources to handle the crawling demand. Troubleshooting server issues involves analyzing server error logs and ensuring that the server’s memory and resources are properly allocated.
2. Overall Website Quality
Another reason why Google may not index enough pages is the overall quality of the website. Google assesses the quality of a website by considering various factors, including the layout, design, integration of images, speed, and more. If significant portions of a site are deemed to be of low quality, it affects Google’s perception of the website as a whole.
It’s important to note that Google takes time to determine the overall site quality, and this process can take months. Therefore, it’s crucial to focus on improving the quality of the website, rather than solely reducing the number of indexable pages.
Optimizing a Website for Crawling and Indexing
When addressing crawling and indexing issues on a website, it’s essential to optimize individual pages on a scaled basis. Here are some key optimization strategies:
- Ensure that the main menu is optimized to lead users to the important sections and popular pages of the site.
- Link to popular pages and sections prominently on the homepage to signal their importance to both users and Google.
- Improve thin content pages by providing meaningful and relevant information for site visitors. This includes measurements, weight, available colors, related product suggestions, compatible brands, links to manuals and FAQs, ratings, and other valuable information.
By implementing these optimization techniques, website owners can enhance crawlability and indexability, ultimately driving more online sales.
For professional SEO services and to learn more about optimizing your website for search engines, visit SEO Augusta.