The company announced on its webmaster blog that as of September 1, 2019, it will no longer support some statements in Robots.txt. As justification, Google stated that these were used too rarely.
These instructions are no longer respected by the Google Bot
“In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, on September 1, 2019, we will return all codes that deal with unsupported and unpublished rules (eg noindex). For those of you who have relied on the noindex indexing statement in the robots.txt file that controls crawling, there are a number of alternative options, “the company said.
As an alternative, Google has compiled the following list of other indexing options that should have been used anyway:
- Noindex in robotic meta tags: In the event that crawling is allowed, the easiest way to use the noindex directive is to use it. This is supported in the HTML as well as the http response headers.
- HTTP status codes 404 and 410: Since these two codes mean that the page no longer exists, the URLs will be removed from Google the next time crawling or deleted from the index.
- Password-protected pages: To delete a page from the Google index, it is hidden behind a login. However, this is only the case if no markup is used for subscription or paywalled content.
- Use Search Console URL Tool: This tool is a way to easily delete a URL on Google’s search results for a while. As soon as you start the tool and click “remove”, the URL will be removed immediately
How should I proceed now?
First, make sure you do not use the noindex directive in robots.txt. If you use these, then you should set up Google’s suggested options as an alternative to September 1st. First, make sure you use commands like crawl-delay or nofollow, and change them to Google-supported instructions.