The SEO market is more and more mature. Gone are the days when it was enough to fill the keyword page to reach the top of the search. Google’s algorithm has already evolved to make sense of words in a text and even to interpret users’ search intentions.
Do you think that, with all this search engine intelligence, it is possible to do just the basics of SEO to have results?
Professionals in the field must understand how the algorithm thinks and adopt optimization approaches that meet their expectations in order to have good ranking results. And these approaches are increasingly sophisticated.
This is the case, for example, of what we are going to talk about in this article: TF-IDF, an on page optimization approach. This acronym represents a way for Google to statistically determine the importance of a keyword or phrase by analyzing hundreds or thousands of documents.
By understanding the intelligence behind this search engine tool, you can adopt better SEO strategies on page and drop ahead of the competition.
In this text you will learn:
What is TF-IDF?
TF-IDF is a statistical calculation adopted by Google’s algorithm for measure which terms are most relevant to a topic, analyzing how often they appear on a page, as compared to their frequency on a larger set of pages.
TF-IDF is not an exclusive SEO concept. It is used in different information retrieval systems. Among them are the web search engines, but also library and text mining systems, for example.
The calculation serves as a weighting factor for terms, that is, to understand the importance of a specific term or phrase for a given document.
But, since you read the title of this article, you must be asking yourself: “TF-what” !? So, let’s understand what this acronym means.
TF-IDF means Term Frequency – Inverse Document Frequency. This expression can be translated into London as “Frequency of the Term – Inverse Frequency of Documents”. It hasn’t been very clear yet, has it? So, let’s go by parts.
TF refers to the “frequency of the term”. This part of the calculation answers the question: how often does the term appear in this document? The greater the frequency in the document, the greater the importance of the term.
The IDF means “inverse frequency of documents”. In this part, the tool answers: how often does the term appear in all documents in the collection? The higher the frequency in the documents, the less important the term.
The calculation of the IDF considers that terms that are repeated frequently in the texts – such as articles and conjunctions (a, o, and, but, that etc.) – have no relevance for documents and, in the case of Google, for indexing and the ranking.
Then, when the IDF factor is incorporated, the calculation decreases the weight of the terms that occur very frequently in the set of documents and increases the weight of the terms that occur rarely. This scheme helps to better understand:
We will not go into the details of statistical calculations (here you can understand the formulas).
We can summarize it like this: the importance of the term (TF-IDF value) increases according to the number of times the word appears in the document (TF), but it is compensated by the number of repetitions in the document collection (IDF), which serves to adjust the fact that some words appear more often in general.
How is TF-IDF calculation used by Google?
In the case of Google, the TF-IDF calculation helps the searcher to emphasize the terms and phrases in the content of websites and blogs that really matter for their indexing and ranking ranking.
Remember that Google uses a robot to crawl web content, right? Therefore, the search engine does not have the human capacity to understand the meanings of words and the context of the contents. Or better: today he already knows how to do that, thanks to technology, which allows him to get closer to human intelligence.
The TF-IDF calculation is an example of the technology incorporated into the robot for language processing. Google adopts systems that do these calculations automatically on millions of web documents to make sense of what they’re saying.
TF-IDF is used as part of latent semantic indexing (LSI or Latent Semantic Indexing). Google uses this indexing approach to understand the relationships between words, phrases and concepts, that is, the semantics of texts on a website or blog.
This is essentially important when there are words with similar meanings (synonymy) or with more than one meaning (polysemy).
Remember that time when websites repeated the same keyword thousands of times for which they wanted to rank?
It was to avoid this type of black hat practice – called keyword stuffing -, harmful to the user experience, that Google adopted LSI. Thus, the search engine has more intelligence to value quality content for the visitor.
Within this logic, then, TF-IDF is used to process the language used in the content. It does not serve to make sense of the terms, but to understand their importance in giving different weights to them.
Before that, Google only considered keyword density, which is a very widespread concept in SEO, but which analyzes only the frequency of the term on the page, without assessing its relevance.
Thus, the word “that” could be understood as relevant in a post about “content marketing”, since it always tends to appear a lot.
So, TF-IDF adjusts this calculation to understand the importance of the term by comparing the frequency on the page with your frequency in thousands of other documents. In this way, Google is able to refine the quality of its indexing for the right keywords.
Thus, when the user does a search on Google, he will know how to indicate the most relevant pages for his query, considering other ranking factors, of course.
How can this on page optimization approach help your blog?
After knowing what TF-IDF is, you may ask yourself: ok, but how can it help with my SEO strategies?
First of all, knowing the logic of TF-IDF is important to understand how Google works and how it has evolved over the years. This is the first step in establishing your SEO strategies according to the latest algorithm updates.
Unfortunately, however, we do not have access to the exact calculations that Google does on your blog. This is kept under lock and key in the search engine’s algorithm.
The good news is that there are tools that do the TF-IDF calculation for the term and the URL you define, compared to other sites well positioned in the ranking.
So, the TF-IDF approach can be used in practice, in its on page optimizations, which you can now do using the same logic that Googe uses. With these tools, you can do:
- keyword research (identifying which terms and subjects are vital to a topic);
- competition analysis (identifying which terms weigh more so that your competitor is ahead of you in the ranking);
- semantic optimization of new content or old publications (identifying key words vital to the topic and inserting them naturally in the content).
Ryte, Seobility and Link Assistant are some tools that work with TF-IDF. They usually work like this: you enter a URL and the keywords (or just one) that you want to rank for.
The tool then scans the top-ranked pages on Google for these keywords, analyzes their content and does the TF-IDF calculation on all terms to identify the most relevant ones.
That way, you get a list of related keywords, also called co-occurrences. With this list in hand, you can plan your content, compare with competitors and optimize your texts semantically.
Next, you will understand how to use these tools in your optimizations.
How to do a TF-IDF optimization?
Want to better understand how to do TF-IDF optimization? Now see a step by step to use a TF-IDF tool and optimize your blog.
1. Write your content or choose a page to optimize
Remember that natural writing communicates better with the user than if you were already writing with the robot in mind. Ideally, start with writing and then optimize with the TF-IDF approach.
2. Choose a TF-IDF analysis tool
As an example, we chose Seobility, which offers three free analyzes, without the need for registration. Enter the URL you want to analyze and the keyword you want to rank for.
You can also define the country of the search (in the free version) and the amount of Google results that the tool will analyze (only in the paid version).
In the example, we use the term “Content Marketing” and the URL “www.rockcontent.com” for a search on “Google.com.br”. The tool generates a graph like this:
The graph shows the terms related to “Content Marketing” that have more relevance on the pages best positioned for it.
In blue, the average TF-IDF value is identified for the results containing the term in question – the larger the blue bar, the greater the importance of the term for the topic.
The orange line, in turn, shows the TF-IDF value of the URL being searched for, in relation to competitors.
3. Identify which terms and co-occurrences are most relevant
From this graph, it is already possible to draw some conclusions. You can see, for example, that the term “results” is well optimized in the URL in relation to competitors.
However, the URL has a very low TF-IDF for the terms “networks” and “social”, which are relatively important co-occurrences. Then, the keyword “social networks” can be targeted by on-page optimizations.
4. Make optimizations on page with the identified terms
Once you’ve identified which terms need to be optimized, it’s time to move on to your text. Insert or replace words in the content so that the identified terms gain more relevance.
In on page optimization, it is important to consider not only the body of the text, but also the fields on the page such as the title, heading tags, URL and image tags. In these fields, the keyword gains even more weight.
Remember, too, that there is no use polluting the text with keywords. Google has enough intelligence to identify and punish keyword stuffing, okay?
So, use the terms naturally, so that the reader has a good experience, but that you can also conquer the algorithm.
In the case of Seobility, the tool also offers the ability to edit the text of the page right there, with recommendations on which terms should be optimized.
See the example below for an example:
When to use TF-IDF optimization?
TF-IDF optimization can be used to optimize content and guide the writing of new blog texts. However, when you already have hundreds of publications, how do you know where to start?
The ideal is to work with the pages that have the greatest potential, for faster results. So, you can aim at:
High potential content on the second page of SERP
Make sure you have content that has been published for a long time, but is unable to reach the first page of Google results.
In such cases, the optimization of content via TF-IDF, together with technical adjustments and link building, tends to bring positive results.
Contents slowly losing positions
Check which pages are losing positions and traffic in the last year. They are probably suffering from the competition or the weight change that the algorithm is giving to the terms.
In this case, revisiting the content to optimize the words, using the TF-IDF approach, helps to recover the positions in the ranking.
Content with similar search terms in content
Identify pages that are optimized for similar search terms, such as “car” and “buy car”.
In that case, you can search for relevant terms that can be used on these pages to differentiate content and prevent cannibalization of keywords.
On the other hand, generic keywords with high competition (eg “news”, “college”, “recipes”) tend not to have as much results with the TF-IDF approach.
In such cases, other criteria, such as backlinks and the authority of the site, tend to have a greater weight in the ranking, while the optimization of the content will probably not be a differential.
Anyway, understanding what TF-IDF is is a way of understanding how Google thinks and optimizing its pages according to the searcher’s logic. However, never forget that the focus is the user. Google doesn’t want you to just like the robot – he wants his website visitors to have a good experience.
The TF-IDF approach, therefore, should not be above user experience. It is just a tool to “adjust the screws” in content optimization and SEO on page.
Now that you know the TF-IDF approach, also read our full article on technical SEO, which is essential for Google to reach your content.