Especially with the increasing importance of spoken search queries or voice search, modern search engines try to provide answers immediately without the user having to navigate through 10 blue links on the first search results page. To do this, the meaning of the search term must be identified and relevant information must be extracted from structured and unstructured data sources. The solution for this is entity retrieval .
The task of entity retrieval is to identify relevant entities from a catalog in response to a written or spoken search query and to output them in a list sorted by relevance to the search query. To output an answer, an excerpt is required that briefly explains the entity.
We know this type of description from the featured snippets as well as the entity descriptions in the knowledge panels, which are usually extracted from Wikipedia or DBpedia. The featured snippets also extract qatar phone number data information from unstructured sources such as magazines, glossaries or blogs. However, Google currently seems to only use the descriptions from unstructured sources as a stopgap solution and prefers descriptions from Wikipedia.
Below is an analysis by Malte Landwehr from March 2019 on the question of how the information sources for featured snippets on the topic of online marketing are distributed.
Distribution of sources for featured snippets for 1095 search terms from online marketing, source: Malte Landwehr (Searchmetrics)
Only occasionally can you find information that does not come from Wikipedia, as in the example of the search query " redirect ". As soon as Wikipedia provides information on a term, it is often preferred. In this case, fortunately for our colleagues at Ryte, Google does not seem to have "linked" the entities redirect and forwarding to each other, although there is already a special forwarding page in Wikipedia .
Example: Featured Snippet and Knowledge Panel for the search query “redirect”
Example Featured Snippet and Knowledge Panel for the search query “forwarding”
Google currently relies most heavily on information from Wikipedia to populate featured snippets. There are several reasons for this. Firstly, due to Wikipedia's clear structure, it is easy to extract the introductory text of an article. This describes the topic briefly and concisely.
How exactly Google extracts information from unstructured website text for the featured snippets is speculation. There are many different assumptions. I believe that it is primarily related to the triples of object, predicate and subject that occur in the section. But more on that in the next post in my series.
The frequency of Wikipedia information appearing in the featured snippets suggests that Google is not yet satisfied with the results of extracting unstructured data and/or has not yet gotten the manipulation attempts under control.