Formerly, Google Search Engine relied only on unstructured data containing a large collection of search terms to provide search results to its users. Google was able to identify what a website contained based on the search terms that were used by that website. Whenever users used those search terms when doing a search, Google would show that website on its search result page. This search procedure allowed meaningless spammy websites to show up on the search result page as long as those websites contain relevant search terms. This policy, which allowed black-hat SEO techniques prevail, was eventually abandoned. Google decided to take more strategic steps toward adopting semantic web technology. The implementation of Google Panda and Google Penguin policies and their update in the last three years proved Google’s seriousness in making its search engine friendlier to human users. On September 27, 2013, when celebrating its 15th birthday, Google introduced its new semantic policy, Google Hummingbird.
What’s New with the Hummingbird?
Google Hummingbird doesn’t entirely abandon the old system that relies on textual search terms and text links; however, it does try to undermine the importance of search terms when providing users with search result. In the past, search result items consist mostly of websites which were rich of search terms. Today, through the Hummingbird update, Google Search Engine doesn’t merely try to give generic search result items, but it tries to answer what the users actually ask about. Because it tries to answer questions, it inevitably has to give meaning to all search terms that it has in its database. When users search by using a search term and that search term indicates a person or a place, Google will know that that term is associated with a specific person or place and will give search result items accordingly. The Hummingbird eventually allows Google Search Engine not only to answer, but also to converse with the users and to anticipate possible questions that they may want to ask.
Entity Substitutes Search Term
Search terms are meaningless. They are only textual data that help search engine identify a website without actually giving the search engine any clues about what that website is about. The semantic technology adopted by Google allows meanings to be imbued to those terms. They are now data that are structured and classified based on specific categories. Those terms are now transformed into entities. Google has been developing those entities through its Knowledge Graph. In this large database, search terms are categorized so that they, which are now called entities, can refer to people, places, and other specific and meaningful instances. Entities help Google unleash its semantic search power and make its searching capability much better than it was before.
The More Reliable Structured Data
As said earlier, data containing meaningless search terms are unstructured. Those terms are not classified into semantic categories that indicate their meanings and they can easily be abused through black-hat SEO techniques. The transformation of those terms into entities is meant to transform those unstructured data into more sophisticated and structured data. They are now have meanings and they are interconnected with other terms within the same semantic category. Semantic search process relies more on the meanings of those terms than on their textual instance. As a result, the search result may also contain items that are not textually rich of those terms, as long as those items can provide relevant answers based on the meanings that those terms imply.
Triplestore, Triples and Their Role in Improving Search Engine’s Precision
To provide precise answer, Search Engine will refer to triplestores. A triplestore is a database containing triples. The number of triples that a triplestore can contain can be billions. But what actually a triple is? A triple is a bridge through which entities are semantically interconnected. A triple links entities semantically based on the basic pattern of subject-predicate-object. For instance, if the entity “Andrew” refers to a person, it is connected to the entity “book” through a predicate “write,” “read,” or something else. The combination of the three entities (Andrew as a subject, the verb as a predicate, and book as an object) is called a triple. The already meaningful entities will carry more precise meaning if they are linked through triples. Because a triplestore can contain billions of triples, a triplestore can be a very reliable resource for search engines to provide very precise answers for questions that the users ask and even for any possible questions that the search engine can anticipate.
From Text Links to Answers
Websites contain a large set of textual contents. Search engine will not look at the generic form of those contents. If those contents are meaningless and there is no proper semantic link between one term and another, the search engine will easily dismiss those websites and their contents from its index. The search engine tries to look for answers by seeing the contents of websites and comparing those contents with its large database of entities and triples. If there is no semantic relationship between them, those websites will not be considered when the search result items are built up.
Google’s Resources for Moving from Indexing Data to Understanding Them
To this point, we have talked a lot about how Google has transformed meaningless search terms and text links into meaningful entities and developed semantic relationship between entities by using triples. This tremendous and ambitious effort is intended to make Google understand what the users intend. Rather than indexing web pages like robot, Google tries to read and to understand what the website contains so that it can build relevant answers to all questions given by users.
We haven’t talked very much, however, about how Google carries out such ambitious effort. To build a database containing millions of entities and another database containing billions of possible semantic relationships between different entities, Google uses all of its available resources and at the same time relies on database provided by third-party enterprises. The primary resource that Google uses to build its database is its Knowledge Base. Google also acquired Metaweb in 2010. Metaweb provides Google with 12 entities that the latter later includes in its Knowledge Base. At that time, the Knowledge Base was already tracking more than 500 entities and 3.5 billion semantic relationships between those entities. Today, after three years, there must already be a very great increase in that number.
By relying on such gigantic database, Google’s capability to understand websites’ content must have been comparable with its capability to index websites. If that’s what happens now, generic indexing of websites is considered virtually useless. Google has more than enough capability to understand websites,to understand users’ questions, and to provide them with precise answers.
Keywords Are Not Merely Indexed, but Semantically Read and Understood
With such capability, meaningless search terms or keywords are now not merely indexed. They are now semantically understood. Google Search Engine reads websites just like human users read them. It understands the content of websites naturally just like human beings understand language naturally.
Entity Extraction
Semantic search allows entities to be categorized. The categories used are not created generically. They are made through semantic extraction. Remember that a semantic tree can be formed by scrutinizing the meaning of entities. From a large semantic category “places,” for instance, smaller categories “courthouse,” “market,” “apartment complex,” and “embassy” can be extracted. Further extraction process can be done to those smaller categories, creating a much smaller categories containing specific entities with specific meanings. There are various methods to do the extraction process. Those methods are elaborated comprehensively in Sandro Hawke’s video titled Introduction to Linked Data with Sandro Hawke.
Making Your Business Data & Content Visible With Semantic Markup
Although the latest Google Search Engine’s policy seems to be very sophisticated, you can still make your website readable by using business data and semantic markup technology. Business data are always valuable and relevant because they consist of video contents, reviews and rating for marketed products, locations, contact information, business specialty information, business details, product information, and other informative contents that the search engine considers valuable. Semantic Markup helps search engine understands the content of your website by supplying entities and their relationships to Google’s Knowledge Base. As a result, the readability level of your website will be improved if you use these two strategies in your SEO campaign
Final Thoughts
The Hummingbird makes Google Search Engine a truly sophisticated search engine that is helpful for users and motivating website developers to improve the quality of their websites. Website developers don’t need to worry about this new policy very much because it is relatively easy to deal with. Its primary goal is to make sure that only high-quality contents are published on websites. As long as you can make sure that the content of your website is of high quality, this new policy should not be a big problem for you. In fact, it can help you reach the top if you really have a high-quality website. To improve the quality of your website, you can use the two strategies explained above.
Be our client today to get a Pre-SEO Analysis jump-start for your website. It's a free assessment and 100% awesome.
Let's do it