Semantic SEO: Using Structured Data to Drive Website Traffic
October 21, 2019 | Contributed by: Brady Amerson
LSA Insider periodically invites experts in a field relevant to our community to share practical advice on how to do the business of local better. This ExpertTake post features Boostability‘s Brady Amerson explaining the Semantic Web and why it’s an essential component of effective SEO today.
Rich Snippets, Schema.org (Schema), Structured Data. These words have been all the buzz in the SEO industry over the last few years and for a good reason.
But too often I come across misunderstandings or vague explanations of what they are, what they can do for your website, how to implement them, and how they improve SEO.
In a nutshell, structured data or similarly, Schema, is markup that provides search engines with a better understanding of your content, which can potentially lead to the search engine result page (SERP) displaying your content in a richer format called a Rich Snippet. Or, as some of us put it, rank #0.
This post will address the necessity of a semantic web by exploring what structured data is, why it is needed, and how it is used by search engines.
Before we dive in, it will be helpful to understand the context that makes structured data with web content necessary. This requires a brief look at the evolution of the internet and the history of Google’s search engine.
Why We Need a Semantic Web
Since I began in web development nearly 20 years ago, web technology has advanced exponentially. What was once just a network of static HTML documents using hyperlinks to connect to other HTML documents eventually evolved into interactive web applications. The introduction of AJAX brought about the rise of social media platforms and with that, we could finally utilize the true purpose of the internet, sharing photos of our dinner and pets.
Additionally, web frameworks and Content Management Systems allowed users the ability to build e-commerce stores, contribute to wikis, and post ideas and opinions on forums. We could easily create any type of web content we wanted without having to write a single line of code or configure complex server software.
These advancements, known as Web 1.0 to Web 2.0, created an explosion of web content that made web visibility increasingly competitive for businesses and content creators. It created challenges for Google to quickly index and rank the vast amount of available content. Subsequently, Google Search users needed to become fluent in “searchese” to find the information they needed.
This had to do with the way Google indexed and ranked pages at the time. For example, if you had searched the term: “Toyota model starting with an s” Google operated in the following way: It would start by searching its index for all the pages that contained just the term “Toyota” and list them in what’s called a posting list. A posting list would be created for each term: “model”, “starting” and “s”. After it had created a posting list for each term, Google would intersect the posting lists to compare which documents contained every term.
Once Google had identified all the pages that contained all the terms it would then apply several ranking algorithms to determine which page was the most relevant and authoritative to the query, like the algorithm: PageRank, which evaluates the number of backlinks (external hyperlinks pointing to the page) and the quality of the websites the external links come from.
This method was successful for most queries, however, if Google couldn’t find web pages that contained all the terms the results were less desirable. Take a look at the following image of the query: “Toyota model starting with an s”.
The results did not provide the information that I expected. I would need to go through the frustrating process of altering the terms until I found what I needed. Though, this SERP would’ve been perfect if the query was: “what price are Toyota models starting at.”
Now, in contrast, if I were to ask you, “Toyota model starting with an s”, you would have likely responded with Sequoia, Sienna, or one of my favorite cars, the Supra. Humans understand the semantics of language (meaning and intent of the vocabulary). For the most part, we can accurately assume, infer, and imply. If we need more context, we can reconcile with additional questions. Context allows us to distinguish between ambiguous terms like the word “bass”.
HTML: Defining structure and syntax, not semantics
The foundation of the web is HTML (Hypertext Markup Language). It is the standard language for all websites, and it provides a set of tags which are limited to defining the structural building blocks and User Interface elements of every webpage. The markup indicates to user agents like Google and Web Browsers how to display the structure of the document.
For example, the <h1> tag tells the browser to display the text “Contact Us” in a Heading 1 format. The <a href> tag is an anchor-text link to the company’s phone number and email. The <p> tags indicate to the browser to display the included text in a paragraph format.
When parsed in a browser, the document appears as such:
Immediately, you and I understand that this document contains relative information on the various ways we can contact Boostability. Additional indicators are not necessary to understand the meaning of the content. We know that “call us” relates to the phone number (801) 261-1537, “our email” relates to the company email listed as email@example.com, we understand that “our location” relates to the company’s listed address.
However, search engines would not automatically understand these relationships. All Google understands is that the “Contact Us” element is a string wrapped in a H1 tag. The HTML tags do not specify that the page is a contact page for a business named Boostability or that the element containing the address is a physical location.
Web 3.0 The Semantic Web
The human language is complex. Google understood this and they were already working on solutions to better understand the meaning of web content and the queries made by search users. Not just typed queries, but we’re seeing the rapid increase in voice search. Typed keyword searches were not the only queries Google needed to understand. Voice queries use natural language and Google needed to accurately interpret it.
In 2011, the major search engines Google, Yahoo, Bing, and Yandex joined forces to introduce Schema.org, a standardized vocabulary. The vocabulary uses an ontology for naming types and characteristics of entities and their relationships with other entities.
The aim was to help search engines better interpret information on web pages. To do this, content creators could utilize the vocabulary and include it within the HTML markup as metadata via syntax languages like RDFa, Microdata or JSON-LD. Where HTML’s limited tag set purely defined syntax, integrating this vocabulary could now define the semantics. It allowed content creators to explicitly state the significance of the text by identifying it as an entity.
Let’s take a look at the previous HTML example but with the Schema.org vocabulary implemented within:
What you see here is the HTML markup from our previous example but with structured data included.
It uses the Microdata syntax and Schema.org vocabulary to provide additional meaning to the content. Although Google’s preferred syntax is JSON-LD, it will accept Microdata as well. And using it will help better illustrate the difference between non-structured HTML and structured HTML.
Similar to our previous example, the very first line of code contains the HTML tag <section> but unlike our first example, we see our first bit of microdata included within the tag.
The term “itemscope” specifies to search engines that the scope of the content will be defined as an item or more accurately, a thing. The next term, “itemtype”, sets the value of the vocabulary we will be using: http://schema.org/Organization and it defines what the thing is. In this case, “itemtype” indicates to search crawlers that the thing is an organization.
As you look through the code you will see the term “itemprop” or item property. This term allows us to add and define properties of the thing, organization. The content or data is structured. Search engines now understand that this particular content is an entity, an organization named Boostability, it has a telephone number, an email address, a postal address, and geocoordinates.
Creating the Relationship of Things
Structured data helped Google get closer to understanding the semantics of data on the web. However, Google needed to understand the relationships between all of these entities. Its next initiative was to utilize the metadata to provide more accurate results to complex queries and enhance its SERP with richer results.
In 2012, Google introduced the Knowledge Graph in an announcement that coined the term “things not strings”. The Knowledge Graph is essentially a knowledge base and at the surface, it does three things: Disambiguates words by understanding context, summarizes information or facts and displays them in knowledge cards, and last, provides results that did not contain keywords from the query but still related to the entity.
At its core is technology that uses concepts like “triples” to build a database that focuses on entities and their attributes and connects other entities that share common attributes. Google to some degree can assume if entities were related based on the type and number of attributes they shared.
With the addition of the Hummingbird update and Rankbrain, Google’s understanding of semantics and natural language launched Web 3.0 in full force. Machines can now understand that text is more than just text. It is objects, organizations, people, products.
However, machines need our help to specify the meaning. Does Google know that your business is an entity or an organization? Does Google know if the text that describes your products are actually products that have attributes related to them? Adding structured data allows Google to understand your content and relate it. It is essential if you are to feature your content as a Rich Snippet.