Why Learn SQL for SEO?

How to give yourself data analysis superpowers

This is a guest post from my friend Cody West, who’s a talented SEO and a member over at The Blueprint Training. Cody surprised me with his skills when he re-built one of our Sheets templates in BigQuery SQL, so I’m happy to have him share his learning journey here.

He’s is the Chief Strategist and Founder at Assisted Reach, where he’s automated a large portion of his agency deliverables using SQL and BigQuery - allowing him to grow his agency to over 6-figures while working a full-time job. Take it away Cody!

If you’re an SEO, you’re probably running a ton of different audits and analyses.

There are content audits, technical audits, internal link analyses, on-page audits, keyword gap analyses, etc.

To build these audits and analyses, you have to use various SEO tools. At my agency, we’re using ~10 right now, and I bet that’s on the lower end of most.

Each tool has a specific use case (which is why you need so many).

You use Ahrefs or SEMRush for keyword and backlink data, Screaming Frog and Sitebulb for crawl data, Google Analytics for session and UX metrics, Google Search Console for query and click data, etc.

Out of the box, these types of tools are helpful, but the insights you can derive from them are limited.

Why?

Because these tools are built for large audiences with broad use cases, it’s simply not feasible for a company like SEMRush or Ahrefs to build every little analysis and reporting view into their tool.

The real value of these tools is their data. Luckily, most of these tool providers allow you to export their raw data in one way or another.

Once you have these raw datasets, if you know how to combine them, filter them down, aggregate them, and then create custom logic, the depth of analysis you can do is miles more advanced than if you’re using each tool’s UI.

That’s where SQL (Structured Query Language) comes in. Now before you get scared off by the thought of learning a programming language (if I can even call it that), let me ease that fear.

If you feel confident you can learn Excel, I’m confident you can learn SQL.

I’ve largely automated ~80% of the deliverables my agency creates writing SQL in BigQuery. It has saved me countless hours of work and has been a major contributing factor in growing my agency to over 6-figures while working full-time.

In this post, I’m going to show you why knowing basic SQL is such a powerful skill for an SEO to have and how to start learning it.

One of the issues with learning technical skills is that they are often WAY overcomplicated by the teacher. I want to avoid that and make this as simple as possible for you.

What Is A Database?

Before we talk about SQL, let’s define a database.

A database is simply a place where structured information (i.e. data) is stored.

Picture a folder on your computer with two Excel files in it. The folder is the database, and each Excel file is what is called a table.

You can also think of an Excel file as a database. Each sheet, in this case, would be a table.

What is SQL And Why You Should Care

SQL is an intuitive “programming” language that allows you to access and manipulate one or more databases and the data the lives within tables in those databases.

It’s very similar to using an Excel formula that references different cells in one or more sheets. It’s just much more dynamic and powerful.

What SEOs Can Do With a Bit of SQL

As an SEO, SQL can help you automate manual tasks, speed up processes and workflows, and ultimately get more out of the data you are already using.

The Website Quality Audit

The website quality audit is a tool developed by The Blueprint Training that is essentially a content audit, technical audit, and overall website quality report all in one. It’s built by taking six different datasets, manipulating the data, and then joining it together (we’ll talk about “joins” later) on URL.

CIFL has an excellent service where you can get a single website quality audit report built for just $30. This is a fantastic option for many agencies.

For me, however, I wanted the ability to customize the report and build it into various workflows and processes within my agency.

When this tool was released, because I knew databases and SQL, I was able to quickly see how to rebuild the tool in BigQuery. I was able to tweak it to fit my needs.

For example, instead of using the SEMRush keyword dataset, I wanted to build it using the Ahrefs Keyword and GSC query dataset. I also wanted to automate a portion of the manual steps required to categorize pages and URL actions.

There were some additional columns I wanted to add to the tool like the date a blog post was published.

I also wanted to build in keyword research for each URL into the website quality audit and then add rank tracking data to make sure the right URLs were ranking.

To give you an idea of how fast you can build cool tools like the website quality audit once you know SQL, this took me around ~5 hours to put together. Now it’s a process I have someone on my team run that takes less than 30 minutes.

Other Reports You Can Automate With SQL

There are always going to be manual tasks in SEO (at least in the foreseeable future).

There isn’t a way to avoid that, at least for now. However, you can largely automate the majority of analysis and reporting your doing as an SEO using a little bit of SQL.

In addition to the website quality audit, we’ve built tools in BigQuery for the following:

  • Keyword GAP Analysis
  • Internal Link Analysis
  • Competitor Backlink Analysis
  • URL Prioritization
  • Various link building processes
  • SEO Opportunity Model
  • Editorial Calendar

SQL is more about learning how to think about data and databases. Once you have a solid understanding of how they work, it’ll be hard to stop the ideas from coming to you!

Specific Examples of SQL For Automation & Analysis

SQL allows you to stitch together data and build handy reports and tools. But it doesn’t just stop there.

You can also use SQL to do analysis and build various automations. Once I finish building a website quality audit, I push the final dataset into a new table in BigQuery that I can run analysis and automations on.

Below I show specific examples of SQL for automation and analysis. The goal is not to have you understand the actual SQL, but to see how powerful a short piece of code can be.

At the end of this article, I dumb down the basics of SQL and then provide you with some learning resources.

1. Automate Category Tagging

As part of the website quality audit, you have to manually tag pages by page category. For example, is the URL a blog post, landing page, blog category, etc.

You can easily automate much of this process using simple IF THEN login.

In SQL, IF statements are called CASE statements.

Here’s an example of a CASE statement that automates the categorization of blog posts, blog category pages, and blog tag pages.

CASE 
 WHEN URL LIKE ‘%/blog/%’ Then ‘Blog Post’
 WHEN URL LIKE ‘%/category/%’ Then ‘Blog Category’
 WHEN URL LIKE ‘%/tag/%’ Then ‘Blog Tag’
  ELSE ‘’ END

You can customize this same logic to match the URL structure of the site your working on and automate a vast portion of this process.

2. Automate URL Action Rules

Another part of the website quality audit that’s manual is the URL action portion, which is essentially a content audit.

You determine what to do with a URL based on various data points. Instead of manually looking at every URL, let’s create some rules to automate this. We’ll develop rules within a CASE statement similar to the section above.

If a URL has significant traffic going to it, we’ll want to tag the URL with “Update On-Page” so that we can look at the page in more detail, make on-page changes, and increase traffic. Here’s how we’ll do that:

CASE
 WHEN Sessions > 100 THEN ‘Update On-Page’
  ELSE ‘’ END

Now let’s add logic to the CASE statement that noindexes blog category and blog tag pages that aren’t already noindexed.

CASE
 WHEN Sessions > 100 THEN ‘Update On-Page’
 WHEN PageCategory IN(‘Blog Category’,’Blog Tag’) AND Indexability <> ‘Nonindexable’ Then ‘NoIndex’
  ELSE ‘’ END

Now let’s add logic that 301 redirects thin content pages, that are blog posts, and that have links:

CASE
 WHEN Sessions > 100 THEN ‘Update On Page’
 WHEN PageCategory IN(‘Blog Category’,’Blog Tag’) AND Indexability <> ‘Nonindexable’ Then ‘NoIndex’
 WHEN PageCategory = ‘Blog Post’ AND Backlinks > 0 AND WordCount < 400 THEN ‘301 Redirect’
  ELSE ‘’ END

You can write this custom logic for each site you’re working on depending on how things are set up.

3. Check For Duplicate Page Issues

Duplicate pages can hurt your rankings. We can write a simple query that checks for duplicate pages.

Select
 PageTitle
 ,COUNT(PageTitle) as TotalPageTitles
From
 WebsiteQualityAuditFinalDataset
Where 
StatusCode = 200
 And Indexability = ‘Indexable’
Group By
 PageTitle
Order By 2 Desc

What this query is doing is taking the final website quality audit data set and counting the number of page titles that are the same where the status code is 200, and the pages are indexable.

It’s then ordering the TotalPageTitles column in descending order, so the pages with the most duplicated pages are shown at the top.

4. Create Buckets To Make Reporting Easier

Buckets are simple groupings. For example, let’s say I wanted to group Domain Rating into groups of 10. I would create the following buckets:

  • 0 to 10
  • 11 to 20
  • 21 to 30
  • 31 to 40
  • 41 to 50
  • 51 to 60
  • 61 to 70
  • 71 to 80
  • 81 to 90
  • 91 to 100

These are useful when you are building reporting from your data.

For example, I wrote a post that shows how to automate a competitor backlink analysis using a Google Sheets template we built. One of the views I built into the report shows link counts by DR bucket and anchor text type:

image alt text

Here’s how you would create buckets using SQL:

CASE
   WHEN DR BETWEEN 0 and 10 Then ‘0 to 10’
   WHEN DR BETWEEN 11 and 20 THEN ‘11 to 20’
   WHEN DR BETWEEN 21 and 30 THEN ‘21 to 30’
   WHEN DR BETWEEN 31 and 40 THEN ‘31 to 40’
   WHEN DR BETWEEN 41 and 50 THEN ‘41 to 50’
   WHEN DR BETWEEN 51 and 60 THEN ‘51 to 60’
   WHEN DR BETWEEN 61 and 70 THEN ‘61 to 70’
   WHEN DR BETWEEN 71 and 80 THEN ‘71 to 80’
   WHEN DR BETWEEN 81 and 90 THEN ‘81 to 90’
   WHEN DR BETWEEN 91 and 1000 THEN ‘91 to 100’
      ELSE NULL END 

You can read the above snippet of code as, IF the DR is between 0 and 10, THEN create the bucket ‘0 to 10’, etc.

5. Use Prebuilt BigQuery Functions

When performing analysis on SEO datasets (especially as it relates to link building), you often want to take ugly URLs and parse out the root domain or TLD. You can use a complex piece of REGEX, or you can use the following functions:

  • NET.REG_DOMAIN(): This will take any URL and parse out the root domain
  • NET.PUBLIC_SUFFIX(): This will parse out the TLD from the URL

For example, here’s my URL: https://codingisforlosers.com/analysis-recipes/

If I write NET.REG_DOMAIN(”https://codingisforlosers.com/analysis-recipes/”), the output would be the following:

codingisforlosers.com

If I write NET.PUBLIC_SUFFIX(https://codingisforlosers.com/analysis-recipes/), the output would be the following

.com

Simple but extremely useful.

6. Automate Funnel Tagging

As an SEO, you often want to tag what part of the marketing funnel a keyword is.

The logic that goes into this is simple.

First, import a table into BigQuery that looks like this:

image alt text

Run this code:

WITH
Keywords AS (

  SELECT
    DISTINCT Keyword
  FROM
    `ciflexample.CIFL.tblFunnel` ),

  Funnel AS (

  SELECT
    DISTINCT TOFU,
    MOFU,
    BOFU
  FROM
    `ciflexample.CIFL.tblFunnel`
  WHERE
    TOFU IS NOT NULL ),

  Classify AS (

  SELECT
    DISTINCT Keyword,
    CASE WHEN REGEXP_CONTAINS(Keyword,TOFU) = TRUE THEN 1 ELSE 0 END AS IsTOFU,
    CASE WHEN REGEXP_CONTAINS(Keyword,MOFU) = TRUE THEN 1 ELSE 0 END AS IsMOFU,
    CASE WHEN REGEXP_CONTAINS(Keyword,BOFU) = TRUE THEN 1 ELSE 0 END AS IsBOFU
  FROM
    Keywords k
  CROSS JOIN
    Funnel f ),

  Aggregate AS ( 

  	SELECT
    DISTINCT Keyword,
    SUM(IsTofu) AS IsTofu,
    SUM(IsMofu) AS IsMofu,
    SUM(IsBofu) AS IsBofu
  FROM
    Classify
  GROUP BY
    Keyword )

    Select
      *,
        CASE
          WHEN IsTofu + IsMofu + IsBofu > 1 THEN "DoubleCheck" 
          When IsTOFU = 1 and IsMofu = 0 and IsBofu = 0 THEN 'TOFU'
          When IsTOFU = 0 and IsMofu = 1 and IsBofu = 0 THEN 'MOFU'
          When IsTOFU = 0 and IsMofu = 0 and IsBofu = 1 THEN 'BOFU'
            ELSE NULL END AS Funnel
    FROM Aggregate

All this is doing is saying, if the keyword contains any of the words in the TOFU, MOFU, OR BOFU column, classify it as TOFU, MOFU, or BOFU.

Here’s the output:

image alt text

For keywords that flag as more than one part of the funnel, I create logic to flag the keyword a “DoubleCheck.” All it does is add the IsTofu, IsMofu, and IsBofu column together and then will flag as DoubleCheck if the sum is greater than one.

If you look at the “how to buy this” keyword, it gets flagged as “double check” because “how” is a TOFU keyword and “buy” is a BOFU keyword. I’d want someone to manual look at this one to determine its position in the funnel.

Dumbing Down The Basics of SQL

I’m not going to go too deep into the basics of SQL because David already has a post you can check out here.

However, I do want to try to show you how simple a lot of the foundational SQL concepts are.

SQL is a very intuitive language, meaning it reads like the English language. When I’m explaining it, I’m going to read the queries like they were English.

Connecting A Google Sheet To BigQuery

BigQuery is my go-to platform for writing SQL. It’s free unless you’re doing some serious data wrangling.

Once you’ve set it up, you can hook Google Sheets up directly to it.

Here’s a Screaming Frog crawl I ran on CIFL. I kept the following six columns:

image alt text

Here’s how you can connect Google Sheets and BigQuery.

SELECT, FROM, WHERE

Once you have your Google Sheet connected, you can write the following foundational SQL statement to view your data.

SELECT * FROM `ciflexample.CIFL.tblsfdata`

The snippet above simply means “select (i.e. SELECT) all (i.e.) of the columns from (i.e. FROM) the Google Sheet we just connected that I named ciflexample.cifl.tblsfdata.

If we click RUN, the output will be the same data as we have in our Google Sheet.

image alt text

Now, let’s say we want to audit CIFL’s 301 redirects. We’ll use the WHERE clause to create logic that says to only show us URLs with a status code equal to 301. Here’s what that look looks like:

image alt text

Aggregation

Now let’s say we want to see how many of each status code, CIFL has. To do that, we’ll need to count each URL by status code. This is called “aggregating” your data. If you were to do a SUM, AVERAGE, etc. those would be aggregations as well.

image alt text

In the query above, I’m counting Address (i.e. URL) and then grouping the count by (i.e. GROUP BY) status code. This shows me how many URLs fall into each status code category.

Joining Data

One of the last foundational concepts in SQL is joins. Let’s say we have our Screaming Frog data set, and we want to see how many unique referring domains each URL has.

Here’s a Google Sheet that contains the Link URL for each unique referring domains I imported from Ahrefs:

image alt text

We’re going to use a couple of the concepts above to get the total link count by URL:

image alt text

Now, all we need to do is get the backlink counts appended to the Screaming Frog dataset. It would look like this:

image alt text

We do this through what’s called a “join” in SQL. Notice how our Screaming Frog dataset and our backlink dataset both have the URL column in common. A” join” will match each dataset on whatever column or columns you specify. Here’s the join:

image alt text

In plain English, we are joining the AggregateLinkData (i.e. the backlink dataset) table to the SF (i.e. the Screaming Frog dataset) table on (i.e. on) the URL column.

Here’s the final set of code to do this:

WITH
  sf AS (

  SELECT
    *,
  REPLACE(REPLACE(REPLACE(REPLACE(Address,'https://www',''),'https://',''),'http://www.',''),'http://','') AS URL
  FROM
    `ciflexample.CIFL.tblsfdata` sf
  WHERE
    StatusCode = '200'),

  LinkData AS (

  SELECT REPLACE(REPLACE(REPLACE(REPLACE(LinkURL,'https://www',''),'https://',''),'http://www.',''),'http://','') AS URL
  FROM
    `ciflexample.CIFL.tblbacklinkdata` ),

  AggregateLinkData AS (

  SELECT
    URL,
    COUNT(URL) AS LinkCount
  FROM
    LinkData
  GROUP BY
    URL )

SELECT
  s.*,
  ld.LinkCount
FROM
  SF s
LEFT JOIN
  AggregateLinkData ld
ON
  s.URL = ld.URL
ORDER BY
  ld.LinkCount DESC

How To Learn SQL

I hope this article showed you that learning SQL is a very attainable skill. It can make you a more efficient and data-driven SEO.

To learn SQL, you have to have projects to apply it to. Don’t be the person who’s taken 100 hours of online programming classes, but still can’t build anything on their own. The best advice I can give you to learn SQL is:

  1. Understand the type of problems you can use SQL to solve (this article should have given you a great headstart on that)
  2. Begin learning SQL through the resources I provide below, but only what you need to solve the problem you’re working on
  3. Use Google and anyone else you know who knows SQL to help you through
  4. Rinse and repeat

If you follow the advice given above, you’ll learn SQL in no time AND build a lot of cool stuff in the process.

Learning Resources

Final Thoughts

With SQL, you can build your own SEO tools, process automations, and workflows. You won’t have to rely on third-party tools anymore to provide direction. You can use their datasets to derive the insights you need to make decisions.

Daniel Cuttridge makes an excellent point in an article he wrote about his methodology for on-page optimization.

In it, he’s talking about some of the fundamental problems with SEO tools and says, “They contain many useless recommendations that are built in to compete with their competitors in an inflating game of ‘total items checked’.”

This is a bold, but valid statement.

Instead of confusing yourself trying to learn all the nuances of these tools (only to get useless or misleading information), export the raw data, throw it into a database like BigQuery, and then write some SQL to answer specific questions you have.

Happy SQLing!