There’s an SEO tool for pretty much everything these days, giving you access to everything you need to run your campaigns.
These tools provide dashboards to make data analysis simple (i.e. Google Search Console’s performance data).
These dashboards are great, but they’re limited. If you’re looking for deeper insights you’ll want to access the raw data.
With the raw data we can combine data sources to get insights no tool in the market can provide.
For example, instead of just looking at keyword rankings in Google Search Console, we can crosswalk data from Google AdWords to see which keywords are driving paid and organic clicks.
To access and analyze raw data, we have 2 options:
You’re here because you want to learn more about the fast way, aka pulling data using a tool’s API. The remainder of this article will deep dive in the best APIs available for SEO professionals, along with my personal reviews and ratings.
In layman’s terms, an API (aka Application Programming Interfaces) is a way to access a tool’s data and pull it into some form of database (whether that’s Google Sheets or a ‘real’ SQL-style database like Google BigQuery).
Using APIs allows you to avoid a few data analysis traps:
Spending time poking around each tool’s dashboard to export reports is a bummer, any way you slice it.
If you’re exporting data from a tool into CSVs, you’ll eventually end up with folders stuffed full of strangely named export files collecting dust.
These tools can be quite expensive (multiple $100s per month), so you’ll want to wring as much value as you can out of the data. Pulling data from an API into some form of database makes that possible.
Every SEO we’ve ever worked with combines data from two or more of these tools – mashing up SEMrush keyword rankings data with Google Analytics traffic with Majestic backlinks.
The manual exports from each service often change format, which breaks the spreadsheet formulas you might’ve configured to mash up data (trust us – maintaining this SEO Content Audit template that uses manual exports has been a bear).
Pulling data from APIs into a standard ‘database’ format (whether in Sheets or SQL) allows you to configure a standardized recipe for your data analysis, that can scale to be used across your team.
Internally at CIFL, we do this through our Agency Data Pipeline process, which allows an SEO audit-style analysis to be implemented consistently by an entire agency team.
Let’s dive into how you can evaluate an SEO API and implement it in your analysis process.
We’ve used these tools *a lot* at CIFL, so we’ll also share our personal opinion in a review of the API for each tool listed above (Ahrefs, DeepCrawl, Google Search Console, Majestic, Moz and SEMrush).
Before you pay up for any SEO tool’s API subscription, there are a three tires to kick:
For any API you’re considering, you’ll want to first pick out *how* you’d be able to pull the data.
Generally there are four ways to pull data from an API – ranked in order of ease of use:
With both options 1 and 2, our next move it usually to push data from Sheets up to BigQuery using the CIFL Sheets <> BigQuery connector.
We’ve built a Sheets template that comes pre-loaded with API connectors + BigQuery configuration for Google Search Console, Moz, Majestic and SEMrush, which you can grab from the Template Vault here.
Data coverage across SEO APIs can differ widely, depending on the size of the site you’re analyzing, and how often it’s underlying keyword rankings or backlinks are indexed.
At the end of the day, choosing which APIs you trust and prefer for a given dataset is really based on feel.
Some SEOs will only use Majestic for backlinks data, where others find Ahrefs or SEMrush data to be sufficient for sites they’re analyzing – given the difference in indexation frequency between domains, it’s impossible to issue a blanket statement “X API is better than Y API for backlinks data.”
We recommend playing around with a trial account of each service you’re considering and analyzing data integrity by hand before making a decision.
We recommend considering *total price* of the APIs your SEO analysis package requires, rather than the individual price of each API.
That’s because many of these services overlap – Ahrefs, Majestic, Moz and SEMrush all provide some form of backlinks data.
So if you need keyword rankings + backlinks data, you could use:
Data accessibility and integrity are non-negotiables when working with APIs – so you’d likely choose the APIs whose data you trust and can access easily (via Supermetrics or otherwise).
Minimizing your total cost of analysis is more a matter of selecting from your menu of options once you decide which APIs will get the job done.
I’m not going to review every SEO API on the market – just the ones we use and recommend.
I’ll be reviewing how each SEO API stacks up against those 3 criteria:
DeepCrawl’s API is currently not integrated by any 3rd-party provider (Supermetrics, Stitch) to allow you to fetch data without writing code.
Their API is well-documented though, so it’s straightforward to roll your own script integration if you have a developer on your team (this is how we connect to DeepCrawl internally at CIFL as part of the Agency Data Pipeline service).
DeepCrawl runs a live crawl on your site, so the data is accurate at the time you kick off the crawl.
DeepCrawl prices on a sliding scale ($14 and up) based on the number of projects (sites) and URLs crawled for the month ($62 per month for 3 projects + 40,000 URLs).
For the most part, if you’re using Deepcrawl with more than 3 sites they’ll likely end up building a custom plan for you.
API access is included in each plan, and there is no difference between in-app usage and API usage – which in our opinion is the way life should be.
Given the breadth of datapoints provided by DeepCrawl, we’d say their pricing is completely fair.
They include Majestic backlink count for each page crawled ($399 / month for your own API subscription).
We also use Deepcrawl’s regex crawl functionality, which allows you to pluck out specific pieces of HTML on a page to identify which type of page it is – for example, crawling this course page for the number of ‘$’ present on the page helps us identify it as a product page.
The most commonly-used endpoint is the Query, which returns the impressions, clicks and average position for a given URL and search keyword combination.
API access is available openly via almost any method (Supermetrics or script), although we generally use Supermetrics to pull it internally at CIFL, then push it up to BigQuery using our Sheets to BigQuery Connector Add-on.
Since their API is popular, if you’re looking to go the custom script route, there’s plenty of examples out there for pushing data from Search Console up to your database of choice.
Google made a *huge* improvement when they opened up Search Console data availability to the previous 16 months (was previously limited to 90 days).
The only downside of the Search Console API, is that Google samples data when it’s returned at a keyword level – ie it may not return results (or complete results) for *every single keyword* that your site is ranking for.
For this reason, summing keyword-level data from GSC won’t add up to the totals displayed in your GSC dashboard.
In the opinion of CIFL, this isn’t the end of the world – all of these APIs return approximate data in some form, rather than absolute truth.
Free! Can’t beat it.
SEMrush data is accessible via almost any method – it returns data in a CSV format from a URL that includes your API key:
Meaning it can be pulled into Sheets via Supermetrics, via the IMPORTDATA function, or with a simple custom script that pings your URL and returns data.
Their API URL format also allows you to return only specific columns in your response, so you can avoid returning unnecessary data.
SEMrush’s indices are updated once a month, which is generally sufficient for keyword rankings where intra-month moves are generally spiky noise.
They also allow you to specifically query historical data for a previous month (with the query string &display_date=YYYYMM15), which costs a higher number of credits but can be useful when looking at a site’s data for the first time.
SEMrush offers a ton of datapoints via their API outside of search keywords – some higher quality than others.
For example, their Domain Organic Search Keywords generally has solid coverage across domains we’ve analyzed – but endpoints like Related Keywords or Backlinks can be spotty for some domains.
$399 per month for their ‘Business’ plan, plus the cost of credits.
They used to offer an introductory $15 per month API plan, which allowed lots of SEOs to get started using SEMrush data in Google Sheets – but unfortunately they discontinued that plan a while back.
This ‘base price + credit’ setup does make it difficult to estimate your total cost of ownership for the month, and you have to keep an eye on how quickly you’re burning through your credits.
The upside though, is that SEMrush’s dashboard does provide a lot of functionality that your team likely already uses – so the cost isn’t just attributed to the cost of fetching raw data.
Majestic’s API is openly available without authentication, meaning you can pass a URL request containing an API key:
And return data in JSON format to pretty much anywhere – via Supermetrics’ JSON connector, a custom Sheets formula like IMPORTJSON, or a Python script run on the command line.
At CIFL, we’ve built an internal template that behaves much like IMPORTJSON, but allows us to return only specific columns from the result (saving you a lot of space against Google Sheets’ 5 million cell limit).
Majestic provides freshness explicitly by providing two separate indices: fresh and historic (denoted by the “&datasource=” section of your query string).
Fresh are backlinks crawled within the last 90 days, and historic includes all-time data – their historic index goes back to 2012, about as long as you’d humanly need given how much the internet has changed in the last 5 years.
Majestic’s pricing is based on a sliding scale of “analysis units” ($399 per month for 100 million units, up to $2,999 per month for 3 billion units). In our experience, we’ve never seen an agency run out of units in a month on the lowest tier API plan ($399 per month).
Ahrefs recently released their open API, which we’re excited about (previously it could only be accessed via an apps marketplace).
It returns data as JSON, and has straight token-based authentication, meaning you can pull backlinks data in the same ways you can from Majestic (the IMPORTJSON function in Sheets, the Supermetrics JSON connector, or a command-line script):
Ahrefs is known for having a frequently-updated a rich index of backlinks, which lots of our members over at the Blueprint Training use and love.
The Ahrefs API is a bit more expensive than the other backlinks providers, with pricing starting at $500 per month and going up based on volume. As of this moment (March 5, 2020), their base plans is priced about like Moz’s Low Volume plan.
Moz’s API used to be accessible via pretty much any connection method – but since they made a change to their authentication setup, it’s currently not available in our starter template (we’d recommend using Supermetrics).
Backlinks data is only updated once per month, which is too slow for many SEOs we work with (if we’re wrong on this, please Tweet at us).
And given that domain authority is a proprietary algorithm, we have no choice but to take their word for it on integrity. It’s nice that folks generally accept DA + PA as a standard metric of authority, but as far as we know there’s no way to vet it.
Ranges from $250 per month for 120,000 rows, up to $10,000 per month for 40 million rows (a row is equal to one backlink or Moz metrics for one URL).
In our experience, the 120,000 rows would likely be eaten up by pulling backlinks for a medium-sized agency – so the all in cost-of-use is likely to match up with that of Ahrefs’ ($500) base plan, Majestic’s base plan ($399 / month), or SEMrush ($399) unless you’re working at a relatively small scale.
Now it’s time to put this data to work – what can we do with all this glorious data? The options are limitless, but here’s a few ways we leverage them for our clients at Coding is for Losers.
Using the Deepcrawl API and Deepcrawl’s “custom extractions” feature, we pull in schemas present on each of a site’s pages.
We use this as part of our Website Quality Audit BigQuery Recipe, in order to generate recommendations about which schemas should be present on a given page:
To pull that via the API (or the Deepcrawl dashboard), we pass the following regex custom extraction (under Advanced Settings -> Custom Extractions) when setting up the crawl:
Which returns each of the schema’s for our pages like so:
This saves us *a ton* of time when making schema recommendations.
A bunch of APIs provide keyword-level ranking data: SEMrush, Ahrefs and, of course, Google Search Console.
Generally what we’ll do with this data, is build a database (either in Sheets or BigQuery) to pull in keyword rankings each month, so that we can see progress over time.
We use this data downstream as part of our Monthly SEO Report at the Blueprint Training, which pulls monthly keyword data into a Google Data Studio report.
Search Console is, of course, the cheapest way to access this data, and you can pull up to 16 months of history at a time – but we still like to use either SEMrush or Ahrefs as a secondary check on those average position numbers.
When you’re doing a content audit, or analyzing your internal link graph, it’s critical to have an understanding of what topic each page on your site covers.
In our Internal Linking Optimization Sheets template over at the Blueprint Training, we do this by:
At the end of the day, this makes it very easy to match up potential internal link pairs, since we know roughly which pages are relevant for each topic the site focuses on:
Ready to take the next step? I’ve got 2 options for you:
As always, drop us a note on Twitter (@losersHQ) if you have any questions.
Get off the ground quickly, with customizable data pipeline Recipes for BigQuery.GET COOKING