Stitch vs Supermetrics for BigQuery

An ETL tool cook-off

Warning! For this review, we’re focused on the pros and cons of Stitch and Supermetrics for analyzing digital marketing data in a BigQuery pipeline, since that’s how we use them as part of our Agency Data Pipeline service and Build your Agency Data Pipeline course.

The opinions below are not generalized reviews of the tools, but specific to the digital marketing analyst use case.

To make this review as fair and balanced as possible, below we’ll walk through 2 potential data pipeline setups:

1. An individual e-commerce company

2. A blended SEO + PPC agency with 50 clients

But before we dive in too deeply, let’s put both Stitch and the new Supermetrics for BigQuery data transfer tool in context.

They’re both what we’ll call ETL tools - they Extract, Transform and Load data from APIs like Facebook Ads, Google Analytics, and Shopify into your data warehouse.

Any database can serve as an analytics warehouse, but some are specifically designed for quickly querying large analytics datasets: Amazon Redshift, Snowflake, and of course Google BigQuery.

We use BigQuery internally at CIFL, because of its tight integration with the rest of our Google Stack (Sheets + Data Studio).

ETL is just the beginning

These tools are not data pipelines in themselves. A data pipeline is a process that runs from ETL through to actual reporting - our process for the Agency Data Pipeline looks like so:

alt text

So no matter what ETL tool you choose to implement, consider it to be merely the first step in building your data pipeline. You’ll also need to handle:

  • Modeling raw data into metrics using SQL (we use DBT extensively)
  • Visualizating reports (in Google Data Studio, Looker, Tableau, etc)

If your team lacks someone proficient in SQL, first I’d recommend you hire or train someone to be your team’s data linchpin.
Spending the effort (and cash) to get your raw data into BigQuery doesn’t make much sense unless you’re able to use it profitably once it’s there.

If hiring or training is truly not possible, then you’re better off sticking to Google Sheets or Excel for analyzing data. Supermetrics has Add-ons for both, and there are plenty of resources on this site to help you along the way.

The Four Questions

Businesses are like people - we’re all slightly different.

So when we’re implementing a data pipeline on behalf of a client, we ask ourselves 4 key questions to drive our ETL tool decision-making:

1. What data sources need to be loaded?

Each data transfer tool integrates with a specific set of APIs (Google Analytics, Facebook Ads, etc).

Needless to say, your team will go with the ETL tool that best covers your API requirements.

As of this moment on April 26th, 2019, this is the breakdown of relevant marketing API coverage across Stitch and Supermetrics for BigQuery (X = covered):

API Stitch Supermetrics for BigQuery
Adobe Analytics X
AdRoll X X
Bing Ads X X
Doubleclick Campaign Manager (aka Google Display & Video 360) X X
Facebook Ads X X
Facebook Insights X
Google Ads X X
Google Analytics X X
Google My Business X
Google Search Console X
Instagram Insights X
LinkedIn Ads X
LinkedIn Company Pages X
Mixpanel X
Quora Ads X
Salesforce X X
Shopify X
Stripe X X
Twitter Ads X
Yahoo Gemini X

As you can see, Supermetrics covers some analytics (Adobe Analytics), organic social media (Facebook + Instagram Insights) and paid social (LinkedIn Ads, Twitter Ads) data sources that Stitch doesn’t.

For this reason, the choice between ETL tools isn’t a pure either / or - often you’ll have to mix and match ETL tools to get full coverage of your APIs.

For most of the pipelines we build, we also end up writing custom API integrations in Google Apps Script + our Sheets to BigQuery Add-on, to support APIs that aren’t covered by any off-the-shelf tool.

2. What metrics + dimensions are needed, and for what time period?

The ETL tool must support the level of data granularity + lookback period you’re looking for in your pipeline.

Generally, an ETL tool will do one of two things:

  1. Allow you to select specifically the tables and columns that you’d like to pull from each API (ie pull campaign metrics by ad / ad set / landing page / date)
  2. Pull all of the columns from the API by default

Either method requires some data modeling in SQL downstream, but you want to make sure that the ETL tool at least allows you to write your required metrics / dimensions to BigQuery.

It’s also critical that the ETL tool allows you to easily backfill data to your desired lookback period (ie the last 2 years), to perform any YoY trend analysis you’re looking to do.

3. How easily can you add new data feeds?

Setting up a pipeline for a single company is much different than doing so for an agency where you work on behalf of many clients.

For a single company, a pipeline setup is generally set it and forget it - you wire up the ETL connections, write some SQL models, and then build reporting. You likely won’t need to add many new data sources once it goes live.

At an agency, the client roster and the type of marketing work you’re doing is always shifting - it needs to be easy to add new clients, remove old ones, and layer in new data feeds over time.

If this process of modifying feeds is laborious, you’ll give your analyst carpal tunnel syndrome.

4. What’s it all cost?

The C question is last for a reason - there’s no point in comparing on price until you’ve assembled data pipeline options that meet your requirements.

And like I mentioned earlier, you’ll often end up assembling a stack of ETL tools rather than using a single tool.

So to get at total cost of ownership you really have to consider how much you’ll be using each tool, rather than looking at the plan sticker prices.

ALSO! Don’t forget, the labor cost of analyzing data is by far the largest expense line item in your data pipeline stack - so consider maintenance time from your team in the cost of owning the tool.

Saving $100 / month on a tool that makes life more difficult for your analyst is penny-wise, pound-foolish.

Let’s dive into the example pipelines, and walk through these 4 questions for each.

Pipeline 1: An Individual E-commerce Business

Let’s say you’re the CEO of a $100mm revenue e-commerce business.

Your marketing, merchandising and sales teams are using separate datasets scattered throughout spreadsheets and legacy systems.

Teams are starting to differ in their definitions of important metrics - meaning it’s time to get it together in one cohesive pipeline.

The data sources + granularity you’ll need to join together are:

  • Shopify: Transaction-level
  • Google Analytics: By transaction, source, medium, campaign and landing page
  • Ad platforms (Bing Ads, Facebook Ads, Google Ads): By campaign and landing page

What ETL tool should you use, and what will it cost?

Data Source Coverage

Stitch covers all of the required data sources (Shopify, GA, Bing, Facebook and Google Ads), which would make them a convenient one-stop shop for this pipeline.

Supermetrics covers Google Analytics, Bing Ads and Facebook Ads, but not yet Shopify (they’ve hinted at supporting it in the future).

Advantage: Stitch for one-stop shopping

Data Granularity

Stitch allows you to select specific tables + columns to push from each API to BigQuery - this helps keep your data size (and therefore cost of using BigQuery) down.

alt text

However, this does mean you have a data modeling learning curve up front - you have to know for each API:

  • What columns + tables do you need to select?
  • How do you join those tables together with SQL to make them into coherent metrics?
  • How far back do you need to backfill data?

This will take some guess-and-check up front by your team, but once you stand up the Stitch feeds + SQL models (in DBT or otherwise) they’ll be easy to maintain.

[ At CIFL, we make this easy by maintaining an internal recipe book of Stitch setups + DBT models for each API we use. It’s a bit complex to share publicly, but drop us a note to help@codingisforlosers.com with some info on your pipeline and we’ll do our best to lend a hand. ]

Supermetrics handles this setup process completely differently. By default, they write a constant set of tables + columns to BigQuery for each API.

alt text

This makes the initial SQL modeling more straightforward, but you do lose flexibility in terms of being able to specify exact metric / dimension combos for an API.

For example, in the Supermetrics BigQuery connector for Google Analytics, I couldn’t find a way to create a view for revenue by source / medium + landing page path, which was a requirement for this specific pipeline.

For backfilling of historical data, Stitch allows you to set a ‘backfill date’ during the feed setup process, and automatically pulls data from the API for you. Usually we backfill 2 years prior, to have relevant YoY comparisons (so Jan 2017 if we’re in 2019).

In Supermetrics for BigQuery, backfills must be manually set up as separate data transfer runs - this means for each data source you add, you’ll also have to run backfill jobs in slices of roughly 6 months at a time.

This can be pretty tedious, since you also have to monitor those jobs to make sure they succeed.

Advantage: Stitch, for flexible coverage of all Google Analytics columns + ease of setting backfill period

Ease of Modification

Since we’re building this pipeline for an individual business, the data feeds won’t need to be modified too frequently.

However, say we hire a digital agency and want to create a new Bing, Facebook or Google Ads account specifically for them to use.

With Stitch’s Bing and Google Ads integrations, adding that new account to your pipeline is as simple as checking a box in your existing feed, which will kick off the account to sync with BigQuery.

alt text

In Stitch’s Facebook Ads integrations, you’ll need to create a new feed entirely - this also means selecting individual tables and columns.

Supermetrics for BigQuery operates on the premise of 1 feed = 1 account, meaning you have to set up a new feed for each new account that comes on line.

alt text

This wouldn’t be a dealbreaker for our e-com shop given the low number of new accounts required, and it comes with the advantage of not having to select columns + tables for each new account at setup.

Advantage: Stitch, for their ability to select multiple accounts for an API via checkbox

Cost

Stitch and Supermetrics differ widely on their pricing models.

Stitch charges based on usage: you pay for the total number of rows written across all of your API connections each month.

alt text

Supermetrics charges per account: you pay a flat fee per account synced per month for each API.

alt text

Like we mentioned before, price is often not the determining factor of which ETL tool to use for a data pipeline.

In the case of our e-com shop, Stitch is really the only option we have amongst the two, given the need for a Shopify connection.

If this was a small e-Commerce shop, the cost would be $100 per month for the base Stitch subscription. But given their larger size, we estimate they’d pay $180 - $350 per month based on Stitch’s sliding scale pricing (less in slower months, more in busier months with more traffic, transactions, etc).

If we were to cobble together a stack composed of Stitch + Supermetrics, it could cost:

  • Shopify + Google Analytics via Stitch: Free if <5mm rows per month (small sites), or $180 if larger
  • FB Ads via Supermetrics: $190 per month for one account
  • Bing Ads for Supermetrics: $190 per month for one account
  • Google Ads: $190 per month for one account

As you can see, the price will scale higher if the company has multiple FB Ads / Bing Ads accounts to set up with Supermetrics. Let’s call the cost ballpark for this Stitch / Supermetrics / Google Data Transfer stack $750 per month.

With Supermetrics, you will save time up front setting up SQL models to process your raw data (in the case of Bing, Facebook and Google Ads), as they do some of that for you.

However, since this is a one-time setup by your team, the recurring roughly $5k per year difference in cost doesn’t seem to be worth that upfront savings.

Advantage: Stitch

E-commerce Pipeline: the Final Verdict

Based on data source coverage + flexible price of $180 - $350 per month, Stitch is a no brainer to serve as this e-commerce businesses ETL tool.

Pipeline 2: A Blended SEO + PPC Agency with 50 Clients

Now let’s put ourselves in the shoes of a digital agency CEO.

Digital agencies have some unique challenges when building data pipelines:

Your agency will always be onboarding new clients, and rolling off of expired contracts with old clients. Your pipeline needs to keep up with this pace of change.

Each client will have their own idiosyncrasies - distinct analytics setups, marketing channel mappings, and campaign strategies.

Your pipeline needs to handle these idiosyncrasies gracefully, and allow non-technical users to update settings when they shift.

So let’s construct an ETL stack that’ll meet these needs, pulling data from:

  • Google Analytics: Transactions, revenue + goal completions, by source, medium, campaign and landing page
  • Paid search (Bing Ads, Google Ads): Conversions by campaign and landing page
  • Paid social (Twitter Ads, Facebook Ads, LinkedIn Ads): Conversions by campaign and landing page

What ETL tool should you use, and what will it cost?

Data Source Coverage

Stitch covers Google Analytics, paid search (Bing + Google Ads), as well as FB Ads. It doesn’t cover paid social for Twitter or LinkedIn Ads.

Supermetrics does cover Twitter and LinkedIn Ads, as well as Google Analytics, Bing, Facebook and Google Ads.

In addition, since we have distinct settings (channel mappings, data feeds, conversion goals) for each client, we’ll need some way of pushing that manually entered data up to BigQuery.

At CIFL, we use our Google Sheets agency analytics tracking plan template in combination with our internal Sheets Add-on to push these settings up to BigQuery.

To maintain a clean and up-to-date pipeline for your agency, you’ll need to maintain a similar setup.

But for API data alone, either Supermetrics, or a combination of Stitch + Supermetrics will do the trick here.

Advantage: Supermetrics, for full data source coverage

Data Granularity

Similarly to our e-commerce pipeline, Stitch’s ability to flexibly select specific columns for Google Analytics feeds is a big plus here.

Each client we’re setting up will likely require different Google Analytics goals to be added to their feeds, and different dimensions (country, device) that are valuable for analyzing their campaigns.

Supermetrics for BigQuery’s hands-free setup, which automatically writes a set of 10 Google Analytics reports to BigQuery, is definitely easier to set up initially, but just doesn’t lend the same element of control that’s often required at agencies.

In addition, Stitch’s ease of setting a ‘backfill date’ for data to populate is a big help - Supermetrics requires that backfill jobs be configured separately of the initial feed setup, rather than being kicked off automatically for you as Stitch does.

Advantage: Stitch, for flexible construction of Google Analytics reports + ease of setting backfill period

Ease of Modification

For the most part, no matter whether using Stitch or Supermetrics, each new client’s data feeds will require setting up a new feed for each new account.

Facebook Ads (Stitch + SM), Google Analytics (Stitch +SM), LinkedIn Ads (SM) and Twitter Ads (SM) will all have this 1:1 account setup structure.

The exceptions are Bing Ads + Google Ads via Stitch, which have a checkbox-style workflow that allows you to easily add new clients to existing feeds.

Advantage: Push

Cost

Let’s build our potential stack and map out the total price range.

Let’s use Stitch for analytics + paid search:

  • Google Analytics
  • Google Ads
  • Bing Ads

And Supermetrics for paid social:

  • Facebook Ads
  • LinkedIn Ads
  • Twitter Ads

In our experience at CIFL, an agency with 50 PPC clients will require between 25 and 50 million rows per month of Stitch credit, for a price range of $350 - $500.

Supermetrics is a bit more complicated to price out - their sticker pricing is quoted per account per month, with a cost of $490 per month per API for 5 accounts.

But for more complex setups, their sales team negotiates volume discounts on a case-by-case basis. For this entire pipeline, they ballparked the cost at roughly $20k per year, or $1700 per month.

What if we wanted to use Supermetrics just for pushing paid social data up to BigQuery, and use Stitch for analytics + paid search?

If we have 10 clients active on all 3 paid social platforms (Facebook Ads, LinkedIn Ads, Twitter Ads), the Supermetrics team quoted a ballpark price of $10k - $15k per year ($833 - $1250 per month).

That brings our total ETL tool cost to between $1183 and $1750 per month ($350 - $500 for Stitch, $833 - $1250 for Supermetrics).

That’s quite steep for an ETL stack - at CIFL, we’ve never paid more than $600 per month on behalf of an agency for their ETL stack.

How do we accomplish that? By using Google Sheets as an intermediary to BigQuery.

It’s less reliable than a direct transfer from an API to BigQuery, but in our Agency Data Pipeline service we use:

This is clunkier than a direct BQ connection, and it requires you to sacrifice data granularity (as Sheets won’t handle large ad-level datasets) - but it’s tough to beat the price.

If you were using this alternative stack, the ETL tool price would come out to $419 - $569 per month.

Digital Agency Pipeline: the Final Verdict

No matter what, at a complex digital agency you’ll need multiple ETL tools in your toolbelt.

This gives you a lot of advantage price-wise, as you can shop between providers instead of being locked into a single tool.

In the past, we’ve built this exact pipeline using a combination of Stitch, Supermetrics for Sheets, and our CIFL <> BigQuery Connector Add-on, for between $419 - $569 per month.

If you require more detailed data granularity (ie at the ad + date-level), then you can roll a stack of Supermetrics for BigQuery + Stitch for $1183 - $1750 per month.

Supermetrics for BigQuery is a great product, and we enjoy it’s built-into-BigQuery UI, but the cost vs their Sheets Add-on is tough to swallow.

CIFL Recommends

As much as each tool on the market (including Fivetran, Segment and Funnel.io) loves to market themselves as a one-stop shop for your ETL needs, there is no such thing.

There are always new APIs that need integrating. There’s always a new analysis to run, that might require a slightly different tool.

And most importantly, because none of these tools are delivering truly done for you data analysis, they all require downstream work:

  • SQL data modeling using a framework like DBT, and
  • Report construction using Google Data Studio, Looker, Tableau or your favorite data viz tool

So our recommendation is: shop around, mix-and-match your ETL stack, train your team in SQL, and control your destiny.

That said, the pros and cons of Stitch and Supermetrics as we see them are:

Stitch

Pros:

  • Generous pricing
  • Live chat support
  • Full display of error logs for easy correcting of feed errors
  • Simple checkbox-style account setup for select data sources (available for Bing + Google Ads)
  • Automatic backfilling of historical data to a selected backfill date

Cons: * Doesn’t cover Adobe Analytics, paid social data sources like Twitter + Linkedin Ads, or organic social data sources like Facebook + Instagram Insights

Supermetrics for BigQuery

Pros:

  • Setup data feeds from within the BigQuery UI
  • Data feeds come pre-configured with columns and pre-modeled in BigQuery, making for a faster initial setup
  • For less technical users, Data Studio connector shortcuts a lot of data wrangling in SQL

Cons:

  • Account-based pricing leads to significantly higher costs vs Stitch
  • Inability to select specific columns for each feed limits flexibility
  • Backfilling historical data requires setting up separate jobs

Policies + features of both Stitch and Supermetrics are constantly changing - we’ll do our best to keep this review updated as they do.

Thanks much to both the Supermetrics and Stitch teams for their hard work on these tools, and for always furnishing demo accounts for our courses.

Have questions on building your data pipeline? Drop a note to help@codingisforlosers.com, check out our Agency Data Pipeline service or Build your Agency Data Pipeline course.