5 Advanced Shopify Reports in BigQuery

advanced shopify reports

David Krevitt

Lover of laziness, connoisseur of lean-back capitalism. Potentially the #1 user of Google Sheets in the world.

Here at CIFL, we love Shopify.

But let’s be honest, the standard reports are mediocre (at best).

You can try exporting the data to Sheets for deeper analysis, but if your store is doing any sort of volume (>1k sales) you’ll need a more powerful solution.

That’s where cloud data warehouses like Google BigQuery come in.


What is Google BigQuery?

Here’s how Google explains it:

“BigQuery is a cloud-based data warehouse for storing + processing large datasets.”

Here’s how we explain it:

BigQuery lets you just push all your data (via API) to a cloud based project where it can be processed and formatted at lightning speed. No more slow browsers or crashed spreadsheets, BigQuery can handle limitless data.

setting up bigquery

On top of that, you can sync your BigQuery database with the visualization tool of your choice.

You’re able to combine data from different platforms into custom reports.

bigquery outputs


Using BigQuery for Shopify analysis

We’ve helped our ecommerce clients setup dozens of custom reports in BigQuery, this post will detail the 5 of the most powerful Ecommerce analyses out there:

  1. Cohort analysis
  2. Buyer segmentation (aka RFM analysis)
  3. Sales + Margin Forecasting
  4. Basket analysis
  5. Retention & churn modeling

Before we dive into how you can run each of these in BigQuery, let’s lay the foundation.

Believe it or not, we use a single BigQuery SQL query to feed all of these metrics.

We call this query the “power pack,” and having a query like it makes your life a *lot* easier.


Unpacking the power pack

Say a buyer orders a pair of socks and a t-shirt.

Answering a few questions about that order tells you everything you need to know:

In BigQuery, you can tag each of your orders with these attributes, so that your base orders table contains all of the underlying info you need to perform any analysis.

As part of our Ecommerce Data Pipeline service, we build these analyses using the dbt framework.

Let’s dive into how the power pack translates into each of the 5 analysis recipes mentioned above:



1. Cohort analysis

Cohort analysis is just a fancy term for a pivot table, where you pivot based on a buyer’s first order month (their “cohort”).

To facilitate this, we tag each order with:

Having all of those attributes at your fingertips makes building cohort charts like this one very straightforward:

shopify cohort analysis

There are a couple key concepts that’ll allow you to get this done:


a. Joining Google Analytics (or other attribution data)

It’s critical to know what marketing channel acquired this buyer, because this may result in wildly different lifetime behavior.

Buyers who first purchase from organic search may be stickier than those finding your store in Google shopping ads, or vice versa.

This lifetime behavior dynamic (aka cohort analysis) is key to knowing how your team’s efforts are paying off, and what areas need attention.

Thankfully, BigQuery (or any cloud warehouse) makes it easy to pipe Google Analytics attribution data into BigQuery, and join them together based on transactionid (GA) and order_number (Shopify).

Note: our BigQuery Recipe for Shopify buyer segmentation performs this join Google Analytics and Shopify data in BigQuery, if you’re looking for an example to start with.

Once you’ve got those two datasets joined up, you’ll be ready to tag your orders.


b. BigQuery window functions

Window (aka analytic) functions in BigQuery are the core of the power pack.  They allow you to quickly tag an individual row (an order), with attributes from the rest of your dataset.

You can think about them like running an unlimited number of GROUP BYs from within the same query.

We use them to tag an order with a handful of attributes:

Having these attributes accessible across your entire order history makes generating *any* cohort analysis pivot table or chart extremely straightforward.

If you’re interested in diving deeper into window functions, check out our free BigQuery tutorial.


2. Buyer Segmentation (RFM Analysis)

Who are your best customers, and where did they come from?  This is the fundamental question we seek to answer with buyer segmentation.

shopify buyer segmentation

It helps to give these segments names, so that your team has a shorthand to work from when discussing customer personas:

To codify these segments, we calculate RFM for each buyer over a given period (usually annual):

Once you’ve calculated RFM for each buyer, we generally define segments based on a combination of percentile distributions of RFM and hardcoded attributes:

If you’re looking to take a deeper dive into buyer segmentation, check out our BigQuery recipe on this topic.


3. Sales + Margin Forecasting

Forecasting doesn’t require a machine learning algorithm for it to be useful – in fact, using a black box algo for forecasting can actually be more harmful than helpful.

That’s because your strategy changes over time, in terms of merchandising and marketing mix.  It’s important to have control to dial a forecast to these anticipated changes.

ecommerce sales forecasting

Before we forecast, we always join a few datasets together to calculate key forecast inputs:

We recommend joining these datasets together in BigQuery, but pulling data back down into Sheets to forecast.


Calculating projected revenues + margins

Ultimately, your revenue equals trafficconversion rateAOV.

You can further break it down into what we call marketing margin: trafficconversion rate * ( AOV – CPA ).

This will tell you how much space you have to increase ad spend, or go after more expensive paid channels.

So once we’ve calculated these on a backward-looking basis, how do we forecast forward?  It really depends on the business, but generally we’ll allow for dialing each of those inputs:

Outside of organic traffic growth rate (which needs to be multiplied by your current base traffic), the rest of these are static variables – you can dial them to your expectations, and see how those dials impact your revenue + margin picture going forward.

Generally we’ll wire up an ‘Assumptions’ box within a Sheets forecast, that displays suggested assumptions based on the last 3-12 months of history.

forecast assumptions

If you’re looking for help specifically with forecasting, ping us on Twitter or in the CIFL Slack group – we haven’t built a specific template / recipe around forecasting yet (given its variability between businesses), but would be happy to help.


4. Basket Analysis

You know those product recommendations on Amazon?  People who bought this also bought…

There’s no reason you can’t do that same analysis for your own shop.

These basket recommendations are extremely powerful when deployed in cross-sell and upsell messaging – whether that’s on your site, in drip email campaigns, or as display ads to custom audiences.

market basket analysis

How do we perform market basket analysis?  It’s ultimately pretty simple – we join every order line item for a buyer to every other order line item.  The rough SQL looks like so:

b.product_id complement_product_id,
b.product_name complement_product_name,
b.final_price complement_final_price
FROM agg_transaction_line_item_type a
LEFT JOIN agg_transaction_line_item_type b
ON (
a.store = b.store AND
a.order_number = b.order_number AND
a.product_id != b.product_id

Once you have order line items joined together, you can pivot out product * complement product, to get a sense of which products are purchased together.

This is also really helpful when done at the product type level, or at the buyer lifetime rather than individual basket level.

You can also calculate lift at the product combination level (essentially the additional $ spent on the combination of two or more products, versus them being purchased alone).

What we’ve found with lift calculations though, that the outcomes aren’t always relevant – even if a niche combination of products had a relatively high lift on a dollar basis, the sample size may be so small that it’s not actionable in real life.

Beware of small sample sizes when performing basket analysis.


5. Customer Retention & Churn Modeling

Shopify outputs a list of orders, out of which you’ll need to derive whether a buyer churned or repeated.

The date window (monthly, quarterly, annually) over which you measure retention will depend, of course, on the products you sell.

Retention modeling is a bit of an odd duck versus these other metrics. It requires bootstrapping a monthly (or weekly / annual, whatever your window is) time series of each of your buyers’ history.

shopify retention modeling

Rather than just count up the months in which they *had* orders, we need to also fill in the gaps of where they *didn’t* order (to count them as churn):

The long and short of how this is done – join your orders data to a date series (we use GENERATE_DATE_ARRAY in BigQuery for this).

This allows you to tag each month for that buyer with a retention / churn status:

Substitute ‘month’ for quarter / year there, based on the cadence you expect buyers to need your product.

Again, generating this type of time series is very difficult to do in Sheets, but in BigQuery can be boiled down to a few lines of SQL.

If you’re interested in getting your hands dirty with the retention modeling code, check out our deep dive into retention & churn modeling.


Standing up your BigQuery Warehouse

If you’re looking to fire up these analysis recipes for your own Shopify store(s), there’s a couple ways to get started:

As always, if you’d like help getting pointed in the right direction, feel free to book a time to chat with David, the Commissioner of CIFL.

Ready for BigQuery?

Get off the ground quickly, with customizable data pipeline Recipes for BigQuery.