Marketing Optimization with Media Mix Modeling: Strong's Approach

Spatial Optimization, Strategy - April 26, 2022

Marketing Optimization with Media Mix Modeling: Strong’s Approach

Atmos Data

SENIOR ML SCIENTIST

Introduction

At Atmos we’ve done a lot of work in the marketing space, building both models and applications to help clients meet their needs. When it comes to models specifically, we’ve productionized analysis to help our clients understand the impact of their marketing efforts on key performance indicators (KPI) such as sales, brand awareness, customer acquisition and retention, and more. One of the tools we’ve implemented on multiple occasions is media mix modeling (MMM), which is a form of analysis that helps distill how the variety of marketing spend results in actual change in KPI, and furthermore, how to optimize that budget to maximize KPI. Though not exactly new, MMM has caught on rapidly in recent years, and is increasingly used, so much so that we’ve even talked about it here at Atmos before! In this post we’ll discuss the basics of media mix modeling just to get our bearings, but we defer to that post for a more detailed overview. In that previous post, James also mentioned bespoke MMM as something that is likely needed to tailor MMM to client-specific needs, and we’ll dive more into that here. We’ll also discuss some of the challenges of MMM, and how we’ve addressed them in our work. Finally, we’ll discuss some of the benefits of MMM, and how it can be used to help clients make better decisions.

Media Mix Modeling

What is MMM?

The basic idea in MMM is that we have a set of marketing channels, and we want to understand how much each channel contributes to a given KPI. For example, we might have a set of channels like TV, radio, print, and digital, and we want to understand how much each channel contributes to sales. A key aspect of MMM is that it is a very interpretation-focused approach. That is, we want to understand the specifics of the impact of each channel on the KPI. This is important because, along with being able to make predictions about the future, we want to be able to use the model to make decisions about how to allocate our marketing budget right now. For example, if we know that TV has a large impact on sales, but radio has a small impact, then we might want to begin allocating more of our budget to TV and less to radio.

Challenges

MMM is not without its challenges though, and there are many. Some of these include:

Adding non-spend features while still teasing out the impact of spend
Dealing with non-linear spend-KPI relationships
Handling lagged contributions of spend
Irregular spend patterns (e.g. weekly vs. monthly budgets)
Dealing with regional marketing and KPIs
Incorporating seasonality
Setting numerous constraints on budgets, spend effects, and more
Accurately assessing uncertainty in predictions
Maintaining interpretability of the model
Different tools and implementations will handle these challenges differently, or maybe not at all. However, it’s important to understand the limitations of any approach, and to understand the implications of those limitations. For example, if an approach did not allow for non-linear spend-KPI relationships, then it may be difficult to capture the true impact of spend on KPI at higher spending levels. Likewise, if a tool does not allow for group structure, then it may be difficult to capture the impact of marketing spend at different levels, such as national vs. regional vs. store level.

Modeling Aspects

How is MMM Implemented?
Starting out, the basic approach of an MMM can be seen as a standard linear model. Here is an example predicting some KPI with different media channels:

In this model, we interpret the
as the impact of each channel on the KPI, often called the efficiency of the channel (or return on investment, ROI). An issue is that we want most of the interpretability of this simple linear model, but we have to take many other factors into account such as seasonality, the lagged effect of marketing spend, factors unrelated to marketing, and more. We’ll discuss some of these in more detail below.

Additional MMM Components

TIME SERIES

MMM is inherently a time-series or longitudinal model. As such, the initial linear model will be extended to include time series components such as seasonality, trend, or at least, will be modified to meet the underlying assumptions. This could be straightforward inclusions of a linear trend, or it could get more involved, such as including a Fourier series to capture seasonality, or even including a time series specific aspect such as ARIMA. In this case, we might now have a model like this:

where we now have added some function of time, possibly in more than one way.

ADSTOCKING AND SATURATION

Another key aspect of MMM is that we want to account for the lagged effect of marketing spend. That is, we want to account for the fact that the impact of marketing spend is not immediate, but rather comes in staggered impacts from previous time points, with the current time point typically being most influential, and marketing spend at previous time points being less so.

Accumulating the past effects in some form of weighted fashion is typically referred to as adstocking. One of the simplest forms of adstocking we might employ is a geometric decay, where the impact of marketing spend decays exponentially over time. While this is a good conceptual starting point, it has its drawbacks. However, any sort of probability distribution could be used to accumulate effects over a specified time window, with some suggestions including include poisson, negative-binomial, or weibull distributions. The following shows the impact of different distributions on the adstocking of a given time point. Even with the same mean, the distributions can have potentially different impacts on the adstocking [1,2].

Impact of different distributions on adstocking

If we look closer at some distributions, here is what it might look like at different parameter settings. We see that generally most of the adstock occurs early, and is practically nonexistent at the latest time points. However, the weight given to the earliest points can be dramatically different with different parameter settings, even for otherwise similar distributions [3]. Regardless of adstocking approach, the parameters should be minimally allowed to vary by media channel, and potentially by other factors as well.

It’s at this point we start to get into the art of MMM and the realm of bespoke MMM, where we need to tailor the model to the specific needs of the client. For example, we might want to use a distribution to allow the impact of marketing spend to be felt for a longer period of time, because that makes more sense for a particular industry. The distribution may not be the same for each type of media channel, such that TV might have a more immediate impact than mailers. Incorporating adstocking into the model, we might now have something like the following for
media channels, each with their own adstocking parameters/application:

The next thing we want to consider is that spend on media cannot continue to add to the KPI indefinitely. That is, there is a limit to the impact of marketing spend, and we want to account for that. This is typically referred to as saturation. For example, if we spend $10000 on TV one month, we might expect a certain increase in sales, but if we spend $100000, we probably would not expect a 10x increase in sales. In fact, we might expect a smaller increase in sales after a certain allotment of budget, and this is because we’ve reached a saturation point in terms of marketing effectiveness. Common ways to deal with this include using a logistic/sigmoid or Hill function to model the saturation, transforming an otherwise linear spend-KPI relationship to a nonlinear one. Depending on the method chosen, there may be specific parameters to estimate, and this would, like adstocking, need to minimally vary by channel.

The following shows the impact of a logistic function and a Hill function on the spend-KPI relationship. Note that the Hill function can be seen as a generalization of the logistic function [4]. The bottom plot shows the two-parameter Hill function with different parameter values, and we see potentially dramatic differences.

Now our model is looking something like this for
media channels, each with their own adstocking and saturation parameters:

NON-MEDIA FEATURES

Along with temporal features, non-media features also come into play. Such features could be related to general economic conditions, product ratings, or even the weather. For example, you’re likely to sell more ice cream in the summer relative to other times regardless of your spend on marketing. The idea is that we don’t want to attribute to marketing what would already be a KPI increase or change due to other factors, as that could result in misallocated budgets down the line. These features can be included in the model in a variety of ways, but as a starting point we can just include them as an additional linear component. So now we have:

where
is our matrix of non-marketing/media covariates and their corresponding betas
.

CONSTRAINTS
A common assumption in marketing is that marketing has only a potentially positive impact. In terms of our model, this idea plays out that marketing can only have a positive impact on the KPI, since with no marketing, the presumed worst case scenario is that there would be no additional KPI. While a reasonable assumption, it’s not always the case, as some very recent history suggests [5]. However, typical practice is to constrain the betas related to media to be positive. This is easily implemented in some contexts. Important to note that this is not the same as constraining the predicted KPI to be positive, which is a different issue altogether. The former is a constraint on the parameters, while the latter is a constraint on the predictions, and could be done by assuming a different distribution for the KPI.

The Atmos Approach
Here at Atmos we’ve developed a tool in-house to help us implement complex MMM that brings together all the previous aspects, and more. The basic idea is that we want to be able to take into account the complexity, but we want to do so in a way that is flexible, interpretable, and efficient.

Hierarchical or Other Group Effects
One common aspect of MMM is that we want to be able to model the impact of marketing spend at different levels. For example, we might have a national campaign or regional campaigns. In addition, perhaps we have data for many different stores in those regions. Any model could potentially add this to the mix, and at Atmos, we are very well versed in models that include random effects for different types of groupings. Depicting such models gets into very verbose notation to account for a single observation, at a single time point, for a single group etc. Let’s focus on the set of observations for just a single group, and we can think of the model as something like this:

Here we add a subscript g to denote the group, and in this case, adstocked effects of media are allowed to vary by group. In addition, we have a national spend component that is shared across all groups. On top of this, we have non-marketing covariates with betas that can potentially vary by group or not or both, and likewise the same for seasonal components.

So while we started with a simple linear MMM, it easily expands to handle a lot of additional complexity, and this can be quite a challenge to implement!

Bayesian
To help with all this complexity, keep things on the interpretable side, and have solid estimates of uncertainty after model estimation, recent MMM endeavors employ a Bayesian approach to inference, and Atmos does too. Bayesian methods have long been implemented to deal with grouped structure in otherwise standard linear models, typically under the guise of ‘mixed’ or ‘random effect’ models. And with the right tool, other complexities such as adstocking can be straightforwardly implemented in a Bayesian approach. Not all aspects of Bayesian modeling are obvious though. For example, we must have suitable priors for the parameters in the model, and we also have to set the positivity constraints for some of the spend parameters. We also need to allow some parameters to vary by group, while allowing others to be constant across all groups. Bayesian approaches allow us to do all this, but they are slower to converge with increasing complexity.

Optimization
With the model in place, we can now start to think about optimization. This regards taking the model and what we know about current marketing spend to find potentially more optimal allocations that can maximize KPI. The basic idea is that we can get predicted KPI via the model, and we know something about the default budget allocation, and we know which media appear to have more ROI than others. With this knowledge in place, we can find an optimal allocation of the budget to maximize KPI. This is typically done via some sort of optimization algorithm, of which there are several, but we also will take into account constraints that may limit our allocation. For example, there may be a minimum spend for a given channel, or a maximum spend for a given channel. We can also take into account the fact that we may not be able or want to spend the entire budget. These sort of factors must be taken into account as part of the optimization process, so that we can find a ‘best’ allocation but given the circumstances.

Tools for MMM
We covered MMM tools in our previous post, but can revisit them briefly here.

Robyn is a comprehensive R package for MMM from the Facebook team. It has options for modeling of many of the aspects mentioned above such as adstocking and saturation, non-media covariates, seasonality, and more. It also has a built-in optimization process to help with budget allocation and nifty visualization functionality. However, its approach actually returns many possible solutions, and it’s up to the user to decide which one to ultimately focus on, and they may not be consistent with one another. Robyn also seems sensitive to even slight data changes, which would then provide more potential model solutions. It’s not clear how well it would work with a different time scale (e.g. monthly), or how it would take into account grouped structure, and it can take a very long time to run, but this may be true of any particular MMM implementation.
LightweightMMM is Google’s foray into the MMM world. It’s a Python package built on numpyro that is open source and freely available to use (Robyn is as well but is mostly developed in R). It also uses a Bayesian approach to account for the many aspects of the MMM model simultaneously (Robyn has a multi-stage approach), and offers more options for adstocking/saturation. Its output is a bit bare bones though, especially compared to Robyn. However, it also provides support for ‘geographic’ specific results, something Robyn does not. Though it’s touted to be ‘lightweight’ and speedier than Robyn, in actual practice your mileage may vary.
HOW Atmos COMPARES
Atmos MMM provides a custom approach to MMM that can be tailored to the client’s needs, and combines the best that other tools have to offer while adding a few more benefits. Some key aspects of our approach includes:

Our approach can handle grouped structure (regions, stores, etc.)
We can have group-specific effects for media efficiencies and non-media effects, while leaving others constant across groups (e.g. national spend)
We can implement a variety of adstocking and saturation functions, which can also have group-specific parameters
Our approach deals with missing data and uneven time series across groups
We have multiple ways to incorporate time-based effects
We have an optimization approach that reflects client’s specific needs (e.g. potentially group-specific constraints)
But aside from this, we engage with the client to better understand how to implement the model. A first stab may uncover budget issues that were not initially revealed, data problems, or other challenges. By continuously engaging with the client, we can further modify the model approach to their needs, and provide a solution that is interpretable, efficient, and actionable. Once the model has been mostly settled, we can further discuss with the client action steps for making decisions about budget allocation.

On top of all this, we can provide a host of other services related to AI, machine learning, and predictive analytics that have nothing specifically to do with MMM, and we have a wealth of knowledge for a variety of industry domains. Many clients will have needs that extend beyond their marketing spend, and Atmos is able to seamlessly integrate those efforts into a multi-faceted solution to the challenges a client faces.

Benefits of a Atmos MMM
The ability of being able to optimally allocate your budget pays dividends for any company large or small. It potentially saves money by not spending as much on things that aren’t paying off, and makes money by investing in those that have a larger return on investment. In addition, allocation optimization can be context specific, and we are able to understand how non-media features also contribute to KPI. On top of all this, we can provide a sense of the limitations of model-based optimization recommendations. Getting a prediction doesn’t mean things will definitely turn out that way, and having more at your disposal to make an informed decision puts you at an advantage.

Atmos provides production-level MMM solutions that can be tailored to your needs. We can provide expertise regarding the model, the optimization, and the interpretation. We can help you understand the impact of your marketing spend, and help you optimize your budget to maximize your KPI. We can also help you understand the limitations of the results, helping you make better decisions about your marketing spend. If you’re interested in learning more about how Atmos can help you, please reach out to us!

Footnotes
The Weibull distribution is a generalization of the exponential distribution, and is often used to model the lifetime of an object. Unlike the others, it is actually a continuous distribution by default, though a discrete version has been derived and used here, and it still has the same mean as the other distributions. In fact, the geometric distribution can be seen as a special case of both the negative binomial and weibull distribution.
All distributions depicted have the same mean. Geometric and Weibull distributions are such that they have the same mean and variance. Relative to those, Poisson has less variance, and the NB has more. As an aside, I’ve also toyed with the Dirichlet so as not to assume any particular decay at all, though this would require more parameters to estimate.
The poisson and negative binomial depicted have the same means, but while the poisson variance equals the mean, this is not the case for the negative binomial.
Note that these methods assume a functional form, which, especially outside of physical/chemical realms and into marketing ones, is probably unlikely. A more flexible approach would be to use something like a generalized additive model (GAM) with monotonicity constraints to model the spend-KPI relationship, which would allow for a more flexible/non-deterministic functional form. Unfortunately this could also be more difficult to implement as it may increase the number of parameters notably.
In 2023, Anheuser-Busch, and more specifically, for their Bud Light product, used an advertising campaign that ultimately resulted in a large negative impact on sales. In addition, in uplift modeling there is an explicit attempt to account for ‘sleeping dogs’ or ‘do not disturbs’, who, for example, might purchase a product unless they are marketed to. As one more thing to consider, statistically speaking, even if all effects were positive on their own, without constraints you almost certainly would have negative effects when holding other spend effects constant due to their collinearity. This is not the same as saying the overall effect of marketing would be negative, but it is a potential issue to consider.