Employing the Law of Large Numbers in Bottom-Up Forecasting

It is utterly implausible that a mathematical formula should make the future known to us, and those who think it can would once have believed in witchcraft. – Jakob Bernoulli (1655-1705)

This is a topic I’ve touched on numerous times in the past, but I’ve never really taken the time to tackle the subject comprehensively.

Before diving in, I just want to make clear that I’m going to stay in my lane: the frame of reference for this entire piece is around forecasting sales at the point of consumption in retail.

In that context, here are some truths that I consider to be self evident:

Consumers buy specific items in specific stores at specific times. Therefore, in order to plan the retail supply chain from consumer demand back, forecasts are needed by item by store.
Any retailer has a large enough percentage of intermittent demand streams at item/store level (e.g. fewer than 1 sale per week) that they can’t simply be ignored in the forecasting process.
Any given item can have continuous demand in some locations and intermittent demand in other locations.
“Intermittent” doesn’t mean the same thing as “random”. An intermittent demand stream could very well have a distinct pattern that is not visible to the naked eye (nor to most forecast algorithms that were designed to work with continuous demands).
Because of points 1 to 4 above, the Law of Large Numbers needs to be employed to see any patterns that exist in intermittent demand streams.

On this basis, it seems to be a foregone conclusion that the only way to forecast at item/store is by employing a top-down approach (i.e. aggregate sales history to some higher level(s) than item/store so that a pattern emerges, calculate an independent forecast at that level, then push down the results proportionally to the item/stores that participated in the original aggregation of history).

So now the question becomes: How do you pick the right aggregation level for forecasting?

This recent (and conveniently titled) article from Institute of Business Forecasting by Eric Wilson called How Do You Pick the Right Aggregation Level for Forecasting? captures the considerations and drawbacks quite nicely and provides an excellent framework to discuss the problem in a retail context.

A key excerpt from that article is below (I recommend that you read the whole thing – it’s very succinct and captures the essence about how to think about this problem in a short few paragraphs):

When To Go High Or Low?

Despite all the potential attributes, levels of aggregation, and combinations of them, historically the debate has been condensed down to only two options, top down and bottom up.

The top-down approach uses an aggregate of the data at the highest level to develop a summary forecast, which is then allocated to individual items on the basis of their historical relativity to the aggregate. This can be any generated forecast as a ratio of their contribution to the sum of the aggregate or on history which is in essence a naïve forecast.

More aggregated data is inherently less noisy than low-level data because noise cancels itself out in the process of aggregation. But while forecasting only at higher levels may be easier and provides less error, it can degrade forecast quality because patterns in low level data may be lost. High level works best when behavior of low-level items is highly correlated and the relationship between them is stable. Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.

The major challenge is that the required level of aggregation to get meaningful statistical information may not match the precision required by the business. You may also find that the requirements of the business may not need a level of granularity (i.e. Customer for production purposes) but certain customers may behave differently, or input is at the item/customer or lower level. More often than not it is a combination of these and you need multiple levels of aggregation and multiple levels of inputs along with varying degrees of noise and signals.

These are the two most important points:

“High level works best when behavior of low-level items is highly correlated and the relationship between them is stable.”
“Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.”

Now, here’s the conundrum in retail:

The behaviour of low level items is very often NOT highly correlated, making forecasting at higher levels a dubious proposition.
Most popular forecasting methods only work well with continuous demand history data, which can often be scarce at item/store level (i.e. they’re not “good at picking up these patterns”).

My understanding of this issue was firmly cemented about 19 years ago when I was involved in a supply chain planning simulation for beer sales at 8 convenience stores in the greater Montreal area. During that exercise, we discovered that 7 of those 8 stores had a sales pattern that one would expect for beer consumption in Canada (repeated over 2 full years): strong sales during the summer months, lower sales in the cooler months and a spike around the holidays. The actual data is long gone, but for those 7 stores, it looked something like this:

The 8th store had a somewhat different pattern.

And by “somewhat different”, I mean exactly the opposite:

Remember, these stores were all located within about 30 kilometres of each other, so they all experienced generally the same weather and temperature at the same time. We fretted over this problem for awhile, thinking that it might be an issue with the data. We even went so far as to call the owner of the 8 store chain to ask him what might be going on.

In an exasperated tone that is typical of many French Canadians, he impatiently told us that of course that particular store has slower beer sales in the summer… because it is located in the middle of 3 downtown university campuses: fewer students in the summer months = a decrease in sales for beer during that time for that particular store.

If we had visited every one of those 8 stores before we started the analysis (we didn’t), we may have indeed noticed the proximity of university campuses to one particular store. Would we have pieced together the cause/effect relationship to beer sales? My guess is probably not. Yet the whole story was right there in the sales data itself, as plain as the nose on your face.

We happened upon this quirk after studying a couple dozen SKUs across 8 locations. A decent sized retailer can sell tens of thousands of SKUs across hundreds or thousands of locations. With millions of item/store combinations, how many other quirky criteria like that could be lurking beneath the surface and driving the sales pattern for any particular item at any particular location?

My primary conclusion from that exercise was that aggregating sales across store locations is definitely NOT a good idea.

So in terms of figuring out the right level of aggregation, that just leaves us with the item dimension – stay at store level, but aggregate across categories of similar items. But in order for this to be a good option for the top level, we now have another problem: “behavior of low-level items is highly correlated and the relationship between them is stable“.

That second part becomes a real issue when it comes to trying to aggregate across items. Retailers live every day on the front line of changing consumer sentiment and behaviour. As a consequence of that, it is very uncommon to see a stable assortment of items in every store year in and year out.

Let’s say that a category currently has 10 similar items in it. After an assortment review, it’s decided that 2 of those items will be leaving the category and 4 new products will be introduced into the category. This change is planned to be executed in 3 months’ time. This is a very simple variation of a common scenario in retail.

Now think about what that means with regard to managing the aggregated sales history for the top level (category/store):

The item/store sales history currently includes 2 items that will be leaving the assortment. But you can’t simply exclude those 2 items from the history aggregation, because this would understate the category/store forecast for the next 3 months, during which time those 2 items will still be selling.
The item/store level sales history currently does not include the 4 new items that will be entering the assortment. But you can’t simply add surrogate history for the 4 new items into the aggregation, because this would overstate the category/store forecast for next 3 months before those items are officially launched.

In this scenario, how would one go about setting up the category/store forecast in such a way that:

It accounts for the specific items participating in the aggregation at different future times (before, during and after the anticipated assortment change)?
The category/store forecast is being pushed down to the correct items at different future times (before, during and after the anticipated assortment change)?

And this is a fairly simple example. What if the assortment changes above are being rolled out to different stores at different times (e.g. a test market launch followed by a staged rollout)? What if not every store is carrying the full 10 SKU assortment today? What if not every store will be carrying the full 12 SKU assortment in the future?

The complexity of trying to deal with this in a top-down structure can be nauseating.

So it seems that we find ourselves in a bit of a pickle here:

The top-down approach is unworkable in retail because the behaviour between locations for the same item are not correlated (beer in Montreal stores) and the relationships among items for the same location are not stable (constantly changing assortments).
In order for the bottom-up approach to work, there needs to be some way of finding patterns in intermittent data. It’s a self-evident truth that the only way to do this is by aggregating.

So the Law of Large Numbers is still needed to solve this problem, but in a retail setting, there is no “right level” of aggregation above item/store at which to develop reliable independent top level forecasts that are also manageable.

Maybe we haven’t been thinking about this problem in the right way.

This is where Darryl Landvater comes in. He’s a long time colleague and mentor of mine best known as a “manufacturing guy” (he’s the author of World Class Production and Inventory Management, as well as co-author of The MRP II Standard System), but in reality he’s actually a “planning guy”.

A number of years ago, Darryl recognized the inherent flaws with using a top-down approach to apply patterns to intermittent demand streams and broke the problem down into two discrete parts:

What is the height of the curve (i.e. rate of sale)?
What is the shape of the curve (i.e. selling profile)?

His contention was that it’s not necessary to use aggregation to calculate completely independent sales forecasts (i.e. height + shape) to achieve this. Instead, what’s needed is to aggregate to calculate selling profiles to be used in cases where the discrete demand history for an item at a store is insufficient to determine one. We’re still using the Law of Large Numbers, but only to solve for the specific problem inherent in slow selling demands – finding the shape of the curve.

It’s called Profile Based Forecasting and here’s a very simplified explanation of how it works:

Calculate an annual forecast quantity for each independent item/store based on sales history from the last 52+ weeks (at least 104 weeks of rolling history is ideal). For example, if an item in a store sold 25 units 2 years ago and 30 units over the most current 52 weeks, then the total forecast for the upcoming 52 weeks might be around 36 units with a calculated trend applied.
Spread the annual forecast into individual time periods as follows:
- If the item/store has a sufficiently high rate of sale that a pattern can be discerned from its own unique sales history (for example, at least 70 units per year), then calculate the selling pattern from only that history and multiply it through the item/store’s selling rate.
- If the item/store’s rate of sale is below the “fast enough to use its own history” threshold, then calculate a sales pattern using a category of similar items at the same store and multiply those percentages through the independently calculated item/store annual forecast.

There is far more to it than that, but the separation of “height of the curve” from “shape of the curve” as described above is the critical design element that forms the foundation of the approach.

Think about what that means:

If an item/store’s rate of sale is sufficient to calculate its own independent sales profile at that level, then it will do so.
If the rate of sale is too low to discern a pattern, then the shape being applied to the independent item/store’s rate of sale is derived by looking at similar items in the category within the same store. Because the profiles are calculated from similar products and only represent the weekly percentages through which to multiply the independent rate of sale, they don’t need to be recalculated very often and are generally immune to the “ins and outs” of specific products in the category. It’s just a shape, remember.
All forecasting is purely bottom-up. Every item at every store can have its own independent forecast with a realistic selling pattern and there are no forecasts to be calculated or managed above the item/store level.
The same forecast method can be used for every item at every store. The only difference between fast and slow selling items is how the selling profile is determined. As the selling rate trends up or down over time, the appropriate selling profile will be automatically applied based on a comparison to the threshold. This makes the approach very “low touch” – demand planners can easily oversee several hundred thousand item/store combinations by managing only exceptions.

With realistic, properly shaped forecasts for every item/store enabled without any aggregate level modelling, it’s now possible to do top-down stuff that makes sense, such as applying promotional lifts or overrides for an item across a group of stores and applying the result proportionally based on each store’s individual height and shape for those specific weeks, rather than using a naive “flat line” method.

Simple. Intuitive. Practical. Consistent. Manageable. Proven.

When To Go High Or Low?

Related

Leave a Reply Cancel reply