Employing the Law of Large Numbers in Bottom-Up Forecasting

 

It is utterly implausible that a mathematical formula should make the future known to us, and those who think it can would once have believed in witchcraft. – Jakob Bernoulli (1655-1705)

forest through the trees

This is a topic I’ve touched on numerous times in the past, but I’ve never really taken the time to tackle the subject comprehensively.

Before diving in, I just want to make clear that I’m going to stay in my lane: the frame of reference for this entire piece is around forecasting sales at the point of consumption in retail.

In that context, here are some truths that I consider to be self evident:

  1. Consumers buy specific items in specific stores at specific times. Therefore, in order to plan the retail supply chain from consumer demand back, forecasts are needed by item by store.
  2. Any retailer has a large enough percentage of intermittent demand streams at item/store level (e.g. fewer than 1 sale per week) that they can’t simply be ignored in the forecasting process.
  3. Any given item can have continuous demand in some locations and intermittent demand in other locations.
  4. “Intermittent” doesn’t mean the same thing as “random”. An intermittent demand stream could very well have a distinct pattern that is not visible to the naked eye (nor to most forecast algorithms that were designed to work with continuous demands).
  5. Because of points 1 to 4 above, the Law of Large Numbers needs to be employed to see any patterns that exist in intermittent demand streams.

On this basis, it seems to be a foregone conclusion that the only way to forecast at item/store is by employing a top-down approach (i.e. aggregate sales history to some higher level(s) than item/store so that a pattern emerges, calculate an independent forecast at that level, then push down the results proportionally to the item/stores that participated in the original aggregation of history).

So now the question becomes: How do you pick the right aggregation level for forecasting?

This recent (and conveniently titled) article from Institute of Business Forecasting by Eric Wilson called How Do You Pick the Right Aggregation Level for Forecasting? captures the considerations and drawbacks quite nicely and provides an excellent framework to discuss the problem in a retail context.

A key excerpt from that article is below (I recommend that you read the whole thing – it’s very succinct and captures the essence about how to think about this problem in a short few paragraphs):


When To Go High Or Low?

Despite all the potential attributes, levels of aggregation, and combinations of them, historically the debate has been condensed down to only two options, top down and bottom up.

The top-down approach uses an aggregate of the data at the highest level to develop a summary forecast, which is then allocated to individual items on the basis of their historical relativity to the aggregate. This can be any generated forecast as a ratio of their contribution to the sum of the aggregate or on history which is in essence a naïve forecast.

More aggregated data is inherently less noisy than low-level data because noise cancels itself out in the process of aggregation. But while forecasting only at higher levels may be easier and provides less error, it can degrade forecast quality because patterns in low level data may be lost. High level works best when behavior of low-level items is highly correlated and the relationship between them is stable. Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.

The major challenge is that the required level of aggregation to get meaningful statistical information may not match the precision required by the business. You may also find that the requirements of the business may not need a level of granularity (i.e. Customer for production purposes) but certain customers may behave differently, or input is at the item/customer or lower level. More often than not it is a combination of these and you need multiple levels of aggregation and multiple levels of inputs along with varying degrees of noise and signals.


These are the two most important points:

  • “High level works best when behavior of low-level items is highly correlated and the relationship between them is stable.”
  • “Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.”

Now, here’s the conundrum in retail:

  • The behaviour of low level items is very often NOT highly correlated, making forecasting at higher levels a dubious proposition.
  • Most popular forecasting methods only work well with continuous demand history data, which can often be scarce at item/store level (i.e. they’re not “good at picking up these patterns”).

My understanding of this issue was firmly cemented about 19 years ago when I was involved in a supply chain planning simulation for beer sales at 8 convenience stores in the greater Montreal area. During that exercise, we discovered that 7 of those 8 stores had a sales pattern that one would expect for beer consumption in Canada (repeated over 2 full years): strong sales during the summer months, lower sales in the cooler months and a spike around the holidays. The actual data is long gone, but for those 7 stores, it looked something like this:

The 8th store had a somewhat different pattern.

And by “somewhat different”, I mean exactly the opposite:

Remember, these stores were all located within about 30 kilometres of each other, so they all experienced generally the same weather and temperature at the same time. We fretted over this problem for awhile, thinking that it might be an issue with the data. We even went so far as to call the owner of the 8 store chain to ask him what might be going on.

In an exasperated tone that is typical of many French Canadians, he impatiently told us that of course that particular store has slower beer sales in the summer… because it is located in the middle of 3 downtown university campuses: fewer students in the summer months = a decrease in sales for beer during that time for that particular store.

If we had visited every one of those 8 stores before we started the analysis (we didn’t), we may have indeed noticed the proximity of university campuses to one particular store. Would we have pieced together the cause/effect relationship to beer sales? My guess is probably not. Yet the whole story was right there in the sales data itself, as plain as the nose on your face.

We happened upon this quirk after studying a couple dozen SKUs across 8 locations. A decent sized retailer can sell tens of thousands of SKUs across hundreds or thousands of locations. With millions of item/store combinations, how many other quirky criteria like that could be lurking beneath the surface and driving the sales pattern for any particular item at any particular location?

My primary conclusion from that exercise was that aggregating sales across store locations is definitely NOT a good idea.

So in terms of figuring out the right level of aggregation, that just leaves us with the item dimension – stay at store level, but aggregate across categories of similar items. But in order for this to be a good option for the top level, we now have another problem: “behavior of low-level items is highly correlated and the relationship between them is stable“.

That second part becomes a real issue when it comes to trying to aggregate across items. Retailers live every day on the front line of changing consumer sentiment and behaviour. As a consequence of that, it is very uncommon to see a stable assortment of items in every store year in and year out.

Let’s say that a category currently has 10 similar items in it. After an assortment review, it’s decided that 2 of those items will be leaving the category and 4 new products will be introduced into the category. This change is planned to be executed in 3 months’ time. This is a very simple variation of a common scenario in retail.

Now think about what that means with regard to managing the aggregated sales history for the top level (category/store):

  • The item/store sales history currently includes 2 items that will be leaving the assortment. But you can’t simply exclude those 2 items from the history aggregation, because this would understate the category/store forecast for the next 3 months, during which time those 2 items will still be selling.
  • The item/store level sales history currently does not include the 4 new items that will be entering the assortment. But you can’t simply add surrogate history for the 4 new items into the aggregation, because this would overstate the category/store forecast for next 3 months before those items are officially launched.

In this scenario, how would one go about setting up the category/store forecast in such a way that:

  1. It accounts for the specific items participating in the aggregation at different future times (before, during and after the anticipated assortment change)?
  2. The category/store forecast is being pushed down to the correct items at different future times (before, during and after the anticipated assortment change)?

And this is a fairly simple example. What if the assortment changes above are being rolled out to different stores at different times (e.g. a test market launch followed by a staged rollout)? What if not every store is carrying the full 10 SKU assortment today? What if not every store will be carrying the full 12 SKU assortment in the future?

The complexity of trying to deal with this in a top-down structure can be nauseating.

So it seems that we find ourselves in a bit of a pickle here:

  1. The top-down approach is unworkable in retail because the behaviour between locations for the same item are not correlated (beer in Montreal stores) and the relationships among items for the same location are not stable (constantly changing assortments).
  2. In order for the bottom-up approach to work, there needs to be some way of finding patterns in intermittent data. It’s a self-evident truth that the only way to do this is by aggregating.

So the Law of Large Numbers is still needed to solve this problem, but in a retail setting, there is no “right level” of aggregation above item/store at which to develop reliable independent top level forecasts that are also manageable.

Maybe we haven’t been thinking about this problem in the right way.

This is where Darryl Landvater comes in. He’s a long time colleague and mentor of mine best known as a “manufacturing guy” (he’s the author of World Class Production and Inventory Management, as well as co-author of The MRP II Standard System), but in reality he’s actually a “planning guy”.

A number of years ago, Darryl recognized the inherent flaws with using a top-down approach to apply patterns to intermittent demand streams and broke the problem down into two discrete parts:

  1. What is the height of the curve (i.e. rate of sale)?
  2. What is the shape of the curve (i.e. selling profile)?

His contention was that it’s not necessary to use aggregation to calculate completely independent sales forecasts (i.e. height + shape) to achieve this. Instead, what’s needed is to aggregate to calculate selling profiles to be used in cases where the discrete demand history for an item at a store is insufficient to determine one. We’re still using the Law of Large Numbers, but only to solve for the specific problem inherent in slow selling demands – finding the shape of the curve.

It’s called Profile Based Forecasting and here’s a very simplified explanation of how it works:

  1. Calculate an annual forecast quantity for each independent item/store based on sales history from the last 52+ weeks (at least 104 weeks of rolling history is ideal). For example, if an item in a store sold 25 units 2 years ago and 30 units over the most current 52 weeks, then the total forecast for the upcoming 52 weeks might be around 36 units with a calculated trend applied.
  2. Spread the annual forecast into individual time periods as follows:
    • If the item/store has a sufficiently high rate of sale that a pattern can be discerned from its own unique sales history (for example, at least 70 units per year), then calculate the selling pattern from only that history and multiply it through the item/store’s selling rate.
    • If the item/store’s rate of sale is below the “fast enough to use its own history” threshold, then calculate a sales pattern using a category of similar items at the same store and multiply those percentages through the independently calculated item/store annual forecast.

There is far more to it than that, but the separation of “height of the curve” from “shape of the curve” as described above is the critical design element that forms the foundation of the approach.

Think about what that means:

  1. If an item/store’s rate of sale is sufficient to calculate its own independent sales profile at that level, then it will do so.
  2. If the rate of sale is too low to discern a pattern, then the shape being applied to the independent item/store’s rate of sale is derived by looking at similar items in the category within the same store. Because the profiles are calculated from similar products and only represent the weekly percentages through which to multiply the independent rate of sale, they don’t need to be recalculated very often and are generally immune to the “ins and outs” of specific products in the category. It’s just a shape, remember.
  3. All forecasting is purely bottom-up. Every item at every store can have its own independent forecast with a realistic selling pattern and there are no forecasts to be calculated or managed above the item/store level.
  4. The same forecast method can be used for every item at every store. The only difference between fast and slow selling items is how the selling profile is determined. As the selling rate trends up or down over time, the appropriate selling profile will be automatically applied based on a comparison to the threshold. This makes the approach very “low touch” – demand planners can easily oversee several hundred thousand item/store combinations by managing only exceptions.

With realistic, properly shaped forecasts for every item/store enabled without any aggregate level modelling, it’s now possible to do top-down stuff that makes sense, such as applying promotional lifts or overrides for an item across a group of stores and applying the result proportionally based on each store’s individual height and shape for those specific weeks, rather than using a naive “flat line” method.

Simple. Intuitive. Practical. Consistent. Manageable. Proven.

Noise is expensive

Noise

Did you know that the iHome alarm clock, common in many hotels, shows a small PM when the time is after 12 noon?  You wonder how many people fail to note the tiny ‘pm’ isn’t showing when they set the alarm, and miss their planned wake up.  Seems a little complicated and unnecessary, wouldn’t you agree?

Did you also know that most microwaves also depict AM or PM? If you need the clock in the microwave to tell you whether it’s morning or night, somethings a tad wrong.

More data/information isn’t always better. In fact, in many cases, it’s a costly distraction or even provides the opportunity to get the important stuff wrong.

Contrary to current thinking, data isn’t free.

Unnecessary data is actually expensive.

If you’re like me, then your life is being subjected to lots of data and noise…unneeded and unwanted information that just confuses and adds complication.

Just think about shopping now for a moment.  In a recent and instructive study sponsored by Oracle (see below), the disconnect between noise and what consumers really want is startling:

  1. 95% of consumers don’t want to talk or engage with a robot
  2. 86% have no desire for other shiny new technologies like AI or virtual reality
  3. 48% of consumers say that these new technologies will have ZERO impact on whether they visit a store and even worse, only 14% said these things might influence them in their purchasing decisions

From the consumers view what this is telling us, and especially supply chain technology firms, we don’t seem to understand what’s noise and what’s actually relevant. I’d argue we’ve got big time noise issues in supply chain planning, especially when it relates to retail.

I’m talking about forecasting consumer sales at a retail store/webstore or point of consumption.  If you understand retail and analyze actual sales you’ll discover something startling:

  1. 50%+ of product/store sales are less than 20 per year, or about 1 every 2-3 weeks.

Many of the leading supply chain planning companies believe that the answer to forecasting and planning at store level is more data and more variables…in many cases, more noise. You’ll hear many of them proclaim that their solution takes hundreds of variables into account, simultaneously processing hundreds of millions of calculations to arrive at a forecast.  A forecast, apparently, that is cloaked in beauty.

As an example, consider the weather.  According to these companies not only can they forecast the weather, they can also determine the impact the weather forecast has on each store/item forecast.

Now, since you live in the real world with me, here’s a question for you:  How often is the weather forecast (from the weather network that employs weather specialists and very sophisticated weather models) right?  Half the time?  Less?  And that’s just trying to predict the next few days, let alone a long term forecast.  Seems like noise, wouldn’t you agree?

Now, don’t get me wrong.  I’m not saying the weather does not impact sales, especially for specific products.  It does.  What I’m saying is that people claiming to predict it with any degree of accuracy are really just adding noise to the forecast.

Weather.  Facebook posts.  Tweets.  The price of tea in China.  All noise, when trying to forecast sales by product at the retail store.

All this “information” needs to be sourced.  Needs to be processed and interpreted somehow.  And it complicates things for people as it’s difficult to understand how all these variables impact the forecast.

Let’s contrast that with a recent retail implementation of Flowcasting.

Our most recent retail implementation of Flowcasting factors none of these variables into the forecast and resulting plans.  No weather forecasts, social media posts, or sentiment data is factored in at all.

None. Zip. Zilch.  Nada.  Heck, it’s so rudimentary that it doesn’t even use any artificial intelligence – I know, you’re aghast, right?

The secret sauce is an intuitive forecasting solution that produces integer forecasts over varying time periods (monthly, quarterly, semi-annually) and consumes these forecasts against actual sales. So, the forecasts and consumption could be considered like a probability.  Think of it like someone managing a retail store. They can say fairly confidently that “I know this product will sell one this month, I just don’t know what day”!

The solution also includes simple replenishment logic to ensure all dependent plans are sensible and ordering for slow selling products is based on your opinion on how probable you think a sale is likely in the short term (i.e., orders are only triggered for a slow selling item if the probability of making a sale is high).

In addition to the simple, intuitive system capabilities above, the process also employs and utilizes a different kind of intelligence – human.  Planners and category managers, since they are speaking the same language – sales – easily come to consensus for situations like promotions and new product introductions.  Once the system is updated then the solution automatically translates and communicates the impact of these events for all partners.

So, what are the results of using such a simple, intuitive process and solution?

The process is delivering world class results in terms of in-stock, inventory performance and costs.  Better results, from what I can tell, than what’s being promoted today by the more sophisticated solutions.  And, importantly, enormously simpler, for obscenely less cost.

Noise is expensive.

The secret for delivering world class performance (supply chain or otherwise) is deceptively simple…

Strip away the noise.

Small data

Lowes Foods, a family-owned grocery chain with stores located throughout North and South Carolina, is one of the region’s largest retailers, but in the late 2000’s they had a problem.

Declining revenues, triggered by the onslaught of ultra-competitors like Walmart and Amazon threatened the very existence of this 100-or-so retail chain, and unless something was done Management contemplated the closing of a number of stores – which, as everyone knew, would really flip the switch to the inevitable death-spiral of cost cutting and downsizing.

Fortunately, Management took action.  They turned to data analytics to help – even before analytics was in vogue. But instead of utilizing what now is known as Big Data, they retained an analytics expert in a different, and more important, field.  The retained the services of Martin Lindstrom.

Martin is one of the world’s leading branding experts and, arguably, the leading guru in a different form of analytics.

He’s a genius using Small Data to uncover stunning and brilliant insights that, in turn, form the basis of new strategies and tactics that help organizations, like Lowes Foods, thrive.

Instead of slicing and dicing volumes of data – which, as a retailer, Lowes had – his work focuses on very small learnings and observations about customers and indeed, Americans in general.

It’s his contention that Small Data, done well, provides the insights and clarity needed that are almost impossible to find with volumes of data.

Big Data is about information. Small Data is about people – finding the needle in a haystack.

For Lowes, he studied American culture – everything from values and beliefs but importantly very small clues that helped formulate his insight and strategy for Lowes.

As an example, he noticed that American hotels are hotels where the windows are locked.  Coupling that with the number of gated communities and a few other small clues, he concluded the following: despite what they tell you, Americans live in fear.

Studying people around the world, one person at a time, he concluded that the last time people were not afraid was when they were children.  Kids, regardless of culture, are by and large care-free.

So his strategy centered on making Lowes Foods more kid-like.  More fun. More entertaining. The place to go.

The entrance was revamped to include both ChickenWorks and SausageWorks – where busy shoppers could buy ready-made meals.  However, in keeping with the kids theme, the purveyors behind each offering were dressed up characters, complete with costumes that put on a show all day long.  They would argue, shout at each other and generally give each other the gears.  It was pure retail-tainment.

Now, Lowes Foods was always known for the quality of its fresh prepared chicken. In keeping with his insights, Lowes also implemented a new ritual.  When batches of new chickens were removed from the oven, a notice came over the loud speaker and all employees, including Management, would break into their “happy chicken dance”, accompanied by a specific ditty to celebrate hot, fresh, quality chicken.

Another important piece of small data Martin leveraged was the passing of business cards in many cultures. In many cultures, how you hand your business card to someone is an important sign of respect and is done by slightly bowing down and passing the card with two hands.

This small data insight led to a change in how ChickenWorks dealt with customers. Now, purveyors of the chicken would pass the chicken to customers using two hands, while slightly acknowledging the customer in the process. This signals respect to the customer and that what is being bought is of high value – both for the customer, and also for the employee.

All small insights. All based on Small Data – observations and learnings by watching and talking to one person at a time.

And it worked. Sales of ChickenWorks and SausageWorks skyrocketed. And Lowes Foods became known as the place to shop – a fun, unpredictable establishment where customers could buy good quality products, but enjoy themselves in the process.

For us supply chain planners, we’re bombarded every day with people touting the virtues and significance of Big Data. And, to be fair, Big Data is and will be important.

But so is Small Data. Small Data provides insights about people. Small Data opens clues to problem resolution that Big Data would suffer to uncover.

Small Data often provides the tiny clues and insights that drive real, significant change. Here’s a great example from supply chain planning.

For long time readers of our blog you know that the capability now exists to forecast and plan slow and very slow selling products at store level (or any final point of consumption). Hopefully you’re also aware that this allows us to Flowcast every product – that is create time-phased sales, inventory, supply and dollar projections, thereby providing the business with a consistent planning process and a single set of numbers across the organization.

Did you ever wonder how the solution for slow selling products came from?  It came from Small Data (a data point from a single person).

The story goes like this. The architect of the solution was talking about planning at store level with a retail store manager in Canada, when the manager proclaimed the following, “I have no idea when these products will sell, all I know is that I’ll sell about 2 every quarter”.

Boom!

And the idea for integer forecasting and forecasting using different planning horizons (e.g., weekly, monthly, quarterly) was planted. Eventually this nugget of Small Data would be parlayed into the world’s leading and, to date, best solution for planning slow and very slow selling products at store level.

The architect of the slow selling solution didn’t get all sorts of slow selling data and then slice and dice the data to try to uncover a solution. Had he tried that approach, he’d likely still be working on it – just like most technology firms and academics.

When it comes to planning slow selling products, we owe Small Data some thanks:

First, Ken for the small data insight.

And then, Darryl for turning insight to solution.