Probabilistic Forecasting – One Man’s (Somewhat Informed) Opinion

A reasonable probability is the only certainty. – E. W. Howe

My, how forecasting methods for supply chain planning have evolved over time:

  • Naive, flat line forecasts (e.g. moving averages) were once used to estimate demand for triggering orders.
  • Time series decomposition type mathematical models added more intelligence around detecting trends and seasonality to enable better long term forecasting.
  • Causal forecasting models allowed different time series to influence each other (e.g. the effect of future planned price changes on forecasted volumes)

All of these methods are deterministic, meaning that their output is a single value representing the “most likely outcome” for each future time period. Ironically, the “most likely outcome” almost never actually materializes.

This brings us to probabilistic forecasting. In addition to calculating a mean (or median) value for each future time period (can be interpreted as the most likely outcome), probabilistic methods also calculate a distinct confidence interval for each individual future forecast period. In essence, instead of having an individual point for each time period into the future, you instead have a cloud of “good forecasts” for various types of scenario modeling and decision making.

But how do you apply this in supply chain management where all of the physical activities driven by the forecast are discrete and deterministic? You can’t submit a purchase order line to a supplier that reads “there’s a 95% chance we’ll need 1 case, a 66% chance we’ll need 2 cases and a 33% chance we’ll need 3 cases”. They need to know exactly how many cases they need to pick, full stop.

The probabilistic forecasting approach can address many “self evident truths” about forecasting that have plagued supply chain planners for decades by better informing the discrete decisions in the supply chain:

  • That not only is demand variable, but variability in demand is also variable over time. Think about a product that is seasonal or highly promotional in nature. The amount of safety stock you need to cover demand variability for a garden hose is far greater in the summer than it is in the winter. By knowing how not just demand but demand variability changes over time, you can properly set discrete safety stock levels at different times of the season. 
  • That uncertainty is inherent in every prediction. Measuring forecasts using the standard “every forecast is wrong, but by how much” method provides little useful information and causes us to chase ghosts. By incorporating a calculated expectation of uncertainty into forecast measurements, we can instead make meaningful determinations about whether or not a “miss” calculated by traditional means was within an expected range and not really a miss at all. The definition of accuracy changes from an arbitrary percentage to a clear judgment call, forecast by forecast, because the inherent and unavoidable uncertainty is treated as part of the signal (which it actually is), allowing us to focus on the true noise.
  • That rollups of granular unit forecasts by item/location to higher levels for capacity and financial planning can be misleading and costly. The ability to also roll up the specific uncertainty by item/location/day allows management to make much more informed decisions about risk before committing resources and capital.

Now here’s the “somewhat informed” part. In order to gain widespread adoption, proponents of probabilistic methods really do need to help us old dogs learn their new tricks. It’s my experience that demand planners can be highly effective without knowing every single rule and formula driving their forecast outputs. If they use off the shelf software packages, the algorithms are proprietary and they aren’t able to get that far down into the details anyhow.

What’s important is that – when looking at all of the information available to the model – a demand planner can look at the output and understand what it was “thinking”, even if they may disagree with it. All models make the general assumption that patterns of the past will continue into the future. Knowing that, a demand planner can quickly address cases where that assumption won’t hold true (i.e. they know something about why the future will be different from the past that the model does not) and take action.

As the pool of early adopters of probabilistic methods grows, I’m looking forward to seeing heaps of case studies and real world examples covering a wide range of business scenarios from the perspective of a retail demand planner – without having to go back to school for 6 more years to earn a PhD in statistics. Some of us are just too old for that shit.

I see great promise, but for the time being, I remain only somewhat informed.

Your Sales Plan is NOT a Forecast!

Man is the only animal that laughs and weeps, for he is the only animal that is struck with the difference between what things are and what they ought to be. – William Hazlitt (1778-1830)

A Ferrari has a steering wheel. A fire truck also has a steering wheel.

A Ferrari has a clutch, brake and accelerator. A fire truck also has a clutch, brake and accelerator.

Most Ferraris are red. Most fire trucks are also red.

A new Ferrari costs several hundred thousand dollars. A new fire truck also costs several hundred thousand dollars.

Ergo, Ferrari = Fire Truck.

That was an absurd leap to make, I know, but no more absurd than using the terms “sales plan” and “sales forecast” interchangeably in a retail setting. Yes, they are each intended to represent a consensus view of future sales, but that’s pretty much where the similarity ends. They differ significantly with regard to purpose, level of detail and frequency of update.

Purpose

The purpose of the sales plan is to set future goals for the business that are grounded in strategy and (hopefully) realism. Its job is to quantify and articulate the “Why” and with a bit of a light touch on the “What” and the “How”. It’s about predicting what we’re trying to make happen.

The purpose of the operational sales forecast is to subjectively predict future customer behaviour based on observed customer demand to date, augmented with information about known upcoming occurrences – such as near term weather events, planned promotions and assortment changes – that may make customers behave differently. It’s all about the “What” and the “How” and its purpose is to foresee what we think is going to happen based on all available information at any one time.

Level of Detail

The sales plan is an aggregate weekly or monthly view of expected sales for a category of goods in dollars. Factored into the plan are category strategies and assumptions (“we’ll promote this category very heavily in the back half” or “we will expand the assortment by 20% to become more dominant”), but usually lacking in the specific details which will be worked out as the year unfolds.

The operational sales forecast is a detailed projection by item/location/week in units, which is how customers actually demand product. It incorporates all of the specific details that flow out of the sales plan whenever they become available.

Frequency of Update

The sales plan is generally drafted once toward the end of a fiscal year so as to get approval for the strategies that will be employed to drive toward the plan for the upcoming year.

The operational sales forecast is updated and rolled forward at least weekly so as to drive the supply chain to respond to what’s expected to happen based on everything that has happened to date up to and including yesterday.

“Reconciling” the Plan and the Forecast

Being more elemental, the operational forecast can be easily converted to dollars and rolled up to the same level at which the sales plan was drafted for easy comparison.

Whenever this is done, it’s not uncommon to see that the rolled up operational forecast does not match the sales plan for any future time period. Nor should it. And based on the differences between them discussed above, how could it?

This should not be panic inducing, rather a call to action:

“According to the sales plan that was drafted months ago, Category X should be booking $10 million in sales over the next 13 weeks.”

“According to the sales forecast that was most recently updated yesterday to include all of the details that are driving customer behaviour for the items in Category X, that ain’t gonna happen.”

Valuable information to have, is it not? Especially since the next 13 weeks are still out there in a future that has yet to transpire.

Clearly assumptions were made when the sales plan was drafted that are not coming to pass. Which assumptions were they and what can we do about them?

While a retailer can’t directly control customer behaviour (wouldn’t that be grand?), they have many weapons in their arsenal to influence it significantly: advertising, pricing, promotions, assortment, cross-selling – the list goes on.

The predicted gap between the plan and the forecast drives tactical action to close the gap:

Maybe it turns out that the tactics you employ will not close the gap completely. Maybe you’re okay with it because the category is expected to track ahead later in the year. Maybe another category will pick up the slack, making the overall plan whole. Or maybe you still don’t like what you’re seeing and need to sharpen your pencil again on your assumptions and tactics.

Good thing your sales plan is separate and distinct from your sales forecast so that you can know about those gaps in advance and actually do something about them.

Your Forecast is Wrong (and That’s Okay)

Just because you made a good plan, doesn’t mean that’s what’s gonna happen. – Taylor Swift

I was 25 years old the first time I met with a financial advisor. I was unmarried, living in a small midtown Toronto apartment and working in my first full time job out of university. 

I can’t say I remember all of the details, but we did go through all of the standard questions:

  • Will I be getting married? Having kids? How many kids?
  • How do I see my career progressing?
  • When might I want to retire?
  • What kind of a lifestyle do I want to have in retirement?

On the basis of that interview, we developed a savings plan and I started executing on it.

The following is an abridged list of events that have happened since that initial plan was created a quarter century ago, only a couple of which were accounted for (vaguely) in my original plan:

  • I left my stable job to pursue a not-so-stable career in consulting
  • I moved from my first apartment to a slightly larger apartment
  • I got married
  • We moved into an even bigger apartment
  • We had a kid
  • We moved into a house
  • We had two more kids
  • I co-authored a book
  • My wife went back to school for her Masters
  • The 2008 financial crisis happened
  • The Canadian government made numerous substantial changes to personal and corporate tax rules and registered savings programs
  • We sold our house and built a new house
  • Numerous cars were bought, many of which died unexpectedly
  • COVID-19 happened

You get the idea. Many of these events (and numerous others not listed) required a re-evaluation of our goals, a change in the plan to achieve those goals or both.

The key takeaway from all of this is obvious: That because the original plan bears no resemblance to what it is today, planning for an unknown and unknowable future is a complete waste of time. 

At this point, you may be feeling a bit bewildered and thinking that this conclusion is – to put it kindly – somewhat misinformed. 

I want you to recall that feeling of bewilderment whenever you hear or read people saying things (in a supply chain context) like “You shouldn’t be forecasting because forecasts are always wrong” or “Forecasting is a waste of time because you can’t predict the future anyhow”.

This viewpoint seems to hinge on the notion that a forecast is not needed if your minimum stock levels are properly calculated. To replenish a location, you just need to wait until the actual stock level is about to breach the minimum stock level and automatically trigger an order. No forecasting required!

Putting aside the fact that properly constructed and maintained forecasts drive far more than just stock replenishment to a location, a bit of trickery was employed to make the argument.

Did you catch it?

It’s the “minimum stock levels are properly calculated” part.

In order for the minimum stock level for an item at a location at any point in time to be “properly calculated”, it would by necessity need to account for (at a minimum):

  • The expected selling rate
  • Expected trends
  • Selling pattern (upcoming peaks and troughs)
  • Planned promotional and event impacts
  • Planned price changes
  • Etc.

Do those elements look at all familiar to you? A forecast by any other name is still a forecast.

The simple fact is that customers don’t like to wait. They’re expecting product to be available to purchase at the moment they make the purchase decision. Unless someone has figured out how to circumvent the laws of time and space, the only way to achieve that is to anticipate customer demand before it happens.

It’s true that any given prediction will be “wrong” to one degree or another as the passage of time unfolds and the correctness of your assumptions about the future are revealed. That’s not just a characteristic of a business forecasting process – it’s a characteristic of life in general. Casting aspersions on forecasting because of that fact is tantamount to casting aspersions upon God Himself.

It’s one thing to recognize that forecasts have error, it’s quite another to argue that because forecasts have error, the forecasting process itself has no value.

Forecasting is not about trying to make every forecast exactly match every actual. Rather it’s a voyage of discovery about your assumptions and continuously changing course as you learn.

Changing the game

In 1972, for my 10th birthday, my Mom would buy me a wooden chess set and a chess book to teach me the basics of the game.  Shortly after, I’d become hooked and the timing was perfect as it coincided with Bobby Fischer’s ascendency in September 1972 to chess immortality – becoming the 11th World Champion.

As a chess aficionado, I was recently intrigued by a new and different chess book, Game Changer, by International Grandmaster Matthew Sadler and International Master Natasha Regan.

The book chronicles the evolution and rise of computer chess super-grandmaster AlphaZero – a completely new chess algorithm developed by British artificial intelligence (AI) company DeepMind.

Until the emergence of AlphaZero, the king of chess algorithms was Stockfish.  Stockfish was architected by providing the engine the entire library of recorded grandmaster games, along with the entire library of chess openings, middle game tactics and endgames.  It would rely on this incredible database of chess knowledge and it’s monstrous computational abilities.

And, the approach worked.  Stockfish was the king of chess machines and its official chess rating of around 3200 is higher than any human in history.  In short, a match between current World Champion Magnus Carlsen and Stockfish would see the machine win every time.

Enter AlphaZero.  What’s intriguing and instructive about AlphaZero is that the developers took a completely different approach to enabling its chess knowledge.  The approach would use machine learning.

Rather than try to provide the sum total of chess knowledge to the engine, all that was provided were the rules of the game.

AlphaZero would be architected by learning from examples, rather than drawing on pre-specified human expert knowledge.  The basic approach is that the machine learning algorithm analyzes a position and determines move probabilities for each possible move to assess the strongest move.

And where did it get examples from which to learn?  By playing itself, repeatedly. Over the course of 9 hours, AlphaZero played 44 million games against itself – during which it continuously learned and adjusted the parameters of its machine learning neural network.

In 2017 AlphaZero would play a 100 game match against Stockfish and the match would result in a comprehensive victory for AlphaZero.  Imagine, a chess algorithm, architected based on a probabilistic machine learning approach would teach itself how to play and then smash the then algorithmic world champion!

What was even more impressive to the plethora of interested grandmasters was the manner in which AlphaZero played.  It played like a human, like the great attacking players of all time – a more precise version of Tal, Kasparov, and Spassky, complete with pawn and piece sacrifices to gain the initiative.

The AlphaZero story is very instructive for us supply chain planners and retail Flowcasters in particular.

As loyal disciples know, retail Flowcasting requires the calculation of millions of item/store forecasts – a staggering number.  Not surprisingly, people cannot manage that number of forecasts and even attempting to manage by exception is proving to have its limits.

What’s emerging, and is consistent with the AlphaZero story and learning, is that algorithms (either machine learning or a unified model approach) can shoulder the burden of grinding through and developing item/store specific baseline forecasts of sales, with little to no human touch required.

If you think about it, it’s not as far-fetched as you might think.  It will facilitate a game changing paradigm shift in demand planning.

First, it will relieve the burden of demand planners from learning and understanding different algorithms and approaches for developing a reasonable baseline forecast. Keep in mind that I said a reasonable forecast.  When we work with retailers helping them design and implement Flowcasting most folks are shocked that we don’t worship at the feet of forecast accuracy – at least not in the traditional sense.

In retail, with so many slow selling items, chasing traditional forecast accuracy is a bit of a fool’s game.  What’s more important is to ensure the forecast is sensible and assess it on some sort of a sliding scale.  To wit, if you usually sell between 20-24 units a year for an item at a store with a store-specific selling pattern, then a reasonable forecast and selling pattern would be in that range.

Slow selling items (indeed, perhaps all items) should be forecasted almost like a probability…for example, you’re fairly confident that 2 units will sell this month, you’re just not sure when.  That’s why, counter-intuitively, daily re-planning is more important than forecast accuracy to sustain exceptionally high levels of in-stock…whew, there, I said it!

What an approach like this means is that planners will no longer be dilly-dallying around tuning models and learning intricacies of various forecasting approaches.  Let the machine do it and review/work with the output.

Of course, sometimes, demand planners will need to add judgment to the forecast in certain situations – where the future will be different and this information and resulting impacts would be unknowable to the algorithm.  Situations where planners have unique market insights – be it national or local.

Second, and more importantly, it will allow demand planners to shift their role/work from analytic to strategic – spending considerably more time on working to pick the “winners” and developing strategies and tactics to drive sales, customer loyalty and engagement.

In reality, spending more time shaping the demand, rather than forecasting it.

And that, in my opinion, will be a game changing shift in thinking, working and performance.

Is the juice worth the squeeze?

Squeezing-Oranges

A little over 10 years ago I was on a project to help one of Canada’s largest grocery and general merchandise retailers design and implement new planning processes and technology. My role was the co-lead of the Integrated Planning, Forecasting & Replenishment Team and, shockingly, we ended up with a Flowcasting-like design.

The company was engaged in a massive supply chain transformation and the planning component was only one piece of the puzzle. As a result of this, one of the world’s preeminent consulting firms, Accenture, was retained to help oversee and guide the entire program.

One of the partners leading the transformation was a chap named Gary. Gary was a sports lover, a really decent person, great communicator and good listener. He also had a number of “southern sayings” – nuggets of wisdom gleaned from growing up in the southern United States.

One of his saying’s that’s always stuck with me is his question, “is the juice worth the squeeze?”, alluding to the fact that sometimes the result is not worth the effort.

I can remember the exact situation when this comment first surfaced. We were trying to help him understand that even for slow and very slow selling items, creating a long term forecast by item/store was not only worth the squeeze, but also critical. As loyal and devoted Flowcasting disciples know this is needed for planning completeness and to be able to provide a valid simulation of reality and work to a single set of numbers – two fundamental principles of Flowcasting.

The good news was that our colleague did eventually listen to us and understood that the squeeze was not too onerous and today, this client is planning and using Flowcasting – for all items, regardless of sales velocity.

But Gary’s question is an instructive one and one that I’ve been pondering quite a bit recently, particularly with respect to demand planning. Let me explain.

The progress that’s been made by leading technology vendors in forecasting by item/store has been impressive. The leading solutions essentially utilize a unified model/approach (sometimes based on AI/ML, and in other cases not), essentially allowing demand planners to largely take their hands off the wheel in terms of generating a baseline forecast.

The implications of this are significant as it allows the work of demand planning to be more focused and value added – that is, instead of learning and tuning forecasting models, they are working with Merchants and Leaders to develop and implement programs and strategies to drive sales and customer loyalty.

But, I think, perhaps we might be reaching the point where we’re too consumed with trying to squeeze the same orange.

My point is how much better, or more accurate, can you make an item/store forecast when most retailers’ assortments have 60%+ items selling less than 26 units per year, by item/store? It’s a diminishing return for sure.

Delivering exceptional levels of daily in-stock and inventory performance is not solely governed by the forecast. Integrating and seamlessly connecting the supply chain from the item/store forecast to factory is, at this stage, I believe, even more crucial.

Of course, I’m talking about the seamless integration of arrival-based, time-phased, planned shipments from consumption to supply, and updated daily (or even in real time if needed) based on the latest sales and inventory information. This allows all partners in the supply chain to work to a single set of numbers and provides the foundation to make meaningful and impactful improvements in lead times and ordering parameters that impede product flow.

The leading solutions and enabling processes need to produce a decent and reasonable forecast, but that’s not what’s going to make a difference, in my opinion. The big difference, now, will be in planning flexibility and agility – for example, how early and easily supply issues can be surfaced and resolved and/or demand re-mapped to supply.

You and your team can work hard on trying to squeeze an extra 1-3% in terms of forecast accuracy. You could also work to ensure planning flexibility and agility. Or you could work hard on both.

It’s a bit like trying to get great orange juice. To get the best juice, you need to squeeze the right oranges.

Which ones are you squeezing?

Keep Calm And Blame It On The Lag

 

A good forecaster is no smarter than everyone else, he merely has his ignorance better organized. – Anonymous

stopwatch

I’ve written on the topic of forecast performance measurement from many different angles, particularly in the context of forecasting sales at the point of consumption in retail.

Over the years, I’ve opined that:

  • Forecast accuracy (in the traditional sense) is a useless measure
  • Reasonableness is more important than accuracy, given that forecasts are, by their nature, forgiving planning elements
  • The outsized importance placed on forecast accuracy in supply chain planning is a myth
  • Accuracy and precision must be considered simultaneously
  • Forecasts should be judged against what is a reasonable expectation for accuracy
  • Forecasting at higher levels of aggregation to achieve higher levels of “accuracy” is a waste of time

After going back and re-reading all of that stuff, they are all really just different angles and approaches for delivering the message “popular methods of comparing forecasts and actuals may not be as useful as you think, especially in a retail context”.

But in all of this time there is one key aspect of forecast measurement that I have not addressed: forecast lags. In other words, which forecast (or forecasts) should you be comparing to the actual?

Assuming, for example, that you have a rolling 52 week forecasting process where forecasts and actuals are in weekly buckets, then for any given week, you would have 52 choices of forecasts to compare to a single actual. So which one(s) do you choose?

Let’s get the easy one out of the way first. Considering that the forecast is being used to drive the supply chain, the conventional wisdom is that the most important lag to capture for measurement  is the order lead time, when a firm commitment to purchase must be made based on the forecast. For example, if the lead time is 4 weeks, you’d capture the forecast for 4 weeks from now and measure its accuracy when the actual is posted 4 weeks later.

Nope. To all of that.

While it’s true that measuring the cumulative forecast error over the lead time can be useful for determining safety stock levels, it’s not very useful for measuring the performance of the forecasting process itself, for a couple of reasons:

  1. It is a flagrant violation of demand planning principle. Nothing on the supply side of the equation (inventory levels, lead times, pack rounding, purchasing constraints, etc.) has anything to do with true demand. Customers want the products they want, where they want them and when they want them at a price they’re willing to pay, period. The amount of time it happens to take to get from the point of origin to a customer accessible location is completely immaterial to the customer.
  2. A demand planner’s job is to manage the entire continuum of forecasts over the forecast horizon. If they know about something that will affect demand at any point (or at all points) over the next 52 weeks, the forecasts should be amended accordingly.

Suppose that you’re a demand planner who manages the following item/location. The black line is 3 years’ worth of demand history and a weekly baseline forecast is calculated for the next 52 weeks.


Because you’re a very good demand planner who keeps tabs on the drivers of demand for this product, you know that:

  • The warm weather that drives the demand pattern for this item/location has arrived early and it looks like it’s going to stay that way between now and when the season was originally expected to start.
  • There are 2 one week price promotions coming up that have just been signed off and all of the pertinent details (particularly timing and discount) are known.
  • For the last 3 years, there have been 3 similar products to this one being offered at this location. A decision has just been made to broaden the assortment with 2 additional similar products half way through the selling season.

On that basis, I have 2 questions:

  1. How does the baseline forecast need to change in order to incorporate this new information?
  2. How would your answer to question 1 change if you also knew that the order-to-delivery lead time for this item/location was 1 week? 2 weeks? 12 weeks?

Hint: Because it was established at the outset that “you’re a very good demand planner who keeps tabs on the drivers of demand for this product”, the answer to question 2 is: “Not at all.”

So if measuring forecast error at the lead time isn’t the right way to go, then what lag(s) should be captured for measurement?

As with all things forecasting related, there is no definitive answer to this question. But as a matter of principle, the lags chosen to measure the performance of a demand planning process should based on when facts become “knowable” that could affect future demand and would prompt a demand planner to “grab the stick” and override a baseline forecast modeled based on historical patterns.

In some cases, upstream processes that create or shape demand can provide very specific input to the forecasting process.

For example, it’s common for retailers to have promotional planning processes with specific milestones, for example:

  • Product selection and price discounts are decided 12 weeks out
  • Final design of media to support the ad is decided 8 weeks out
  • Last minute adds, deletes and switches are finalized 3 weeks out

At each of those milestones, decisions can be made that might impact a demand planner’s expectation of demand for the promotion, so in this case, it would be valuable to store forecasts at lags 3, 8 and 12. Similar milestone schedules generally exist for assortment decisions as well.

In other cases, what’s “knowable” to the demand planner can be subject to judgment. For example, if actuals come in higher than forecast for 3 weeks in a row, is that a trend change or a blip? How about 4 weeks in a row?

Lags that are closer in time (say 0 through 4) are often useful in this regard, as they can show error trends forming while they are still fresh.

Unless tied to a demand shaping process with specific milestones as described above, long term lags are virtually useless. Reviewing actuals posted over the weekend and comparing it to a forecast for that week that was created 6 months ago might be an interesting academic exercise, but it’s a complete waste of time otherwise.

The key of measuring is to inform so as to improve the process over the long term.

With the right tools and mindset, today’s “I wish I knew that ahead of time” turns into tomorrow’s knowable information.

Employing the Law of Large Numbers in Bottom-Up Forecasting

 

It is utterly implausible that a mathematical formula should make the future known to us, and those who think it can would once have believed in witchcraft. – Jakob Bernoulli (1655-1705)

forest through the trees

This is a topic I’ve touched on numerous times in the past, but I’ve never really taken the time to tackle the subject comprehensively.

Before diving in, I just want to make clear that I’m going to stay in my lane: the frame of reference for this entire piece is around forecasting sales at the point of consumption in retail.

In that context, here are some truths that I consider to be self evident:

  1. Consumers buy specific items in specific stores at specific times. Therefore, in order to plan the retail supply chain from consumer demand back, forecasts are needed by item by store.
  2. Any retailer has a large enough percentage of intermittent demand streams at item/store level (e.g. fewer than 1 sale per week) that they can’t simply be ignored in the forecasting process.
  3. Any given item can have continuous demand in some locations and intermittent demand in other locations.
  4. “Intermittent” doesn’t mean the same thing as “random”. An intermittent demand stream could very well have a distinct pattern that is not visible to the naked eye (nor to most forecast algorithms that were designed to work with continuous demands).
  5. Because of points 1 to 4 above, the Law of Large Numbers needs to be employed to see any patterns that exist in intermittent demand streams.

On this basis, it seems to be a foregone conclusion that the only way to forecast at item/store is by employing a top-down approach (i.e. aggregate sales history to some higher level(s) than item/store so that a pattern emerges, calculate an independent forecast at that level, then push down the results proportionally to the item/stores that participated in the original aggregation of history).

So now the question becomes: How do you pick the right aggregation level for forecasting?

This recent (and conveniently titled) article from Institute of Business Forecasting by Eric Wilson called How Do You Pick the Right Aggregation Level for Forecasting? captures the considerations and drawbacks quite nicely and provides an excellent framework to discuss the problem in a retail context.

A key excerpt from that article is below (I recommend that you read the whole thing – it’s very succinct and captures the essence about how to think about this problem in a short few paragraphs):


When To Go High Or Low?

Despite all the potential attributes, levels of aggregation, and combinations of them, historically the debate has been condensed down to only two options, top down and bottom up.

The top-down approach uses an aggregate of the data at the highest level to develop a summary forecast, which is then allocated to individual items on the basis of their historical relativity to the aggregate. This can be any generated forecast as a ratio of their contribution to the sum of the aggregate or on history which is in essence a naïve forecast.

More aggregated data is inherently less noisy than low-level data because noise cancels itself out in the process of aggregation. But while forecasting only at higher levels may be easier and provides less error, it can degrade forecast quality because patterns in low level data may be lost. High level works best when behavior of low-level items is highly correlated and the relationship between them is stable. Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.

The major challenge is that the required level of aggregation to get meaningful statistical information may not match the precision required by the business. You may also find that the requirements of the business may not need a level of granularity (i.e. Customer for production purposes) but certain customers may behave differently, or input is at the item/customer or lower level. More often than not it is a combination of these and you need multiple levels of aggregation and multiple levels of inputs along with varying degrees of noise and signals.


These are the two most important points:

  • “High level works best when behavior of low-level items is highly correlated and the relationship between them is stable.”
  • “Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.”

Now, here’s the conundrum in retail:

  • The behaviour of low level items is very often NOT highly correlated, making forecasting at higher levels a dubious proposition.
  • Most popular forecasting methods only work well with continuous demand history data, which can often be scarce at item/store level (i.e. they’re not “good at picking up these patterns”).

My understanding of this issue was firmly cemented about 19 years ago when I was involved in a supply chain planning simulation for beer sales at 8 convenience stores in the greater Montreal area. During that exercise, we discovered that 7 of those 8 stores had a sales pattern that one would expect for beer consumption in Canada (repeated over 2 full years): strong sales during the summer months, lower sales in the cooler months and a spike around the holidays. The actual data is long gone, but for those 7 stores, it looked something like this:

The 8th store had a somewhat different pattern.

And by “somewhat different”, I mean exactly the opposite:

Remember, these stores were all located within about 30 kilometres of each other, so they all experienced generally the same weather and temperature at the same time. We fretted over this problem for awhile, thinking that it might be an issue with the data. We even went so far as to call the owner of the 8 store chain to ask him what might be going on.

In an exasperated tone that is typical of many French Canadians, he impatiently told us that of course that particular store has slower beer sales in the summer… because it is located in the middle of 3 downtown university campuses: fewer students in the summer months = a decrease in sales for beer during that time for that particular store.

If we had visited every one of those 8 stores before we started the analysis (we didn’t), we may have indeed noticed the proximity of university campuses to one particular store. Would we have pieced together the cause/effect relationship to beer sales? My guess is probably not. Yet the whole story was right there in the sales data itself, as plain as the nose on your face.

We happened upon this quirk after studying a couple dozen SKUs across 8 locations. A decent sized retailer can sell tens of thousands of SKUs across hundreds or thousands of locations. With millions of item/store combinations, how many other quirky criteria like that could be lurking beneath the surface and driving the sales pattern for any particular item at any particular location?

My primary conclusion from that exercise was that aggregating sales across store locations is definitely NOT a good idea.

So in terms of figuring out the right level of aggregation, that just leaves us with the item dimension – stay at store level, but aggregate across categories of similar items. But in order for this to be a good option for the top level, we now have another problem: “behavior of low-level items is highly correlated and the relationship between them is stable“.

That second part becomes a real issue when it comes to trying to aggregate across items. Retailers live every day on the front line of changing consumer sentiment and behaviour. As a consequence of that, it is very uncommon to see a stable assortment of items in every store year in and year out.

Let’s say that a category currently has 10 similar items in it. After an assortment review, it’s decided that 2 of those items will be leaving the category and 4 new products will be introduced into the category. This change is planned to be executed in 3 months’ time. This is a very simple variation of a common scenario in retail.

Now think about what that means with regard to managing the aggregated sales history for the top level (category/store):

  • The item/store sales history currently includes 2 items that will be leaving the assortment. But you can’t simply exclude those 2 items from the history aggregation, because this would understate the category/store forecast for the next 3 months, during which time those 2 items will still be selling.
  • The item/store level sales history currently does not include the 4 new items that will be entering the assortment. But you can’t simply add surrogate history for the 4 new items into the aggregation, because this would overstate the category/store forecast for next 3 months before those items are officially launched.

In this scenario, how would one go about setting up the category/store forecast in such a way that:

  1. It accounts for the specific items participating in the aggregation at different future times (before, during and after the anticipated assortment change)?
  2. The category/store forecast is being pushed down to the correct items at different future times (before, during and after the anticipated assortment change)?

And this is a fairly simple example. What if the assortment changes above are being rolled out to different stores at different times (e.g. a test market launch followed by a staged rollout)? What if not every store is carrying the full 10 SKU assortment today? What if not every store will be carrying the full 12 SKU assortment in the future?

The complexity of trying to deal with this in a top-down structure can be nauseating.

So it seems that we find ourselves in a bit of a pickle here:

  1. The top-down approach is unworkable in retail because the behaviour between locations for the same item are not correlated (beer in Montreal stores) and the relationships among items for the same location are not stable (constantly changing assortments).
  2. In order for the bottom-up approach to work, there needs to be some way of finding patterns in intermittent data. It’s a self-evident truth that the only way to do this is by aggregating.

So the Law of Large Numbers is still needed to solve this problem, but in a retail setting, there is no “right level” of aggregation above item/store at which to develop reliable independent top level forecasts that are also manageable.

Maybe we haven’t been thinking about this problem in the right way.

This is where Darryl Landvater comes in. He’s a long time colleague and mentor of mine best known as a “manufacturing guy” (he’s the author of World Class Production and Inventory Management, as well as co-author of The MRP II Standard System), but in reality he’s actually a “planning guy”.

A number of years ago, Darryl recognized the inherent flaws with using a top-down approach to apply patterns to intermittent demand streams and broke the problem down into two discrete parts:

  1. What is the height of the curve (i.e. rate of sale)?
  2. What is the shape of the curve (i.e. selling profile)?

His contention was that it’s not necessary to use aggregation to calculate completely independent sales forecasts (i.e. height + shape) to achieve this. Instead, what’s needed is to aggregate to calculate selling profiles to be used in cases where the discrete demand history for an item at a store is insufficient to determine one. We’re still using the Law of Large Numbers, but only to solve for the specific problem inherent in slow selling demands – finding the shape of the curve.

It’s called Profile Based Forecasting and here’s a very simplified explanation of how it works:

  1. Calculate an annual forecast quantity for each independent item/store based on sales history from the last 52+ weeks (at least 104 weeks of rolling history is ideal). For example, if an item in a store sold 25 units 2 years ago and 30 units over the most current 52 weeks, then the total forecast for the upcoming 52 weeks might be around 36 units with a calculated trend applied.
  2. Spread the annual forecast into individual time periods as follows:
    • If the item/store has a sufficiently high rate of sale that a pattern can be discerned from its own unique sales history (for example, at least 70 units per year), then calculate the selling pattern from only that history and multiply it through the item/store’s selling rate.
    • If the item/store’s rate of sale is below the “fast enough to use its own history” threshold, then calculate a sales pattern using a category of similar items at the same store and multiply those percentages through the independently calculated item/store annual forecast.

There is far more to it than that, but the separation of “height of the curve” from “shape of the curve” as described above is the critical design element that forms the foundation of the approach.

Think about what that means:

  1. If an item/store’s rate of sale is sufficient to calculate its own independent sales profile at that level, then it will do so.
  2. If the rate of sale is too low to discern a pattern, then the shape being applied to the independent item/store’s rate of sale is derived by looking at similar items in the category within the same store. Because the profiles are calculated from similar products and only represent the weekly percentages through which to multiply the independent rate of sale, they don’t need to be recalculated very often and are generally immune to the “ins and outs” of specific products in the category. It’s just a shape, remember.
  3. All forecasting is purely bottom-up. Every item at every store can have its own independent forecast with a realistic selling pattern and there are no forecasts to be calculated or managed above the item/store level.
  4. The same forecast method can be used for every item at every store. The only difference between fast and slow selling items is how the selling profile is determined. As the selling rate trends up or down over time, the appropriate selling profile will be automatically applied based on a comparison to the threshold. This makes the approach very “low touch” – demand planners can easily oversee several hundred thousand item/store combinations by managing only exceptions.

With realistic, properly shaped forecasts for every item/store enabled without any aggregate level modelling, it’s now possible to do top-down stuff that makes sense, such as applying promotional lifts or overrides for an item across a group of stores and applying the result proportionally based on each store’s individual height and shape for those specific weeks, rather than using a naive “flat line” method.

Simple. Intuitive. Practical. Consistent. Manageable. Proven.

Noise is expensive

Noise

Did you know that the iHome alarm clock, common in many hotels, shows a small PM when the time is after 12 noon?  You wonder how many people fail to note the tiny ‘pm’ isn’t showing when they set the alarm, and miss their planned wake up.  Seems a little complicated and unnecessary, wouldn’t you agree?

Did you also know that most microwaves also depict AM or PM? If you need the clock in the microwave to tell you whether it’s morning or night, somethings a tad wrong.

More data/information isn’t always better. In fact, in many cases, it’s a costly distraction or even provides the opportunity to get the important stuff wrong.

Contrary to current thinking, data isn’t free.

Unnecessary data is actually expensive.

If you’re like me, then your life is being subjected to lots of data and noise…unneeded and unwanted information that just confuses and adds complication.

Just think about shopping now for a moment.  In a recent and instructive study sponsored by Oracle (see below), the disconnect between noise and what consumers really want is startling:

  1. 95% of consumers don’t want to talk or engage with a robot
  2. 86% have no desire for other shiny new technologies like AI or virtual reality
  3. 48% of consumers say that these new technologies will have ZERO impact on whether they visit a store and even worse, only 14% said these things might influence them in their purchasing decisions

From the consumers view what this is telling us, and especially supply chain technology firms, we don’t seem to understand what’s noise and what’s actually relevant. I’d argue we’ve got big time noise issues in supply chain planning, especially when it relates to retail.

I’m talking about forecasting consumer sales at a retail store/webstore or point of consumption.  If you understand retail and analyze actual sales you’ll discover something startling:

  1. 50%+ of product/store sales are less than 20 per year, or about 1 every 2-3 weeks.

Many of the leading supply chain planning companies believe that the answer to forecasting and planning at store level is more data and more variables…in many cases, more noise. You’ll hear many of them proclaim that their solution takes hundreds of variables into account, simultaneously processing hundreds of millions of calculations to arrive at a forecast.  A forecast, apparently, that is cloaked in beauty.

As an example, consider the weather.  According to these companies not only can they forecast the weather, they can also determine the impact the weather forecast has on each store/item forecast.

Now, since you live in the real world with me, here’s a question for you:  How often is the weather forecast (from the weather network that employs weather specialists and very sophisticated weather models) right?  Half the time?  Less?  And that’s just trying to predict the next few days, let alone a long term forecast.  Seems like noise, wouldn’t you agree?

Now, don’t get me wrong.  I’m not saying the weather does not impact sales, especially for specific products.  It does.  What I’m saying is that people claiming to predict it with any degree of accuracy are really just adding noise to the forecast.

Weather.  Facebook posts.  Tweets.  The price of tea in China.  All noise, when trying to forecast sales by product at the retail store.

All this “information” needs to be sourced.  Needs to be processed and interpreted somehow.  And it complicates things for people as it’s difficult to understand how all these variables impact the forecast.

Let’s contrast that with a recent retail implementation of Flowcasting.

Our most recent retail implementation of Flowcasting factors none of these variables into the forecast and resulting plans.  No weather forecasts, social media posts, or sentiment data is factored in at all.

None. Zip. Zilch.  Nada.  Heck, it’s so rudimentary that it doesn’t even use any artificial intelligence – I know, you’re aghast, right?

The secret sauce is an intuitive forecasting solution that produces integer forecasts over varying time periods (monthly, quarterly, semi-annually) and consumes these forecasts against actual sales. So, the forecasts and consumption could be considered like a probability.  Think of it like someone managing a retail store. They can say fairly confidently that “I know this product will sell one this month, I just don’t know what day”!

The solution also includes simple replenishment logic to ensure all dependent plans are sensible and ordering for slow selling products is based on your opinion on how probable you think a sale is likely in the short term (i.e., orders are only triggered for a slow selling item if the probability of making a sale is high).

In addition to the simple, intuitive system capabilities above, the process also employs and utilizes a different kind of intelligence – human.  Planners and category managers, since they are speaking the same language – sales – easily come to consensus for situations like promotions and new product introductions.  Once the system is updated then the solution automatically translates and communicates the impact of these events for all partners.

So, what are the results of using such a simple, intuitive process and solution?

The process is delivering world class results in terms of in-stock, inventory performance and costs.  Better results, from what I can tell, than what’s being promoted today by the more sophisticated solutions.  And, importantly, enormously simpler, for obscenely less cost.

Noise is expensive.

The secret for delivering world class performance (supply chain or otherwise) is deceptively simple…

Strip away the noise.

Unvarnished

It’s an altercation that’s stuck with me for decades.

Roughly twenty years ago I was leading a retail team that would eventually design what we now call Flowcasting. We were an eclectic team, full of passion and dedicated to designing and implementing something new, and much better.

After a particularly explosive team session – that saw tensions and ideas run hot – everyone went back to their workstations to let sleeping dogs lie. One business team member, who’d really gotten into it with one of the IT associates, could not contain his passion. He promptly walked over to the team member’s cubicle and said…

“Oh, one more thing…F**k You!!”

Like most of the team, I was a little startled. I went over and talked to the team member and we had a good chat about how inappropriate his actions were. Luckily the IT team member was one cool dude and he didn’t take offence to it – the event just rolled off his back. To his credit, the next day my team member formally apologized and all was forgiven.

Now, please don’t think I’m condoning this type of action. I’m not. However, as a student of business, change and innovation I’ve been actively learning and trying to understand what really seeds innovation and, in particular, what types of people seem to be able to make change happen.

And, during my research and studies, I keep coming back to this event. It’s evidence of what seems to be a key trait and characteristic of innovative teams and people. They are what many refer to as…

Unvarnished.

If I think back to that team from two decades ago, we were definitely unvarnished. We called a spade a spade. Had little to no respect to the company hierarchy and even less for the status quo. And, as a team, we were brutally honest with each other and everyone on the team felt very comfortable letting me know when I was full of shit – which was, and continues to be, often.

But that team moved, as Steve Jobs would say, mountains – not only designing what would later morph into Flowcasting, but implementing a significant portion of the concept and, as a result, changing the mental model of retail planning.

I had no idea at the time, but being unvarnished was the key trait we had. Franseca Gino has extensively studied what makes great teams and penned a brilliant book about her learnings, entitled “Rebel Talent”.

She dedicates consider time to unvarnishment and quotes extensively from Ed Catmull, famed leader of Pixar Animation Studios who’s worked brilliantly with another member of the unvarnished hall of fame – Steve Jobs.

According to Catmull, “a hallmark of creative cultures is that people feel free to share ideas, opinions and criticisms. When the group draws on the unvarnished perspectives of all its members, the collective knowledge and decision making benefits.”

According to Catmull, and others (including me), “Candor is the key to constructive collaboration”. The KEY to disruptive innovation.

Here’s another example to prove my point. When I was consulting at a national western Canadian retailer, our team was lucky to have an Executive Sponsor who was, as I now understand, unvarnished as well.

As the project unfolded I was amazed how he operated and the way he encouraged and responded to what I’d call dissent. Most leaders of teams absolutely abhor dissent – having been unfortunately schooled over time that company hierarchy was there for a reason and was the tie-breaker on decision making and direction setting.

Our Sponsor openly encouraged people to dissent with him and readily and openly changed his mind whenever required. I vividly remember a very tense and rough session around job design and rollout in which he was at loggerheads with the team, including me. When I think back, it was amazing to see how “safe” team members felt disagreeing with him – and, in this case, very passionately.

As it turned out, over the next few days, we continued the dialogue and he changed his opinion 180 degrees – eventually agreeing with his direct report.

Neuroscience refers to this as being able to work with “psychological safety” – which is a fancier way of saying people are free to be unvarnished. To say what they believe, why and to whom with no consequences whatsoever.
Without question, as I’ve been thinking and studying great teams and innovation I realize just how brilliant this Sponsor was and the environment he helped to foster.

How many Executives, Leaders or teams are really working in an unvarnished environment – with complete psychological safety? I think you’d agree, not many.

If you, your company and your supply chain is going to compete and continually evolve and improve, won’t ongoing innovation need to become a way of life? And that means people need to collaborate better, disrupt faster and feel completely comfortable challenging and destroying the status quo.

Now, I’m not saying that when you don’t agree with someone to tell them to go F-themselves.

What I am saying – and other folks who are a lot smarter than me – is that hiring, promoting, encouraging and fostering people and a working environment that is unvarnished will be a crucial!

So here’s to being unvarnished. To being and working in safety. To real collaboration and candor.

And to looking your status quo in the eye and saying…”F**k you!”

Questions and Answers

Questions

Did you know that most, if not all, organizations and innovations started with a question, or series of questions?

Reed Hastings concocted Netflix by asking a simple question to himself…”what if DVD’s could be rented through a subscription-type service, so no one ever had to pay late fees?” (Rumor was that this was just after he’d been hit with a $40 late fee).

Apple Computer was forged by Woz and Jobs asking, “Why aren’t computers small enough for people to have them in their homes and offices?”

In the 1940’s, the Polaroid instant camera was conceived based on the question of a three year old. Edwin H. Land’s daughter grew impatient after her father had taken a photo and asked, “Why do we have to wait for the picture, Daddy?”

Harvard child psychologist Dr. Paul Harris estimated that between the ages of two to five, a child asks about 40,000 questions. Yup, forty thousand!

Questions are pretty important. They lead to thinking, reflection, discovery and sometimes breakthrough ideas and businesses.

The problem is that we’re not five years old anymore and, as a result, we just don’t seem to ask enough questions – especially the “why” and “what if” kinds of questions. We should.

Turns out our quest for answers and solutions would be much better served by questions. To demonstrate the power of questions, let’s consider the evolution of solutions to develop a forward looking, time-phased forecast of consumer demand by item/store.

Early solutions realized that at item/store level a significant number of products sold at a very slow rate. Using just that items sales history, at that store, made it difficult to determine a selling pattern – how the forecasted demand would happen over the calendar year.

To solve this dilemma, many of the leading solutions used the concept of the “law of large numbers” – whereby they could aggregate a number of similar products into a grouping of those products to determine a sales pattern.

I won’t bore you with the details, but that is essentially the essence behind the thinking that, for the retail/store, the forecast pattern would need to be derived from a higher level forecast and then each individual store forecast would be that stores contribution to the forecast, spread across time using the higher level forecast’s selling pattern.

It’s the standard approach used by many solutions, one who’s even labelled it as multi-level forecasting. Most retail clients who are developing a time-phased forecast at item/store are using this approach.

Although the approach does produce a time-phased item/store forecast, it has glaring and significant problems – most notably in terms of complexity, manageability and reasonableness of using the same selling pattern for a product across a number of stores.

To help you understand, consider a can of pork and beans at a grocery retailer. At what level of aggregation would you pick so that the aggregate selling pattern could be used in every store for that product? If you think about it for a while, you’ll understand that two stores even with a few miles of each other could easily have very different selling patterns. Using the same pattern to spread each stores forecast would yield erroneous and poor results. And, in practice, they do.
Not only that, but you need to manage many different levels of a system calculated forecast and ensuring that these multi-level forecasts are synchronized amongst each level – which requires more system processing. Trying to determine the appropriate levels to forecast in order to account for the myriad of retail planning challenges has also been a big problem – which has tended to make the resulting implementations more complex.

As an example, for most of these implementations, it’s not uncommon to have 3 or more forecasting levels to “help” determine a selling pattern for the item/store. Adding to the issue is that as the multi-level implementation becomes more complex, it’s harder for planners to understand and manage.

Suffice it to say, this approach has not worked well. It’s taken a questioner, at heart, to figure out a better, simpler and more effective way.

Instead of the conventional wisdom, much like our 3 year old above, he asked some simple questions…

“What if I calculated a rolling, annual forecast first?” “Couldn’t I then spread that forecast into the weekly/daily selling pattern?”

As it turns out, he was right.

Then, another question…

“Why do I have to create a higher level forecast to determine a pattern?” Couldn’t I just aggregate sales history for like items, in the same store, to determine the selling pattern?”

Turns out, he could.

Finally, a last question…

“Couldn’t I then multiple the annual forecast by the selling pattern to get my time-phased, item/store forecast?”

Yes, indeed he could.

Now, the solution he developed also included some very simple and special thinking around slow selling items and using a varying time period to forecast them – fast sellers in weekly periods, slower sellers monthly and even slower sellers in quarterly or semi-annual periods.

The questions he asked himself were around the ideas of “Why does every item, at the retail store, need to be forecast in weekly time periods?”

Given the very slow rate of sales for most item/stores the answer is they don’t and shouldn’t.

The solution described above was arrived at by asking questions. It works beautifully and if you’re interested in learning more and perhaps asking a few questions of your own, you know how to find me.

So, if you’re a retailer and are using the complicated, hard-to-manage, multi-level forecasting approach outlined above, perhaps you should ask a question or two as well…

1. “Why are we doing it like this?”
2. “Who is using the new approach and how’s it working?”

They’re great questions and, as you now know, questions will lead you to the answers!