A Forecast By Any Other Name

What’s in a name? That which we call a rose
By any other name would smell as sweet. – William Shakespeare (1564-1616), Romeo and Juliet, Act 2 Scene 2

Scenario 1: A store associate walks down the aisles. She sees 6 units of an item on the shelf and determines that more is needed on the next shipment, so she orders another case pack of 12 units.

Scenario 2: In the overnight batch run, a centralized store min/max system averages the last 6 weeks of sales for every item at every store. This average selling rate is used to set a replenishment policy – a replenishment request is triggered when the stock level reaches 2 weeks’ worth of on hand (based on the 6 week average) and the amount ordered is enough to get up to 4 weeks’ worth of on hand, rounded to the nearest pack size.

Scenario 3: In the overnight batch run, a centralized store reorder point system calculates a total sales forecast over the next 2 shipping cycles. It uses 2 years’ worth of sales history so that it can capture a trend and weekly selling pattern for each item/store being replenished and calculate a proper safety stock based on demand variability. On designated ordering days, the replenishment system evaluates the current stock position against the total of expected sales plus safety stock over the next two ordering cycles and triggers replenishment requests as necessary to ensure that safety stock will not be breached between successive replenishment days.

Scenario 4: In the overnight batch run, a centralized supply chain planning system calculates a sales forecast (with expected trend and weekly selling pattern) for the next 52 weeks. Using this forecast, merchandising minimums, store receiving calendars and the current stock position, it calculates when future arrivals of stock are needed at the store to ensure that the merchandising minimums won’t be breached over the next 52 weeks. Using the transit lead time, it determines when each of those planned arrivals will need to be shipped from the supplying distribution centre over the next 52 weeks. The rolled up store shipment projections become the outbound plans for each item/DC, which then performs the same logic to calculate when future inbound arrivals are needed and their corresponding ship dates. Finally the projected inbound shipments to the DC are communicated to suppliers so that they can properly plan their finished goods inventory, production and raw material procurement. For both stores and DCs, the plans are turned into firm replenishment requests at the ordering lead time.

With that out of the way, let’s do some audience participation. I have a question for you: Which of the above replenishment methods are forecast based? (You can pause here to scroll up to read each scenario again before deciding, or you can just look down to the very next line for the answer).

The answer is… they are ALL forecast based.

Don’t believe me?

In Scenario 1, how did the store associate know that a visual stock position of 6 units meant they were “getting low”? And why did she order a single case of 12 in response? Why didn’t she wait until there were 3 units? Or 1 unit? And why did she order 12? Why not 120?

For Scenario 2, you’re probably saying to yourself: “Averaging the past 6 weeks’ worth of sales is looking backward – that’s NOT forecasting!” Au contraire. By deciding to base your FUTURE replenishment on the basis of the last 6 weeks’ worth of sales, an assumption is being made that upcoming sales will be similar to past sales. That assumption IS the forecast. I’m not saying it’s a good assumption or that it will be a good forecast. I’m just saying that the method is forecast based.

Using the terms “trend” and “selling pattern” in Scenarios 3 and 4 probably spoiled the surprise for those ones.

So why did I go through such pains to make this point?

Quite simply, to counter the (foolish and naive) narrative that “forecasts are always wrong, so you shouldn’t bother forecasting at all”. 

The simple fact is that unless you are in a position where you don’t need to replenish stock until AFTER your customer has already committed to buying it, any stock replenishment method you use must by definition be forecast based. I have yet to run across a retailer in the last 28+ years that has that luxury.

A forecast that happens in someone’s head, isn’t recorded anywhere and only manifests itself physically as a replenishment request is still a forecast.

An assumption that next week will be like the average of the last few weeks is still a forecast.

As the march continues and retailers gradually transition from Scenarios 1, 2 or 3 to Scenario 4, the forecasting process will become more formalized and measurable. And it can be a lot of work to maintain them (along with the replenishment plans that are driven by them). 

But the overall effort pays off handsomely. Retailers in Scenarios 1, 2 and 3 have experienced in stock rates of 92-93% with wild swings in inventory levels and chronic stock imbalances. This has been documented time and again for 30 years.

Only by formalizing your forecasting your forecasting process, using those forecasts to drive long term plans and sharing those plans up and down the supply chain can you achieve 97-98% in stock while simultaneously reducing inventory investment, with reduced overall effort.

What Demand Planners Really Need

Necessity never made a good bargain. – Benjamin Franklin (1706-1790)

If you ask someone who thinks they know what a retail demand planner needs from a forecasting system, the response will likely be a list of features and gadgets that they believe will make  forecasts “more accurate”. On the surface, this makes some sense – a more accurate forecast has greater planning value than a less accurate one.

Based on this perceived need, the hunt is on to buy a shiny new forecasting system for the demand planners to use. After some evaluations, the list is narrowed down to a couple of front runners. You send them your historical sales data and challenge them to a “bakeoff” – whoever produces the most “accurate” weekly forecast over a few cycles wins (or at least significantly improves their odds of winning).

And what do you learn from this process? You learn how good a bunch of nerds working for the pre-sales team are at fine tuning the inner workings of their system to produce the desired result they’re looking for – a new customer for their solution. How many person hours did they spend trying to win the sale? What exactly did they do to the models? Is any of what they did even remotely close to what a real demand planner can (and should) do on a daily basis to manage a large number of forecasts? You probably won’t learn any of this until the implementation team arrives after you sign on the dotted line.

What you will probably also learn is that each of the front runners produces a more accurate forecast for about 50% of the forecasts – likely with no clear reason as to why one did better than the other for a particular item in a particular location in a particular week. After rolling up all of the results, you find that one software provider’s accuracy is 0.89% higher overall than the other for the sample set used.

That’s when someone creates a fancy spreadsheet to “prove” that this extra 0.89% of “accuracy” actually equates to millions of dollars of additional benefits when you multiply it through all items at all locations and do a 10 year net present value on it. It’s all complete nonsense of course, but because it’s based on a tiny kernel of “truth” from the evaluation, it’s given outsized weight.

Fast forward to 3 years later. All of the real business challenges rear their ugly heads during the implementation and are solved with some compromises. Actual demand planners can’t seem to get the same “accuracy” results that were touted in the bakeoff. They don’t really understand all of the inner workings and don’t have the time that the pre-sales team had to fine tune everything in the same way. All of the press releases say that they now use Software X for demand planning, but in reality, most of the real work is being done in Excel spreadsheets, which the demand planners actually know how to use.

Now what if you asked demand planners directly what they actually need from a forecasting system? It’s really only 2 things: Comprehension and control.

Comprehension

When a demand planner is reviewing a system calculated forecast, they want to be able to say one thing: “Given the same inputs as the model, I would have come up with the same forecast on my own.”

That doesn’t mean that they agree with the forecast, it just means that they understand what the model was “thinking” to come up with the result. They don’t need code level understanding of the algorithms in order to do this, just knowledge of how the model interprets data and how it can be influenced.

Before they move a dial or switch to alter the model, they want to be able to reliably predict the outcome of their actions.

Control

So long as the behaviour of the model can be understood, a demand planner will want to work with it to get the output they want, rather than just give up and work against it with manual overrides they calculated in Excel.

Knowing what the model did and why it did it is important, but demand planners also need to know how to affect changes in the model to make it behave differently, but also predictably so that the system will produce forecasts that they agree with and for which they are willing to be held accountable.

Accuracy is a rearview mirror measure. Demand planners need to be able to live in the future, not the past. In order to support them, a forecasting system needs to be both understandable and directly controllable so that they can fully accept accountability for the outcome.

Probabilistic Forecasting – One Man’s (Somewhat Informed) Opinion

A reasonable probability is the only certainty. – E. W. Howe

My, how forecasting methods for supply chain planning have evolved over time:

  • Naive, flat line forecasts (e.g. moving averages) were once used to estimate demand for triggering orders.
  • Time series decomposition type mathematical models added more intelligence around detecting trends and seasonality to enable better long term forecasting.
  • Causal forecasting models allowed different time series to influence each other (e.g. the effect of future planned price changes on forecasted volumes)

All of these methods are deterministic, meaning that their output is a single value representing the “most likely outcome” for each future time period. Ironically, the “most likely outcome” almost never actually materializes.

This brings us to probabilistic forecasting. In addition to calculating a mean (or median) value for each future time period (can be interpreted as the most likely outcome), probabilistic methods also calculate a distinct confidence interval for each individual future forecast period. In essence, instead of having an individual point for each time period into the future, you instead have a cloud of “good forecasts” for various types of scenario modeling and decision making.

But how do you apply this in supply chain management where all of the physical activities driven by the forecast are discrete and deterministic? You can’t submit a purchase order line to a supplier that reads “there’s a 95% chance we’ll need 1 case, a 66% chance we’ll need 2 cases and a 33% chance we’ll need 3 cases”. They need to know exactly how many cases they need to pick, full stop.

The probabilistic forecasting approach can address many “self evident truths” about forecasting that have plagued supply chain planners for decades by better informing the discrete decisions in the supply chain:

  • That not only is demand variable, but variability in demand is also variable over time. Think about a product that is seasonal or highly promotional in nature. The amount of safety stock you need to cover demand variability for a garden hose is far greater in the summer than it is in the winter. By knowing how not just demand but demand variability changes over time, you can properly set discrete safety stock levels at different times of the season. 
  • That uncertainty is inherent in every prediction. Measuring forecasts using the standard “every forecast is wrong, but by how much” method provides little useful information and causes us to chase ghosts. By incorporating a calculated expectation of uncertainty into forecast measurements, we can instead make meaningful determinations about whether or not a “miss” calculated by traditional means was within an expected range and not really a miss at all. The definition of accuracy changes from an arbitrary percentage to a clear judgment call, forecast by forecast, because the inherent and unavoidable uncertainty is treated as part of the signal (which it actually is), allowing us to focus on the true noise.
  • That rollups of granular unit forecasts by item/location to higher levels for capacity and financial planning can be misleading and costly. The ability to also roll up the specific uncertainty by item/location/day allows management to make much more informed decisions about risk before committing resources and capital.

Now here’s the “somewhat informed” part. In order to gain widespread adoption, proponents of probabilistic methods really do need to help us old dogs learn their new tricks. It’s my experience that demand planners can be highly effective without knowing every single rule and formula driving their forecast outputs. If they use off the shelf software packages, the algorithms are proprietary and they aren’t able to get that far down into the details anyhow.

What’s important is that – when looking at all of the information available to the model – a demand planner can look at the output and understand what it was “thinking”, even if they may disagree with it. All models make the general assumption that patterns of the past will continue into the future. Knowing that, a demand planner can quickly address cases where that assumption won’t hold true (i.e. they know something about why the future will be different from the past that the model does not) and take action.

As the pool of early adopters of probabilistic methods grows, I’m looking forward to seeing heaps of case studies and real world examples covering a wide range of business scenarios from the perspective of a retail demand planner – without having to go back to school for 6 more years to earn a PhD in statistics. Some of us are just too old for that shit.

I see great promise, but for the time being, I remain only somewhat informed.

Your Sales Plan is NOT a Forecast!

Man is the only animal that laughs and weeps, for he is the only animal that is struck with the difference between what things are and what they ought to be. – William Hazlitt (1778-1830)

A Ferrari has a steering wheel. A fire truck also has a steering wheel.

A Ferrari has a clutch, brake and accelerator. A fire truck also has a clutch, brake and accelerator.

Most Ferraris are red. Most fire trucks are also red.

A new Ferrari costs several hundred thousand dollars. A new fire truck also costs several hundred thousand dollars.

Ergo, Ferrari = Fire Truck.

That was an absurd leap to make, I know, but no more absurd than using the terms “sales plan” and “sales forecast” interchangeably in a retail setting. Yes, they are each intended to represent a consensus view of future sales, but that’s pretty much where the similarity ends. They differ significantly with regard to purpose, level of detail and frequency of update.

Purpose

The purpose of the sales plan is to set future goals for the business that are grounded in strategy and (hopefully) realism. Its job is to quantify and articulate the “Why” and with a bit of a light touch on the “What” and the “How”. It’s about predicting what we’re trying to make happen.

The purpose of the operational sales forecast is to subjectively predict future customer behaviour based on observed customer demand to date, augmented with information about known upcoming occurrences – such as near term weather events, planned promotions and assortment changes – that may make customers behave differently. It’s all about the “What” and the “How” and its purpose is to foresee what we think is going to happen based on all available information at any one time.

Level of Detail

The sales plan is an aggregate weekly or monthly view of expected sales for a category of goods in dollars. Factored into the plan are category strategies and assumptions (“we’ll promote this category very heavily in the back half” or “we will expand the assortment by 20% to become more dominant”), but usually lacking in the specific details which will be worked out as the year unfolds.

The operational sales forecast is a detailed projection by item/location/week in units, which is how customers actually demand product. It incorporates all of the specific details that flow out of the sales plan whenever they become available.

Frequency of Update

The sales plan is generally drafted once toward the end of a fiscal year so as to get approval for the strategies that will be employed to drive toward the plan for the upcoming year.

The operational sales forecast is updated and rolled forward at least weekly so as to drive the supply chain to respond to what’s expected to happen based on everything that has happened to date up to and including yesterday.

“Reconciling” the Plan and the Forecast

Being more elemental, the operational forecast can be easily converted to dollars and rolled up to the same level at which the sales plan was drafted for easy comparison.

Whenever this is done, it’s not uncommon to see that the rolled up operational forecast does not match the sales plan for any future time period. Nor should it. And based on the differences between them discussed above, how could it?

This should not be panic inducing, rather a call to action:

“According to the sales plan that was drafted months ago, Category X should be booking $10 million in sales over the next 13 weeks.”

“According to the sales forecast that was most recently updated yesterday to include all of the details that are driving customer behaviour for the items in Category X, that ain’t gonna happen.”

Valuable information to have, is it not? Especially since the next 13 weeks are still out there in a future that has yet to transpire.

Clearly assumptions were made when the sales plan was drafted that are not coming to pass. Which assumptions were they and what can we do about them?

While a retailer can’t directly control customer behaviour (wouldn’t that be grand?), they have many weapons in their arsenal to influence it significantly: advertising, pricing, promotions, assortment, cross-selling – the list goes on.

The predicted gap between the plan and the forecast drives tactical action to close the gap:

Maybe it turns out that the tactics you employ will not close the gap completely. Maybe you’re okay with it because the category is expected to track ahead later in the year. Maybe another category will pick up the slack, making the overall plan whole. Or maybe you still don’t like what you’re seeing and need to sharpen your pencil again on your assumptions and tactics.

Good thing your sales plan is separate and distinct from your sales forecast so that you can know about those gaps in advance and actually do something about them.

Your Forecast is Wrong (and That’s Okay)

Just because you made a good plan, doesn’t mean that’s what’s gonna happen. – Taylor Swift

I was 25 years old the first time I met with a financial advisor. I was unmarried, living in a small midtown Toronto apartment and working in my first full time job out of university. 

I can’t say I remember all of the details, but we did go through all of the standard questions:

  • Will I be getting married? Having kids? How many kids?
  • How do I see my career progressing?
  • When might I want to retire?
  • What kind of a lifestyle do I want to have in retirement?

On the basis of that interview, we developed a savings plan and I started executing on it.

The following is an abridged list of events that have happened since that initial plan was created a quarter century ago, only a couple of which were accounted for (vaguely) in my original plan:

  • I left my stable job to pursue a not-so-stable career in consulting
  • I moved from my first apartment to a slightly larger apartment
  • I got married
  • We moved into an even bigger apartment
  • We had a kid
  • We moved into a house
  • We had two more kids
  • I co-authored a book
  • My wife went back to school for her Masters
  • The 2008 financial crisis happened
  • The Canadian government made numerous substantial changes to personal and corporate tax rules and registered savings programs
  • We sold our house and built a new house
  • Numerous cars were bought, many of which died unexpectedly
  • COVID-19 happened

You get the idea. Many of these events (and numerous others not listed) required a re-evaluation of our goals, a change in the plan to achieve those goals or both.

The key takeaway from all of this is obvious: That because the original plan bears no resemblance to what it is today, planning for an unknown and unknowable future is a complete waste of time. 

At this point, you may be feeling a bit bewildered and thinking that this conclusion is – to put it kindly – somewhat misinformed. 

I want you to recall that feeling of bewilderment whenever you hear or read people saying things (in a supply chain context) like “You shouldn’t be forecasting because forecasts are always wrong” or “Forecasting is a waste of time because you can’t predict the future anyhow”.

This viewpoint seems to hinge on the notion that a forecast is not needed if your minimum stock levels are properly calculated. To replenish a location, you just need to wait until the actual stock level is about to breach the minimum stock level and automatically trigger an order. No forecasting required!

Putting aside the fact that properly constructed and maintained forecasts drive far more than just stock replenishment to a location, a bit of trickery was employed to make the argument.

Did you catch it?

It’s the “minimum stock levels are properly calculated” part.

In order for the minimum stock level for an item at a location at any point in time to be “properly calculated”, it would by necessity need to account for (at a minimum):

  • The expected selling rate
  • Expected trends
  • Selling pattern (upcoming peaks and troughs)
  • Planned promotional and event impacts
  • Planned price changes
  • Etc.

Do those elements look at all familiar to you? A forecast by any other name is still a forecast.

The simple fact is that customers don’t like to wait. They’re expecting product to be available to purchase at the moment they make the purchase decision. Unless someone has figured out how to circumvent the laws of time and space, the only way to achieve that is to anticipate customer demand before it happens.

It’s true that any given prediction will be “wrong” to one degree or another as the passage of time unfolds and the correctness of your assumptions about the future are revealed. That’s not just a characteristic of a business forecasting process – it’s a characteristic of life in general. Casting aspersions on forecasting because of that fact is tantamount to casting aspersions upon God Himself.

It’s one thing to recognize that forecasts have error, it’s quite another to argue that because forecasts have error, the forecasting process itself has no value.

Forecasting is not about trying to make every forecast exactly match every actual. Rather it’s a voyage of discovery about your assumptions and continuously changing course as you learn.

Changing the game

In 1972, for my 10th birthday, my Mom would buy me a wooden chess set and a chess book to teach me the basics of the game.  Shortly after, I’d become hooked and the timing was perfect as it coincided with Bobby Fischer’s ascendency in September 1972 to chess immortality – becoming the 11th World Champion.

As a chess aficionado, I was recently intrigued by a new and different chess book, Game Changer, by International Grandmaster Matthew Sadler and International Master Natasha Regan.

The book chronicles the evolution and rise of computer chess super-grandmaster AlphaZero – a completely new chess algorithm developed by British artificial intelligence (AI) company DeepMind.

Until the emergence of AlphaZero, the king of chess algorithms was Stockfish.  Stockfish was architected by providing the engine the entire library of recorded grandmaster games, along with the entire library of chess openings, middle game tactics and endgames.  It would rely on this incredible database of chess knowledge and it’s monstrous computational abilities.

And, the approach worked.  Stockfish was the king of chess machines and its official chess rating of around 3200 is higher than any human in history.  In short, a match between current World Champion Magnus Carlsen and Stockfish would see the machine win every time.

Enter AlphaZero.  What’s intriguing and instructive about AlphaZero is that the developers took a completely different approach to enabling its chess knowledge.  The approach would use machine learning.

Rather than try to provide the sum total of chess knowledge to the engine, all that was provided were the rules of the game.

AlphaZero would be architected by learning from examples, rather than drawing on pre-specified human expert knowledge.  The basic approach is that the machine learning algorithm analyzes a position and determines move probabilities for each possible move to assess the strongest move.

And where did it get examples from which to learn?  By playing itself, repeatedly. Over the course of 9 hours, AlphaZero played 44 million games against itself – during which it continuously learned and adjusted the parameters of its machine learning neural network.

In 2017 AlphaZero would play a 100 game match against Stockfish and the match would result in a comprehensive victory for AlphaZero.  Imagine, a chess algorithm, architected based on a probabilistic machine learning approach would teach itself how to play and then smash the then algorithmic world champion!

What was even more impressive to the plethora of interested grandmasters was the manner in which AlphaZero played.  It played like a human, like the great attacking players of all time – a more precise version of Tal, Kasparov, and Spassky, complete with pawn and piece sacrifices to gain the initiative.

The AlphaZero story is very instructive for us supply chain planners and retail Flowcasters in particular.

As loyal disciples know, retail Flowcasting requires the calculation of millions of item/store forecasts – a staggering number.  Not surprisingly, people cannot manage that number of forecasts and even attempting to manage by exception is proving to have its limits.

What’s emerging, and is consistent with the AlphaZero story and learning, is that algorithms (either machine learning or a unified model approach) can shoulder the burden of grinding through and developing item/store specific baseline forecasts of sales, with little to no human touch required.

If you think about it, it’s not as far-fetched as you might think.  It will facilitate a game changing paradigm shift in demand planning.

First, it will relieve the burden of demand planners from learning and understanding different algorithms and approaches for developing a reasonable baseline forecast. Keep in mind that I said a reasonable forecast.  When we work with retailers helping them design and implement Flowcasting most folks are shocked that we don’t worship at the feet of forecast accuracy – at least not in the traditional sense.

In retail, with so many slow selling items, chasing traditional forecast accuracy is a bit of a fool’s game.  What’s more important is to ensure the forecast is sensible and assess it on some sort of a sliding scale.  To wit, if you usually sell between 20-24 units a year for an item at a store with a store-specific selling pattern, then a reasonable forecast and selling pattern would be in that range.

Slow selling items (indeed, perhaps all items) should be forecasted almost like a probability…for example, you’re fairly confident that 2 units will sell this month, you’re just not sure when.  That’s why, counter-intuitively, daily re-planning is more important than forecast accuracy to sustain exceptionally high levels of in-stock…whew, there, I said it!

What an approach like this means is that planners will no longer be dilly-dallying around tuning models and learning intricacies of various forecasting approaches.  Let the machine do it and review/work with the output.

Of course, sometimes, demand planners will need to add judgment to the forecast in certain situations – where the future will be different and this information and resulting impacts would be unknowable to the algorithm.  Situations where planners have unique market insights – be it national or local.

Second, and more importantly, it will allow demand planners to shift their role/work from analytic to strategic – spending considerably more time on working to pick the “winners” and developing strategies and tactics to drive sales, customer loyalty and engagement.

In reality, spending more time shaping the demand, rather than forecasting it.

And that, in my opinion, will be a game changing shift in thinking, working and performance.

Is the juice worth the squeeze?

Squeezing-Oranges

A little over 10 years ago I was on a project to help one of Canada’s largest grocery and general merchandise retailers design and implement new planning processes and technology. My role was the co-lead of the Integrated Planning, Forecasting & Replenishment Team and, shockingly, we ended up with a Flowcasting-like design.

The company was engaged in a massive supply chain transformation and the planning component was only one piece of the puzzle. As a result of this, one of the world’s preeminent consulting firms, Accenture, was retained to help oversee and guide the entire program.

One of the partners leading the transformation was a chap named Gary. Gary was a sports lover, a really decent person, great communicator and good listener. He also had a number of “southern sayings” – nuggets of wisdom gleaned from growing up in the southern United States.

One of his saying’s that’s always stuck with me is his question, “is the juice worth the squeeze?”, alluding to the fact that sometimes the result is not worth the effort.

I can remember the exact situation when this comment first surfaced. We were trying to help him understand that even for slow and very slow selling items, creating a long term forecast by item/store was not only worth the squeeze, but also critical. As loyal and devoted Flowcasting disciples know this is needed for planning completeness and to be able to provide a valid simulation of reality and work to a single set of numbers – two fundamental principles of Flowcasting.

The good news was that our colleague did eventually listen to us and understood that the squeeze was not too onerous and today, this client is planning and using Flowcasting – for all items, regardless of sales velocity.

But Gary’s question is an instructive one and one that I’ve been pondering quite a bit recently, particularly with respect to demand planning. Let me explain.

The progress that’s been made by leading technology vendors in forecasting by item/store has been impressive. The leading solutions essentially utilize a unified model/approach (sometimes based on AI/ML, and in other cases not), essentially allowing demand planners to largely take their hands off the wheel in terms of generating a baseline forecast.

The implications of this are significant as it allows the work of demand planning to be more focused and value added – that is, instead of learning and tuning forecasting models, they are working with Merchants and Leaders to develop and implement programs and strategies to drive sales and customer loyalty.

But, I think, perhaps we might be reaching the point where we’re too consumed with trying to squeeze the same orange.

My point is how much better, or more accurate, can you make an item/store forecast when most retailers’ assortments have 60%+ items selling less than 26 units per year, by item/store? It’s a diminishing return for sure.

Delivering exceptional levels of daily in-stock and inventory performance is not solely governed by the forecast. Integrating and seamlessly connecting the supply chain from the item/store forecast to factory is, at this stage, I believe, even more crucial.

Of course, I’m talking about the seamless integration of arrival-based, time-phased, planned shipments from consumption to supply, and updated daily (or even in real time if needed) based on the latest sales and inventory information. This allows all partners in the supply chain to work to a single set of numbers and provides the foundation to make meaningful and impactful improvements in lead times and ordering parameters that impede product flow.

The leading solutions and enabling processes need to produce a decent and reasonable forecast, but that’s not what’s going to make a difference, in my opinion. The big difference, now, will be in planning flexibility and agility – for example, how early and easily supply issues can be surfaced and resolved and/or demand re-mapped to supply.

You and your team can work hard on trying to squeeze an extra 1-3% in terms of forecast accuracy. You could also work to ensure planning flexibility and agility. Or you could work hard on both.

It’s a bit like trying to get great orange juice. To get the best juice, you need to squeeze the right oranges.

Which ones are you squeezing?

Keep Calm And Blame It On The Lag

 

A good forecaster is no smarter than everyone else, he merely has his ignorance better organized. – Anonymous

stopwatch

I’ve written on the topic of forecast performance measurement from many different angles, particularly in the context of forecasting sales at the point of consumption in retail.

Over the years, I’ve opined that:

  • Forecast accuracy (in the traditional sense) is a useless measure
  • Reasonableness is more important than accuracy, given that forecasts are, by their nature, forgiving planning elements
  • The outsized importance placed on forecast accuracy in supply chain planning is a myth
  • Accuracy and precision must be considered simultaneously
  • Forecasts should be judged against what is a reasonable expectation for accuracy
  • Forecasting at higher levels of aggregation to achieve higher levels of “accuracy” is a waste of time

After going back and re-reading all of that stuff, they are all really just different angles and approaches for delivering the message “popular methods of comparing forecasts and actuals may not be as useful as you think, especially in a retail context”.

But in all of this time there is one key aspect of forecast measurement that I have not addressed: forecast lags. In other words, which forecast (or forecasts) should you be comparing to the actual?

Assuming, for example, that you have a rolling 52 week forecasting process where forecasts and actuals are in weekly buckets, then for any given week, you would have 52 choices of forecasts to compare to a single actual. So which one(s) do you choose?

Let’s get the easy one out of the way first. Considering that the forecast is being used to drive the supply chain, the conventional wisdom is that the most important lag to capture for measurement  is the order lead time, when a firm commitment to purchase must be made based on the forecast. For example, if the lead time is 4 weeks, you’d capture the forecast for 4 weeks from now and measure its accuracy when the actual is posted 4 weeks later.

Nope. To all of that.

While it’s true that measuring the cumulative forecast error over the lead time can be useful for determining safety stock levels, it’s not very useful for measuring the performance of the forecasting process itself, for a couple of reasons:

  1. It is a flagrant violation of demand planning principle. Nothing on the supply side of the equation (inventory levels, lead times, pack rounding, purchasing constraints, etc.) has anything to do with true demand. Customers want the products they want, where they want them and when they want them at a price they’re willing to pay, period. The amount of time it happens to take to get from the point of origin to a customer accessible location is completely immaterial to the customer.
  2. A demand planner’s job is to manage the entire continuum of forecasts over the forecast horizon. If they know about something that will affect demand at any point (or at all points) over the next 52 weeks, the forecasts should be amended accordingly.

Suppose that you’re a demand planner who manages the following item/location. The black line is 3 years’ worth of demand history and a weekly baseline forecast is calculated for the next 52 weeks.


Because you’re a very good demand planner who keeps tabs on the drivers of demand for this product, you know that:

  • The warm weather that drives the demand pattern for this item/location has arrived early and it looks like it’s going to stay that way between now and when the season was originally expected to start.
  • There are 2 one week price promotions coming up that have just been signed off and all of the pertinent details (particularly timing and discount) are known.
  • For the last 3 years, there have been 3 similar products to this one being offered at this location. A decision has just been made to broaden the assortment with 2 additional similar products half way through the selling season.

On that basis, I have 2 questions:

  1. How does the baseline forecast need to change in order to incorporate this new information?
  2. How would your answer to question 1 change if you also knew that the order-to-delivery lead time for this item/location was 1 week? 2 weeks? 12 weeks?

Hint: Because it was established at the outset that “you’re a very good demand planner who keeps tabs on the drivers of demand for this product”, the answer to question 2 is: “Not at all.”

So if measuring forecast error at the lead time isn’t the right way to go, then what lag(s) should be captured for measurement?

As with all things forecasting related, there is no definitive answer to this question. But as a matter of principle, the lags chosen to measure the performance of a demand planning process should based on when facts become “knowable” that could affect future demand and would prompt a demand planner to “grab the stick” and override a baseline forecast modeled based on historical patterns.

In some cases, upstream processes that create or shape demand can provide very specific input to the forecasting process.

For example, it’s common for retailers to have promotional planning processes with specific milestones, for example:

  • Product selection and price discounts are decided 12 weeks out
  • Final design of media to support the ad is decided 8 weeks out
  • Last minute adds, deletes and switches are finalized 3 weeks out

At each of those milestones, decisions can be made that might impact a demand planner’s expectation of demand for the promotion, so in this case, it would be valuable to store forecasts at lags 3, 8 and 12. Similar milestone schedules generally exist for assortment decisions as well.

In other cases, what’s “knowable” to the demand planner can be subject to judgment. For example, if actuals come in higher than forecast for 3 weeks in a row, is that a trend change or a blip? How about 4 weeks in a row?

Lags that are closer in time (say 0 through 4) are often useful in this regard, as they can show error trends forming while they are still fresh.

Unless tied to a demand shaping process with specific milestones as described above, long term lags are virtually useless. Reviewing actuals posted over the weekend and comparing it to a forecast for that week that was created 6 months ago might be an interesting academic exercise, but it’s a complete waste of time otherwise.

The key of measuring is to inform so as to improve the process over the long term.

With the right tools and mindset, today’s “I wish I knew that ahead of time” turns into tomorrow’s knowable information.

Employing the Law of Large Numbers in Bottom-Up Forecasting

 

It is utterly implausible that a mathematical formula should make the future known to us, and those who think it can would once have believed in witchcraft. – Jakob Bernoulli (1655-1705)

forest through the trees

This is a topic I’ve touched on numerous times in the past, but I’ve never really taken the time to tackle the subject comprehensively.

Before diving in, I just want to make clear that I’m going to stay in my lane: the frame of reference for this entire piece is around forecasting sales at the point of consumption in retail.

In that context, here are some truths that I consider to be self evident:

  1. Consumers buy specific items in specific stores at specific times. Therefore, in order to plan the retail supply chain from consumer demand back, forecasts are needed by item by store.
  2. Any retailer has a large enough percentage of intermittent demand streams at item/store level (e.g. fewer than 1 sale per week) that they can’t simply be ignored in the forecasting process.
  3. Any given item can have continuous demand in some locations and intermittent demand in other locations.
  4. “Intermittent” doesn’t mean the same thing as “random”. An intermittent demand stream could very well have a distinct pattern that is not visible to the naked eye (nor to most forecast algorithms that were designed to work with continuous demands).
  5. Because of points 1 to 4 above, the Law of Large Numbers needs to be employed to see any patterns that exist in intermittent demand streams.

On this basis, it seems to be a foregone conclusion that the only way to forecast at item/store is by employing a top-down approach (i.e. aggregate sales history to some higher level(s) than item/store so that a pattern emerges, calculate an independent forecast at that level, then push down the results proportionally to the item/stores that participated in the original aggregation of history).

So now the question becomes: How do you pick the right aggregation level for forecasting?

This recent (and conveniently titled) article from Institute of Business Forecasting by Eric Wilson called How Do You Pick the Right Aggregation Level for Forecasting? captures the considerations and drawbacks quite nicely and provides an excellent framework to discuss the problem in a retail context.

A key excerpt from that article is below (I recommend that you read the whole thing – it’s very succinct and captures the essence about how to think about this problem in a short few paragraphs):


When To Go High Or Low?

Despite all the potential attributes, levels of aggregation, and combinations of them, historically the debate has been condensed down to only two options, top down and bottom up.

The top-down approach uses an aggregate of the data at the highest level to develop a summary forecast, which is then allocated to individual items on the basis of their historical relativity to the aggregate. This can be any generated forecast as a ratio of their contribution to the sum of the aggregate or on history which is in essence a naïve forecast.

More aggregated data is inherently less noisy than low-level data because noise cancels itself out in the process of aggregation. But while forecasting only at higher levels may be easier and provides less error, it can degrade forecast quality because patterns in low level data may be lost. High level works best when behavior of low-level items is highly correlated and the relationship between them is stable. Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.

The major challenge is that the required level of aggregation to get meaningful statistical information may not match the precision required by the business. You may also find that the requirements of the business may not need a level of granularity (i.e. Customer for production purposes) but certain customers may behave differently, or input is at the item/customer or lower level. More often than not it is a combination of these and you need multiple levels of aggregation and multiple levels of inputs along with varying degrees of noise and signals.


These are the two most important points:

  • “High level works best when behavior of low-level items is highly correlated and the relationship between them is stable.”
  • “Low level tends to work best when behavior of the data series is very different from each other (i.e. independent) and the method you use is good at picking up these patterns.”

Now, here’s the conundrum in retail:

  • The behaviour of low level items is very often NOT highly correlated, making forecasting at higher levels a dubious proposition.
  • Most popular forecasting methods only work well with continuous demand history data, which can often be scarce at item/store level (i.e. they’re not “good at picking up these patterns”).

My understanding of this issue was firmly cemented about 19 years ago when I was involved in a supply chain planning simulation for beer sales at 8 convenience stores in the greater Montreal area. During that exercise, we discovered that 7 of those 8 stores had a sales pattern that one would expect for beer consumption in Canada (repeated over 2 full years): strong sales during the summer months, lower sales in the cooler months and a spike around the holidays. The actual data is long gone, but for those 7 stores, it looked something like this:

The 8th store had a somewhat different pattern.

And by “somewhat different”, I mean exactly the opposite:

Remember, these stores were all located within about 30 kilometres of each other, so they all experienced generally the same weather and temperature at the same time. We fretted over this problem for awhile, thinking that it might be an issue with the data. We even went so far as to call the owner of the 8 store chain to ask him what might be going on.

In an exasperated tone that is typical of many French Canadians, he impatiently told us that of course that particular store has slower beer sales in the summer… because it is located in the middle of 3 downtown university campuses: fewer students in the summer months = a decrease in sales for beer during that time for that particular store.

If we had visited every one of those 8 stores before we started the analysis (we didn’t), we may have indeed noticed the proximity of university campuses to one particular store. Would we have pieced together the cause/effect relationship to beer sales? My guess is probably not. Yet the whole story was right there in the sales data itself, as plain as the nose on your face.

We happened upon this quirk after studying a couple dozen SKUs across 8 locations. A decent sized retailer can sell tens of thousands of SKUs across hundreds or thousands of locations. With millions of item/store combinations, how many other quirky criteria like that could be lurking beneath the surface and driving the sales pattern for any particular item at any particular location?

My primary conclusion from that exercise was that aggregating sales across store locations is definitely NOT a good idea.

So in terms of figuring out the right level of aggregation, that just leaves us with the item dimension – stay at store level, but aggregate across categories of similar items. But in order for this to be a good option for the top level, we now have another problem: “behavior of low-level items is highly correlated and the relationship between them is stable“.

That second part becomes a real issue when it comes to trying to aggregate across items. Retailers live every day on the front line of changing consumer sentiment and behaviour. As a consequence of that, it is very uncommon to see a stable assortment of items in every store year in and year out.

Let’s say that a category currently has 10 similar items in it. After an assortment review, it’s decided that 2 of those items will be leaving the category and 4 new products will be introduced into the category. This change is planned to be executed in 3 months’ time. This is a very simple variation of a common scenario in retail.

Now think about what that means with regard to managing the aggregated sales history for the top level (category/store):

  • The item/store sales history currently includes 2 items that will be leaving the assortment. But you can’t simply exclude those 2 items from the history aggregation, because this would understate the category/store forecast for the next 3 months, during which time those 2 items will still be selling.
  • The item/store level sales history currently does not include the 4 new items that will be entering the assortment. But you can’t simply add surrogate history for the 4 new items into the aggregation, because this would overstate the category/store forecast for next 3 months before those items are officially launched.

In this scenario, how would one go about setting up the category/store forecast in such a way that:

  1. It accounts for the specific items participating in the aggregation at different future times (before, during and after the anticipated assortment change)?
  2. The category/store forecast is being pushed down to the correct items at different future times (before, during and after the anticipated assortment change)?

And this is a fairly simple example. What if the assortment changes above are being rolled out to different stores at different times (e.g. a test market launch followed by a staged rollout)? What if not every store is carrying the full 10 SKU assortment today? What if not every store will be carrying the full 12 SKU assortment in the future?

The complexity of trying to deal with this in a top-down structure can be nauseating.

So it seems that we find ourselves in a bit of a pickle here:

  1. The top-down approach is unworkable in retail because the behaviour between locations for the same item are not correlated (beer in Montreal stores) and the relationships among items for the same location are not stable (constantly changing assortments).
  2. In order for the bottom-up approach to work, there needs to be some way of finding patterns in intermittent data. It’s a self-evident truth that the only way to do this is by aggregating.

So the Law of Large Numbers is still needed to solve this problem, but in a retail setting, there is no “right level” of aggregation above item/store at which to develop reliable independent top level forecasts that are also manageable.

Maybe we haven’t been thinking about this problem in the right way.

This is where Darryl Landvater comes in. He’s a long time colleague and mentor of mine best known as a “manufacturing guy” (he’s the author of World Class Production and Inventory Management, as well as co-author of The MRP II Standard System), but in reality he’s actually a “planning guy”.

A number of years ago, Darryl recognized the inherent flaws with using a top-down approach to apply patterns to intermittent demand streams and broke the problem down into two discrete parts:

  1. What is the height of the curve (i.e. rate of sale)?
  2. What is the shape of the curve (i.e. selling profile)?

His contention was that it’s not necessary to use aggregation to calculate completely independent sales forecasts (i.e. height + shape) to achieve this. Instead, what’s needed is to aggregate to calculate selling profiles to be used in cases where the discrete demand history for an item at a store is insufficient to determine one. We’re still using the Law of Large Numbers, but only to solve for the specific problem inherent in slow selling demands – finding the shape of the curve.

It’s called Profile Based Forecasting and here’s a very simplified explanation of how it works:

  1. Calculate an annual forecast quantity for each independent item/store based on sales history from the last 52+ weeks (at least 104 weeks of rolling history is ideal). For example, if an item in a store sold 25 units 2 years ago and 30 units over the most current 52 weeks, then the total forecast for the upcoming 52 weeks might be around 36 units with a calculated trend applied.
  2. Spread the annual forecast into individual time periods as follows:
    • If the item/store has a sufficiently high rate of sale that a pattern can be discerned from its own unique sales history (for example, at least 70 units per year), then calculate the selling pattern from only that history and multiply it through the item/store’s selling rate.
    • If the item/store’s rate of sale is below the “fast enough to use its own history” threshold, then calculate a sales pattern using a category of similar items at the same store and multiply those percentages through the independently calculated item/store annual forecast.

There is far more to it than that, but the separation of “height of the curve” from “shape of the curve” as described above is the critical design element that forms the foundation of the approach.

Think about what that means:

  1. If an item/store’s rate of sale is sufficient to calculate its own independent sales profile at that level, then it will do so.
  2. If the rate of sale is too low to discern a pattern, then the shape being applied to the independent item/store’s rate of sale is derived by looking at similar items in the category within the same store. Because the profiles are calculated from similar products and only represent the weekly percentages through which to multiply the independent rate of sale, they don’t need to be recalculated very often and are generally immune to the “ins and outs” of specific products in the category. It’s just a shape, remember.
  3. All forecasting is purely bottom-up. Every item at every store can have its own independent forecast with a realistic selling pattern and there are no forecasts to be calculated or managed above the item/store level.
  4. The same forecast method can be used for every item at every store. The only difference between fast and slow selling items is how the selling profile is determined. As the selling rate trends up or down over time, the appropriate selling profile will be automatically applied based on a comparison to the threshold. This makes the approach very “low touch” – demand planners can easily oversee several hundred thousand item/store combinations by managing only exceptions.

With realistic, properly shaped forecasts for every item/store enabled without any aggregate level modelling, it’s now possible to do top-down stuff that makes sense, such as applying promotional lifts or overrides for an item across a group of stores and applying the result proportionally based on each store’s individual height and shape for those specific weeks, rather than using a naive “flat line” method.

Simple. Intuitive. Practical. Consistent. Manageable. Proven.

Noise is expensive

Noise

Did you know that the iHome alarm clock, common in many hotels, shows a small PM when the time is after 12 noon?  You wonder how many people fail to note the tiny ‘pm’ isn’t showing when they set the alarm, and miss their planned wake up.  Seems a little complicated and unnecessary, wouldn’t you agree?

Did you also know that most microwaves also depict AM or PM? If you need the clock in the microwave to tell you whether it’s morning or night, somethings a tad wrong.

More data/information isn’t always better. In fact, in many cases, it’s a costly distraction or even provides the opportunity to get the important stuff wrong.

Contrary to current thinking, data isn’t free.

Unnecessary data is actually expensive.

If you’re like me, then your life is being subjected to lots of data and noise…unneeded and unwanted information that just confuses and adds complication.

Just think about shopping now for a moment.  In a recent and instructive study sponsored by Oracle (see below), the disconnect between noise and what consumers really want is startling:

  1. 95% of consumers don’t want to talk or engage with a robot
  2. 86% have no desire for other shiny new technologies like AI or virtual reality
  3. 48% of consumers say that these new technologies will have ZERO impact on whether they visit a store and even worse, only 14% said these things might influence them in their purchasing decisions

From the consumers view what this is telling us, and especially supply chain technology firms, we don’t seem to understand what’s noise and what’s actually relevant. I’d argue we’ve got big time noise issues in supply chain planning, especially when it relates to retail.

I’m talking about forecasting consumer sales at a retail store/webstore or point of consumption.  If you understand retail and analyze actual sales you’ll discover something startling:

  1. 50%+ of product/store sales are less than 20 per year, or about 1 every 2-3 weeks.

Many of the leading supply chain planning companies believe that the answer to forecasting and planning at store level is more data and more variables…in many cases, more noise. You’ll hear many of them proclaim that their solution takes hundreds of variables into account, simultaneously processing hundreds of millions of calculations to arrive at a forecast.  A forecast, apparently, that is cloaked in beauty.

As an example, consider the weather.  According to these companies not only can they forecast the weather, they can also determine the impact the weather forecast has on each store/item forecast.

Now, since you live in the real world with me, here’s a question for you:  How often is the weather forecast (from the weather network that employs weather specialists and very sophisticated weather models) right?  Half the time?  Less?  And that’s just trying to predict the next few days, let alone a long term forecast.  Seems like noise, wouldn’t you agree?

Now, don’t get me wrong.  I’m not saying the weather does not impact sales, especially for specific products.  It does.  What I’m saying is that people claiming to predict it with any degree of accuracy are really just adding noise to the forecast.

Weather.  Facebook posts.  Tweets.  The price of tea in China.  All noise, when trying to forecast sales by product at the retail store.

All this “information” needs to be sourced.  Needs to be processed and interpreted somehow.  And it complicates things for people as it’s difficult to understand how all these variables impact the forecast.

Let’s contrast that with a recent retail implementation of Flowcasting.

Our most recent retail implementation of Flowcasting factors none of these variables into the forecast and resulting plans.  No weather forecasts, social media posts, or sentiment data is factored in at all.

None. Zip. Zilch.  Nada.  Heck, it’s so rudimentary that it doesn’t even use any artificial intelligence – I know, you’re aghast, right?

The secret sauce is an intuitive forecasting solution that produces integer forecasts over varying time periods (monthly, quarterly, semi-annually) and consumes these forecasts against actual sales. So, the forecasts and consumption could be considered like a probability.  Think of it like someone managing a retail store. They can say fairly confidently that “I know this product will sell one this month, I just don’t know what day”!

The solution also includes simple replenishment logic to ensure all dependent plans are sensible and ordering for slow selling products is based on your opinion on how probable you think a sale is likely in the short term (i.e., orders are only triggered for a slow selling item if the probability of making a sale is high).

In addition to the simple, intuitive system capabilities above, the process also employs and utilizes a different kind of intelligence – human.  Planners and category managers, since they are speaking the same language – sales – easily come to consensus for situations like promotions and new product introductions.  Once the system is updated then the solution automatically translates and communicates the impact of these events for all partners.

So, what are the results of using such a simple, intuitive process and solution?

The process is delivering world class results in terms of in-stock, inventory performance and costs.  Better results, from what I can tell, than what’s being promoted today by the more sophisticated solutions.  And, importantly, enormously simpler, for obscenely less cost.

Noise is expensive.

The secret for delivering world class performance (supply chain or otherwise) is deceptively simple…

Strip away the noise.