Intraday-shelf-replenishment-decision-supp_2021_International-Journal-of-Pro by Jinghua

Int. J. Production Economics 231 (2021) 107828

Contents lists available at ScienceDirect

International Journal of Production Economics journal homepage: www.elsevier.com/locate/ijpe

Intraday shelf replenishment decision support for perishable goods Jakob Huber ∗, Heiner Stuckenschmidt Data and Web Science Group, University of Mannheim, B6 26, 68159 Mannheim, Germany

ARTICLE Keywords: Forecasting Scheduling Decision support Intraday demand Retailing Machine learning

INFO

ABSTRACT Retailers that offer perishable items are required to make hundreds of ordering decisions on a daily basis. For certain products, it is even necessary to make intraday decisions in order to increase the freshness of the goods while still serving the demand. We present a use case from the bakery domain where a part of the assortment has to be baked during the day as the delivered goods are not ready for sale. Hence, the operational performance depends on the decisions of the store personnel which can be optimized by a decision support system. Our approach to tackle this problem consists of two distinct phases: First, we forecast the hourly demand for each product. Second, the forecasts are input for a scheduling problem whose solution represents the baking plan that is provided to the store personnel. Based on our empirical evaluation, we conclude that forecasting accuracy has the biggest impact on the operational performance. More enhanced prediction methods noticeably outperform the reference methods. In particular, the machine learning based forecasting model significantly outperforms established time series models. If the computed schedules are executed as suggested, the customers can be served with freshly baked goods.

1. Introduction The freshness of goods has a significant impact on the buying decisions of customers and can be even more important than the price of a product (Ali et al., 2010). Thus, the perceived freshness is the subject of several studies (e.g. Heenan et al. (2009) and Gvili et al. (2017)). Retailers that offer perishable items implement agile supply chains in order to be able to offer the goods as fresh as possible. Consequently, the time between decisions gets shorter, which also increases the volume of decisions. This is particularly true for perishable goods that can only be sold for a limited number of days, which makes frequent replenishment necessary (van Donselaar et al., 2006). For highly perishable goods (e.g. baked goods), the store manager is typically responsible for making the decisions. According to domain experts and practitioners, this has some drawbacks and represents an area for improvement: The decisions are not reliable across all stores as not every store manager has the required experience and skills. Moreover, the manual decision process is time consuming as the number of decisions that have to be made on a day-to-day basis is quite high. However, the advances in large-scale data analysis and the availability of large datasets (Hofmann and Rutschmann, 2018) enable the development of data-driven decision support systems for operational short-term decisions in the retail industry (van Donselaar et al., 2006; Huber et al., 2017). Hence, a retailer can gain a competitive advantage by digitizing the decision process and consequently improving the decisions (van Donselaar et al., 2010; Ehrenthal and Stölzle, 2013).

In this study, we present a real-world application scenario of a German bakery chain that primarily sells highly perishable goods like buns, baguettes, pretzels, and breads. The company operates a central production facility from which the stores are delivered on a daily basis. Each store has to place orders for each product one day in advance and items that are not sold on the day of delivery have to be discarded. Some products are not ready for sale when they arrive at the store and need to be processed during the day, i.e., baked and placed on the shelves. For this purpose, each store is equipped with up to three ovens. Baking goods during the day is necessary as the items have a high rate of deterioration and should be provided as fresh as possible in order to increase the customer satisfaction. Among the determination of the daily order quantity, a challenge is to provide a suitable baking plan that can be executed by the store personnel. A baking plan is a schedule that outlines when the different products have to be baked and consequently placed on the shelves (see Table 1). The baking plan shows which oven has to be used and which baking program has to be started. The amount of items per article that has to be baked is given in the number of baking trays. The number of items per tray is fixed for every article. The capacity of an oven corresponds to the number of baking trays that can be processed at the same time. The overall objective of this study is to provide a solution approach for the computation of a baking plan in a real-world setting by leveraging available empirical data. At first glance, the considered application

∗ Corresponding author. E-mail addresses: jakob@informatik.uni-mannheim.de (J. Huber), heiner@informatik.uni-mannheim.de (H. Stuckenschmidt).

https://doi.org/10.1016/j.ijpe.2020.107828 Received 2 August 2019; Received in revised form 20 April 2020; Accepted 6 June 2020 Available online 9 June 2020 0925-5273/© 2020 Elsevier B.V. All rights reserved.

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt Table 1 An example of a baking plan (schedule). The plan outlines the number of baking trays that have to be placed in a specific oven and baked at a given point in time. Therefore, it is necessary to start the baking program that is suitable for the items on the baking trays. Time

Oven [ID]

Program [ID]

Article

Baking trays [Qty.] 6 2

05:30

11 12

05:30

31 32 33

1 2 1

05:55

…

(2009)). In the context of time series forecasting, ML methods have been compared to various traditional time series models. There is no empirical evidence that ML methods are able to constantly outperform established time series models (Ahmed et al., 2010; Makridakis et al., 2018a,b). For instance, Carbonneau et al. (2008) argue that ML methods are not better than traditional approaches like moving average or linear regression in the context of monthly demand forecasting. Gür Ali et al. (2009) present a study for weekly forecasts of perishable goods having durability of several days at store–product level. However, they conclude that only ML methods benefit from detailed data. Hence, we study the performance of daily and also intra-day demand forecasts using ML to address this research gap. While we are not aware of literature that discusses intraday forecasting for perishable goods in the retail industry, there are other application areas that are concerned with intraday forecasting. A non-exhaustive list of application areas includes energy load forecasting (Hong and Fan, 2016; Marino et al., 2016), load forecasting of cooling systems (Li et al., 2015), water usage forecasting (Quevedo et al., 2014), forecasting call arrivals in call centers (Barrow and Kourentzes, 2018; Ibrahim et al., 2016), or short-term traffic forecasting (Lv et al., 2015; Ma et al., 2015). In the aforementioned studies, ML methods are frequently applied and are at least a viable contender for traditional time series models. Moreover, models based on long short-term memory (LSTM) neural networks (Hochreiter and Schmidhuber, 1997) are frequently used for intraday forecasting in recent studies (e.g. Ma et al. (2015), Ke et al. (2017), Tian et al. (2018) and Qing and Niu (2018)). Hence, we evaluate if the promising performance of ML also carries over to the retail domain. Therefore, we rely on LSTM models as representatives of ML methods in our empirical evaluation. The consideration of a data-driven ML method is reasonable as various studies suggest that the demand is subject to a variety of factors, and ML methods tend to perform well if the data foundation is large (Chu and Zhang, 2003; Arunraj and Ahrens, 2015; Ramanathan and Muyldermans, 2010; Beutel and Minner, 2012; van Donselaar et al., 2016; Kourentzes and Petropoulos, 2016; Huber et al., 2019; Huber and Stuckenschmidt, 2020). Hierarchical forecasting (Gross and Sohl, 1990) can be applied to connect the daily level and the hourly level. The standard approaches are top-down and bottom-up forecasting. However, the results in empirical studies are not conclusive (e.g. Dangerfield and Morris (1992), Zotteri et al. (2005), Widiarta et al. (2009) and Williams and Waller (2011)). In general, top-down forecasting is preferred if the bottom level is very noisy or the demand processes at different levels are comparable such that the information loss is negligible. While most studies are concerned with leveraging the organizational structure, Athanasopoulos et al. (2017) and Kourentzes et al. (2017) discuss the challenges related to temporal hierarchies. They emphasize that the signal-to-noise ratio can be strengthened and that the effect of outliers at lower levels can be reduced. In particular, temporal aggregation allows to reduce intermittency in the time series data (Nikolopoulos et al., 2011; Petropoulos et al., 2016). Reviews on inventory systems for deteriorating goods are provided by Bakker et al. (2012) and Janssen et al. (2016). Hübner and Kuhn (2012) elaborate on in-store logistics planning in the context of shelf space management. van Donselaar et al. (2006) emphasize the importance of developing automated ordering system for perishable goods as their characteristics differ from other product categories. Broekmeulen and van Donselaar (2019) discuss measures to assess the performance of fresh food departments in supermarkets. Recent inventory models acknowledge the costs for in-store logistics (van Zelst et al., 2009; Curşeu et al., 2009; van Donselaar et al., 2010; Taube and Minner, 2018; Mou et al., 2018; Turgut et al., 2018). Turan et al. (2017) show that demand uncertainty of perishable goods can be reduced by conducting transhipments and thereby balancing the inventories among the stores. Hofer et al. (2016) and Teller et al. (2018) report that onshelf availability is caused by poor forecasting, inefficient backroom

scenario seems to be very specific and rather unimportant. However, the main characteristics of bakeries and other retail facilities that rely on intraday baking are very similar. Consequently, the application of our proposed solution approach is not limited to the specific bakery, but actually applicable in several thousand stores. In recent years, all major supermarkets and discounters in Germany introduced a bake-off section in their stores. The revenue of baked goods was around 20 Billion Euro in Germany in 2018. Hence, it is reasonable to study this application and to introduce and discuss solution approaches that improve decision quality and enhance operational performance. Moreover, the sustainability of the supply chain can be improved by decreasing the amount of food waste (Parfitt et al., 2010) as a side effect of more precise decisions. In order to be able to compute a baking schedule that is required to support operational decisions, we need an intraday (e.g. hourly) demand estimation that has to match the daily delivery quantity. Based on the forecasts, we compute a schedule representing the baking plan. In particular, we address the following research questions: • Is a Machine Learning (ML) method suitable for hourly demand forecasting considering the given application scenario? • How can the baking plan generation be formulated as a scheduling problem? • What is the effect of the forecast accuracy on the operational performance? The remainder of this paper is structured as follows: In Section 2, we outline related work concerning forecasting and in-store logistics. Our solution approach for intraday decision support for baked goods is introduced in Section 3. We present and discuss the results of the empirical evaluation in Section 4. We conclude this paper with a brief summary of the most important results and an outlook on future work in Section 5. 2. Related work Newsvendor models are suitable for setting optimal inventory levels for perishable goods as they capture the trade-off of ordering too much and too little (e.g. see Qin et al. (2011)). Various data-driven solution approaches have been proposed and successfully applied in application scenarios that are similar to our study (Beutel and Minner, 2012; Sachs and Minner, 2014; Ban and Rudin, 2018; Huber et al., 2019). A recent study suggests that improved forecasting dominates other potential benefits of data-driven solution methods with respect to setting ordering quantities of perishable goods (Huber et al., 2019). In this study, we investigate if this result also applies to the operational performance of intraday baking. Thus, we implicitly rely on a newsvendor model as we evaluate the effect for different target service levels. Fildes et al. (2019) offer the most recent literature survey on retail forecasting. Most existent studies are mostly based on weekly data as daily forecasts are typically not required for most decisions (e.g. Chu and Zhang (2003), Carbonneau et al. (2008) and Gür Ali et al. 2

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Fig. 1. Overview on the different phases of our approach.

Fig. 2. The charts depict the demand distribution of a single product in a store on Sunday at different levels: The left figure illustrates the daily sales distribution. The middle figure shows the hourly sales where each day is represented by its own line. The distribution of the sum of the hourly sales matches the distribution of the daily sales (left). The right figure depicts the relative profiles, i.e., instead of showing the absolute values (center), we provide the percentage of sales in each hour of the daily sales.

• We explicitly perform hourly forecasting in the retail domain and assess the forecasting performance. • We conduct an empirical evaluation of our solution approach. • We evaluate the influence of the prediction model on the operational performance.

operations, and replenishment policies. Reiner et al. (2013) and Teller et al. (2018) also claim that measures taken at store level are highly effective and have an immediate impact. Different characteristics of the scheduling problem that reflect a baking plan have been extensively discussed in the literature. Our objective is to minimize the temporal deviation of baking processes from a target time that depends on the customer demand. This can be modeled by considering earliness and tardiness penalties for the baking processes. Scheduling problems with earliness and/or tardiness penalties have been studied in the context of single machines (e.g. Wan and Yen (2002) and Kedad-Sidhoum and Sourd (2010)), two-machine flow shops (e.g. İşler et al. (2012)), multi-machines (e.g. Zhu and Heady (2000)), permutation flow shop settings (e.g. Schaller and Valente (2013), Hendel and Sourd (2007), Schaller and Valente (2019b,a)), or parallel machines (e.g. Sivrikaya-Şerifoǧlu and Ulusoy (1999), Radhakrishnan and Ventura (2000) and Bilge et al. (2004)). Our application scenario is a parallel machine setting, because the stores operate more than one oven (i.e. machines). However, in contrast to multiprocessor tasks (e.g Wu and Wang (2018)), a job needs to be processed by only one machine, but each machine can process comparable jobs at the same time. Such parallel batch machines with incompatible job families are discussed by Balasubramanian et al. (2004), Mönch et al. (2005, 2006) and Almeder and Mönch (2011). Within the same application domain, Hecker et al. (2013, 2014) study optimization methods to compute the production schedule for baked goods covering all phases from raw materials to the distribution by minimizing either the makespan or the total idle time of machines. While the considered production model is more complex, they do not address the challenges related to intraday baking and demand estimation. With this study, we want to contribute to existing literature in various ways:

3. Methodology We present a solution approach for intraday baking that consists of two distinct phases: forecasting and scheduling (see Fig. 1). The forecasting phase (see Section 3.1) is concerned with providing intraday demand estimations that serve as input for the subsequent scheduling phase (see Section 3.2). The initial forecasts will be transformed to jobs associated with deadlines and costs for earliness and tardiness. A job refers to the placement of a baking tray into an oven. Consequently, the jobs are assigned to machines (i.e. ovens) according to certain requirements. The resulting schedule represents the baking plan that can be executed in the stores. We also provide a more detailed example for the baking plan generation as part of the supplemental materials. 3.1. Forecasting An essential input for the baking plan generation is the intraday demand estimation. The considered application scenario requires to completely bake the daily delivered order quantity on the same day. The goods are ordered on the previous day, which means that we also need to compute the daily demand of each product for the next day. As the sum of the hourly predictions has to match the forecast at the daily level, we exploit the temporal hierarchy. In order to connect both temporal levels, we consider a bottom-up approach and a top-down approach. The bottom-up approach requires the direct computation of the demand forecasts at the hourly level. This

• We propose an intraday decision support system for baked goods that supports intraday baking. 3

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Table 2 Overview of the features used for the machine learning forecasting model.

approach has the advantage that the hourly predictions are directly given and the daily demand can also be easily obtained. However, the data on hourly level is rather noisy, e.g., some products are not sold or demanded in every hour. Consequently, the accuracy of the resulting predictions at the daily level can be negatively affected. Another option is to compute the daily demand directly, which has subsequently to be distributed to the different hours. The top-down distribution is closely connected to the actual operational process. However, in order to obtain the hourly forecasts, we additionally need to forecasts an intraday demand profile. Intraday profiles reflect the percentage of the daily demand that is sold in each hour and can be used to distribute the daily quantity top-down. The advantage of this approach is that the data on the daily level is less noisy than data on the hourly level. Moreover, the intraday demand profiles are more robust to changes in the demand level. Hence, we evaluate forecasts for three different target levels: (1) daily demand, (2) hourly intraday profiles, and (3) hourly demand (see Fig. 2). However, for productive usage of the system, we need either daily forecasts and hourly profile forecasts, or just hourly forecasts. The only requirement is that the sum of the hourly forecasts matches the daily order quantity. For this purpose, we employ a long short-term memory network (LSTM) (Hochreiter and Schmidhuber, 1997), which is a recurrent neural network that processes the input features in sequential order by applying the same network to each step in a sequence. In the context of this work, a sequence (đ?&#x2018;Ľ1 , â&#x20AC;Ś , đ?&#x2018;Ľđ?&#x2018;Ą , â&#x20AC;Ś , đ?&#x2018;Ľđ?&#x2018;&#x2021; ) of đ?&#x2018;&#x203A;-dimensional vectors (đ?&#x2018;Ľđ?&#x2018;Ą â&#x2C6;&#x2C6; Rđ?&#x2018;&#x203A; ) contains information about past demand and additional explanatory data. The output of the LSTM is also a sequence of vectors, but we are only interested in the last output vector of the sequence, which contains the demand forecast for the next day. A LSTM cell maintains a sophisticated memory concept based on input gates đ?&#x2018;&#x2013;đ?&#x2018;Ą , output gates đ?&#x2018;&#x153;đ?&#x2018;Ą , forget gates đ?&#x2018;&#x201C;đ?&#x2018;Ą , and a cell state đ?&#x2018;?đ?&#x2018;Ą that allows tracking dynamic patterns: đ?&#x2018;&#x201C;đ?&#x2018;Ą = đ?&#x153;&#x17D;đ?&#x2018; đ?&#x2018;&#x2013;đ?&#x2018;&#x201D;đ?&#x2018;&#x161;đ?&#x2018;&#x153;đ?&#x2018;&#x2013;đ?&#x2018;&#x2018; (đ?&#x2018;&#x160;đ?&#x2018;&#x201C; đ?&#x2018;Ľđ?&#x2018;Ą + đ?&#x2018;&#x2C6;đ?&#x2018;&#x201C; â&#x201E;&#x17D;đ?&#x2018;Ąâ&#x2C6;&#x2019;1 + đ?&#x2018;?đ?&#x2018;&#x201C; )

(1)

đ?&#x2018;&#x2013;đ?&#x2018;Ą = đ?&#x153;&#x17D;đ?&#x2018; đ?&#x2018;&#x2013;đ?&#x2018;&#x201D;đ?&#x2018;&#x161;đ?&#x2018;&#x153;đ?&#x2018;&#x2013;đ?&#x2018;&#x2018; (đ?&#x2018;&#x160;đ?&#x2018;&#x2013; đ?&#x2018;Ľđ?&#x2018;Ą + đ?&#x2018;&#x2C6;đ?&#x2018;&#x2013; â&#x201E;&#x17D;đ?&#x2018;Ąâ&#x2C6;&#x2019;1 + đ?&#x2018;?đ?&#x2018;&#x2013; )

(2)

đ?&#x2018;&#x153;đ?&#x2018;Ą = đ?&#x153;&#x17D;đ?&#x2018; đ?&#x2018;&#x2013;đ?&#x2018;&#x201D;đ?&#x2018;&#x161;đ?&#x2018;&#x153;đ?&#x2018;&#x2013;đ?&#x2018;&#x2018; (đ?&#x2018;&#x160;đ?&#x2018;&#x153; đ?&#x2018;Ľđ?&#x2018;Ą + đ?&#x2018;&#x2C6;đ?&#x2018;&#x153; â&#x201E;&#x17D;đ?&#x2018;Ąâ&#x2C6;&#x2019;1 + đ?&#x2018;?đ?&#x2018;&#x153; )

(3)

đ?&#x2018;?đ?&#x2018;Ą = đ?&#x2018;&#x201C;đ?&#x2018;Ą â&#x2014;Ś đ?&#x2018;?đ?&#x2018;Ąâ&#x2C6;&#x2019;1 + đ?&#x2018;&#x2013;đ?&#x2018;Ą â&#x2014;Ś đ?&#x153;&#x17D;đ?&#x2018;Ąđ?&#x2018;&#x17D;đ?&#x2018;&#x203A;â&#x201E;&#x17D; (đ?&#x2018;&#x160;đ?&#x2018;? đ?&#x2018;Ľđ?&#x2018;Ą + đ?&#x2018;&#x2C6;đ?&#x2018;? â&#x201E;&#x17D;đ?&#x2018;Ąâ&#x2C6;&#x2019;1 + đ?&#x2018;?đ?&#x2018;? )

(4)

â&#x201E;&#x17D;đ?&#x2018;Ą = đ?&#x2018;&#x153;đ?&#x2018;Ą â&#x2014;Ś đ?&#x153;&#x17D;đ?&#x2018;Ąđ?&#x2018;&#x17D;đ?&#x2018;&#x203A;â&#x201E;&#x17D; (đ?&#x2018;?đ?&#x2018;Ą )

(5)

Data Source

Features

Master data

Store class, product category, opening times (day, hours/duration) Target variable: lagged sales, rolling median of sales, binary promotional information Day of year, month, day of month, weekday, public holiday, day type, bridge day, nonworking day, indicators for each special day, school holidays Temperature (minimum, mean, maximum) and cloud cover of target day General location (city, suburb, town); in proximity to the store: shops (bakeries, butcher, grocery, kiosk, fast-food, car repair), amenities (worship, medical doctors, hospitals), leisure (playground, sport facility, park), education (kindergarten, school, university)

Transactional data External: Calendar

External: Weather External: Location

also consists of external data like calendric information and weather. Moreover, some feature data is not specific per time series like the characterization of the location. Such feature data is useful as we train global models that are able to forecast any time series of the dataset. Hence, the model is able to learn patterns across products and stores. An additional advantage of this approach is that we have more training data and need to train fewer models. Hourly forecasts are not sufficient for the generation of a baking plan as the planning granularity of the baking plan is lower than one hour. In this study, the planning granularity of the baking is five minutes. However, we only compute hourly forecasts as the data is already quite noisy and difficult to forecast at the hourly level. Moreover, we only have access to aggregated hourly point-of-sales data. In order to obtain a demand estimation for shorter intervals, we linearly distribute the hourly forecasts. Adding safety stock. A point forecast that is typically provided by an unbiased prediction method leads to a service level of 50%, i.e., the demand is as frequently underestimated as overestimated. However, higher service levels are often desired in the retail industry, which can be achieved by adding safety stock or by directly predicting the respective quantile (see Section 2). We rely on sample average approximation (Kleywegt et al., 2002; Shapiro, 2003) to obtain the order quantities đ?&#x2018;&#x17E;Ě&#x201A; for higher service levels. Hence, the order quantity is computed by adding the quantile that reflects the target service level đ?&#x2018;&#x2020;đ??ż from the empirical distribution of past forecast errors đ?&#x153;&#x2013;đ?&#x2018;&#x2013; to the point forecast đ?&#x2018;Ś: Ě&#x201A; { } đ?&#x2018;&#x203A; 1â&#x2C6;&#x2018; đ?&#x2018;&#x17E;Ě&#x201A; = đ?&#x2018;ŚĚ&#x201A; + inf đ?&#x2018;? â&#x2C6;ś I(đ?&#x153;&#x2013;đ?&#x2018;&#x2013; â&#x2030;¤ đ?&#x2018;?) â&#x2030;Ľ đ?&#x2018;&#x2020;đ??ż . (6) đ?&#x2018;&#x203A; đ?&#x2018;&#x2013;=1

The operator â&#x2014;Ś is the Hadamard product, i.e., element-wise multiplication of matrices and vectors having the same dimension. The parameters of a LSTM unit are đ?&#x2018;&#x160; , đ?&#x2018;&#x2C6; , and đ?&#x2018;?. A LSTM cell processes the current feature vector đ?&#x2018;Ľđ?&#x2018;Ą and combines it with information that is already extracted from previous vectors of the sequence and is encoded in the cell state đ?&#x2018;?đ?&#x2018;Ąâ&#x2C6;&#x2019;1 . New information is consequently added to the cell state, while the network learns during training how to extract relevant information. The output of a LSTM cell â&#x201E;&#x17D;đ?&#x2018;Ą can be passed through a linear activation function which is suitable for a regression task, i.e., the output after the final vector of the sequence â&#x201E;&#x17D;đ?&#x2018;&#x2021; reflects the prediction. The parameters are trained by applying backpropagation through time in combination with stochastic gradient descent. For this purpose, we rely on a variant called ADAM (Kingma and Ba, 2015), which has shown to be effective as it maintains and adapts learning rates for each parameter. The input features of the LSTM are lagged time series observations (lags: 14, 7, 6, â&#x20AC;Ś, 1) which cover the complete last week and the observation on the same weekday two weeks ago as the demand for baked goods is subject to a strong weekly seasonality. For daily forecasts, the network only forecasts the demand for the next day. At hourly level (demand & profile), the network predicts all hourly values at once, i.e., the output dimension reflects the maximum number of opening hours. Consequently, the lagged observations are also included for each hour. Additionally, we enhance each step of the sequence with explanatory feature data (see Table 2). The feature data is not only derived from the enterprise resource planning system of the company but

3.2. Scheduling The output of the forecasting phase is the expected demand at the hourly level. In the next phase, those forecasts are used to compute the actual recommendation for action with respect to intraday baking. The store personnel need to know when the different items have to be baked and consequently placed on the shelves. Hence, our goal is to provide a baking plan, i.e., a schedule that supports the decisions. A baking plan considers all relevant articles and needs to be provided per store and day. We formulate a mixed-integer program for this problem that is closely connected to the actual process in the store. The main task for the personnel is to place the baking trays into the ovens. Hence, we model this problem such that placing a baking tray into an oven is a job that needs to be scheduled. The use of the subsequent terminology is based on Pinedo (2016). The jobs (đ?&#x2018;&#x2014;) can be derived from the forecasts by distributing the expected demand to baking trays. Only items from one product can be placed on a baking tray. The number of items that can be placed on a baking tray is fixed per product. Additionally, each product is assigned 4

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

a job (đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ = đ?&#x2018;&#x2021; â&#x2C6;&#x2019; đ?&#x2018;&#x;đ?&#x2018;&#x2014; + 1) as a started job has to be finished during the

to a baking program that specifies the duration of a job (đ?&#x2018;&#x;đ?&#x2018;&#x2014; ). The runtime of a program has to be provided in number of planning steps (i.e. an integer value), whereby the planning steps have a fixed duration (e.g. 5 min). Products belonging to the same program, e.g., similar buns, can be processed in parallel and build job families (đ?&#x2018;&#x2013;). There is no service time for program changes after successful completion and the time for (un)loading the machine can be disregarded. The deadline đ?&#x2018;&#x2018;đ?&#x2018;&#x2014; of a job is the time at which the first item on a baking tray is expected to be sold minus the duration of the baking program. It is desired to bake the products as close to the deadline as possible because late jobs cause stock-outs while the freshness of the goods is reduced if they are baked earlier than necessary. Thus, we introduce costs for earliness đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; and tardiness đ?&#x2018;Ąđ?&#x2018;&#x2014; for each job. The costs đ?&#x2018;¤đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; and đ?&#x2018;¤đ?&#x2018;Ąđ?&#x2018;&#x2014; reflect the expected average revenue per time instant of the job. We consider symmetric penalties, but also study the effects of increased tardiness costs. In summary, the jobs are associated with the following information: job id, article id, quantity, family id, deadline, duration, earliness costs, tardiness costs. The jobs need to be assigned to the ovens, which are the machines of the scheduling problem. The ovens are only characterized by their capacity, which is the number of baking trays that can be processed at the same time: machine id, capacity. The stores typically operate more than one oven, which means that we have a parallel machine environment. The ovens should be loaded to their full capacity. A job has to be processed by exactly one machine, and a started baking process cannot be interrupted. After completion of a baking process, the items on the baking trays are placed on the shelves and can be purchased by the customers. We formulate the following integer linear program to solve the scheduling problem:

planning horizon. Hence, we define the following decision variables: (V1) (V2) (V3) (V4)

0 â&#x2030;¤ đ?&#x2018; đ?&#x2018;&#x2014; â&#x2030;¤ đ?&#x2018;&#x2021; 0 â&#x2030;¤ đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; â&#x2030;¤ đ?&#x2018;&#x2021; 0 â&#x2030;¤ đ?&#x2018;Ąđ?&#x2018;&#x2014; â&#x2030;¤ đ?&#x2018;&#x2021; đ?&#x2018;Ľđ?&#x2018; đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; {0, 1}

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x20AC;đ?&#x2018;&#x2014;

(V5) (V6) (V7) (V8) (V9) (V10)

đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; {0, 1} đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą â&#x2C6;&#x2C6; {0, 1} đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2C6; {0, 1} đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; {0, 1} đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2C6; {0, 1} đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2C6; {0, 1}

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ] â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ] â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021; ] â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž] â&#x2C6;&#x20AC;đ?&#x2018;&#x2013; â&#x2C6;&#x2C6; [1, đ??ź], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2013;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ] â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [2, đ?&#x2018;&#x2021; ]

minimize

đ??˝ â&#x2C6;&#x2018;

(đ?&#x2018;¤đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; + đ?&#x2018;¤đ?&#x2018;Ąđ?&#x2018;&#x2014; đ?&#x2018;Ąđ?&#x2018;&#x2014; ) +

đ?&#x2018;&#x2014;=1

đ??˝ â&#x2C6;&#x2018; (100 â&#x2039;&#x2026; đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; â&#x2039;&#x2026; đ?&#x2018;&#x2021; ) đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014; đ?&#x2018;&#x2014;=1

The objective function is subject to a set of constraints (C1â&#x20AC;&#x201C;C22). The first group of constraints comprises cardinality and count constraints: (C1a)

â&#x2C6;&#x2018;đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ

(C1b)

đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014;

(C1c)

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ]

đ?&#x2018;Ľđ?&#x2018; đ?&#x2018;&#x2014; + đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014;

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ]

(C2)

â&#x2C6;&#x2018;đ??ž

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ]

(C3)

â&#x2C6;&#x2018;đ??ž

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021; ]

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ]

â&#x2030;¤ đ?&#x2018;?đ?&#x2018;&#x2DC;

â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021; ]

(C4)

â&#x20AC;˘ time đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021; ]; đ?&#x2018;&#x2021; also reflects the planning horizon â&#x20AC;˘ machine đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž]

(C5)

đ?&#x2018;Ą=1

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą

â&#x2C6;&#x2018;đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ đ?&#x2018;Ą=1

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą

đ?&#x2018; đ?&#x2018;&#x2DC;=1 đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; â&#x2C6;&#x2019; đ?&#x2018;Ľđ?&#x2018;&#x2014;

đ?&#x2018;&#x2DC;=1 đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2018;đ??ž â&#x2C6;&#x2018;đ?&#x2018;&#x2021;

đ?&#x2018; đ?&#x2018;Ą=1 đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x;đ?&#x2018;&#x2014; đ?&#x2018;Ľđ?&#x2018;&#x2014;

đ?&#x2018;&#x2DC;=1

â&#x2C6;&#x2018;đ??˝ đ?&#x2018;&#x2014;=1

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

Constraint C1a ensures that each job is started at most once and constraints C1b and C1c monitor if the job could be successfully scheduled. In general, a job is processed by at most one machine if the job could be scheduled (C2) and also per time instant (C3). The sum of the activities of a job has to match its processing duration (C4) if a job has been successfully scheduled. Moreover, the capacity of a machine cannot be exceeded (C5). We define the following constraints in order to obtain the starting time đ?&#x2018; đ?&#x2018;&#x2014; of a scheduled job (C7) and to determine if the job is early or late (C6):

â&#x20AC;&#x201C; đ?&#x2018;?đ?&#x2018;&#x2DC; : capacity of machine đ?&#x2018;&#x2DC; â&#x20AC;˘ job đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ] â&#x20AC;&#x201C; đ?&#x2018;&#x2018;đ?&#x2018;&#x2014; : deadline of job đ?&#x2018;&#x2014; â&#x20AC;&#x201C; đ?&#x2018;&#x;đ?&#x2018;&#x2014; : duration of job đ?&#x2018;&#x2014; â&#x20AC;&#x201C; đ?&#x2018;&#x201C;đ?&#x2018;&#x2014; : job family of job đ?&#x2018;&#x2014; â&#x20AC;&#x201C; đ?&#x2018;¤đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; , đ?&#x2018;¤đ?&#x2018;Ąđ?&#x2018;&#x2014; : penalties for earliness, tardiness of job đ?&#x2018;&#x2014;

(C6)

â&#x20AC;˘ job family đ?&#x2018;&#x2013; â&#x2C6;&#x2C6; [1, đ??ź]

(C7) â&#x20AC;&#x201C; đ?&#x2018;&#x;đ?&#x2018;&#x2013; : duration of jobs in job family đ?&#x2018;&#x2013;

đ?&#x2018; đ?&#x2018;&#x2014; â&#x2C6;&#x2019; đ?&#x2018;Ą đ?&#x2018;&#x2014; + đ?&#x2018;&#x2019; đ?&#x2018;&#x2014; â&#x2C6;&#x2018;đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ (đ?&#x2018;Ą đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą ) â&#x2C6;&#x2019; đ?&#x2018; đ?&#x2018;&#x2014; đ?&#x2018;Ą=1

= đ?&#x2018;&#x2018;đ?&#x2018;&#x2014;

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ]

The constraint C7 connects the binary starting variables of the jobs đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą to a numeric value đ?&#x2018; đ?&#x2018;&#x2014; that can be used to calculate the earliness đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; and tardiness đ?&#x2018;Ąđ?&#x2018;&#x2014; (C6) which are used in the objective function of the linear program. The next group of constraints models the machine activity:

Variables: â&#x20AC;˘ đ?&#x2018; đ?&#x2018;&#x2014; : start time, đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; : earliness, đ?&#x2018;Ąđ?&#x2018;&#x2014; : tardiness of job đ?&#x2018;&#x2014; â&#x20AC;˘ đ?&#x2018;Ľđ?&#x2018; đ?&#x2018;&#x2014; : job đ?&#x2018;&#x2014; could be scheduled â&#x20AC;˘

đ??˝] đ??˝] đ??˝] đ??˝]

The objective is to minimize the earlinessâ&#x20AC;&#x201C;tardiness costs of the jobs while it should be avoided that jobs are not scheduled (i.e. due to capacity constraints):

Sets, Indices, Parameters:

đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014; :

â&#x2C6;&#x2C6; [1, â&#x2C6;&#x2C6; [1, â&#x2C6;&#x2C6; [1, â&#x2C6;&#x2C6; [1,

â&#x2C6;&#x2018;đ??ź

(C8)

đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą +

đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2013;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ]

(C9)

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2013; â&#x2C6;&#x2C6; [1, đ??ź], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2013;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ]

â&#x2030;¤1

đ?&#x2018;&#x2013;=1

â&#x20AC;˘ đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą : job đ?&#x2018;&#x2014; is processed by machine đ?&#x2018;&#x2DC; at time đ?&#x2018;Ą

(C11)

đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą + đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2018;đ?&#x2018;Ą+đ?&#x2018;&#x;đ?&#x2018;&#x2013; â&#x2C6;&#x2019;1 đ?&#x2018;&#x;đ?&#x2018;&#x2013; đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;Ąâ&#x20AC;˛ =đ?&#x2018;Ą+1 đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ąâ&#x20AC;˛ â&#x2C6;&#x2018;đ?&#x2018;Ąâ&#x2C6;&#x2019;1 đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;Ąâ&#x20AC;˛ =đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ(1,đ?&#x2018;Ąâ&#x2C6;&#x2019;đ?&#x2018;&#x; +1) đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ąâ&#x20AC;˛

â&#x2030;¤0

â&#x20AC;˘ đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; : job đ?&#x2018;&#x2014; is processed by machine đ?&#x2018;&#x2DC;

(C12)

đ?&#x2018;&#x161;đ?&#x2018;&#x2013;â&#x20AC;˛ đ?&#x2018;&#x2DC;đ?&#x2018;Ą + đ?&#x2018;&#x161;đ?&#x2018;&#x2013;â&#x20AC;˛â&#x20AC;˛ đ?&#x2018;&#x2DC;đ?&#x2018;Ą

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2013;â&#x20AC;˛ , đ?&#x2018;&#x2013;â&#x20AC;˛â&#x20AC;˛ â&#x2C6;&#x2C6; [1, đ??ź], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž],

â&#x20AC;˘ đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą : program đ?&#x2018;&#x2013; is started on machine đ?&#x2018;&#x2DC; at time đ?&#x2018;Ą â&#x20AC;˘ đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą : machine đ?&#x2018;&#x2DC; is active at time đ?&#x2018;Ą

(C13)

đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą + đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;(đ?&#x2018;Ą+đ?&#x2018;&#x;đ?&#x2018;&#x2013; )

â&#x2030;¤1

job đ?&#x2018;&#x2014; could not be scheduled

â&#x20AC;˘ đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą : job đ?&#x2018;&#x2014; is started at time đ?&#x2018;Ą

(C10)

â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2013;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ], đ?&#x2018;&#x2013;â&#x20AC;˛ â&#x2030; đ?&#x2018;&#x2013;â&#x20AC;˛â&#x20AC;˛

If a machine is not idle at a given point in time, it can either be active or a program is started (C8, C9). The machine has to be active during the duration of a program, i.e., a program cannot be interrupted (C10), and a machine can only be active if a program has been started before (C11). Only one program can be started at a time on each machine

The variables đ?&#x2018; đ?&#x2018;&#x2014; , đ?&#x2018;&#x2019;đ?&#x2018;&#x2014; , and đ?&#x2018;Ąđ?&#x2018;&#x2014; are integer variables in the range from 0 to đ?&#x2018;&#x2021; . The other variables

(đ?&#x2018;Ľđ?&#x2018; đ?&#x2018;&#x2014; , đ?&#x2018;Ľđ?&#x2018;&#x201C;đ?&#x2018;&#x2014; ,

đ?&#x2018;&#x2013;

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą , đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą , đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; , đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą , đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą ) are all binary and

only one if the respective event is true. The variables đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą and đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą are only defined from 1 to đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ (đ?&#x2018;&#x2021;đ?&#x2018;&#x2013;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ). đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ is the latest starting time of 5

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Fig. 3. The figures illustrate the rolling scheduling approach. For each planning window (đ?&#x2018;¤1 , đ?&#x2018;¤2 ), we consider jobs having deadlines between A (or earlier) and B. The selected jobs are then scheduled within the time window between A and C. After schedule optimization, jobs having starting times between A and B will be fixed. Scheduled jobs that are planned to start after point B will be postponed to the next window (see Fig. 3a). Hence, planning window đ?&#x2018;¤2 depends on all previous planning windows, e.g., đ?&#x2018;¤1 . An alternative is to fix point A for all windows (see Fig. 3b), which allows to schedule a job at any earlier time and increases flexibility.

if it is not yet used for the duration of the job or if the correct program is started and the machine is not fully occupied. As the starting time of a job is most important with respect to the objective function, we sort the starting times by their absolute deviation from the deadline of the job, i.e., times that are close to the deadline are tested first. For each possible starting time, we iterate over the list of ovens, which is sorted in descending order of the oven capacity. The algorithm continues with the next job as soon as a suitable assignment is found.

(C12), and a machine cannot be active directly after a program ends (C13). The remaining constraints ensure that the jobs are assigned to machines and processed accordingly: (C14)

đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; â&#x2C6;&#x2019; đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

â&#x2030;Ľ0

(C15)

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ], đ?&#x2018;&#x2013; = đ?&#x2018;&#x201C;đ?&#x2018;&#x2014;

(C18)

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą + đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą â&#x2C6;&#x2018;đ??ž đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x2DC;=1 đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2018;đ??ž đ?&#x2018;&#x2DC;=1 đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ąâ&#x20AC;˛ â&#x2C6;&#x2019; đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą â&#x2C6;&#x2018;đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; đ?&#x2018;Ą=1

â&#x2030;Ľ0

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], đ?&#x2018;&#x2013; = đ?&#x2018;&#x201C;đ?&#x2018;&#x2014;

(C19)

đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

â&#x2030;¤0

(C20)

đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą + đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ąâ&#x20AC;˛

â&#x2030;¤1

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž],

(C16) (C17)

(C21)

â&#x2C6;&#x2018;đ??˝

(C22)

â&#x2C6;&#x2018;đ??˝

đ?&#x2018;&#x2014;=1

â&#x2030;¤0

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ], đ?&#x2018;&#x2013; = đ?&#x2018;&#x201C;đ?&#x2018;&#x2014;

â&#x2030;Ľ0

â&#x2C6;&#x20AC;đ?&#x2018;&#x2014; â&#x2C6;&#x2C6; [1, đ??˝ ], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ], â&#x2C6;&#x20AC;đ?&#x2018;Ąâ&#x20AC;˛ â&#x2C6;&#x2C6; {đ?&#x2018;Ą + đ?&#x2018; |0 â&#x2030;¤ đ?&#x2018; < đ?&#x2018;&#x;đ?&#x2018;&#x2014; }

Reduction of the problem size The problem size of the aforementioned scheduling program can be reduced in order to make solving it more feasible. It is a requirement to fully load the ovens as the store staff should not unnecessarily interrupt their other tasks, which are mostly related to serving the customers, and energy costs for running the ovens are also a factor. Jobs belonging to the same family can be processed concurrently by an oven. Hence, jobs of a job family can be grouped by the size of the smallest oven after they are ordered by their deadlines. As an oven is able to process at least four baking trays at a time and the capacity of the larger ovens are multiples of the smallest oven, the problem size with respect to the jobs that need to be scheduled can be reduced by roughly 75%. The deadline of a derived job is the earliest deadline, and the penalties for earliness and tardiness are the average penalties of the grouped jobs. Additionally, the capacity of the ovens is also reduced accordingly, i.e., the capacity of the smallest oven is one, which also reduces the complexity of the problem.

â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021;đ?&#x2018;&#x2014;đ?&#x2018;&#x161;đ?&#x2018;&#x17D;đ?&#x2018;Ľ ], đ?&#x2018;Ąâ&#x20AC;˛ â&#x2C6;&#x2C6; {đ?&#x2018;Ą + đ?&#x2018; |1 â&#x2030;¤ đ?&#x2018; < đ?&#x2018;&#x;đ?&#x2018;&#x2014; }, đ?&#x2018;&#x2013; = đ?&#x2018;&#x201C;đ?&#x2018;&#x2014; đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

đ?&#x2018;&#x2014;=1 đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą

â&#x2030;Ľ0

â&#x2C6;&#x20AC;đ?&#x2018;&#x2DC; â&#x2C6;&#x2C6; [1, đ??ž], â&#x2C6;&#x20AC;đ?&#x2018;Ą â&#x2C6;&#x2C6; [1, đ?&#x2018;&#x2021; ]

â&#x2030;Ľ0

The constraint (C14) ensures that each job is assigned to the machine it is processed on. If a job is active and associated with a machine that starts a program at the same time, the job has to be started at this time instant (C15). Constraint C16 states that a machine has to be started when a job is started. If a job is started, it has to be active during its duration, i.e., a running job cannot be interrupted (C17). A suitable program has to be started on a machine if a job is assigned to it (C18) and a machine has to be either started or active if a job is active (C19). A machine must be active during the subsequent steps after the start of a program (C20). Finally, a job has to be active if a machine is active (C21) or started (C22). A linear program solver can solve this scheduling problem. There are different possibilities to obtain the baking plan (see Table 1) from the schedule as several variables provide information about the assignment of the jobs to the machines and the starting times of the jobs. For instance, đ?&#x2018;Ľđ?&#x2018;&#x2014;đ?&#x2018;Ą or đ?&#x2018; đ?&#x2018;&#x2014; in combination with đ?&#x2018;Śđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC; can be used for

Moreover, planning the full day at once is not necessary as decisions in the morning hardly influence decisions in the afternoon. This allows us to employ a rolling scheduling approach (Sridharan et al., 1987) to solve the scheduling problem as illustrated in Fig. 3. We divide a day into windows that are optimized in sequence. As the windows overlap, it is possible that jobs that are scheduled in a previous window are still in process in the subsequent windows. In order to block the machines in the subsequent (and previous) planning steps, we manually set the values of đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2DC;đ?&#x2018;Ą and đ?&#x2018;&#x161;đ?&#x2018;&#x2DC;đ?&#x2018;Ą to zero. The support of rolling scheduling makes the problem formulation also directly applicable to an application scenario where intraday updates are possible (e.g. bakeoff in supermarkets). At any point during the day, it is not only possible to update the schedule for the remaining part of the day but the jobs, which need to be scheduled, can also be changed. Moreover, we also consider a variant which fixes the start of the planning window and also allows to schedule jobs earlier and not only later (see Fig. 3b). In the empirical evaluation of this study, we always compute the schedule at the beginning of the day and rely on the latter variant of the rolling approach.

this purpose. The resulting schedule is provided to the personnel in the stores who are responsible for filling the ovens and the shelves. Heuristic The linear program can be solved by a commercial solver, but the required runtime and computational costs can be an obstacle for practical application. Hence, we also consider a simple heuristic that is based on a greedy assignment of the jobs to time slots and machines. For each job, we test combinations of starting times and machines until a suitable free slot in a machine is found. For this purpose, we sort the jobs in ascending order of their deadline (đ?&#x2018;&#x2018;đ?&#x2018;&#x2014; ) and in descending order of their penalty (đ?&#x2018;¤đ?&#x2018;Ą , đ?&#x2018;¤đ?&#x2018;&#x2019; ). Subsequently, we iterate over the sorted list of jobs. For each job, we try to assign it to a time slot on a machine, i.e., we test if a machine is available at specific times. A machine is available 6

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt Table 3 Daily forecast performance based on 24,194 observations per method. The bottom-up predictions are obtained from the direct hourly forecasts (see Table 6). The best method in each column is statistically different from the other evaluated methods. Method

Mode

MASE

SMAPE

RMSE

MAE

Loss

S-NaĂŻve

Direct Bottom-up

0.980 0.980

29.47 29.47

66.27 66.27

21.39 21.39

52.3 52.3

88.2 88.2

21.5 21.5

33.3 33.3

S-Mean

Direct Bottom-up

0.814 0.818

23.74 24.05

47.94 48.41

17.83 17.86

55.7 54.3

92.1 91.5

20.3 19.5

28.2 28.0

S-Median

Direct Bottom-up

0.773 0.852

22.77 27.96

43.96 44.95

16.30 16.84

53.8 37.0

91.4 84.0

17.4 10.1

26.1 26.1

ETS

Direct Bottom-up

0.726 0.735

21.72 22.06

42.42 42.15

15.62 15.77

55.7 58.4

92.5 93.0

17.5 18.5

24.9 25.5

LSTM

Direct Bottom-up

0.674 0.881

20.06 27.56

34.38 33.08

14.67 16.65

50.4 19.4

92.0 80.0

14.9 4.9

23.0 24.9

â&#x20AC;&#x201C; top-down: We distribute the daily forecasts using hourly profile predictions (mode: one). We evaluate all possible approaches and only use the best approach for the computation of the baking plans. In addition to the forecasts, we also investigate the effect of higher service levels with respect to the daily order quantity. In order to evaluate the forecasts, we compute the mean absolute scaled error (MASE) (Hyndman and Koehler, 2006), the symmetric mean absolute percentage error (SMAPE), the root mean square error (RMSE), and the mean absolute error (MAE). 1 â&#x2C6;&#x2018; đ?&#x2018; đ?&#x2018;&#x203A;

|đ?&#x2018;&#x152;đ?&#x2018;&#x203A; â&#x2C6;&#x2019; đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; | â&#x2C6;&#x2018;đ?&#x2018;&#x2021; đ?&#x2018;Ą=đ?&#x2018;&#x161;+1 |đ?&#x2018;&#x152;đ?&#x2018;Ą â&#x2C6;&#x2019; đ?&#x2018;&#x152;đ?&#x2018;Ąâ&#x2C6;&#x2019;đ?&#x2018;&#x161; | 1 â&#x2C6;&#x2018; |đ?&#x2018;&#x152;đ?&#x2018;&#x203A; â&#x2C6;&#x2019; đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; | đ?&#x2018;&#x2020;đ?&#x2018;&#x20AC;đ??´đ?&#x2018;&#x192; đ??¸ = đ?&#x2018; đ?&#x2018;&#x203A; (đ?&#x2018;&#x152;đ?&#x2018;&#x203A; + đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; )â&#x2C6;&#x2022;2 â&#x2C6;&#x161; â&#x2C6;&#x2018; Ě&#x201A; 2 đ?&#x2018;&#x203A; (đ?&#x2018;&#x152;đ?&#x2018;&#x203A; â&#x2C6;&#x2019; đ?&#x2018;&#x152;đ?&#x2018;&#x203A; ) đ?&#x2018;&#x2026;đ?&#x2018;&#x20AC;đ?&#x2018;&#x2020;đ??¸ = đ?&#x2018; 1 â&#x2C6;&#x2018;| | đ?&#x2018;&#x20AC;đ??´đ??¸ = |đ?&#x2018;&#x152;đ?&#x2018;&#x203A; â&#x2C6;&#x2019; đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; | | | đ?&#x2018; đ?&#x2018;&#x20AC;đ??´đ?&#x2018;&#x2020;đ??¸ =

4. Empirical evaluation The empirical evaluation aims to assess the performance of the introduced approach and falls into two parts: First, we focus on the forecast accuracy (see Section 4.2.1) of the intraday and daily forecasts. Consequently, we evaluate the impact of the forecast performance on the inventory or rather the operational performance in Section 4.2.2.

đ?&#x2018;&#x2021; đ?&#x2018;&#x2021; â&#x2C6;&#x2019;đ?&#x2018;&#x161;

(7) (8)

(9) (10)

đ?&#x2018;&#x203A;

The denominator of MASE is the scaling factor of the absolute errors and is determined within the training set for each time series by comparing the seasonal naĂŻve forecast to the actual values. Moreover, we compute the fill rate (FR), the overage rate (OR), and the service level (SL) in order to assess the supply chain performance. The expected service level for unbiased point forecasts is 50%. The overage rate indicates the percentage of goods that need to be discarded in relation to the actual sales, the fill rate represents the percentage of demand that can be fulfilled while the service level specifies the probability that the demand of a product can be completely fulfilled on a single planning step. At daily level (see Tables 3 + 4), we compare the daily order quantity with the daily demand and a planning step refers to a day. At intra-day level (see Tables 7 + 9), a planning step covers only 5 min and the shelf load depends on the execution of the baking plan. As either the fill rate or the overage rate can be manipulated, we sum the deviation from the optimum of both key figures and refer to it as total loss (Loss). 1 â&#x2C6;&#x2018; I(đ?&#x2018;&#x152;đ?&#x2018;&#x203A; â&#x2030;¤ đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; ) (11) SL = đ?&#x2018; đ?&#x2018;&#x203A; ( đ?&#x2018;&#x152;Ě&#x201A; ) 1 â&#x2C6;&#x2018; FR = (12) đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x203A; 1, đ?&#x2018;&#x203A; đ?&#x2018; đ?&#x2018;&#x203A; đ?&#x2018;&#x152;đ?&#x2018;&#x203A; + 1 â&#x2C6;&#x2018; (đ?&#x2018;&#x152;Ě&#x201A;đ?&#x2018;&#x203A; â&#x2C6;&#x2019; đ?&#x2018;&#x152;đ?&#x2018;&#x203A; ) OR = (13) đ?&#x2018; đ?&#x2018;&#x203A; đ?&#x2018;&#x152;đ?&#x2018;&#x203A;

4.1. Experimental design We use a real-world dataset that comprises hourly sales data of 14 different baked goods from 9 stores over a period of 987 days (i.e. 141 weeks or 2.7 years). As the assortment varies among the stores, we only need to consider 121 time series. Moreover, information about the available ovens in the different stores as well as the baking program assignment of the products is available. The first 110 weeks (â&#x2030;&#x2C6; 78%) of the data serve as training data while only the latter 22% (31 weeks) are used for the evaluation. We compute hourly forecasts and baking plans in order to demonstrate the viability of our approach. The computed baking plans consider all articles and empirical constraints with respect to the available ovens and the duration of the baking processes. Hence, we meet all requirements of a productive application. We compute a baking plan for each store and date when the store is not closed, i.e., 1820 baking plans per forecasting method. The baking plans are based on 24,194 daily forecasts (store Ă&#x2014; product Ă&#x2014; day) and 324,948 forecasts at hourly level (store Ă&#x2014; product Ă&#x2014; day Ă&#x2014; opening hours). We evaluate different approaches to obtain the hourly demand forecasts. We can either compute the hourly forecasts directly or distribute the daily forecasts to the hourly level. Thus, we consider the following targets and different modes:

Loss = (1 â&#x2C6;&#x2019; FR) + OR

(14)

We test if the performance differences are statistically significant by applying the Wilcoxon-signed rank test. It is a rank-based test that does not require assumptions on the distributions of the key figures. In addition to the ML method (LSTM) described in Section 3.1, we also consider baseline forecasting approaches (see Hyndman and Athanasopoulos (2014)) and the time series model ETS(ANA) (Hyndman et al., 2008). All baseline methods consider the weekly seasonality and simulate the typical decision process of store managers that are likely to take the previous weeks into account if they have access to this information. We cannot consider the decision quality based on judgmental forecasts because this information is not available. However, it is unreasonable to assume that untrained store clerks, some of whom are part-time employees, are able to outperform statistical methods. Moreover, the store personnel is also responsible for other tasks, and they cannot dedicate time to the optimization of baking plans. The forecast of S-NaĂŻve is the last observation from the same part of the season (e.g. weekday) and S-Mean (S-Median) computes the mean (median) of the last four observations. For the intraday forecasts, each hour is represented by a separate time series, i.e., we only have

â&#x20AC;˘ target: daily â&#x20AC;&#x201C; direct: We compute the daily demand directly, i.e., using daily data. â&#x20AC;&#x201C; bottom-up: We obtain the daily forecast by accumulating the hourly forecasts. â&#x20AC;˘ target: hourly profile â&#x20AC;&#x201C; direct: We compute the hourly profiles directly, i.e., based on past hourly profiles. â&#x20AC;&#x201C; one: We scale the sum of the hourly profile forecasts obtained from mode â&#x20AC;&#x2DC;â&#x20AC;&#x2DC;directâ&#x20AC;&#x2122;â&#x20AC;&#x2122; to one in order to be able to distribute the daily quantity. â&#x20AC;˘ target: hourly â&#x20AC;&#x201C; direct: We compute the hourly demand directly, i.e., using hourly data. 7

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Table 4 Operational performance at daily level based on 24,194 observations per method and target service level. The key figures are based on the direct forecasts (see Table 3). Beside the forecast (FC), we also evaluate the performance for the target service levels 75% and 90%. Method

S-Naïve S-Mean S-Median ETS LSTM

Service level

Fill rate

Overage rate

Loss

52.3 55.7 53.8 55.7 50.4

77.4 78.2 78.3 78.1 74.7

92.2 92.4 92.5 91.9 89.3

88.2 92.1 91.4 92.5 92.0

95.4 96.7 96.6 96.9 96.9

98.4 98.8 98.8 98.8 98.9

21.5 20.3 17.4 17.5 14.9

39.9 35.3 33.5 31.6 28.7

66.4 59.3 58.1 53.9 46.4

33.3 28.2 26.1 24.9 23.0

44.5 38.6 36.9 34.7 31.8

68.0 60.4 59.2 55.1 47.5

Table 5 Hourly profile forecast performance based on 324,948 observations per method. As the sum of the direct prediction is not necessarily equal to one per day, we also scale them to one in order to be able to distribute the full daily demand. The best method is shown in boldface and underlined while methods that are not significant different at 0.05 significance level are only print in bold.

to deal with one primary seasonality. For the ETS model, we employ a rolling origin evaluation, which means that the model is re-fitted every day. The hyper-parameters of the LSTM model are determined by a random search in combination with cross-validation on the training set (Bergstra and Bengio, 2012). We train an ensemble of 50 LSTM models for each target (i.e. daily, hourly, hourly profile) and employ the median ensemble operator to obtain the final prediction (Barrow et al., 2010; Kourentzes et al., 2014). In order to estimate the safety stock for higher target service levels, we exploit the residuals of each method and employ sample average approximation as described in Section 3.1. We do not consider other prediction methods as our goal is to investigate the influence of the forecasting phase on the operational performance. The selected methods offer different characteristics and are sufficient to answer the research questions. We evaluate simple baseline methods (S-Naïve, S-Mean, S-Median), a popular time series method (ETS), and a ML method (LSTM). Thus, we are able to assess the suitability and potential benefits of ML and more accurate predictions in general. The experiments are conducted on a virtual machine having 32 Intel vCPU cores with 2.6 GHz and 24 GB memory, and running the operating system Ubuntu 14.04.6 LTS. For the ETS forecasts, we use the ets() function from the forecast (version: 8.9) package (Hyndman and Khandakar, 2008) for the statistical software R (version: 3.6.1). The LSTM models are implemented with Pytorch (version: 1.0.1) (Paszke et al., 2017) for Python (version: 3.7) and trained on a GPU (nVidia GeForce GTX 1080 Ti). In order to solve the linear programs, we use the commercial solver Gurobi (version: 8.1.1) (Gurobi Optimization, 2019). We restrict the resources for each Gurobi process to two cores in order to be able to solve multiple optimization problems in parallel.

Method

Mode

MASE

MAE

RMSE

S-Nav̈e

Direct One

1.108 1.108

0.063 0.063

0.104 0.104

S-Mean

Direct One

0.935 0.939

0.053 0.053

0.083 0.083

S-Median

Direct One

0.881 0.925

0.051 0.054

0.085 0.092

ETS

Direct One

0.873 0.880

0.049 0.049

0.076 0.078

LSTM

Direct One

0.875 0.876

0.048 0.048

0.075 0.075

the hourly forecasts are reasonably accurate. A reason for this is that the data on the hourly level is quite noisy. For example, some products are not sold or there is no demand for goods in every hour causing frequent zeros in the time series. Consequently, the service level for LSTM (bottom-up) is only 19.4% which is far below the expected service level of unbiased forecasts (50%). While LSTM (bottom-up) has the lowest RMSE, it also has the lowest overage rate and lowest fill rate. Despite the huge bias, it is still the second best method with respect to the total loss but it is significantly outperformed by the LSTM model that computes daily forecasts directly. Based on the conducted experiments, we conclude that the daily demand should be directly forecast. The good forecasting performance of LSTM carries over to higher target service levels (see Table 4). While all methods roughly match the desired service level, the differences are most noticeable for the overage rate. For high service levels, the advantage of LSTM increases as the estimation of the quantiles is sharper, which also leads to a significantly smaller amount of food waste.

4.2. Results We subsequently present the results of the two main parts of the proposed approach. 4.2.1. Forecasting We compute and evaluate forecasts for three different targets: daily demand (see Table 3), hourly profile (see Table 5), and hourly demand (see Table 6). The goal is to provide hourly forecasts that are consolidated with the daily demand and serve as input for the baking plan generation. The main observation is that the ranking of the methods is mostly independent from the forecast accuracy measure and forecast target. The ML method LSTM constantly outperforms all other approaches, but there is also a significant gap between ETS and the baseline methods, which supports the plausibility of the results. With respect to the baselines, we can report that it is not sufficient to only consider a single value as S-Naïve provides worse results than any other evaluated method. S-Median is more accurate than S-Mean as it is more robust with respect to outliers. For the daily forecasts (see Table 3), the bottom-up approach to calculate the forecasts is either comparable with direct forecasts or slightly worse. However, the exceptions are S-Median and LSTM which systematically underestimate the demand at the daily level even though

The relative advantage of LSTM diminishes for the profile forecasts (see Table 5) which means that the intraday demand profile is quite robust over time. As the profile forecasts are intended to be used for a top-down distribution of the daily quantity, we also scale the sum per day to one. The post-processing step only negatively affects S-Median whose sum was frequently smaller than one. The forecast accuracy for hourly sales is presented in Table 6. We compute the hourly demand top-down by relying on the direct daily forecasts and the hourly profile forecasts. The top-down approach leads to comparable results for most approaches. Only S-Median and LSTM, which underestimate the daily demand, perform noticeably worse. The results show that it is important to measure the performance indicators at different levels of aggregation. Nevertheless, LSTM (top-down) is still more accurate at the hourly level than any other forecasting method except LSTM (direct). In order to make the intraday forecasts more reliable, it may be helpful to consider the data at a lower granularity (e.g. morning, midday, afternoon) for products that are usually not sold every hour. 8

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt Table 6 Hourly forecast performance based on 324,948 observations per method. The top-down predictions are obtained from the daily forecasts (mode: direct) (see Table 3) and the predicted day profiles (mode: one) (see Table 5). The best method in each column is statistically different from the other evaluated methods. Method

Mode

MASE

MAE

RMSE

S-Nav̈e

Direct Top-down

1.053 1.053

4.586 4.586

12.639 12.639

S-Mean

Direct Top-down

0.898 0.902

3.741 3.712

9.607 9.405

S-Median

Direct Top-down

0.835 0.876

3.585 3.674

9.287 9.292

ETS

Direct Top-down

0.835 0.835

3.375 3.345

8.586 8.575

LSTM

Direct Top-down

0.778 0.822

3.099 3.181

7.456 7.490

the the the has put

Table 7 Scheduling: Operational key figures without and with intraday baking. The best method in each column and block is statistically different from the other evaluated methods. The key figures are given for intraday baking (IB: yes) and alternatively for no intraday baking (IB: no), i.e., all items are baked before the store opens. The variant no intraday baking is not desirable as the goods deteriorate too quickly but it illustrates the performance loss due to intraday decisions. The average age of goods at selling time is given in the format ‘‘hour:minute’’. IB

Optimizer

Method

Age

SLl

Loss

–

Perfect S-Naïve S-Mean S-Median ETS LSTM

04:37 04:06 04:13 04:11 04:15 04:12

100.0 95.9 96.8 96.5 96.9 96.6

100.0 89.3 92.0 91.4 92.5 92.0

0.0 21.8 20.1 17.3 17.4 14.8

0.0 32.5 28.1 25.9 24.8 22.8

Yes

Heuristic

Perfect S-Naïve S-Mean S-Median ETS LSTM

02:07 02:23 02:30 02:26 02:31 02:29

99.3 95.4 96.5 96.3 96.7 96.5

97.5 84.5 89.0 88.1 89.7 89.3

2.5 26.6 23.2 20.6 20.2 17.5

5.0 42.1 34.2 32.5 30.4 28.2

MIP

Perfect S-Naïve S-Mean S-Median ETS LSTM

02:07 02:23 02:30 02:27 02:31 02:29

99.3 95.4 96.5 96.3 96.7 96.5

97.8 84.7 89.1 88.2 89.8 89.4

2.2 26.4 23.1 20.4 20.1 17.4

4.5 41.7 34.0 32.2 30.3 28.1

In summary, we rely on the hourly forecasts that are obtained from top-down distribution of the daily forecast (order quantity) using intraday profile forecasts for the generation of the schedules. For present use case, it is a requirement that the daily order quantity to be processed during the same day which makes it reasonable to more emphasis on the forecast accuracy at the daily level.

4.2.2. Intraday baking While the demand forecasts are a necessary input for the scheduling problem, they do not represent the decisions. We want to investigate the effect of the forecast accuracy on the operational performance. Thus, we compute the fill rate and overage rate at the end of the day. Moreover, we determine the average age of goods at the selling time if the computed schedule is executed accordingly. In order to obtain the key figures, we iteratively compute the shelf load. The shelves are empty at the beginning of the day and directly filled after a job ends, i.e., starting time of the job plus its duration. If the demand can be fulfilled, the shelf load is reduced accordingly. The shelf load cannot be below zero and items are only removed if they are sold, i.e., they are not removed during the day by the store personnel. However, items that are not sold by the end of the day have to be discarded. We use the hourly forecasts (mode: top-down) presented in the previous section in order to create the instances of the scheduling problem as described in Section 3.2. Additionally, we also consider the perfect forecast (i.e. sales) to validate our results. A planning step of the schedule comprises 5 min. Hence, we linearly distribute the hourly data to the planning steps. As the scheduling problem is based on demand given in integers, we add fractions to the earlier time step and reduce the succeeding steps accordingly. Moreover, baking should end a couple of hours before the store closes which makes it necessary to prepone the quantities that are required to fulfill the expected demand of the last opening hours. More precisely, the baking processes should terminate three (two; one) hours before the store closes if the store is open for more than seven (between five and seven; less than five) hours. We will investigate the impact of the prediction model on the operational performance (1), the general benefits gained due to intraday baking (2), and also discuss the characteristics of the scheduling tasks as well as the optimization of the scheduling problem (3).

Table 8 Scheduling: Operational key figures for higher service level as indicated in column ‘‘SL-Day’’. SL-Day

Optimizer

Method

Age

Loss

MIP

S-Naïve S-Mean S-Median ETS LSTM

04:25 04:27 04:27 04:28 04:27

98.3 98.5 98.5 98.6 98.5

96.1 96.7 96.6 96.8 96.9

40.2 35.0 33.1 31.4 28.4

44.1 38.3 36.6 34.6 31.5

Yes

Heuristic

S-Naïve S-Mean S-Median ETS LSTM

02:38 02:41 02:39 02:43 02:41

97.3 97.9 97.9 98.0 98.0

90.6 93.5 92.9 93.9 94.0

45.7 38.2 36.8 34.3 31.3

55.1 44.7 43.8 40.4 37.3

MIP

S-Naïve S-Mean S-Median ETS LSTM

02:38 02:42 02:40 02:43 02:41

97.3 97.9 97.9 98.0 98.0

90.8 93.6 93.1 94.1 94.1

45.5 38.1 36.6 34.2 31.2

54.6 44.4 43.5 40.1 37.1

MIP

S-Naïve S-Mean S-Median ETS LSTM

04:34 04:34 04:34 04:34 04:33

99.3 99.4 99.4 99.4 99.4

98.7 98.8 98.8 98.8 98.9

66.4 58.9 57.7 53.5 46.0

67.6 60.1 58.8 54.7 47.1

Yes

Heuristic

S-Naïve S-Mean S-Median ETS LSTM

02:50 02:53 02:51 02:54 02:51

98.2 98.7 98.7 98.8 98.7

93.3 95.9 95.2 96.2 96.2

72.0 61.8 61.2 56.1 48.8

78.7 65.9 66.0 60.0 52.6

MIP

S-Naïve S-Mean S-Median ETS LSTM

02:51 02:53 02:52 02:55 02:51

98.3 98.8 98.7 98.8 98.8

93.6 96.0 95.4 96.4 96.3

71.5 61.7 61.1 55.9 48.6

77.9 65.6 65.7 59.5 52.3

Effect of the forecasting model. An important observation is that the operational performance is directly linked to the accuracy of the provided forecasts (see Table 7 (intraday baking: yes)). While this observation is obvious in many settings, the effect of the scheduling phase might diminish the benefits of more accurate predictions. For instance, the main characteristics of resulting jobs are comparable (see Table 10). However, the performance gains due to better predictions are very apparent as perfect forecasts (i.e. sales) substantially outperform the actual forecasts with respect to all key figures. A comparison of the

forecasting methods reveals that not only the ranking based on the forecast accuracy (see Section 4.2.1) is preserved but also that the relative difference with respect to the total loss is fairly comparable. For higher target service levels, we observe similar results but the relative performance improvements of a more precise prediction method increases (see Table 8). The choice of the schedule optimization method has no significant impact on the operational performance. Hence, we 9

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Table 9 Closing the performance gap between intraday baking (IB: yes) and no intraday baking (IB: no) based on forecasts without safety stock. We vary the ratio between the penalty for earliness đ?&#x2018;¤đ?&#x2018;&#x2019; and tardiness đ?&#x2018;¤đ?&#x2018;Ą (column: đ?&#x2018;¤đ?&#x2018;&#x2019; â&#x2C6;ś đ?&#x2018;¤đ?&#x2018;Ą ). Moreover, we execute the jobs earlier by adding a time buffer of 15 or 30 min (column: buffer). All schedules are optimized using the MIP. Method

đ?&#x2018;¤đ?&#x2018;&#x2019; â&#x2C6;ś đ?&#x2018;¤đ?&#x2018;Ą

Buffer

Age

Service level

Fill rate

Perfect

1:1

00:00

04:37

100.0

0.0

Yes

1:1 1:3 1:9

00:00 00:00 00:00

02:07 02:08 02:10

99.3 99.8 99.9

97.8 99.4 99.9

2.2 0.6 0.1

4.5 1.2 0.3

1:1 1:1

00:15 00:30

02:18 02:32

99.7 99.9

99.2 99.7

0.8 0.3

1.6 0.7

ETS

LSTM

Overage rate

Loss

1:9

00:30

02:39

100.0

0.0

1:1

00:00

04:15

96.9

92.5

17.4

24.8

Yes

1:1 1:3 1:9

00:00 00:00 00:00

02:31 02:35 02:36

96.7 96.9 96.9

89.8 90.6 90.8

20.1 19.3 19.1

30.3 28.6 28.3

1:1 1:1

00:15 00:30

02:40 02:52

97.0 97.1

91.1 91.6

18.8 18.3

27.8 26.7

1:9

00:30

03:00

97.1

91.8

18.1

26.4

1:1

00:00

04:12

96.6

92.0

14.8

22.8

yes

1:1 1:3 1:9

00:00 00:00 00:00

02:29 02:33 02:35

96.5 96.6 96.6

89.4 90.1 90.3

17.4 16.7 16.5

28.1 26.6 26.2

1:1 1:1

00:15 00:30

02:37 02:49

96.7 96.8

90.5 91.1

16.2 15.7

25.7 24.6

1:9

00:30

02:58

96.8

91.2

15.6

24.4

can report that the choice of the prediction method is the most crucial decision, while the scheduling phase has a significantly smaller impact. Moreover, we notice that the scheduling phase has only a minor impact on the operational performance as perfect forecasts (i.e. sales) lead to an almost perfect performance.

A major reason to bake the goods during the day is to provide them as fresh as possible. Hence, we compare the average age of sold goods in order to measure this effect (see Table 7). When comparing the average age of goods between intraday baking and no intraday baking, it has to be considered that the difference is underestimated. First, the baking process of all provided items cannot exactly end when the stores open. The goods would need to be baked in a separate facility, which would also add additional delivery time. Second, at the beginning of the day, the difference between both approaches is negligible while larger differences are expected in the later parts of the day. We note that the average age of sold goods can be reduced by roughly 54%, i.e., from 04:37 h to 02:07 h, for perfect forecasts (i.e. sales) and 41%, i.e., from 04:12 h to 02:29 h, for the schedules based on forecasts (see Table 7). Hence, intraday baking allows to significantly reduce the age of sold goods. It has to be noted that the age is correlated with the fill rate, e.g., by underestimating the demand, a very low average age can be measured. Thus, the age has to be concurrently viewed with other key figures like the fill rate. For instance, for S-NaĂŻve the average age of sold goods is comparable low, but this is also true for the fill rate as the predictions are not well aligned with the demand which also leads to high overages.

The baseline S-NaĂŻve does more often underestimate the demand, which leads to a lower age of goods compared to the more advanced prediction models. On the contrary, it is also true that overestimating the demand at the beginning of the day makes customers buy older items later due to the â&#x20AC;&#x2DC;â&#x20AC;&#x2DC;first in - first outâ&#x20AC;&#x2122;â&#x20AC;&#x2122; assumption. However, the absolute average age difference is under 10 min for all evaluated forecasting methods and has no practical relevance. Moreover, we want to point out that the reported service level is measured based on the observations on each planning step (i.e. 5 min) and thus rather comparable to the fill rate. Even without dedicated safety stocks, it is possible to serve around 90% of the customers. A comparison between LSTM and ETS reveals that the fill rate of ETS is only 0.4 percentage points higher but the overage rate of LSTM is 13.4% lower while the average age of goods is similar. By setting a higher target service level for the daily order quantity (see Table 8), the fill rate can be noticeably increased. But at the same time, the loss rate also increases. Hence, the decision on the optimal service level depends on the costs and profit margins of the offered products. However, the LSTM model outperforms all reference methods across all service levels. Hence, we can conclude that the LSTM is the best approach among the evaluated methods, which is also reflected in the lowest total loss. This also underlines the suitability of ML methods for this application domain.

With respect to the other key figures, we notice that the scheduling part has a negative impact. This is also observable for higher target service levels, but the relative disadvantage decreases (see Table 8). The reason for this is that some items are not baked in time. Hence, a part of the demand cannot be fulfilled, which negatively influences the fill rate, overage rate, and total loss. For the perfect forecast (i.e. sales), the total loss only increases by 4.5 percentage points as 2.2% of the demand cannot be fulfilled. For the actual forecasting methods (e.g. LSTM), the increase of total loss is on average 5.9 percentage points (22%). In order to mitigate the negative effect of intraday baking, we study possibilities to close the performance gap.

Effect of intraday baking. We want to measure the effect of the scheduling part of our solution approach by comparing the operational key figures with a no intraday baking approach (see Table 7). No intraday baking means that the full predicted daily demand is placed on the shelves when the store opens. In practice, this might not be feasible as the available shelf space is limited, which makes filling up shelves still a requirement.

The relative increase of the overage rate is most noticeable and can be decreased by increasing the age of goods, e.g., by baking the goods earlier. Moreover, it is reasonable to set asymmetric penalties, i.e., setting a higher penalty for earliness compared to tardiness. In consequence, the jobs are more likely be executed earlier if an oven 10

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt Table 10 Scheduling: Number of jobs based on forecasts without safety stock. The average planning horizon comprises 143.3 steps (≈ 12 h). Due to the aggregation, the number of jobs that need to be scheduled can be reduced by 73.5%. Method

Sales S-Naïve S-Mean S-Median ETS LSTM

Jobs

heuristic as introduced in Section 3.2. We also considered solving the whole day in a single step, but preliminary experiments indicated that the runtime is much higher (factor > 100) while the objectives of the solved linear programs were fairly comparable. The presented results show that the operational performance of both optimization methods is almost identical (see Tables 7 + 8) and that perfect forecasts lead to a near optimal operational performance. The results with respect to the optimization of the scheduling problems are provided in Table 12. In general, the objectives of the optimized schedules cannot be compared among different methods as each scheduling instance depends on the initial forecasts, which determine the number of jobs and their deadlines. Hence, the objectives are not directly comparable if the forecasts differ, but we can compare the optimization methods. The use of the heuristic leads to a roughly 60% higher objective across all methods and service levels. However, this does hardly affect the operational performance as both methods offer reasonably accurate schedules. The average penalty per job is 0.59 (1.00) using the MIP (Heuristic) after the optimization. We link this result to the average earliness and tardiness penalty per job which is 1.30 (see Table 10) and infer that the average deviation of a job from the deadline is only 0.45 (0.77) planning steps, i.e., less than 3 (4) min. Thus, a slightly increased objective of the optimization problem has no noticeable effect on the operational performance. Moreover, the equipment of the stores with respect to the available ovens is sufficient to fulfill the demand, which contributes to the negligible effect of the optimization method. The application of the heuristic also has the advantage that scheduling problems can be solved on average in only 1.05 s compared to 53.43 s using the MIP. This is a reduction by more than 98%. While the runtime increases for higher service levels as more jobs need to be scheduled, the relative difference remains roughly stable.

Jobs (grouped)

avg. penalty

120.8 122.5 124.1 121.0 123.0 122.9

1.294 1.308 1.225 1.195 1.206 1.228

32.0 32.5 32.9 32.1 32.7 32.6

1.366 1.370 1.281 1.249 1.267 1.275

Table 11 Scheduling: Average number of jobs (N) per program (P1–P5) before and after grouping based on forecasts without safety stock. P1

Jobs

N pct.

59.2 47.7%

22.9 18.4%

24.5 20.6%

6.4 5.1%

9.6 8.2%

Jobs (grouped)

N pct.

15.2 46.0%

6.1 18.5%

6.5 20.5%

2.0 6.0%

2.8 9.0%

Duration

is not available at the required time. We study the effect of those approaches and report the results in Table 9. The effect of asymmetric penalties is most noticeable for perfect forecasts where the negative effects of intraday baking can be almost resolved. With respect to actual forecasts, the improvements are noticeable but overall rather limited. Overall, the performance is not very sensitive to the ratio of the earliness and tardiness penalties. Hence, the amount of severe scheduling conflicts is low and only adjusting the penalties is not sufficient. The influence of an erroneous distribution of the daily quantity can be more effectively addressed by adding time buffer to the jobs. This approach works quite well and reduces the total loss by around 4 percentage points if a time buffer of only 30 min is applied. A combination of both approaches leads to the best results, i.e., time buffer (30 min) and asymmetric penalties (ratio 1 ∶ 9). For perfect forecasts, the performance is then equivalent while a performance difference of roughly 2 percentage points remains for the evaluated forecast methods. However, intraday baking based on LSTM with additional time buffer and asymmetric penalties outperforms ETS without intraday baking.

5. Conclusions We introduced a solution approach for intraday shelf replenishment of perishable goods. Its purpose is to assist the store personnel by baking goods during the day. Therefore, we compute hourly demand forecasts that are used to optimize a scheduling problem that reflects a baking plan. The baking plan can either be provided as part of an interactive mobile application or be print on paper depending on the requirements and preferences of the bakery. Based on our empirical evaluation, we conclude that our solution approach serves its purpose. If the resulting schedules are executed as suggested, most customers can be served with freshly baked goods. The average age of goods is significantly lower due to intraday baking in comparison with baking before the store opens. The drawbacks of intraday baking caused by inaccurate hourly forecasts can be addressed by adding safety stock, e.g., higher order quantities or additional time buffer. With respect to the forecasting phase of our approach, we can report that the ML model outperformed the reference methods for all evaluated levels. A general observation for perishable goods is that the demand at hourly level is quite noisy and hard to predict. As the intraday demand profiles are more stable compared to the actual demand, it is advisable to follow a top-down forecasting approach. We also measured that the operational performance mostly depends on the accuracy of the demand estimation, which means that it is most beneficial to develop an accurate prediction model. In terms of the applicability of our approach in a large-scale application scenario that requires offering more than 100 baking plans per day, it is reasonable to rely on a heuristic to solve the scheduling problem as this leads to a much lower runtime and a comparable operational performance (see Table 12). The impact of the scheduling phase is overall negligible, which means that an exact solution might not be required. Our proposed problem formulation and the considered evaluation criteria are closely linked to the process in the stores and do not depend on the actual optimization approach. Hence, the evaluation

Schedule optimization. The jobs of the scheduling problems are based on the forecasts. For every day and store, we create a schedule considering all articles that are relevant for intraday baking. On every day, we have to schedule on average more than 120 jobs per store, i.e., baking trays that have to be put in an oven (see Table 10). The jobs can be grouped by their program assignment, which reduces the problem size by 73.5% and ensures a high utilization of the ovens. The average penalty for earliness or tardiness is around 1.30 after grouping, i.e., for a job comprising several initial jobs. The majority of the jobs belong to program P1 (46%), P2 and P3 each cover around 20% while P4 and P5 account together for only 15% (see Table 11). Hence, most jobs have a short duration of less than 5 planning steps (25 min). Longer baking durations are not usual for goods that are baked in the stores. The average planning horizon depends on the opening hours of the stores and comprises 143.3 steps (≈ 12 h). We notice that the average number of jobs is fairly comparable among the forecasting methods while the operational performance is still significantly different (see Table 7). Hence, jobs derived from more accurate prediction models are better aligned with the actual demand, which translates to a better performance. In order to solve the scheduling problems, we employ the rolling approach with a fixed starting time (see Fig. 3b) and compare it with the 11

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Table 12 Scheduling: The results concerning the optimization of the scheduling problem based on forecasts without safety stock. Optimizer

Method

Runtime [s] Mean

Objective Median

Objective per job

Mean

Median

Mean

Median

Heuristic

Perfect S-Naïve S-Mean S-Median ETS LSTM

1.053 1.117 1.001 0.979 1.065 1.076

0.846 0.807 0.820 0.796 0.891 0.874

39.174 56.738 36.344 34.628 30.946 33.53

11.519 11.784 12.204 10.256 11.332 10.904

1.039 1.321 0.964 0.923 0.846 0.885

0.423 0.434 0.428 0.354 0.417 0.408

MIP

Perfect S-Naïve S-Mean S-Median ETS LSTM

48.815 64.756 52.465 49.960 51.686 52.931

31.822 32.384 35.026 33.355 35.258 34.923

23.611 37.874 21.575 20.54 18.602 19.994

5.674 5.745 5.575 5.033 5.342 5.248

0.612 0.843 0.554 0.531 0.495 0.525

0.212 0.216 0.208 0.188 0.202 0.204

CRediT authorship contribution statement

criteria can be used to compare different optimization methods for the scheduling problem. A more complex objective, which considers additional operational costs, can also be considered. For instance, the ovens should be fully loaded in order to save energy costs and the number of starts of baking processes should be limited as this requires the staff to interrupt other tasks.

Jakob Huber: Conceptualization, Methodology, Software, Validation, Formal analysis, Data curation, Writing - original draft, Writing - review & editing, Visualization. Heiner Stuckenschmidt: Conceptualization, Methodology, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition.

While we focus on the case of a typical bakery that is daily delivered with goods that need to be baked during the same day, we want to highlight that our solution approach is also applicable for scenarios that are more flexible and enable real-time intraday adjustments. For instance, the bake-off sections in supermarkets also rely on intraday baking, but the unprocessed goods can be in storage at the store for several days. Hence, less emphasis can be put on the daily order quantity. Consequently, the forecasts as well as the resulting schedules can be updated during the day. For instance, it is possible to set a higher service level for the first half of the day in order to serve the full customer demand. At midday, the current shelf load can be aligned with the expected demand for the remaining opening hours. It is also possible to update the demand estimate for the remaining hours during which the store is open. For instance, the hourly demand profiles can be used to interpolate sales in order to obtain an updated estimation of the demand for the second half of the day (Lau and Lau, 1996). This should improve the operational performance as the absolute uncertainty associated with the second half of the day is less than for the whole day. Another possibility to mitigate the effect of demand uncertainty is to distribute goods among the stores during the day (Turan et al., 2017).

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment This research was supported by OPAL — Operational Analytics GmbH, Germany (https://www.opal-analytics.com). Appendix A. Supplementary material Supplementary material related to this article can be found online at https://doi.org/10.1016/j.ijpe.2020.107828. We also provide a more detailed example for the baking plan generation as part of the supplemental materials. References Ahmed, N.K., Atiya, A.F., Gayar, N.E., El-Shishiny, H., 2010. An empirical comparison of machine learning models for time series forecasting. Econometric Rev. 29 (5–6), 594–621. Ali, J., Kapoor, S., Moorthy, J., 2010. Buying behaviour of consumers for food products in an emerging economy. Br. Food J. 112 (2), 109–124. Almeder, C., Mönch, L., 2011. Metaheuristics for scheduling jobs with incompatible families on parallel batching machines. J. Oper. Res. Soc. 62 (12), 2083–2096. Arunraj, N.S., Ahrens, D., 2015. A hybrid seasonal autoregressive integrated moving average and quantile regression for daily food sales forecasting. Int. J. Prod. Econ. 170, 321–335. Athanasopoulos, G., Hyndman, R.J., Kourentzes, N., Petropoulos, F., 2017. Forecasting with temporal hierarchies. European J. Oper. Res. 262 (1), 60–74. Bakker, M., Riezebos, J., Teunter, R.H., 2012. Review of inventory systems with deterioration since 2001. European J. Oper. Res. 221 (2), 275–284. Balasubramanian, H., Mönch, L., Fowler, J., Pfund, M., 2004. Genetic algorithm based scheduling of parallel batch machines with incompatible job families to minimize total weighted tardiness. Int. J. Prod. Res. 42 (8), 1621–1638. Ban, G.-Y., Rudin, C., 2018. The big data newsvendor: Practical insights from machine learning. Oper. Res. 67 (1), 90–108. Barrow, D., Crone, S., Kourentzes, N., 2010. An evaluation of neural network ensembles and model selection for time series prediction. In: The 2010 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. Barrow, D., Kourentzes, N., 2018. The impact of special days in call arrivals forecasting: A neural network approach to modelling special days. European J. Oper. Res. 264 (3), 967–977.

Baked goods are only a specific type of perishable goods that can benefit from an intraday decision support system. Hence, it would be interesting to study similar solution approaches, which are based on intraday demand estimations, for other product types. For instance, the prepacking of meat in supermarkets also depends on intraday decisions. Moreover, we suspect that operations in other application domains (e.g. restaurants) can be improved with similar solution approaches. Beside the application scenario, the work can be extended in multiple directions: We treat the demand estimation and scheduling aspect separately but it should be possible to integrate them, e.g., using reinforcement learning. Moreover, instead of setting the order quantities for each product independently, it is also possible to consider demand substitution (van Woensel et al., 2007) in the inventory model (Sachs, 2015; Schlapp and Fleischmann, 2018). Moreover, the distribution of the daily order quantity to the different hours is essentially a multi-period problem that could be addressed with a multi-period newsvendor model. With respect to temporal hierarchical forecasting, it would be interesting to investigate the effect of optimal hierarchical reconciliation (Athanasopoulos et al., 2017; Wickramasuriya et al., 2019). 12

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt Bergstra, J., Bengio, Y., 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305. Beutel, A.L., Minner, S., 2012. Safety stock planning under causal demand forecasting. Int. J. Prod. Econ. 140 (2), 637–645. Bilge, m., Kıraç, F., Kurtulan, M., Pekgün, P., 2004. A tabu search algorithm for parallel machine total tardiness problem. Comput. Oper. Res. 31 (3), 397–414. Broekmeulen, R.A.C.M., van Donselaar, K.H., 2019. Quantifying the potential to improve on food waste, freshness and sales for perishables in supermarkets. Int. J. Prod. Econ. 209, 265–273. Carbonneau, R., Laframboise, K., Vahidov, R., 2008. Application of machine learning techniques for supply chain demand forecasting. European J. Oper. Res. 184 (3), 1140–1154. Chu, C.-W., Zhang, G.P., 2003. A comparative study of linear and nonlinear models for aggregate retail sales forecasting. Int. J. Prod. Econ. 86 (3), 217–231. Curşeu, A., van Woensel, T., Fransoo, J., van Donselaar, K., Broekmeulen, R., 2009. Modelling handling operations in grocery retail stores: an empirical analysis. J. Oper. Res. Soc. 60 (2), 200–214. Dangerfield, B.J., Morris, J.S., 1992. Top-down or bottom-up: Aggregate versus disaggregate extrapolations. Int. J. Forecast. 8 (2), 233–241. van Donselaar, K.H., Gaur, V., van Woensel, T., Broekmeulen, R.A., Fransoo, J.C., 2010. Ordering behavior in retail stores and implications for automated replenishment. Manage. Sci. 56 (5), 766–784. van Donselaar, K.H., Peters, J., de Jong, A., Broekmeulen, R.A.C.M., 2016. Analysis and forecasting of demand during promotions for perishable items. Int. J. Prod. Econ. 172, 65–75. van Donselaar, K., van Woensel, T., Broekmeulen, R., Fransoo, J., 2006. Inventory control of perishables in supermarkets. Int. J. Prod. Econ. 104 (2), 462–472. Ehrenthal, J.C., Stölzle, W., 2013. An examination of the causes for retail stockouts. Int. J. Phys. Distrib. Logist. Manage. 43 (1), 54–69. Fildes, R., Ma, S., Kolassa, S., 2019. Retail forecasting: research and practice. Int. J. Forecast. http://dx.doi.org/10.1016/j.ijforecast.2019.06.004. Gross, C.W., Sohl, J.E., 1990. Disaggregation methods to expedite product line forecasting. J. Forecast. 9 (3), 233–254. Gür Ali, Ö., Sayın, S., van Woensel, T., Fransoo, J., 2009. Sku demand forecasting in the presence of promotions. Expert Syst. Appl. 36 (10), 12340–12348. Gurobi Optimization, L., 2019. Gurobi optimizer reference manual. URL: http://www. gurobi.com. Gvili, Y., Tal, A., Amar, M., Wansink, B., 2017. Moving up in taste: Enhanced projected taste and freshness of moving food products. Psychol. Mark. 34 (7), 671–683. Hecker, F.T., Hussein, W.B., Paquet-Durand, O., Hussein, M.A., Becker, T., 2013. A case study on using evolutionary algorithms to optimize bakery production planning. Expert Syst. Appl. 40 (17), 6837–6847. Hecker, F.T., Stanke, M., Becker, T., Hitzmann, B., 2014. Application of a modified GA, ACO and a random search procedure to solve the production scheduling of a case study bakery. Expert Syst. Appl. 41 (13), 5882–5891. Heenan, S.P., Hamid, N., Dufour, J.-P., Harvey, W., Delahunty, C.M., 2009. Consumer freshness perceptions of breads, biscuits and cakes. Food Qual. Preference 20 (5), 380–390. Hendel, Y., Sourd, F., 2007. An improved earliness–tardiness timing algorithm. Comput. Oper. Res. 34 (10), 2931–2938. Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), 1735–1780. Hofer, C., Waller, M.A., Moussaoui, I., Williams, B.D., Aloysius, J.A., 2016. Drivers of retail on-shelf availability: Systematic review, critical assessment, and reflections on the road ahead. Int. J. Phys. Distrib. Logist. Manage. 46 (5), 516–535. Hofmann, E., Rutschmann, E., 2018. Big data analytics and demand forecasting in supply chains: a conceptual analysis. Int. J. Logist. Manage. 29 (2), 739–766. Hong, T., Fan, S., 2016. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 32 (3), 914–938. Huber, J., Gossmann, A., Stuckenschmidt, H., 2017. Cluster-based hierarchical demand forecasting for perishable goods. Expert Syst. Appl. 76, 140–151. Huber, J., Müller, S., Fleischmann, M., Stuckenschmidt, H., 2019. A data-driven newsvendor problem: From data to decision. European J. Oper. Res. 278 (3), 904–915. Huber, J., Stuckenschmidt, H., 2020. Daily retail demand forecasting using machine learning with emphasis on calendric special days. Int. J. Forecast. http://dx.doi. org/10.1016/j.ijforecast.2020.02.005. Hübner, A.H., Kuhn, H., 2012. Retail category management: State-of-the-art review of quantitative research and software applications in assortment and shelf space management. Omega 40 (2), 199–209. Hyndman, R.J., Athanasopoulos, G., 2014. Forecasting: Principles and Practice. OTexts. Hyndman, R.J., Khandakar, Y., 2008. Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 26 (3), 1–22. Hyndman, R.J., Koehler, A.B., 2006. Another look at measures of forecast accuracy. Int. J. Forecast. 22 (4), 679–688. Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing: the State Space Approach. Springer Science & Business Media. Ibrahim, R., Ye, H., L’Ecuyer, P., Shen, H., 2016. Modeling and forecasting call center arrivals: A literature survey and a case study. Int. J. Forecast. 32 (3), 865–874.

İşler, M.C., Toklu, B., Çelik, V., 2012. Scheduling in a two-machine flow-shop for earliness/tardiness under learning effect. Int. J. Adv. Manuf. Technol. 61 (9), 1129–1137. Janssen, L., Claus, T., Sauer, J., 2016. Literature review of deteriorating inventory models by key topics from 2012 to 2015. Int. J. Prod. Econ. 182, 86–112. Ke, J., Zheng, H., Yang, H., Chen, X.M., 2017. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transp. Res. C 85, 591–608. Kedad-Sidhoum, S., Sourd, F., 2010. Fast neighborhood search for the single machine earliness–tardiness scheduling problem. Operations Research and Data Mining in Biological Systems, Comput. Oper. Res. Operations Research and Data Mining in Biological Systems, 37 (8).1464–1471, Kingma, D.P., Ba, J.L., 2015. Adam: A method for stochastic optimization. In: International Conference on Learning Representations. Kleywegt, A.J., Shapiro, A., Homem-de Mello, T., 2002. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12 (2), 479–502. Kourentzes, N., Barrow, D.K., Crone, S.F., 2014. Neural network ensemble operators for time series forecasting. Expert Syst. Appl. 41 (9), 4235–4244. Kourentzes, N., Petropoulos, F., 2016. Forecasting with multivariate temporal aggregation: The case of promotional modelling. Int. J. Prod. Econ. 181, 145–153. Kourentzes, N., Rostami-Tabar, B., Barrow, D.K., 2017. Demand forecasting by temporal aggregation: Using optimal or multiple aggregation levels?. J. Bus. Res. 78, 1–9. Lau, H.-S., Lau, A.H.L., 1996. Estimating the demand distributions of single-period items having frequent stockouts. European J. Oper. Res. 92 (2), 254–265. Li, N., Wang, K., Cheng, J., 2015. A research on a following day load simulation method based on weather forecast parameters. Energy Convers. Manage. 103, 691–704. Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F., 2015. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 16, 865–873. Ma, X., Tao, Z., Wang, Y., Yu, H., Wang, Y., 2015. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. C 54, 187–197. Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2018a. The m4 competition: Results, findings, conclusion and way forward. Int. J. Forecast. 34 (4), 802–808. Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2018b. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLOS ONE 13 (3), 1–26. Marino, D.L., Amarasinghe, K., Manic, M., 2016. Building energy load forecasting using Deep Neural Networks. In: IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society. pp. 7046–7051. Mönch, L., Balasubramanian, H., Fowler, J.W., Pfund, M.E., 2005. Heuristic scheduling of jobs on parallel batch machines with incompatible job families and unequal ready times. Comput. Oper. Res. 32 (11), 2731–2750. Mönch, L., Zimmermann, J., Otto, P., 2006. Machine learning techniques for scheduling jobs with incompatible families and unequal ready times on parallel batch machines. Eng. Appl. Artif. Intell. 19 (3), 235–245. Mou, S., Robb, D.J., DeHoratius, N., 2018. Retail store operations: Literature review and research directions. European J. Oper. Res. 265 (2), 399–422. Nikolopoulos, K., Syntetos, A.A., Boylan, J.E., Petropoulos, F., Assimakopoulos, V., 2011. An aggregate–disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis. J. Oper. Res. Soc. 62 (3), 544–554. Parfitt, J., Barthel, M., Macnaughton, S., 2010. Food waste within food supply chains: quantification and potential for change to 2050. Phil. Trans. R. Soc. B 365 (1554), 3065–3081. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A., 2017. Automatic differentiation in pytorch. In: NeurIPS Autodiff Workshop. Petropoulos, F., Kourentzes, N., Nikolopoulos, K., 2016. Another look at estimators for intermittent demand. Int. J. Prod. Econ. 181, 154–161. Pinedo, M.L., 2016. Scheduling: Theory, Algorithms, and Systems, fifth ed. Springer International Publishing. Qin, Y., Wang, R., Vakharia, A.J., Chen, Y., Seref, M.M.H., 2011. The newsvendor problem: Review and directions for future research. European J. Oper. Res. 213 (2), 361–374. Qing, X., Niu, Y., 2018. Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 148, 461–468. Quevedo, J., Saludes, J., Puig, V., Blanch, J., 2014. Short-term demand forecasting for real-time operational control of the Barcelona water transport network. In: 22nd Mediterranean Conference on Control and Automation. pp. 990–995. Radhakrishnan, S., Ventura, J.A., 2000. Simulated annealing for parallel machine scheduling with earliness-tardiness penalties and sequence-dependent set-up times. Int. J. Prod. Res. 38 (10), 2233–2252. Ramanathan, U., Muyldermans, L., 2010. Identifying demand factors for promotional planning and forecasting: A case of a soft drink company in the UK. Supply Chain Forecasting Systems, Int. J. Prod. Econ. Supply Chain Forecasting Systems, vol. 128.538–545, Reiner, G., Teller, C., Kotzab, H., 2013. Analyzing the efficient execution of in-store logistics processes in grocery retailing –The case of dairy products. Prod. Oper. Manage. 22 (4), 924–939. 13

International Journal of Production Economics 231 (2021) 107828

J. Huber and H. Stuckenschmidt

Turan, B., Minner, S., Hartl, R.F., 2017. A VNS approach to multi-location inventory redistribution with vehicle routing. Comput. Oper. Res. 78, 526–536. Turgut, Ö., Taube, F., Minner, S., 2018. Data-driven retail inventory management with backroom effect. OR Spectrum 40 (4), 945–968. Wan, G., Yen, B.P.C., 2002. Tabu search for single machine scheduling with distinct due windows and weighted earliness/tardiness penalties. European J. Oper. Res. 142 (2), 271–281. Wickramasuriya, S.L., Athanasopoulos, G., Hyndman, R.J., 2019. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Amer. Statist. Assoc. 114 (526), 804–819. Widiarta, H., Viswanathan, S., Piplani, R., 2009. Forecasting aggregate demand: An analytical evaluation of top-down versus bottom-up forecasting in a production planning framework. Int. J. Prod. Econ. 118 (1), 87–94. Williams, B.D., Waller, M.A., 2011. Top-down versus bottom-up demand forecasts: The value of shared point-of-sale data in the retail supply chain. J. Bus. Logist. 32 (1), 17–26. van Woensel, T., van Donselaar, K., Broekmeulen, R., Fransoo, J., 2007. Consumer responses to shelf out-of-stocks of perishable products. Int. J. Phys. Distrib. Logist. Manage. 37 (9), 704–718. Wu, L., Wang, S., 2018. Exact and heuristic methods to solve the parallel machine scheduling problem with multi-processor tasks. Int. J. Prod. Econ. 201, 26–40. van Zelst, S., van Donselaar, K., van Woensel, T., Broekmeulen, R., Fransoo, J., 2009. Logistics drivers for shelf stacking in grocery retail stores: Potential for efficiency improvement. Int. J. Prod. Econ. 121 (2), 620–632. Zhu, Z., Heady, R.B., 2000. Minimizing the sum of earliness/tardiness in multi-machine scheduling: a mixed integer programming approach. Comput. Ind. Eng. 38 (2), 297–305. Zotteri, G., Kalchschmidt, M., Caniato, F., 2005. The impact of aggregation level on forecasting performance. Int. J. Prod. Econ. 93–94, 479–491.

Sachs, A.-L., 2015. Data-driven order policies with censored demand and substitution in retailing. In: Retail Analytics. In: Lecture Notes in Economics and Mathematical Systems, vol. 680, Springer International Publishing, pp. 57–78. Sachs, A.L., Minner, S., 2014. The data-driven newsvendor with censored demand observations. Int. J. Prod. Econ. 149, 28–36. Schaller, J., Valente, J.M.S., 2013. An evaluation of heuristics for scheduling a nondelay permutation flow shop with family setups to minimize total earliness and tardiness. J. Oper. Res. Soc. 64 (6), 805–816. Schaller, J., Valente, J., 2019a. Branch-and-bound algorithms for minimizing total earliness and tardiness in a two-machine permutation flow shop with unforced idle allowed. Comput. Oper. Res. 109, 1–11. Schaller, J., Valente, J.M.S., 2019b. Heuristics for scheduling jobs in a permutation flow shop to minimize total earliness and tardiness with unforced idle time allowed. Expert Syst. Appl. 119, 376–386. Schlapp, J., Fleischmann, M., 2018. Technical note — Multiproduct inventory management under customer substitution and capacity restrictions. Oper. Res. 66 (3), 740–747. Shapiro, A., 2003. Monte carlo sampling methods. In: Ruszczynski, A., Shapiro, A. (Eds.), Handbooks in Operations Research and Management Science, Vol. 10. Elsevier Science B.V., Boston, USA, pp. 353–425. Sivrikaya-Şerifoǧlu, F., Ulusoy, G., 1999. Parallel machine scheduling with earliness and tardiness penalties. Comput. Oper. Res. 26 (8), 773–787. Sridharan, V., Berry, W.L., Udayabhanu, V., 1987. Freezing the master production schedule under rolling planning horizons. Manage. Sci. 33 (9), 1137–1149. Taube, F., Minner, S., 2018. Data-driven assignment of delivery patterns with handling effort considerations in retail. Comput. Oper. Res. 100, 379–393. Teller, C., Holweg, C., Reiner, G., Kotzab, H., 2018. Retail store operations and food waste. J. Cleaner Prod. 185, 981–997. Tian, Y., Zhang, K., Li, J., Lin, X., Yang, B., 2018. LSTM-based traffic flow prediction with missing data. Neurocomputing 318, 297–305.