MARKET BASKET ANALYSIS USING ASSOCIATION RULES
ANJA ĐURIĆ
FACULTY OF ECONOMICS AND BUSINESS Zagreb,2020.
UNIVERSITY OF ZAGREB - CROATIA FACULTY OF ECONOMICS AND BUSINESS MASTER STUDY – MANAGERIAL INFORMATICS
Subject: Knowledge Discovery in Data Bases Title: Market Basket Analysis Using Association Rules
Academic year: 2019./2020. Professor: Mirjana Pejić Bach, PhD. Submitted by: Anja Đurić, Danijel Đurić
Zagreb, May 2020.
Table of Contents 1.
Goal of the project..........................................................................................................................3
2.
Similar research..............................................................................................................................4
3.
Description of the methods used...................................................................................................6
4.
Description of the data set.............................................................................................................7
5.
Results............................................................................................................................................9 5.1.
Model 1..................................................................................................................................9
5.2.
Model 2................................................................................................................................15
5.3.
Model 3................................................................................................................................20
5.4.
Model 4................................................................................................................................23
6.
Implementation of results............................................................................................................29
7.
Conclusion....................................................................................................................................32
Works Cited..........................................................................................................................................33
1. Goal of the project Purpose of this project is to showcase how knowledge discovery aids the decision making process when it comes to long term and short term planning. This report covers the topic of market basket analysis using association rules and Weka software. Typically a market basket is described as a set of consumer goods and services usually purchased by consumers. It is used for determining various changes in costs in order to calculate different indexes and values, most commonly inflation. This particular report deals with shopping patterns which can be discovered with data mining. In order to get insigt in those patterns, this report is based on a ''Market Basket'' data set which contains information about transactions of an unknown grocery store which will be reffered to as ''Shop'' throughout the paper. Previously mentioned data set will be processed in Weka software, using association rules method and FPGrowth algorithm. This method will give insight in which products remind the customer of another product. It is useful because it provides valuable information which can help in the decision making process when it comes to long term planning, meaning the general layout of the store, and short term planning, meaning the occasional promotions and discounts. In order to extract such valuabe information we created four dependant models which will be presented and described in detail in following chapters. Models will showcase ''Reminder products'', ones which remmind the customer of another prduct, ''Impulse products'', ones which are added to the cart because of the previous choices, and ''Confidence'' as a measure of certainty that the pattern will be executed.
2. Similar research
Market basket method is a very common approach when it comes to observing shopping patterns and customer behaviour, and for that reason there are numerous research studies conducted regarding that particular topic. Research by Ayse Nur Sagına and Berk Ayvazb was conducted in 2018, dealing with a hardware store data set. This research was based on five-an-a-half year old data of a hardware company operating in the retail sector. Goal of the study was to identfy related product categories with association rules method. To conduct the analysis, authors used both Apriori and FPGrowth algorithms. Reasearch provides an in depth analysis of transactions and concludes that the strongest connection between Chisels and Cutters, Bits Tips, Drill Bits, Punch Group, Meters, Screw Adapters and Files[ CITATION Sag18 \l 1050 ]. Another related market basket research was conducted by Isti Surjandari and Annury Citra Seruni in 2005. The paper deals with the design of product placement layout in retail shop using market basket analysis. Authors used Weka software and Apriori algorithm in order to identify connections between products and provided a product placement layout for the particual store. Reasearch was concluded with a sketch of the proposed layout and emphasized that it benefits both customers regarding the convenience of the new product placement, and the store owners because of the higher probability that the customers will add more prodcts to their basket[ CITATION Sur05 \l 1050 ]. S. Hariharan, M. Kannan and P. Raguraman examined seasonal trends in a retail store, providing and in depth analysis of customer's purchase behaviour. End goal was to provide retailes with better seasonal strategies that will drive the market. To conduct their research, author separated their data set in to three seasons: January-April, May-August, and September-December. Used data set covered 12000 transactions and 215 product categories. Study provided four main points that may not be overlooked in the future[CITATION SHa13 \l 1050 ]:
''The festival seasons play a vital role in boosting aproduct sale as can be seen in this study for the seasonMay-August, whereby there was a surge in the sale ofMasala items.'' ''The climatic seasons are a big factor not to beoverlooked as depicted in the season January-April,when the sales of juice items soared high in dry weatherconditions.'' ''The seasons defined by socio-economic problems,mandate the profuse purchase of certain products. It wasevident from the study that the onset of Dengue feverinfluenced the increased sale of Mosquito repellents for the season SeptemberDecember.'' ''Political factors also have an implicit bearing on thepulse of retail market as the power shortage crisis led to adramatic increase in the purchase of candles, which inother cases would not have topped the customer’s mustbuy list.''
3. Description of the methods used To conduct our market basket analysis we used Weka software and association rules method, more precisely, FPGrowth algorithm. Market basket analysis is a comon way of uncovering associations between products widely used by large retailers. By observing transactions, it is possible to discover which combinations of products frequently occur together and allows retailers to identify relationships between the products that customers buy[CITATION LiS17 \l 1050 ]. According to Shah P., ''Named after a flightless New Zealand bird, Weka is a set of machine learning algorithms that can be applied to a data set directly, or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualisation.''[ CITATION Sha17 \l 1050 ] ''Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.''[ CITATION LiS17 \l 1050 ] FPGrowth algorithm is used for extracting frequent patterns reagrding the items listed in the data set. It is an alternative to the established Apriori algorithm. Generally, the algorithm is desinged to be used on data sets which include transactions, such as store transactions made ba customers. An itemset is considered as "frequent" if it meets a user-specified support threshold. What makes FPGrowth different from Apriori algorithm is the fact that FPGrowth does not require candidate generation[ CITATION Fre \l 1050 ].
4. Description of the data set
To conduct this research we used a ''Market basket'' data set which involves 255 attributes and 1362 instances in total. Attributes mentioned in the data set vary from food products to cosmetics, showcasing typical grocery store products. Original data set had the attribute type settings set to numeric. In order to process and analyze the data set in Weka using association rules, attribute type needed to be changed to nominal. To do tha we used the following orders: Filter – Choose – Filters – Unsupervised – Attribute – NumericToNominal – Apply Now all attributes are understood by Weka as numeric. In fact, they are all binary, having values either 0 (not purchased) or 1 (purchased). For successful completion of the data set preparation we also needed to chose ''No class'' option in the bottom right corner on the drop-down menu. Final interface before data set processing is showcased in Figure 1.
Figure 1 Preprocess Data Set Settings
To interpret a single attribute's role in the data set we will use ''Hair Conditioner'' example from Figure 1. As it is previously mentioned, all of the attribute values are in fact binary, either 0 (not purchased) or 1 (purchased). Table above tell us that the total number of instances (transactions in this case) is 1361. We can also see that ''Hair Conditioner'' appeared in 80 transactions, while in the other 1281 transaction it did not. Same logic applies to the interpretation of all the other attributes.
5. Results This chapter will provide insight in our in depth analysis using association rules and FPGrowth algoritm as mentioned earlier. We created four dependant models, each holding an important role when it comes to the final conclusion regardig short term and long term planning. All of the models are based on top 50 rules given in the output.
5.1. Model 1 Model 1 was crucial for our research since it is the starting point of this study and it provides valuable information on how to conduct further, deeper analysis of the data set. We started the analysis by adjusting the algorithm settings for the given data set. Model 1 algorithm settings:
Figure 2 Algorithm settings - Model 1
Two algorithm settings which were crucial for succesfull processing of the data set were namely: lowerBoundMinSupport and minMetric.

lowerBoundMinSupport -- Lower bound for minimum support as a fraction or number of instances.  minMetric -- Minimum metric score. Consider only rules with scores higher than this value. Mentioned algorithm settings will vary depinding on the model requirements. The output: === Run information === Scheme: Relation: Instances: Attributes:
weka.associations.FPGrowth -P 2 -I -1 -N 50 -T 0 -C 0.6 -D 0.05 -U 1.0 -M 0.03 marketbasket-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last 1361 255 [list of attributes omitted] === Associator model (full training set) === FPGrowth found 50 rules (displaying top 50) 1. [ Sweet Relish=1, Hot Dog Buns=1]: 49 ==> [ Hot Dogs=1]: 41 <conf:(0.84)> lift:(9.04) lev:(0.03) conv:(4.94) 2. [ Potato Chips=1, Toothpaste=1]: 51 ==> [ White Bread=1]: 41 <conf:(0.8)> lift:(6.75) lev:(0.03) conv:(4.08) 3. [ Eggs=1, Wheat Bread=1]: 55 ==> [ 2pct. Milk=1]: 42 <conf:(0.76)> lift:(6.98) lev: (0.03) conv:(3.5) 4. [ 2pct. Milk=1, Toothpaste=1]: 59 ==> [ White Bread=1]: 45 <conf:(0.76)> lift:(6.41) lev:(0.03) conv:(3.47) 5. [ 2pct. Milk=1, Potato Chips=1]: 61 ==> [ Eggs=1]: 46 <conf:(0.75)> lift:(6.15) lev: (0.03) conv:(3.34) 6. [ Eggs=1, Cola=1]: 55 ==> [ White Bread=1]: 41 <conf:(0.75)> lift:(6.26) lev:(0.03) conv:(3.23) 7. [ White Bread=1, Cola=1]: 55 ==> [ Eggs=1]: 41 <conf:(0.75)> lift:(6.08) lev:(0.03) conv:(3.22) 8. [ Eggs=1, Wheat Bread=1]: 55 ==> [ White Bread=1]: 41 <conf:(0.75)> lift:(6.26) lev: (0.03) conv:(3.23) 9. [ White Bread=1, Cola=1]: 55 ==> [ 2pct. Milk=1]: 41 <conf:(0.75)> lift:(6.81) lev: (0.03) conv:(3.27) 10. [ 2pct. Milk=1, Cola=1]: 55 ==> [ White Bread=1]: 41 <conf:(0.75)> lift:(6.26) lev: (0.03) conv:(3.23) 11. [ 2pct. Milk=1, Wheat Bread=1]: 58 ==> [ White Bread=1]: 43 <conf:(0.74)> lift:(6.23) lev:(0.03) conv:(3.19) 12. [ 2pct. Milk=1, Potato Chips=1]: 61 ==> [ White Bread=1]: 45 <conf:(0.74)> lift:(6.2) lev:(0.03) conv:(3.16) 13. [ White Bread=1, Wheat Bread=1]: 59 ==> [ 2pct. Milk=1]: 43 <conf:(0.73)> lift:(6.66) lev:(0.03) conv:(3.09) 14. [ 2pct. Milk=1, Wheat Bread=1]: 58 ==> [ Eggs=1]: 42 <conf:(0.72)> lift:(5.9) lev: (0.03) conv:(2.99) 15. [ Eggs=1, Toothpaste=1]: 61 ==> [ White Bread=1]: 44 <conf:(0.72)> lift:(6.06) lev: (0.03) conv:(2.99) 16. [ Hot Dogs=1, Hot Dog Buns=1]: 57 ==> [ Sweet Relish=1]: 41 <conf:(0.72)> lift:(8.44) lev:(0.03) conv:(3.07) 17. [ Hot Dog Buns=1]: 80 ==> [ Hot Dogs=1]: 57 <conf:(0.71)> lift:(7.7) lev:(0.04) conv: (3.02) 18. [ 2pct. Milk=1, Toothpaste=1]: 59 ==> [ Eggs=1]: 42 <conf:(0.71)> lift:(5.8) lev:(0.03) conv:(2.88) 19. [ Potato Chips=1, 98pct. Fat Free Hamburger=1]: 59 ==> [ White Bread=1]: 42 <conf: (0.71)> lift:(5.98) lev:(0.03) conv:(2.89) 20. [ White Bread=1, 2pct. Milk=1]: 70 ==> [ Eggs=1]: 49 <conf:(0.7)> lift:(5.7) lev:(0.03) conv:(2.79)
21. [ Eggs=1, Potato Chips=1]: 66 ==> [ White Bread=1]: 46 <conf:(0.7)> lift:(5.86) lev: (0.03) conv:(2.77) 22. [ Eggs=1, Potato Chips=1]: 66 ==> [ 2pct. Milk=1]: 46 <conf:(0.7)> lift:(6.37) lev: (0.03) conv:(2.8) 23. [ White Bread=1, Wheat Bread=1]: 59 ==> [ Eggs=1]: 41 <conf:(0.69)> lift:(5.66) lev: (0.02) conv:(2.72) 24. [ White Bread=1, Toothpaste=1]: 65 ==> [ 2pct. Milk=1]: 45 <conf:(0.69)> lift:(6.32) lev:(0.03) conv:(2.76) 25. [ Eggs=1, 2pct. Milk=1]: 71 ==> [ White Bread=1]: 49 <conf:(0.69)> lift:(5.8) lev: (0.03) conv:(2.72) 26. [ Eggs=1, Toothpaste=1]: 61 ==> [ 2pct. Milk=1]: 42 <conf:(0.69)> lift:(6.29) lev: (0.03) conv:(2.72) 27. [ Hamburger Buns=1]: 97 ==> [ 98pct. Fat Free Hamburger=1]: 66 <conf:(0.68)> lift:(7.29) lev:(0.04) conv:(2.75) 28. [ White Bread=1, Toothpaste=1]: 65 ==> [ Eggs=1]: 44 <conf:(0.68)> lift:(5.52) lev: (0.03) conv:(2.59) 29. [ Sugar Cookies=1]: 75 ==> [ Eggs=1]: 50 <conf:(0.67)> lift:(5.43) lev:(0.03) conv: (2.53) 30. [ Eggs=1, 98pct. Fat Free Hamburger=1]: 62 ==> [ White Bread=1]: 41 <conf:(0.66)> lift: (5.56) lev:(0.02) conv:(2.48) 31. [ White Bread=1, Potato Chips=1]: 70 ==> [ Eggs=1]: 46 <conf:(0.66)> lift:(5.36) lev: (0.03) conv:(2.46) 32. [ Eggs=1, White Bread=1]: 75 ==> [ 2pct. Milk=1]: 49 <conf:(0.65)> lift:(5.97) lev: (0.03) conv:(2.47) 33. [ Eggs=1, 2pct. Milk=1]: 71 ==> [ Potato Chips=1]: 46 <conf:(0.65)> lift:(6.63) lev: (0.03) conv:(2.46) 34. [ White Bread=1, 2pct. Milk=1]: 70 ==> [ Potato Chips=1]: 45 <conf:(0.64)> lift:(6.58) lev:(0.03) conv:(2.43) 35. [ White Bread=1, Potato Chips=1]: 70 ==> [ 2pct. Milk=1]: 45 <conf:(0.64)> lift:(5.87) lev:(0.03) conv:(2.4) 36. [ White Bread=1, 2pct. Milk=1]: 70 ==> [ Toothpaste=1]: 45 <conf:(0.64)> lift:(8.1) lev:(0.03) conv:(2.48) 37. [ Hot Dogs=1, Sweet Relish=1]: 64 ==> [ Hot Dog Buns=1]: 41 <conf:(0.64)> lift:(10.9) lev:(0.03) conv:(2.51) 38. [ Sugar Cookies=1]: 75 ==> [ White Bread=1]: 48 <conf:(0.64)> lift:(5.38) lev:(0.03) conv:(2.36) 39. [ White Bread=1, Toothpaste=1]: 65 ==> [ Potato Chips=1]: 41 <conf:(0.63)> lift:(6.45) lev:(0.03) conv:(2.35) 40. [ White Bread=1, 98pct. Fat Free Hamburger=1]: 67 ==> [ Potato Chips=1]: 42 <conf: (0.63)> lift:(6.41) lev:(0.03) conv:(2.33) 41. [ Salsa Dip=1]: 68 ==> [ Eggs=1]: 42 <conf:(0.62)> lift:(5.03) lev:(0.02) conv:(2.21) 42. [ White Bread=1, 2pct. Milk=1]: 70 ==> [ Wheat Bread=1]: 43 <conf:(0.61)> lift:(7.96) lev:(0.03) conv:(2.31) 43. [ Eggs=1, White Bread=1]: 75 ==> [ Potato Chips=1]: 46 <conf:(0.61)> lift:(6.28) lev: (0.03) conv:(2.26) 44. [ Hot Dog Buns=1]: 80 ==> [ Sweet Relish=1]: 49 <conf:(0.61)> lift:(7.19) lev:(0.03) conv:(2.29) 45. [ White Bread=1, 98pct. Fat Free Hamburger=1]: 67 ==> [ Eggs=1]: 41 <conf:(0.61)> lift: (4.99) lev:(0.02) conv:(2.18) 46. [ Tomatoes=1]: 90 ==> [ White Bread=1]: 55 <conf:(0.61)> lift:(5.13) lev:(0.03) conv: (2.2) 47. [ Canned Tuna=1]: 74 ==> [ White Bread=1]: 45 <conf:(0.61)> lift:(5.11) lev:(0.03) conv: (2.17) 48. [ Raisins=1]: 83 ==> [ Eggs=1]: 50 <conf:(0.6)> lift:(4.91) lev:(0.03) conv:(2.14) 49. [ Toothpaste=1]: 108 ==> [ White Bread=1]: 65 <conf:(0.6)> lift:(5.06) lev:(0.04) conv: (2.16) 50. [ White Bread=1, Potato Chips=1]: 70 ==> [ 98pct. Fat Free Hamburger=1]: 42 <conf: (0.6)> lift:(6.43) lev:(0.03) conv:(2.19)
Example of interpretation for rule no. 1: 1. [ Sweet Relish=1, Hot Dog Buns=1]: 49 ==> [ Hot Dogs=1]: 41 lev:(0.03) conv:(4.94)
<conf:(0.84)> lift:(9.04)
- if a transaction contains both ''Sweet relish'' and ''Hot dog buns'' there is a 84% chance that ''Hot dogs'' will be on that same transaction If you closely examine the right side of the output you can notice 10 different products which we are going to call ''Impulse products'' and on the left side we have our ''Reminder products''. A product can, and will hold both roles considering a particular transaction case. Those 10''Impulse products'' are namely: 1) 2) 3) 4) 5)
Hot Dogs White Bread 2pct. Milk Eggs Sweet Relish
6) 7) 8) 9) 10)
98. pct Fat Free Hamburger Wheat Bread Potato Chips Toothpaste Hot Dog Buns
To simplify the output we created tables presenting dependence of our ''Impulse products''.The tables showcases ''Impusle product'', ''Reminder products'' and confidence percentage for the given rule. SIMPLIFIED TABLE:XIMPULSE
REMINDER
HOT DOGS
Sweet Relish, Hot Dog Buns Hot Dog Buns
WHITE BREAD
Potato Chips, Toothpaste 2pct. Milk, Toothpaste Eggs, Cola Eggs, Wheat Bread 2pct. Mik, Cola 2pct. Milk, Wheat Bread 2pct. Milk, Potato chips Eggs, Toothpaste Potato Chips, 98pct. Fat Free Hamburger Eggs, Potato Chips Eggs, 2pct. Milk Eggs, 98pct. Fat Free Hamburger Sugar Cookies Tomatoes Canned Tuna Toothpaste
0,8 0,76 0,75 0,75 0,75 0,74 0,74 0,72 0,71 0,7 0,69 0,66 0,64 0,61 0,61 0,6
2PCT. MILK
Eggs, Wheat Bread White Bread, Cola White Bread, Wheat Bread Eggs, Potato Chips
0,76 0,75 0,73 0,7
CONFIDENC E 0,84 0,71
White Bread, Toothpaste Eggs, Toothpaste Eggs, White Bread White Bread, Potato Chips
0,69 0,69 0,65 0,64
EGGS
2pct. Milk, Potato Chips White Bread, Cola 2pct. Milk, Wheat Bread 2pct. Milk, Toothpaste White Bread, 2pct. Milk White Bread, Wheat Bread White Bread, Toothpaste Sugar Cookies White Bread, Potato Chips Salsa Dip White Bread, 98pct. Fat free Hamburger Raisins
0,75 0,75 0,72 0,71 0,7 0,69 0,68 0,67 0,66 0,62 0,61 0,6
SWEET RELISH
Hot Dogs, Hot Dog Buns Hot Dog Buns
0,72 0,61
98PCT. FAT FREE HAMBURGER
Hamburger Buns
0,68
White Bread, Potato Chips
0,6
POTATO CHIPS
Eggs, 2pct. Milk White Bread, 2pct. Milk White Bread, Toothpaste White Bread, 98pct. Fat Free Hamburger Eggs, White Bread
0,65 0,64 0,63 0,63 0,61
TOOTHPASTE
White Bread, 2pct. Milk
0,64
HOT DOG BUNS
Hot Dogs, Sweet Relish
0,64
WHEAT BREAD
White Bread, 2pct. Milk
0,61
Table 1 Model 1 - Output Table
These tables are useful for decision making process regarding the layout of the store. With the given information we can decide which products should be placed near each other. Store layout plays an important role when it comes to increasing profit. It should be a part of a long term plan for increasing revenue. Knowledge discovery from past transactions reveals shopping patterns and which products are likely to remind the shopper about another
product. If the product is in close proximity, chances of an extra product ending up in our shoppers cart is even higher and there is no missed opportunity for profit. Another way of increasing profit is a short term plan regarding marketing related activities such as promotions and discounts. The end goal for the store is to sell more of those products which have a higher margin. If we take a look at our tables we can assume that products such as white bread, eggs and milk have a lower margin than hot dogs and potato chips for example. If the store combines promotions, having the store layout in mind, it has an opportunity to increase their profits by targeting products with a higher margin. We propose a chained reaction technique which is explained below: 23. [ White Bread=1, (0.02) conv:(2.72)
Wheat Bread=1]: 59 ==>[ Eggs=1]: 41
3. [ Eggs=1, Wheat Bread=1]: 55 ==> [ 2pct. Milk=1]: 42 (0.03) conv:(3.5) 33. [ Eggs=1, 2pct. Milk=1]: 71 ==> [ Potato Chips=1]: 46 (0.03) conv:(2.46)
<conf:(0.69)> lift:(5.66) lev:
<conf:(0.76)> lift:(6.98) lev:
<conf:(0.65)> lift:(6.63) lev:
50. [ White Bread=1, Potato Chips=1]: 70 ==> [ 98pct. Fat Free Hamburger=1]: 42 (0.6)> lift:(6.43) lev:(0.03) conv:(2.19)
<conf:
Explanation: If a shopper decides to buy white bread and wheat bread there is a 69% chance that they will also buy eggs, eggs in combination with wheat bread migh remind a shopper to add milk in their cart, now we can combine eggs and milk and there will be a 65% chance that the shopper will be reminded about potato chips which leads us to the final rule which states that if a transaction consists of white bread and potato chips there is a 60% chance that fat free hamburgers will be on that same transaction The idea is to strategically put discounts on products which already have a lower margin and are likely to remind the buyer about a product with a higher margin as explained in the example above.1
5.2. Model 2 During the examination of our Model 1 we decided that we need to focus less on products that are essential in everyday life, such as bread products and milk. These products will almost always find a way in our shoppers basket even if our shopper came to the store only to buy some detergent. Milk and bread products are essential food products in most households so when we go to the store we always want to make sure we buy some more even if we still have some left at home. This makes it hard to establish any real connection between milk/bread and other products that donot belong in the same category. 1 Since we do not have insight in which products have a higher margin for the particular store, our ''chain reaction'' proposals are based on our own asumptions about margin levels.
After removing these products from our analysis we got a more precise image of true ''impulse'' products and their connection to other non essential products. Removed attributes: ď&#x201A;ˇ 125. 2pct. Milk, 133. Eggs, 142. White Bread, 233. Wheat bread, 247. Plain White Bread Model 2 algorithm settings:
Figure 3 Algorithm settings - Model 2
The output: === Run information === Scheme: weka.associations.FPGrowth -P 2 -I -1 -N 100 -T 0 -C 0.3 -D 0.05 -U 1.0 -M 0.03 Relation: marketbasket-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-lastweka.filters.unsupervised.attribute.Remove-R125,133,142,233,247weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last Instances: 1361 Attributes: 250 [list of attributes omitted] === Associator model (full training set) === FPGrowth found 68 rules (displaying top 68) 1. [ Sweet Relish=1, Hot Dog Buns=1]: 49 ==> [ Hot Dogs=1]: 41 <conf:(0.84)> lift:(9.04) lev:(0.03) conv:(4.94) 2. [ Hot Dogs=1, Hot Dog Buns=1]: 57 ==> [ Sweet Relish=1]: 41 <conf:(0.72)> lift:(8.44) lev:(0.03) conv:(3.07) 3. [ Hot Dog Buns=1]: 80 ==> [ Hot Dogs=1]: 57 <conf:(0.71)> lift:(7.7) lev:(0.04) conv: (3.02) 4. [ Hamburger Buns=1]: 97 ==> [ 98pct. Fat Free Hamburger=1]: 66 <conf:(0.68)> lift:(7.29) lev:(0.04) conv:(2.75)
5. [ Hot Dogs=1, Sweet Relish=1]: 64 ==> [ Hot Dog Buns=1]: 41 <conf:(0.64)> lift:(10.9) lev:(0.03) conv:(2.51) 6. [ Hot Dog Buns=1]: 80 ==> [ Sweet Relish=1]: 49 <conf:(0.61)> lift:(7.19) lev:(0.03) conv:(2.29) 7. [ Sweet Relish=1]: 116 ==> [ Hot Dogs=1]: 64 <conf:(0.55)> lift:(5.96) lev:(0.04) conv: (1.99) 8. [ Domestic Beer=1]: 92 ==> [ Pepperoni Pizza - Frozen=1]: 50 <conf:(0.54)> lift:(7.87) lev:(0.03) conv:(1.99) 9. [ Pepperoni Pizza - Frozen=1]: 94 ==> [ Domestic Beer=1]: 50 <conf:(0.53)> lift:(7.87) lev:(0.03) conv:(1.95) 10. [ 98pct. Fat Free Hamburger=1]: 127 ==> [ Hamburger Buns=1]: 66 <conf:(0.52)> lift: (7.29) lev:(0.04) conv:(1.9) 11. [ Hot Dog Buns=1]: 80 ==> [ Hot Dogs=1, Sweet Relish=1]: 41 <conf:(0.51)> lift:(10.9) lev:(0.03) conv:(1.91) 12. [ Hot Dogs=1]: 126 ==> [ Sweet Relish=1]: 64 <conf:(0.51)> lift:(5.96) lev:(0.04) conv: (1.83) 13. [ Popcorn Salt=1]: 87 ==> [ Potato Chips=1]: 44 <conf:(0.51)> lift:(5.18) lev:(0.03) conv:(1.78) 14. [ Tomatoes=1]: 90 ==> [ Potato Chips=1]: 44 <conf:(0.49)> lift:(5) lev:(0.03) conv: (1.73) 15. [ Toothpaste=1]: 108 ==> [ Potato Chips=1]: 51 <conf:(0.47)> lift:(4.83) lev:(0.03) conv:(1.68) 16. [ Toothpaste=1]: 108 ==> [ Sweet Relish=1]: 51 <conf:(0.47)> lift:(5.54) lev:(0.03) conv:(1.7) 17. [ Tomatoes=1]: 90 ==> [ Toothpaste=1]: 42 <conf:(0.47)> lift:(5.88) lev:(0.03) conv: (1.69) 18. [ Bologna=1]: 88 ==> [ Toothpaste=1]: 41 <conf:(0.47)> lift:(5.87) lev:(0.02) conv: (1.69) 19. [ 98pct. Fat Free Hamburger=1]: 127 ==> [ Potato Chips=1]: 59 <conf:(0.46)> lift:(4.75) lev:(0.03) conv:(1.66) 20. [ Tomatoes=1]: 90 ==> [ Sweet Relish=1]: 41 <conf:(0.46)> lift:(5.34) lev:(0.02) conv: (1.65) 21. [ Cola=1]: 106 ==> [ Potato Chips=1]: 48 <conf:(0.45)> lift:(4.63) lev:(0.03) conv: (1.62) 22. [ Hot Dogs=1]: 126 ==> [ Hot Dog Buns=1]: 57 <conf:(0.45)> lift:(7.7) lev:(0.04) conv: (1.69) 23. [ Onions=1]: 109 ==> [ Potato Chips=1]: 49 <conf:(0.45)> lift:(4.6) lev:(0.03) conv: (1.61) 24. [ Domestic Beer=1]: 92 ==> [ Potato Chips=1]: 41 <conf:(0.45)> lift:(4.56) lev:(0.02) conv:(1.6) 25. [ Domestic Beer=1]: 92 ==> [ Onions=1]: 41 <conf:(0.45)> lift:(5.56) lev:(0.02) conv: (1.63) 26. [ Toothpaste=1]: 108 ==> [ Onions=1]: 48 <conf:(0.44)> lift:(5.55) lev:(0.03) conv: (1.63) 27. [ Potato Chips=1]: 133 ==> [ 98pct. Fat Free Hamburger=1]: 59 <conf:(0.44)> lift:(4.75) lev:(0.03) conv:(1.61) 28. [ Cola=1]: 106 ==> [ Toothpaste=1]: 47 <conf:(0.44)> lift:(5.59) lev:(0.03) conv:(1.63) 29. [ Onions=1]: 109 ==> [ Toothpaste=1]: 48 <conf:(0.44)> lift:(5.55) lev:(0.03) conv: (1.62) 30. [ Sweet Relish=1]: 116 ==> [ Toothpaste=1]: 51 <conf:(0.44)> lift:(5.54) lev:(0.03) conv:(1.62) 31. [ Toothpaste=1]: 108 ==> [ Cola=1]: 47 <conf:(0.44)> lift:(5.59) lev:(0.03) conv:(1.61) 32. [ Toilet Paper=1]: 101 ==> [ Hot Dogs=1]: 43 <conf:(0.43)> lift:(4.6) lev:(0.02) conv: (1.55) 33. [ Sweet Relish=1]: 116 ==> [ Hot Dog Buns=1]: 49 <conf:(0.42)> lift:(7.19) lev:(0.03) conv:(1.61) 34. [ Toilet Paper=1]: 101 ==> [ Potato Chips=1]: 42 <conf:(0.42)> lift:(4.26) lev:(0.02) conv:(1.52) 35. [ Toilet Paper=1]: 101 ==> [ Sweet Relish=1]: 41 <conf:(0.41)> lift:(4.76) lev:(0.02) conv:(1.51) 36. [ Toilet Paper=1]: 101 ==> [ Cola=1]: 41 <conf:(0.41)> lift:(5.21) lev:(0.02) conv: (1.53) 37. [ Cola=1]: 106 ==> [ Onions=1]: 43 <conf:(0.41)> lift:(5.07) lev:(0.03) conv:(1.52)
38. [ Toothpaste=1]: 108 ==> [ Hot Dogs=1]: 43 <conf:(0.4)> lift:(4.3) lev:(0.02) conv: (1.48) 39. [ Sweet Relish=1]: 116 ==> [ Potato Chips=1]: 46 <conf:(0.4)> lift:(4.06) lev:(0.03) conv:(1.47) 40. [ Onions=1]: 109 ==> [ Cola=1]: 43 <conf:(0.39)> lift:(5.07) lev:(0.03) conv:(1.5) 41. [ Hot Dogs=1]: 126 ==> [ Potato Chips=1]: 49 <conf:(0.39)> lift:(3.98) lev:(0.03) conv: (1.46) 42. [ Toothpaste=1]: 108 ==> [ Tomatoes=1]: 42 <conf:(0.39)> lift:(5.88) lev:(0.03) conv: (1.51) 43. [ Cola=1]: 106 ==> [ 98pct. Fat Free Hamburger=1]: 41 <conf:(0.39)> lift:(4.15) lev: (0.02) conv:(1.46) 44. [ Cola=1]: 106 ==> [ Toilet Paper=1]: 41 <conf:(0.39)> lift:(5.21) lev:(0.02) conv: (1.49) 45. [ Potato Chips=1]: 133 ==> [ Toothpaste=1]: 51 <conf:(0.38)> lift:(4.83) lev:(0.03) conv:(1.48) 46. [ Toothpaste=1]: 108 ==> [ Bologna=1]: 41 <conf:(0.38)> lift:(5.87) lev:(0.02) conv: (1.49) 47. [ Onions=1]: 109 ==> [ Hot Dogs=1]: 41 <conf:(0.38)> lift:(4.06) lev:(0.02) conv:(1.43) 48. [ Onions=1]: 109 ==> [ Domestic Beer=1]: 41 <conf:(0.38)> lift:(5.56) lev:(0.02) conv: (1.47) 49. [ Potato Chips=1]: 133 ==> [ Hot Dogs=1]: 49 <conf:(0.37)> lift:(3.98) lev:(0.03) conv: (1.42) 50. [ Potato Chips=1]: 133 ==> [ Onions=1]: 49 <conf:(0.37)> lift:(4.6) lev:(0.03) conv: (1.44)
Simplified table: IMPULSE
REMINDER
CONFIDENC E 0,84 0,71 0,55 0,43 0,4 0,38 0,37
HOT DOGS
Sweet Relish, Hot Dog Buns Hot Dog Buns Sweet Relish Toilet Paper Toothpaste Onions Potato Chips
SWEET RELISH
Hot Dogs, Hot Dog Buns Hot Dog Buns Hot Dogs Toothpaste Tomatoes Toilet Paper
0,72 0,61 0,51 0,47 0,46 0,41
98PCT. FAT FREE HAMBURGER
Hamburger Buns
0,68
Potato Chips Cola
0,44 0,39
HOT DOG BUNS
Hot Dogs, Sweet Relish Hot Dogs Sweet Relish
0,64 0,45 0,42
PEPPERONI PIZZA -
Domestic Beer
0,54
FROZEN DOMESTIC BEER
Pepperoni Pizza - Frozen Onions
0,53 0,38
HAMBURGER BUNS
98pct. Fat Free Hamburger
0,52
HOT DOGS, SWEET RELISH
Hot Dog Buns
0,51
POTATO CHIPS
Popcorn Salt Tomatoes Toothpaste 98pct. Fat Free Hamburger Cola Onions Domestic Beer Toilet Paper Sweet Relish Hot Dogs
0,51 0,49 0,47 0,46 0,45 0,45 0,45 0,42 0,4 0,39
TOOTHPASTE
Tomatoes Bologna Cola Onions Sweet Relish Potato Chips
0,47 0,47 0,44 0,44 0,44 0,38
ONIONS
Domestic Beer Toothpaste Cola Potato Chips
0,45 0,44 0,41 0,37
COLA
Toothpaste Toilet Paper Onions
0,44 0,41 0,39
TOMATOES
Toothpaste
0,39
TOILETPAPER
Cola
0,39
BOLOGNA
Toothpaste
0,38
Table 2 Model 2 - Output Table
Example of given results:
â&#x20AC;&#x17E;3. [ Hot Dog Buns=1]: 80 ==> [ Hot Dogs=1]: 57 (3.02)â&#x20AC;&#x153;
<conf:(0.71)> lift:(7.7) lev:(0.04) conv:
Information we got from this analysis is knowledge that we should definitly take into account while organizing the store layout although this may be considered a bit too common. For example, we already know and can assume that the person who buys hot dog buns will also buy hot dogs (this occurs with 71% confidence in our dataset).
''Chain reaction'' proposal: 48. [ Onions=1]: 109 ==> [ (1.47) 24. [ Domestic Beer=1]: 92 conv:(1.6) 49. [ Potato Chips=1]: 133 (1.42) 22. [ Hot Dogs=1]: 126 ==> (1.69)
Domestic Beer=1]: 41
<conf:(0.38)> lift:(5.56) lev:(0.02) conv:
==> [ Potato Chips=1]: 41
<conf:(0.45)> lift:(4.56) lev:(0.02)
==> [ Hot Dogs=1]: 49
<conf:(0.37)> lift:(3.98) lev:(0.03) conv:
[ Hot Dog Buns=1]: 57
<conf:(0.45)> lift:(7.7) lev:(0.04) conv:
Following the same logic from previous model we propose another chain reaction option starting with the purchase of onions which leads to 38% chance of adding domestic beer to the customers shopping cart. Domestic beer leads to a 45% chance of adding potato chips in the cart. Following the potato chips, there is a 37% chance that the customer will add hot dogs to their cart and finally a 45% chance that hot dog buns will be added on the transaction aswell. Since our goal is to increase profits by doing this research, we want to know more about connections between products that have higher profit margin. We decided that our next step should be analysis of products that are not in the food category so for our third model we excluded food.
5.3. Model 3 Third model shows data about purchases of products that are not in food category, with minimal confidence level of 40%. We decreased minimum confidence requirement in order to get more results since we are now analizing products that are bought in less quantity since their profit margin is higher and it takes longer time for average customer to consume it. For example, hair conditioner or toilet paper are products that will not usually be purchased on our everyday store visit unlike bread and milk.
Model 3 algorithm settings:
Figure 4 Algorithm settings - Model 3
The output: === Associator model (full training set) === FPGrowth found 78 rules (displaying top 50) 1. [ Hair Conditioner=1, Donuts=1]: 16 ==> [ Toothpaste=1]: 14 <conf:(0.88)> lift:(11.03) lev:(0.01) conv:(4.91) 2. [ Hair Conditioner=1, Salsa Dip=1]: 17 ==> [ Toothpaste=1]: 14 <conf:(0.82)> lift: (10.38) lev:(0.01) conv:(3.91) 3. [ Sandwich Bags=1, Plastic Forks=1]: 17 ==> [ Toothpaste=1]: 14 <conf:(0.82)> lift: (10.38) lev:(0.01) conv:(3.91) 4. [ Toilet Paper=1, Donuts=1]: 19 ==> [ Toothpaste=1]: 15 <conf:(0.79)> lift:(9.95) lev: (0.01) conv:(3.5) 5. [ Toilet Paper=1, Oven Cleaner=1]: 19 ==> [ Toothpaste=1]: 15 <conf:(0.79)> lift:(9.95) lev:(0.01) conv:(3.5)
6. [ Toilet Paper=1, 60 Watt Lightbulb=1]: 23 ==> [ Toothpaste=1]: 16 <conf:(0.7)> lift: (8.77) lev:(0.01) conv:(2.65) 7. [ Toothpaste=1, Salsa Dip=1]: 28 ==> [ Toilet Paper=1]: 19 <conf:(0.68)> lift:(9.14) lev:(0.01) conv:(2.59) 8. [ Toilet Paper=1, C Cell Batteries=1]: 21 ==> [ Toothpaste=1]: 14 <conf:(0.67)> lift: (8.4) lev:(0.01) conv:(2.42) 9. [ Toilet Paper=1, Deodorant=1]: 21 ==> [ Toothpaste=1]: 14 <conf:(0.67)> lift:(8.4) lev:(0.01) conv:(2.42) 10. [ Toothpaste=1, Donuts=1]: 23 ==> [ Toilet Paper=1]: 15 <conf:(0.65)> lift:(8.79) lev: (0.01) conv:(2.37) 11. [ Toilet Paper=1, Hair Conditioner=1]: 25 ==> [ Toothpaste=1]: 16 <conf:(0.64)> lift: (8.07) lev:(0.01) conv:(2.3) 12. [ Toilet Paper=1, Salsa Dip=1]: 30 ==> [ Toothpaste=1]: 19 <conf:(0.63)> lift:(7.98) lev:(0.01) conv:(2.3) 13. [ Toothpaste=1, Oven Cleaner=1]: 24 ==> [ Toilet Paper=1]: 15 <conf:(0.63)> lift:(8.42) lev:(0.01) conv:(2.22) 14. [ Toothpaste=1, 60 Watt Lightbulb=1]: 26 ==> [ Toilet Paper=1]: 16 <conf:(0.62)> lift: (8.29) lev:(0.01) conv:(2.19) 15. [ Pepper=1]: 41 ==> [ Toothpaste=1]: 25 <conf:(0.61)> lift:(7.68) lev:(0.02) conv:(2.22) 16. [ Toothpaste=1, Donuts=1]: 23 ==> [ Hair Conditioner=1]: 14 <conf:(0.61)> lift:(10.36) lev:(0.01) conv:(2.16) 17. [ Toothpaste=1, Plastic Forks=1]: 24 ==> [ Sandwich Bags=1]: 14 <conf:(0.58)> lift: (10.73) lev:(0.01) conv:(2.06) 18. [ Oven Cleaner=1]: 42 ==> [ Toothpaste=1]: 24 <conf:(0.57)> lift:(7.2) lev:(0.02) conv: (2.04) 19. [ Deodorant=1]: 48 ==> [ Toothpaste=1]: 27 <conf:(0.56)> lift:(7.09) lev:(0.02) conv: (2.01) 20. [ Hair Conditioner=1, Sandwich Bags=1]: 26 ==> [ Toothpaste=1]: 14 <conf:(0.54)> lift: (6.79) lev:(0.01) conv:(1.84) 21. [ Donuts=1]: 44 ==> [ Toothpaste=1]: 23 <conf:(0.52)> lift:(6.59) lev:(0.01) conv:(1.84) 22. [ Toothpaste=1, Deodorant=1]: 27 ==> [ Toilet Paper=1]: 14 <conf:(0.52)> lift:(6.99) lev:(0.01) conv:(1.79) 23. [ Toothpaste=1, Sandwich Bags=1]: 29 ==> [ Toilet Paper=1]: 15 <conf:(0.52)> lift: (6.97) lev:(0.01) conv:(1.79) 24. [ 60 Watt Lightbulb=1]: 51 ==> [ Toothpaste=1]: 26 <conf:(0.51)> lift:(6.42) lev:(0.02) conv:(1.81) 25. [ Toilet Bowl Cleaner=1]: 40 ==> [ Toothpaste=1]: 20 <conf:(0.5)> lift:(6.3) lev:(0.01) conv:(1.75) 26. [ Toilet Paper=1, Sandwich Bags=1]: 30 ==> [ Toothpaste=1]: 15 <conf:(0.5)> lift:(6.3) lev:(0.01) conv:(1.73) 27. [ Toothpaste=1, Salsa Dip=1]: 28 ==> [ Hair Conditioner=1]: 14 <conf:(0.5)> lift:(8.51) lev:(0.01) conv:(1.76) 28. [ Sponge=1]: 37 ==> [ Toothpaste=1]: 18 <conf:(0.49)> lift:(6.13) lev:(0.01) conv:(1.7) 29. [ Glass Cleaner=1]: 35 ==> [ Toothpaste=1]: 17 <conf:(0.49)> lift:(6.12) lev:(0.01) conv:(1.7) 30. [ Fingernail Clippers=1]: 29 ==> [ Toothpaste=1]: 14 <conf:(0.48)> lift:(6.08) lev: (0.01) conv:(1.67) 31. [ Toothpaste=1, C Cell Batteries=1]: 29 ==> [ Toilet Paper=1]: 14 <conf:(0.48)> lift: (6.51) lev:(0.01) conv:(1.68) 32. [ Toothpaste=1, Sandwich Bags=1]: 29 ==> [ Hair Conditioner=1]: 14 <conf:(0.48)> lift: (8.21) lev:(0.01) conv:(1.71) 33. [ Toothpaste=1, Sandwich Bags=1]: 29 ==> [ Plastic Forks=1]: 14 <conf:(0.48)> lift: (12.64) lev:(0.01) conv:(1.74) 34. [ Tissues=1]: 46 ==> [ Toothpaste=1]: 22 <conf:(0.48)> lift:(6.03) lev:(0.01) conv: (1.69) 35. [ Paper Towels=1]: 44 ==> [ Toothpaste=1]: 21 <conf:(0.48)> lift:(6.01) lev:(0.01) conv: (1.69) 36. [ Toothpaste=1, Toilet Paper=1]: 40 ==> [ Salsa Dip=1]: 19 <conf:(0.47)> lift:(9.51) lev:(0.01) conv:(1.73) 37. [ Toothbrush=1]: 38 ==> [ Toothpaste=1]: 18 <conf:(0.47)> lift:(5.97) lev:(0.01) conv: (1.67) 38. [ Wash Towels=1]: 34 ==> [ Toilet Paper=1]: 16 <conf:(0.47)> lift:(6.34) lev:(0.01) conv:(1.66)
39. [ Paper Plates=1]: 43 ==> [ Toothpaste=1]: 20 <conf:(0.47)> lift:(5.86) lev:(0.01) conv: (1.65) 40. [ AA Cell Batteries=1]: 43 ==> [ Toothpaste=1]: 20 <conf:(0.47)> lift:(5.86) lev:(0.01) conv:(1.65) 41. [ AA Cell Batteries=1]: 43 ==> [ Salsa Dip=1]: 20 <conf:(0.47)> lift:(9.31) lev:(0.01) conv:(1.7) 42. [ Plastic Forks=1]: 52 ==> [ Toothpaste=1]: 24 <conf:(0.46)> lift:(5.82) lev:(0.01) conv:(1.65) 43. [ Sponge=1]: 37 ==> [ Toilet Paper=1]: 17 <conf:(0.46)> lift:(6.19) lev:(0.01) conv: (1.63) 44. [ Liquid Laundry Detergent=1]: 46 ==> [ Toothpaste=1]: 21 <conf:(0.46)> lift:(5.75) lev: (0.01) conv:(1.63) 45. [ Tissues=1]: 46 ==> [ Toilet Paper=1]: 21 <conf:(0.46)> lift:(6.15) lev:(0.01) conv: (1.64) 46. [ Trash Bags=1]: 42 ==> [ Toothpaste=1]: 19 <conf:(0.45)> lift:(5.7) lev:(0.01) conv: (1.61) 47. [ Oven Cleaner=1]: 42 ==> [ Toilet Paper=1]: 19 <conf:(0.45)> lift:(6.1) lev:(0.01) conv:(1.62) 48. [ 60 Watt Lightbulb=1]: 51 ==> [ Toilet Paper=1]: 23 <conf:(0.45)> lift:(6.08) lev: (0.01) conv:(1.63) 49. [ Hair Conditioner=1]: 80 ==> [ Toothpaste=1]: 36 <conf:(0.45)> lift:(5.67) lev:(0.02) conv:(1.64) 50. [ Dishwasher Detergent=1]: 40 ==> [ Toilet Paper=1]: 18 <conf:(0.45)> lift:(6.06) lev: (0.01) conv:(1.61)
These are the products that we want to focus on. As we can see in our results, many purchases of house supplies like deodorant, oven cleaner or trash bags often result in a purchase of toothpaste. Example from rule list: â&#x20AC;&#x17E;18. [ Oven Cleaner=1]: 42 ==> [ Toothpaste=1]: 24 <conf:(0.57)> lift:(7.2) lev:(0.02) conv: (2.04) 19. [ Deodorant=1]: 48 ==> [ Toothpaste=1]: 27 <conf:(0.56)> lift:(7.09) lev:(0.02) conv: (2.01) 20. [ Hair Conditioner=1, Sandwich Bags=1]: 26 ==> [ Toothpaste=1]: 14 <conf:(0.54)> lift: (6.79) lev:(0.01) conv:(1.84)â&#x20AC;&#x153;
This knowledge is not obvious as one that we got from model 1 and 2 and will be very usefull while arranging our store layout. We can already see that house supplies would have to be in vicinity of cosmetic products in order for our customers to make that connection more easily. For our next and final model we decided to see if we can find some strong connections between other products that were not listed in first two models since those were very common and too similar.
5.4. Model 4 Since Models 1 and 2 revolve around similar products, we decided to explore the dataset further and exclude all of the attributes which were mentioned in them. Model 4 algorithm settings:
Figure 5 Algorithm settings - Model 4
The output: === Run information === Scheme: weka.associations.FPGrowth -P 2 -I -1 -N 50 -T 0 -C 0.5 -D 0.05 -U 1.0 -M 0.01 Relation: marketbasket-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-lastweka.filters.unsupervised.attribute.Remove-R5-7,17,105,111,119,124125,130,133,142,154,170,176,197,202,218,239,247weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-lastweka.filters.unsupervised.attribute.Remove-R215weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last Instances: 1361 Attributes: 234 [list of attributes omitted] === Associator model (full training set) === FPGrowth found 311 rules (displaying top 50) 1. [ Apples=1, Lemons=1]: 17 ==> [ Hair Conditioner=1]: 16 <conf:(0.94)> lift:(16.01) lev: (0.01) conv:(8) 2. [ Toilet Paper=1, Microwave Popcorn=1]: 15 ==> [ Popcorn Salt=1]: 14 <conf:(0.93)> lift:(14.6) lev:(0.01) conv:(7.02) 3. [ Bananas=1, Trash Bags=1]: 16 ==> [ Apples=1]: 14 <conf:(0.88)> lift:(16.31) lev: (0.01) conv:(5.05) 4. [ Sour Cream=1, Noodle Soup=1]: 18 ==> [ Toilet Paper=1]: 15 <conf:(0.83)> lift:(11.23) lev:(0.01) conv:(4.17) 5. [ Apples=1, AA Cell Batteries=1]: 18 ==> [ Ramen Noodles=1]: 15 <conf:(0.83)> lift: (13.34) lev:(0.01) conv:(4.22) 6. [ Apples=1, Manicotti=1]: 18 ==> [ Hair Conditioner=1]: 15 <conf:(0.83)> lift:(14.18) lev:(0.01) conv:(4.24) 7. [ Salsa Dip=1, Manicotti=1]: 17 ==> [ Toilet Paper=1]: 14 <conf:(0.82)> lift:(11.1) lev:(0.01) conv:(3.93) 8. [ Cantaloupe=1, Creamy Peanut Butter=1]: 17 ==> [ Popcorn Salt=1]: 14 <conf:(0.82)> lift:(12.88) lev:(0.01) conv:(3.98)
9. [ Lollipops=1, AA Cell Batteries=1]: 17 ==> [ Salsa Dip=1]: 14 <conf:(0.82)> lift: (16.48) lev:(0.01) conv:(4.04) 10. [ Orange Juice=1, Microwave Popcorn=1]: 20 ==> [ Popcorn Salt=1]: 16 <conf:(0.8)> lift: (12.51) lev:(0.01) conv:(3.74) 11. [ Hair Conditioner=1, Lemons=1]: 20 ==> [ Apples=1]: 16 <conf:(0.8)> lift:(14.92) lev: (0.01) conv:(3.79) 12. [ Sandwich Bags=1, Sour Cream=1]: 19 ==> [ Toilet Paper=1]: 15 <conf:(0.79)> lift: (10.64) lev:(0.01) conv:(3.52) 13. [ Apples=1, Cantaloupe=1]: 19 ==> [ Toilet Paper=1]: 15 <conf:(0.79)> lift:(10.64) lev: (0.01) conv:(3.52) 14. [ Orange Juice=1, Chicken Soup=1]: 19 ==> [ Popcorn Salt=1]: 15 <conf:(0.79)> lift: (12.35) lev:(0.01) conv:(3.56) 15. [ Sandwich Bags=1, Sour Cream=1]: 19 ==> [ Ramen Noodles=1]: 15 <conf:(0.79)> lift: (12.64) lev:(0.01) conv:(3.56) 16. [ Ramen Noodles=1, AA Cell Batteries=1]: 19 ==> [ Apples=1]: 15 <conf:(0.79)> lift: (14.72) lev:(0.01) conv:(3.6) 17. [ Apples=1, Oven Cleaner=1]: 18 ==> [ Toilet Paper=1]: 14 <conf:(0.78)> lift:(10.48) lev:(0.01) conv:(3.33) 18. [ Salsa Dip=1, Summer Sausage=1]: 18 ==> [ Toilet Paper=1]: 14 <conf:(0.78)> lift: (10.48) lev:(0.01) conv:(3.33) 19. [ Bananas=1, Pretzels=1]: 18 ==> [ Pancake Mix=1]: 14 <conf:(0.78)> lift:(13.57) lev: (0.01) conv:(3.39) 20. [ Apples=1, Trash Bags=1]: 18 ==> [ Bananas=1]: 14 <conf:(0.78)> lift:(13.07) lev: (0.01) conv:(3.39) 21. [ Sandwich Bags=1, Garlic Bread=1]: 18 ==> [ Chicken Soup=1]: 14 <conf:(0.78)> lift: (21.6) lev:(0.01) conv:(3.47) 22. [ Garlic Bread=1, Chicken Soup=1]: 18 ==> [ Sandwich Bags=1]: 14 <conf:(0.78)> lift: (14.3) lev:(0.01) conv:(3.4) 23. [ Sour Cream=1, Salsa Dip=1]: 22 ==> [ Toilet Paper=1]: 17 <conf:(0.77)> lift:(10.41) lev:(0.01) conv:(3.39) 24. [ Popcorn Salt=1, Apple Fruit Roll=1]: 21 ==> [ Toilet Paper=1]: 16 <conf:(0.76)> lift: (10.27) lev:(0.01) conv:(3.24) 25. [ French Fries=1, Cheese Crackers=1]: 21 ==> [ Popcorn Salt=1]: 16 <conf:(0.76)> lift: (11.92) lev:(0.01) conv:(3.28) 26. [ Orange Juice=1, Salsa Dip=1]: 21 ==> [ Popcorn Salt=1]: 16 <conf:(0.76)> lift:(11.92) lev:(0.01) conv:(3.28) 27. [ Sour Cream=1, Summer Sausage=1]: 20 ==> [ Toilet Paper=1]: 15 <conf:(0.75)> lift: (10.11) lev:(0.01) conv:(3.09) 28. [ Popcorn Salt=1, Chicken Soup=1]: 20 ==> [ Orange Juice=1]: 15 <conf:(0.75)> lift: (13.26) lev:(0.01) conv:(3.14) 29. [ Popcorn Salt=1, Apples=1]: 23 ==> [ Toilet Paper=1]: 17 <conf:(0.74)> lift:(9.96) lev:(0.01) conv:(3.04) 30. [ Toilet Paper=1, Frozen Chicken Thighs=1]: 19 ==> [ Popcorn Salt=1]: 14 <conf:(0.74)> lift:(11.53) lev:(0.01) conv:(2.96) 31. [ Popcorn Salt=1, Frozen Chicken Thighs=1]: 19 ==> [ Toilet Paper=1]: 14 <conf:(0.74)> lift:(9.93) lev:(0.01) conv:(2.93) 32. [ Apples=1, Salsa Dip=1]: 19 ==> [ Toilet Paper=1]: 14 <conf:(0.74)> lift:(9.93) lev: (0.01) conv:(2.93) 33. [ Toilet Paper=1, Oven Cleaner=1]: 19 ==> [ Apples=1]: 14 <conf:(0.74)> lift:(13.74) lev:(0.01) conv:(3) 34. [ Sour Cream=1, Cottage Cheese=1]: 19 ==> [ Toilet Paper=1]: 14 <conf:(0.74)> lift: (9.93) lev:(0.01) conv:(2.93) 35. [ Sour Cream=1, Cantaloupe=1]: 19 ==> [ Toilet Paper=1]: 14 <conf:(0.74)> lift:(9.93) lev:(0.01) conv:(2.93) 36. [ Sour Cream=1, Flavored Fruit Bars=1]: 19 ==> [ Toilet Paper=1]: 14 <conf:(0.74)> lift:(9.93) lev:(0.01) conv:(2.93) 37. [ Pancake Mix=1, Pretzels=1]: 19 ==> [ Bananas=1]: 14 <conf:(0.74)> lift:(12.38) lev: (0.01) conv:(2.98) 38. [ C Cell Batteries=1, Cantaloupe=1]: 19 ==> [ French Fries=1]: 14 <conf:(0.74)> lift: (12.54) lev:(0.01) conv:(2.98) 39. [ Orange Juice=1, Pretzels=1]: 19 ==> [ Waffles=1]: 14 <conf:(0.74)> lift:(14.33) lev: (0.01) conv:(3) 40. [ Waffles=1, Pretzels=1]: 19 ==> [ Orange Juice=1]: 14 <conf:(0.74)> lift:(13.02) lev: (0.01) conv:(2.99)
41. [ French Fries=1, Salsa Dip=1]: 22 ==> [ Toilet Paper=1]: 16 <conf:(0.73)> lift:(9.8) lev:(0.01) conv:(2.91) 42. [ Oranges=1, Apples=1]: 22 ==> [ Hair Conditioner=1]: 16 <conf:(0.73)> lift:(12.37) lev:(0.01) conv:(2.96) 43. [ Toilet Paper=1, Chardonnay Wine=1]: 21 ==> [ Ramen Noodles=1]: 15 <conf:(0.71)> lift: (11.44) lev:(0.01) conv:(2.81) 44. [ Toilet Paper=1, Summer Sausage=1]: 21 ==> [ Sour Cream=1]: 15 <conf:(0.71)> lift: (14.09) lev:(0.01) conv:(2.85) 45. [ Toilet Paper=1, Noodle Soup=1]: 21 ==> [ Sour Cream=1]: 15 <conf:(0.71)> lift:(14.09) lev:(0.01) conv:(2.85) 46. [ Bananas=1, Orange Flavored Fruit Bars=1]: 21 ==> [ Pancake Mix=1]: 15 <conf:(0.71)> lift:(12.46) lev:(0.01) conv:(2.83) 47. [ Bananas=1, Waffles=1]: 21 ==> [ Orange Juice=1]: 15 <conf:(0.71)> lift:(12.63) lev: (0.01) conv:(2.83) 48. [ Hair Conditioner=1, Manicotti=1]: 21 ==> [ Apples=1]: 15 <conf:(0.71)> lift:(13.32) lev:(0.01) conv:(2.84) 49. [ Toilet Paper=1, Licorice=1]: 20 ==> [ Popcorn Salt=1]: 14 <conf:(0.7)> lift:(10.95) lev:(0.01) conv:(2.67) 50. [ Ramen Noodles=1, Vegetable Oil=1]: 20 ==> [ Toilet Paper=1]: 14 <conf:(0.7)> lift: (9.43) lev:(0.01) conv:(2.65)
Simplified table: IMPULSE HAIR CONDITIONER
REMINDER Apples, Lemons
CONFIDENCE 0,94
Apples, Manicotti Oranges, Apples
0,83 0,73
POPCORN SALT
Toilet Paper, Microwave Popcorn Cantaloupe, Creamy Peanut Butter Orange Juice, Microwave Popcorn Orange Juice, Chicke Soup French Fries, Cheese Crackers Orange Juice, Salsa Dip Toilet Paper, Frozen Chicken Thighs Toilet Paper, Licorice
0,93 0,82 0,8 0,79 0,76 0,76 0,74 0,7
APPLES
Bananas, Trash Bags Hair Conditioner, Lemons Ramen Noodles, AA Cell Batteries Toilet Paper, Oven Cleaner Hair Conditioner, Manicotti
0,88 0,8 0,79 0,74 0,71
TOILET PAPER
Sour Cream, Noodle Soup Salsa Dip, Manicotti Sandwich Bags, Sour Cream Apples, Cantaloupe Apples, Oven Cleaner Salsa Dip, Summer Sausage Sour Cream, Salsa Dip Popcorn Salt, Apple Fruit Roll Sour Cream, Summer Sausage
0,83 0,82 0,79 0,79 0,78 0,78 0,77 0,76 0,75
Popcorn Salt, Apples Popcorn Salt, Frozen Chicken Thighs Apples, Salsa Dip Sour cram, Cottage Cheese Sour Cram, Cnataloupe Sour Cream, Flavored fruit Bars Franch fries, salsa dip Ramen Noodles, Vegetable Oil
0,74 0,74
Apples, AA Cell Batteries
0,83
Sandwich Bags, Sour Cream Toilet Papr, Chardonay Wine
0,79 0,71
SALSA DIP
Lollipops, Chardonay Wine
0,82
PANCAKE MIX
Bananas, Pretzels Bananas, Orange Flavoree Fruit Bars
0,78 0,71
BANANAS
Apples, Trash Bags Pancake Mix, Pretzels
0,78 0,74
CHICKEN SOUP
Sandwich Bags, garlic Bread
0,78
SANDWICH BAGS
Garlic Bread, Chicken Soup
0,78
ORANGE JUICE
Popcorn Salt, Chicken Soup Waffles, Pretzels Bananas, Waffles
0,75 0,74 0,71
FRANCH FRIES
C Cell Batteries, Cantaloupe
0,74
WAFFLES
Orange Juice, Pretzels
0,74
SOUR CREAM
Toilet Paper, Summer Sausage Toilet Paper, Noodle Soup
0,71 0,71
RAMEN NOODLES
0,74 0,74 0,74 0,74 0,73 0,7
Table 3 Model 4 - Output Table
This model shows us new connections between products that were not visible in our first two models. Again, most products belong in food category but this time bread, milk, meat, eggs etc. were excluded. Results that were given show us that purchase of one type of fruit often results in purchase of other types of fruit, this is also something we can asume, but we can also see a new
connection that was not so obvious. Fruit products often remind customers of buying some cosmetic products. Example: „1. [ Apples=1, Lemons=1]: 17 ==> [ Hair Conditioner=1]: 16 (0.01) conv:(8)“
<conf:(0.94)> lift:(16.01) lev:
Customers that bought lemons and apples will also buy hair conditioner with 94% confidence level. This shows us that friut reminds people of buying cosmetics products that are often made from extracts of fruit and vegetables. „10. [ Orange Juice=1, Microwave Popcorn=1]: 20 ==> [ Popcorn Salt=1]: 16 lift:(12.51) lev:(0.01) conv:(3.74) „ „40. [ Waffles=1, Pretzels=1]: 19 ==> [ Orange Juice=1]: 14 (0.01) conv:(2.99) „
<conf:(0.8)>
<conf:(0.74)> lift:(13.02) lev:
We can also notice that orange juice is often strongly connected to snacks. So when a customer decides to buy waffles and pretzles they will be reminded of orange juice. Orange juice will be the „impulse“ product in this case and can also lead to purchase of some type of fruit. This information is relevant to us since products in cosmetics category have the highest profit margin and in order to maximize profits, we want to put „reminder“ products (in this case fruit products) as close as possible to cosmetics because they will lead to increased profits.
6. Implementation of results Initial processing of the 1362 transactions using WEKA has given the results that we will now use in order to organize the store layout. Goal is to increase profits of the store and make
customers buy more products by highlighting connections between products that were found to be strongly linked in the minds of consumers. For this research to have the most impact to stores revenue we will focus on products that have the highest confidence levels in different categories. Table of linked products with highest confidence from all 4 models: IMPULSE PRODUCT HOT DOGS WHITE BREAD MILK POTATO CHIPS POTATO CHIPS HAMBURGER TOOTHPASTE TOILET PAPER HAIR CONDITIONER TOILET PAPER
REMINDER PRODUCT/S hot dog buns, sweet relish toothpaste, milk, cola, eggs eggs, white bread, potato chips Eggs, milk white bread, milk, eggs, tomatoes Cola, potato chips onions Cola Apples, lemmons Apples Table 4 Products with highest confidence
Products with most confident connections (listed above), can be separated into their categories: ''Meat products'', ''Bakery products'', ''Dairy products'', ''Drinks'', ''Fruit'', ''Cosmetics'' and ''House supplies''. Taking this product connections into account, we can organize our store in a way that customers can easily fulfill their need when they get the connection and decide to buy another, linked product. Impulse in customers would be activated if ''impulse product'' is in sight while customer is buying the ''reminder producs''. We also need to consider placing the most offten bought products to the furthest part of our store in order to allow customers to pass by many other products and get as much reminders to trigger an ''impusle purchase''. Having this in mind, a store layout can be created.
Figure 6 Proposed store layout
We will explain why this setup would be of most benefit to the store with following examples. For example, if a customer comes to buy eggs and white bread, there is a high probability that he will also buy milk. Milk, white bread and eggs will associate customer to buy snacks like potato chips that they just saw on the way to checkout.
Figure 7 Customer path - Example 1
If the customer wants to buy some fruit or vegetables like onion he will most likely end up buying toothpaste or toilet paper as well and that is why these categories of products must be in close proximity. If customer decides to buy Cola while buying fruit, they might get
triggered by meat products nearby and buy hamburger which will also lead to them purchasing some bakery products (eg. white bread or hamburger buns). Cola also reminds people to buy toilet paper that our customer would come across while going to checkout.
Figure 8Customer path - Example 2
7. Conclusion By using transaction data from a store and data about customersâ&#x20AC;&#x2122; shopping behavior, this paper has discovered several shopping patterns that can be utilized to increase profits of the store. During the analysis of dataset, no products with absolute relations (confidence = 1.0) were found and the next best results were taken into consideration. Products with high relation confidence were put in categories wich were arranged in aisles in such way that it would maximize shopping impulses in customers. To conclude this paper we can safely assume that the market basket analysis in combination with data mining brings valuable information which can increase both profit and customer sattisfaction. If done properly, it affects both long term and short term decision making processes and provides the management with facts rather than assumptions.
Works Cited
Frequent Itemsets via the FP-Growth Algorithm. (n.d.). Dohvaćeno iz mIxtend: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/ Hariharan, S., Kannan, M., & Raguraman, P. (2013). A Seasonal Approach for Analysis of Temporal Trends. Tamilnadu: International Journal of Computer Applications . Li, S. (25. Sep 2017). A Gentle Introduction on Market Basket Analysis — Association Rules. Dohvaćeno iz TowardsDataScience: https://towardsdatascience.com/a-gentle-introductionon-market-basket-analysis-association-rules-fa4b986a40ce Sagin, A. N., & Ayvaz, B. (2018). Determination of Association Rules with Market Basket Analysis:. Istanbul: Istanbul Commerce University. Shah, P. (10. Jan 2017). An Introduction to Weka. Dohvaćeno iz OpenSourceForu: https://opensourceforu.com/2017/01/an-introduction-to-weka/ Surjandari, I., & Seruni, A. C. (2005). Design of product placement layout in retail shop . Depok: University of Indonesia,.