Fp growth algorithm for application in research of the basket analysis

Transactions on Computer Science and Technology June 2013, Volume 2, Issue 2, PP.24-30

FP-Growth Algorithm for Application in Research of the Basket Analysis Qing Tian 1, Yongmei Liu 2# 1. School of Computer Science, Zhao Qing University, Zhaoqing Guangdong 526061, China 2. College of Information Engineering, Capital Normal University, Beijing 100048, China #

Email: snowflash@163.com

Abstract During the process of mining frequent item sets, when minimum support is little, the production of candidate sets is a kind of time-consuming and frequent operation in the mining algorithm. The FP growth algorithm does not need to produce the candidate sets, the database which provides the frequent item set is compressed to a frequent pattern tree (or FP tree), the frequent item set is mining by using of FP tree. For the sake of researching the Basket Analysis, the Frequent-Pattern is introduced, Visual C++ is applied to design the program to mine the frequent item sets. In view of the frequent K-item set, the various results are contrasted, the goods which is sell possibly at the same time in the supermarket is arranged in the same place. Keywords: Candidate Sets; Frequent Item Sets; the Basket Analysis; FP-Growth; FP-Tree

FP-Growth 算法在购物篮分析研究中的应用* 田庆 1，刘永梅 2 1.肇庆学院计算机学院，广东肇庆 526061 2.首都师范大学信息工程学院，北京 100048 要：频繁项集的挖掘过程中，在最小支持度较小的情况下，候选集的产生是算法的主要耗时操作。FP-Growth 算法不

摘

用产生候选集，将提供频繁项集的数据库压缩到一棵频繁模式树（或 FP-Tree），利用 FP-Tree 挖掘出频繁项集。为了进行购物篮分析，根据 FP-Growth 算法的理论，运用 VC++程序开发工具，对数据进行频繁项集挖掘，针对挖掘得到的频繁 K 项集，对比实验数据指导超级市场将可能同时卖出的商品摆放在一起。关键词：候选集；频繁项集；购物篮分析；FP-Growth；FP-Tree

引言在过去的数十年中，经济发展迅猛，信息化水平不断提高，组织机构普遍收集了大量的商业数据。然而拥有大量的数据并不意味着拥有了丰富的商业信息。商业机构迫切需要从海量的数据中发现有价值的信息、知识。频繁项集挖掘的一个典型例子是购物篮分析[1]，通过频繁项集挖掘方法来发现顾客放入“购物篮” 中的不同商品之间的关联，分析顾客的购买习惯。这种关联的发现可以帮助零售商了解哪些商品频繁的被顾客同时购买，从而帮助他们开发更好的营销策略。例如，如果顾客在超级市场购物时购买了牛奶，他们有多大可能也同时购买面包（以及何种面包）？这种信息可以帮助零售商做选择性销售和安排货价空间，导致增加销售量。购物篮分析的目标是在顾客的购买交易中分析出同时购买一类产品或一组产品的可能性（相互关联），从购物篮分析中获得的知识是很有价值的。另外购物篮分析中，通过对大量的销售数据进行频繁项集的挖掘，在最小支持度较小的情况下，此挖掘过程的主要耗时操作其一是候选集的产生，其二是多次重复扫描数据库。针对上述两个问题，Han 等人提出了一种基于 FP-Tree 的关联规则挖掘算法 FP-Growth，一种不需 *

基金资助：受肇庆市科技创新项目（基金号：2012G26）和肇庆学院自然科学项目（基金号：201212）支持资助。 - 24 http://www.ivypub.org/cst

Turn static files into dynamic content formats.

Create a flipbook