Efficient Algorithms for Mining Closed Frequent Itemset and Generating Rare Association Rules from Uncertain Databases

Authors

May 14, 2013

Dealing with uncertainty has gained increasing attention in the past few years in both static and streaming data management and mining. Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems and data integration. Specifically some privacy– preserving data stream mining applications like banking sectors operate on uncertain data and they need to mine frequent itemsets. There are many existing works on mining frequent itemsets from uncertain databases. They use the expected support of an itemset from an uncertain database to define whether it is frequent or not. Itemsets are considered frequent if its expected support exceeds the minimum support threshold. There are two main problems with this approach of mining frequent itemsets from uncertain databases. The first problem is that often too many frequent itemsets are generated many of which may prove to be irrelevant. The second is that only expected support is calculated to define frequent itemsets as being those whose expected support exceeds the minimum support. To overcome both these problems closed frequent itemset mining can be used. An itemset X is said to be closed, if and only if none of its superset has the same support. The mining process can be done by using Poisson Binomial Distribution. Closed frequent itemsets are extracted by an approximate mining algorithm which can effectively and accurately discover closed frequent itemsets in large uncertain databases. It can be extended to generate the rare association rules within the given support and confidence framework. In a practical scenario, there can be some itemsets which have significance and can provide useful knowledge but their support counts are relatively less. To extract the relationship among those itemsets having less frequency of occurrences, rare association rule mining came into existence.