Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

New data mining models based on formal concept analysis and probability logic

Liying Jiang, University of Nebraska - Lincoln

Abstract

This dissertation enhances data mining processes by formalizing them in a logic framework, with the focuses on improving the efficiency of association rule mining and extending the use of association rules to make predictions based on the proposed framework. Although extensive studies have been done on data mining, most of them concentrate on specific application domains. A logic framework to formally represent important notions and processes in data mining has attracted little attention. SPICE---Symbolic integration of Probability Inference and Concept Extraction, is therefore proposed, in which the logic representations of concepts, patterns, previously unknown and potentially interesting patterns are formalized. Two primary data mining tasks, association rule mining and classification, are formally represented as pattern discovery processes in SPICE. Based on the SPICE framework, a new special type of patterns, Maximal Potentially Useful (MaxPUF) patterns, is formalized. The MaxPUF patterns lead to a new class of association rules, called MaxPUF rules. These rules are characterized by the minimum antecedents among all the high-confidence rules for the same consequent. At the same time, this minimum antecedent includes the most important factors to imply a consequent with high confidence. Thus, the MaxPUF rules are very interesting and potentially useful to the user. The mining of MaxPUF rules provides a solution to the rule redundancy problem in association rule mining, which occurs when a large number of rules are generated and many of them are uninteresting or unimportant. A new mining approach called Succinct Worthy Association Rule Mining (SWARM) is proposed to improve mining efficiency. Different from previous mining approaches that only prune the infrequent itemsets, SWARM adopts a new pruning strategy that deletes less important items in the mining process. Because a much smaller number of itemset candidates are generated after the items have been deleted, SWARM is more efficient than previous approaches. In SWARM the MaxPUF rules are used to help identify less important items. In addition, the possible use of association rules for prediction is studied and a new prediction rule model is proposed. The experimental results show that the discovered prediction rules can be used for prediction with good results. Overall, this dissertation introduces a logic framework for data mining and develops methodologies based on the proposed framework to enhance data mining.

Subject Area

Computer science

Recommended Citation

Jiang, Liying, "New data mining models based on formal concept analysis and probability logic" (2006). ETD collection for University of Nebraska-Lincoln. AAI3216105.
https://digitalcommons.unl.edu/dissertations/AAI3216105

Share

COinS