Association Rule eXtractor

AssRuleX (Association Rule eXtractor) is the second most important module of the Coron platform. This module is responsible for the extraction of different sets of association rules. With AssRuleX one can extract the following association rules:

1. all valid association rules 
2. closed association rules 
3. all informative association rules 
4. reduced informative association rules 
5. Generic Basis (GB) 
6. (all) Informative Basis (IB) 
7. reduced Informative Basis (IB) 
8. rare informative association rules 
  -rule:all
  -rule:closed
  -rule:all_inf
  -rule:inf
  -rule:GB
  -rule:all_IB
  -rule:IB
  -rule:rare

Note that under "all informative association rules" we mean the minimal non-redundant association rules (MNR); under "reduced informative association rules" we mean the transitive reduction of MNR (i.e. RMNR); and under "rare informative association rules" we mean the exact MRG rules. (The reason of this "confusion" is that Bastide et al. called their rules as "informative rules". However, Kryszkiewicz uses the concept "informative" in a different sense, and she calls the same set of rules as "minimal non-redundant rules". In (szathmary06t) we used this latter terminology, but since we started to develop AssRuleX using the old terminology, we decided not to change it because of backward compatibility.)


I. Command-line Interface

Usage: ./assrulex.sh [switches] <database> <min_supp> <min_conf> -alg:<alg> -rule:<rule>

There are five compulsory parameters:

1. database file (in .basenum, .bool, or .rcf format)
2. minimum support
3. minimum confidence
4. name of the algorithm to be used
5. rule set that we want to extract with the previously specified algorithm

The minimum support can be given in either absolute or relative value, e.g. 2 or 40%.

The minimum confidence can be given as a real value (between 0 and 1.0, e.g. 0.5), or as a percentage (between 0% and 100%, e.g. 50%).

There are two kinds of switches:

1. -option (example: -names)
2. -key:value (example: -alg:apriori)

Other options:

--help
--version, -V
--update
help information
version information
check for a new version

Verbosity options:

-v:m
-v:f
-v:t
memory usage
function information (which function is called)
time information (runtime)

These options can be combined with -vc:

-vc:mf equivalent to -v:m -v:f

The following algorithm/association rules combinations can be used:

Apriori:
1) all association rules -rule:all
Close:
1) closed association rules -rule:closed
Pascal:
1) all association rules -rule:all
Pascal+:
1) all association rules
2) closed association rules
-rule:all
-rule:closed
Charm:
1) closed association rules -rule:closed
Zart:
1) all association rules
2) closed association rules
3) all informative association rules
4) reduced informative association rules
5) Generic Basis (GB)
6) (all) Informative Basis (IB)
7) reduced Informative Basis (IB)
-rule:all
-rule:closed
-rule:all_inf
-rule:inf
-rule:GB
-rule:all_IB
-rule:IB
Eclat-Z:
1) all association rules
2) closed association rules
3) all informative association rules
4) reduced informative association rules
5) Generic Basis (GB)
6) (all) Informative Basis (IB)
7) reduced Informative Basis (IB)
-rule:all
-rule:closed
-rule:all_inf
-rule:inf
-rule:GB
-rule:all_IB
-rule:IB
BtB:
1) rare association rules -rule:rare

Example:

./start.sh sample/laszlo.rcf 4 50% -names -alg:zart -rule:inf

Result:

Database file name:
Database file size:
Number of lines:
Largest attribute:
Number of attributes:
Number of attributes in average:
min_supp:
min_conf:
Chosen algorithm:
Rules to extract:
sample/laszlo.rcf
208~bytes
5
5
5
3.4
4, i.e 80%
50%
Zart
reduced informative association rules

{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +

{e} => {b} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +

# Number of found rules: 2

# Number of FF rules: 2


At the beginning and at the end there are some statistics about the dataset and the number of found rules.

If we only want to analyze the input dataset without calculating the itemsets, use the -stat option:

./start.sh sample/laszlo.rcf 4 50% -names -alg:zart -rule:inf -stat

In this case the program terminates after showing the database statistics.

The -names option is highly recommended. It works only for .rcf files. With this option, attribute numbers are replaced with their names.

Let us see what a rule looks like:

{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +


This means: the antecedent is {b}, the consequent is {e}. The support of the rule is 4, which is equivalent to 80% in this dataset (see the sample dataset). Confidence: 100%. Support of the left part of the rule: 4; support of the right part of the rule: 4. The rule is in the FF class, i.e. both sides of the rule are frequent (frequent itemset implies frequent itemset). The rule is closed.

There are some other quality measures available for the rules. They can be visualized with the -full or -measures switch.

{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%];
lift=1.250; conv=NOT_DEF; dep=0.200; nov=0.160; sat=1.000; class=FF) +


This means:

  1. left part of the rule ({b})
  2. right part of the rule ({e})
  3. support of the rule (4, i.e. 80%)
  4. confidence of the rule (1.0, i.e. 100%)
  5. support of the left part of the rule (4, i.e. 80%)
  6. support of the right part of the rule (4, i.e. 80%)
  7. lift (1.250)
  8. conviction (not defined in the case of exact association rules)
  9. dependency (0.200)
  10. novelty (0.160)
  11. satisfaction (1.000)
  12. classification of the rule (type FF, i.e. frequent itemset implies frequent itemset)
  13. is it a closed rule? (in the example the rule is closed)

Notes: in some cases a statistical measure cannot be calculated for a rule. In this case "NOT_DEF" is displayed. The '+' at the end means that the rule is closed, i.e. the union of the antecedent and consequent forms a closed itemset.

With the -examples switch one can visualize the positive and negative examples of each rule. Positive example: objects that contain left and right sides of the rule. Negative example: objects that contain the left, but not the right side of the rule.

Example:

./start.sh sample/laszlo.rcf 2 50% -names -alg:zart -rule:inf -examples

Sample output:

{a} => {b, e} (supp=3 [60.00%]; conf=0.750 [75.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +

Positive examples (objects that contain left AND right sides of the rule): [o1, o3, o5]

Negative examples (objects that contain left, BUT NOT the right side of the rule): [o2]



II. Graphical User Interface

AssRuleX also has a graphical frontend. The graphical interface is very similar to Coron-base's, thus the following figures only show those screens that are different.

At step 1 the user chooses the input file. At step 2 we need to choose an output file because the result is saved in a file in all cases. It is possible to use a temporary file. After defining the minimum support and minimum confidence (step 3, Figure 1), we must choose the mining algorithm and the type of rules to be extracted (step 4, Figure 2). The software summarizes the user's choice at step 5. We can go back at each step to modify our choice. After pressing the "Start calculation!" button, the result is saved in a file, which can be visualized at the end (step 6, Figure 3).

The graphical interface uses a configuration file called .assrulex_gui.rc, which is placed in the HOME/.coron directory. When the GUI is launched for the very first time, this file is created automatically with the default values. This file can be edited by the user to customize the software.