Available algorithms

Here we present the available algorithms in Coron-base:



1. Apriori (-alg:apriori)

This is the basic algorithm for finding FIs. It is efficient for sparse datasets only. Since it is a levelwise algorithm, it produces itemsets in ascending order by length.

./start.sh sample/laszlo.rcf 3 -names -alg:apriori

Sample output:

{a} (4)

This means: {a} is a frequent itemset with support 4.



2. Apriori-Close (-alg:aprioriclose, -alg:ac)



A simple extension of Apriori. It also marks FCIs. It is almost as efficient as Apriori, i.e. the derivation of FCIs does not induce serious additional computation time.

./start.sh sample/laszlo.rcf 3 -names -alg:aprioriclose

{a} (4) +
{b} (4)

In each algorithm, the '+' sign means that the itemset is closed. That is: {a} is a frequent closed itemset with support 4; {b} is frequent (but not closed) with support 4.



3. Apriori-Inverse (-alg:aprioriinverse, -alg:ai)



The algorithm finds all perfectly rare itemsets (PRIs). Perfectly rare itemsets are a subset of all rare itemsets. This set contains rare itemsets such that all their subsets are rare (except the empty set!).

./start.sh test/laszlo.rcf 3 -names -alg:aprioriinverse

{d} (1)

This means: {d} is a perfectly rare itemset with support 1.



4. Apriori-Rare (-alg:apriorirare)



This algorithm is based on Apriori, thus its efficiency is like Apriori's. While enumerating FIs, it filters minimal rare itemsets (mRIs). An mRI is a rare itemset, and all its proper subsets are frequent.

There are two kinds of mRIs: mRIs with support 0 (zero itemsets), and mRIs with support higher than 0 (non-zero itemsets). It must be specified which group to extract:

* -all extract zero itemsets too, or
* -nonzero extract non-zero itemsets only

./start.sh sample/laszlo.rcf 3 -names -alg:apriorirare -nonzero

{d} (1)

This means: {d} is an mRI with support 1.



5. Arima (-alg:arima)



Arima (A Rare Itemset Miner Algorithm) finds rare itemsets (RIs) in the following way: it takes mRIs generated by Apriori-Rare. Then, using non-zero mRIs, it finds their proper supersets. In the result zero-itemsets are avoided because of their large number.

Minimal zero generators (mZGs) are used to reduce the search space (lots of zero itemsets are pruned thanks to this technique).

./start.sh sample/laszlo.rcf 3 -names -alg:arima

{a, d, e} (1)

This means: {a, d, e} is a rare itemset with support 1.



6. Carpathia-G (-alg:carpathiag)



Carpathia-G is a levelwise algorithm that finds frequent generators. The algorithm is similar to A-Close, but A-Close computes also the closures of generators. Carpathia-G concentrates on finding FGs only.

./start.sh sample/laszlo.rcf 3 -names -alg:carpathiag

{a} (4) [key: +]

This means: {a} is a frequent generator with support 4.



7. Carpathia-G-Rare (-alg:carpathiagrare)



Note: in our paper (szathmary07b), this algorithm is called MRG-Exp.

This algorithm is based on Carpathia-G. While enumerating FGs, it filters minimal rare generators (mRGs). In (szathmary07b) we showed that the sets of mRIs and mRGs are equivalent. An mRG (or mRI) is a rare itemset, and all its proper subsets are frequent. Moreover, an mRG (or mRI) is a rare generator, and all its proper subsets are frequent generators!

There are two kinds of mRGs: mRGs with support 0 (zero itemsets), and mRGs with support higher than 0 (non-zero itemsets). It must be specified which group to extract:

* -all extract zero itemsets too, or
* -nonzero extract non-zero itemsets only

./start.sh sample/laszlo.rcf 3 -names -alg:carpathiagrare -nonzero

{d} (1)

This means: {d} is an mRG with support 1.



8. Charm (-alg:charm)



A very efficient algorithm for finding FCIs. Not a breadth-first, but a depth-first algorithm. It needs to keep all FCIs in the main memory in order to decide if a newly found itemset is closed or not. When the algorithm finishes the enumeration of itemsets, it has all FCIs in the main memory in a hash structure. Normally the itemsets are not ordered by length, but if needed, the hash structure can be easily traversed in such a way that the result be ordered.

./start.sh sample/laszlo.rcf 3 -names -alg:charm

{a, b, e} (3) +

This means: {a,b,e} is an FCI with support 3.



9. Close (-alg:close)



A levelwise algorithm that finds FCIs. It is more efficient on dense datasets. Its authors do not mention that during the levelwise search the algorithm can find the same FCIs several times. Thus, if the result is written immediately to a file, then it will contain duplicates. To avoid it, FCIs must be kept in main memory in order to prune duplicates. It results in a large memory consumption.

./start.sh sample/laszlo.rcf 3 -names -alg:close

{b, c, e} (3) +

This means: {b,c,e} is an FCI with support 3.



10. Eclat (-alg:eclat)



Actually, Charm is a modified version of Eclat. Eclat is not a breadth-first, but a depth-first algorithm. It finds all FIs in a very efficient way. In the output, frequent itemsets are not ordered by their length.

./start.sh sample/laszlo.rcf 3 -names -alg:eclat

{a, b, e} (3)

This means: {a,b,e} is an FI with support 3.



11. Eclat-Z (-alg:eclatz)



Eclat-Z (Eclat-Zart) is a combination of Eclat with the idea introduced in Zart. Eclat finds efficiently all FIs. These itemsets are stored in a file. When done, the itemsets in the file are processed in ascending order by length (remember, Eclat produces FIs in an unordered way). This can be done efficiently with a special file indexing technique. During this levelwise post-process, FGs and FCIs are marked, and generators are associated to their closures.

./start.sh sample/laszlo.rcf 3 -names -alg:eclatz

{b, e} (4) +; [{b}, {e}]

This means: {b,e} is an FCI with support 4. It has two generators, {b} and {e} (with the same support).



12. Pascal (-alg:pascal)



An efficient levelwise algorithm for finding all FIs. It also marks which itemsets are key generators. Among levelwise FI-miner algorithms, it may be the most efficient.

./start.sh sample/laszlo.rcf 3 -names -alg:pascal

{a} (4) [key: +]
{b, e} (4) [key: -]

This means: {a} is an FI with support 4. It is an FG too. The itemset {b,e} is an FI with support 4, but it is not a key generator.



13. Pascal+ (-alg:pascalplus)



Pascal+ is a simple extension of Pascal. In addition to Pascal, it also tells which itemsets are closed.

./start.sh sample/laszlo.rcf 3 -names -alg:pascalplus

{a} (4) [key: +] +
{b, c} (3) [key: +]
{b, e} (4) [key: -] +

This means: {a} is an FI, an FG and an FCI. The itemset {b,c} is an FI, an FG, but not an FCI. The itemset {b,e} is an FI, not an FG, but an FCI.



14. Pseudo-Closed (-alg:pseudoclosed, -alg:pc)



Pseudo-Closed is a levelwise algorithm that finds pseudo-closed itemsets and their closures. This result is necessary for generating the Duquennes-Guigues basis for instance.

./start.sh sample/laszlo.rcf 3 -names -alg:pseudoclosed

{b} (4) [pseudo-closed: +; closure: {b, e}]

This means: {b} is a pseudo-closed itemset with support 4. Its closure is {b, e} (whose support is also 4).



15. Zart (-alg:zart)



This algorithm is based on Pascal. In addition to Pascal, it marks FCIs, and associates generators to their closures. It is almost as efficient as Pascal. The output of the algorithm is ideal for generating minimal non-redundant association rules.

./start.sh sample/laszlo.rcf 3 -names -alg:zart

{b, e} (4) +; [{b}, {e}]

This means: {b,e} is an FCI with support 4. It has two generators, {b} and {e} (with the same support).