The Coron Toolkit

Coron-base is the most important module of the Coron platform. This module is responsible for the extraction of different itemsets, providing input to the other modules of the platform.
With Coron-base one can extract the following itemsets:

Coron-base has a command line and a graphical user interface as well.



I. Command-line Interface

Usage: ./coron.sh [switches] ‹database› ‹min_supp› [-alg:‹alg›]

There are two compulsory parameters:

1. the database file (in .basenum, .bool or .rcf format), and
2. the minimum support (in absolute or relative value).

Throughout this guide we will work with the dataset shown in Table 1. In the examples we assume that this dataset is stored in a file called laszlo.rcf in .rcf format. The supported file formats are shown in Table 2. Line i of a .basenum file contains items that are included in object i. The .bool file is a binary matrix representation of the binary database. The .rcf file is very similar to the .bool file format but it has the advantage that names can be assigned to objects and to attributes.

Table 1: A sample dataset for the examples.
a(1) b(2) c(3) d(4) e(5)
01 X X X X
02 X X
03 X X X X
04 X X X
05 X X X X

Table 2: Our example dataset (Table 1) in different file formats.
(1) .basenum (2) .bool (3) .rcf
1 2 4 5 1 1 0 1 1 [Relational Context]
1 3 1 0 1 0 0 Default Name
1 2 3 5 1 1 1 0 1 [Binary Relation]
2 3 5 0 1 1 0 1 Name_of_dataset
1 2 3 5 1 1 1 0 1 o1 | o2 | o3 | o4 | o5
a | b | c | d | e
1 1 0 1 1
1 0 1 0 0
1 1 1 0 1
0 1 1 0 1
1 1 1 0 1
[END Relational Context]



The minimum support can be given in either absolute or relative value, e.g. 2 or 40%.

There are two kinds of switches:

1. -option (example: -names)
2. -key:value (example: -alg:apriori)

The algorithm to be used can be specified with the -alg:<alg> switch. The available algorithms are described below.

Example:

./start.sh sample/laszlo.rcf 2 -names -alg:apriori

Result:

# Database file name:
# Database file size:
# Number of lines:
# Largest attribute:
# Number of attributes:
# Number of attributes in average:
# min_supp:
# Chosen algorithm:

{a} (4)
{b} (4)
...
# FIs: 15
sample/laszlo.rcf
208 bytes
5
5
5
3.4
2, i.e. 40%
Apriori

At the beginning and at the end there are some statistics about the dataset and the number of found itemsets.

If we only want to analyze the input dataset without calculating the itemsets, use the -stat option:

./start.sh sample/laszlo.rcf -stat

In this case the program terminates after showing the database statistics.

The -names option is highly recommended. It works only for .rcf files. With this option, attribute numbers can be replaced by their names. The example above without -names would look like this:

./start.sh sample/laszlo.rcf 40% -alg:apriori

Result:

{1} (4)
{2} (4)
...

This means: the first attribute has support 4, the second attribute has also support 4, etc.

Other options:

--help help information
--version, -V version information
--update check for a new version

Verbosity options:

-v:m memory usage
-v:f function information (which function is called)
-v:t time information (runtime)

These options can be combined with -vc:

-vc:mf equivalent to -v:m -v:f

Verbosity options display some additional information while the program is running. These kind of feedbacks are always redirected to the standard error, and these lines start with a '>' sign. Because of the redirection to stderr, this information does not mix with normal result.

Statistical information is sent to the standard output, and these lines always start with a '#' sign. This way, these lines can be easily filtered.

>>> The detailed description of the available algorithms is here. <<<



II. Graphical User Interface

Coron-base is equipped with a graphical frontend too. Figures 1 – 7 show the different steps of the interface. At step 1 the user chooses the input file. At step 2 he chooses an output file because the result is saved in a file in all cases. It is possible to use a temporary file. After defining the minimum support (step 3) and choosing the mining algorithm (step 4), the software summarizes the user's choice at step 5. The user can go back at each step to modify his choice. After pressing the "Start calculation!" button, the result is saved in a file, which can be visualized at the end (step 6).

The graphical interface uses a configuration file called .coron_gui.rc, which is placed in the HOME/.coron directory. When the GUI is launched for the very first time, this file is created automatically with the default values. This file can be edited by the user to customize the software.