Determining the Number of Trace Clusters
=============================================
Determining the approppriate amount of trace clusters of an event log by evaluating the stability of its clustering.
About
-----
A wide array of trace clustering techniques exist for partitioning an event log. However, none of them propose an approach for determining an approppriate amount of
trace clusters. Therefore, we propose a stability-based approach for choosing the number of clusters, using log-perturbations and similarity metrics.
Our framework is implemented as an experimental plugin and includes multiple log-perturbation strategies, similarity metrics and trace clustering techniques for evaluation.
[](#TCDiagram)
The image above is a conceptual representation of the approach for calculating stability.
References
----------
The approach is detailed thoroughly in the following submission:
* De Koninck P., De Weerdt, J. (2016). Determining the Number of Trace Clusters: a Stability-based Approach; International Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2016 (in submission)
Implementation
--------------
The stability-based approach is implemented as a [ProM 6](http://www.promtools.org/) plugin. The following JAR file contains the plugin:
* [Version of 2016-04-13](downloads/ClusterStability20160413.jar)
You will need to make sure that ProM can find the downloaded JAR in its classpath. To do so, you can create a folder 'plugins' in the ProM installation directory, place the downloaded JAR file in this directory, and start ProM with the following command (Windows example):
java -classpath ./plugins/*;./lib/*;./* -Djava.util.Arrays.useLegacyMergeSort=true -Djava.library.path=./lib -ea -Xmx2g -XX:MaxPermSize=512m -XX:+UseCompressedOops org.processmining.contexts.uitopia.UI
The implementation contains two plugins: "Determine the Number Of Trace Clusters (Defaults)" and
"Determine the Number Of Trace Clusters (Wizard)". In the default setting, the stability is evaluted using the HeurGED-metric, with a noise induction percentage of 10%. In the wizard version,
the user is able to select multiple trace clustering techniques to evaluate, with a configuration of his/her preference. The results are presented in a basic table.
Contact
-------
Contact the authors at:
* [Pieter De Koninck](mailto:pieter.dekoninck@kuleuven.be) (corresponding author)
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium
* [Jochen De Weerdt](mailto:jochen.deweerdt@kuleuven.be) (corresponding author)
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium
Screenshots
-----------
[](#img01)
[](#img02)
[](#img03)