Determining the Number of Trace Clusters
=============================================

Determining the approppriate amount of trace clusters of an event log by evaluating the stability of its clustering.
About ----- A wide array of trace clustering techniques exist for partitioning an event log. However, none of them propose an approach for determining an approppriate amount of trace clusters. Therefore, we propose a stability-based approach for choosing the number of clusters, using log-perturbations and similarity metrics. Our framework is implemented as an experimental plugin and includes multiple log-perturbation strategies, similarity metrics and trace clustering techniques for evaluation. [](#TCDiagram) The image above is a conceptual representation of the approach for calculating stability. References ---------- The approach is detailed thoroughly in the following submission: * De Koninck P., De Weerdt, J. (2016). Determining the Number of Trace Clusters: a Stability-based Approach; International Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2016 (in submission) Implementation -------------- The stability-based approach is implemented as a [ProM 6](http://www.promtools.org/) plugin. The following JAR file contains the plugin: * [Version of 2016-04-13](downloads/ClusterStability20160413.jar) You will need to make sure that ProM can find the downloaded JAR in its classpath. To do so, you can create a folder 'plugins' in the ProM installation directory, place the downloaded JAR file in this directory, and start ProM with the following command (Windows example): java -classpath ./plugins/*;./lib/*;./* -Djava.util.Arrays.useLegacyMergeSort=true -Djava.library.path=./lib -ea -Xmx2g -XX:MaxPermSize=512m -XX:+UseCompressedOops org.processmining.contexts.uitopia.UI The implementation contains two plugins: "Determine the Number Of Trace Clusters (Defaults)" and "Determine the Number Of Trace Clusters (Wizard)". In the default setting, the stability is evaluted using the HeurGED-metric, with a noise induction percentage of 10%. In the wizard version, the user is able to select multiple trace clustering techniques to evaluate, with a configuration of his/her preference. The results are presented in a basic table. Contact ------- Contact the authors at: * [Pieter De Koninck](mailto:pieter.dekoninck@kuleuven.be) (corresponding author)
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium * [Jochen De Weerdt](mailto:jochen.deweerdt@kuleuven.be) (corresponding author)
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium Screenshots ----------- [](#img01) [](#img02) [](#img03)