# Clustering

Clustering is a method to group similar events. This can be done automatically for you using several well known algorithms, which each have their own pros and cons.

### Create Clusters

You can automatically generate new clusters for an FCS file by selecting the Clustering Tool. Then choose a clustering algorithm to use, fill in your desired parameters and click "Apply".

#### K-Means

K-Means is a very basic clustering algorithm. One of its biggest benefits it that is is very fast at least compared to other clustering algorithms. It can assign clusters a little crudely at times and you will be required to input how many clusters you want it to find. If you select K-Means as your clustering algorithm, you will need to provide the following:

- Number of Clusters - The number of cluster you want the algorithm to group the data into
- Number of Iterations - How long you want the algorithm to work for. Bigger numbers may result in better clusters but will take longer
- Standardize - If applied, the data will be standardized make the mean for each parameter 0 with units of standard deviation. This may be useful for clustering parameters which have different scales.
- Transform - If applied, the data will be transformed by the current transform for that parameter. This may make parameters on a log scale cluster better

#### DBSCAN

DBSCAN is density-based spatial clustering of applications with noise. It has some features that make it a little more convenient than K-Means. First, DBSCAN does not require you to specify the number of clusters. It will figure out how many clusters are in the data on its own. Next, DBSCAN is good at handling outliers in the data and will label data as noise if appropriate, not including them in a cluster.

- Epsilon - Distance used to determine if points are in the same cluster
- Minimum Points - The minimum number of points that need to be grouped together before it is considered a cluster
- Standardize - If applied, the data will be standardized make the mean for each parameter 0 with units of standard deviation. This may be useful for clustering parameters which have different scales.
- Transform - If applied, the data will be transformed the current transform for that parameter. This may make parameters on a log scale cluster better