This report summarizes the clustering of pathogen whole-genome sequencing samples using MicroTrace. Clustering is based on pairwise SNP distances. This enables identification of genetically related isolates, aiding outbreak detection and infection control.
We read in the cluster assignments produced by MicroTrace, which include optional metadata (collection date, ward, etc.).
clusters <- read_csv("data/cluster_assignments.csv")
## Rows: 10 Columns: 5
## ── Column specification ───────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Sample, Ward, Patient_ID
## dbl (1): Cluster
## date (1): Collection_Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(clusters)
## # A tibble: 6 × 5
## Sample Cluster Collection_Date Ward Patient_ID
## <chr> <dbl> <date> <chr> <chr>
## 1 Sample_1 1 2023-01-01 Ward_A P001
## 2 Sample_2 1 2023-01-02 Ward_A P002
## 3 Sample_3 1 2023-01-03 Ward_A P003
## 4 Sample_4 1 2023-01-04 Ward_A P004
## 5 Sample_5 1 2023-01-05 Ward_A P005
## 6 Sample_6 2 2023-01-06 Ward_B P006
We summarize how many samples fall into each SNP-defined cluster, and display the distribution of collection dates and hospital wards for each cluster.
summary_table <- clusters %>%
group_by(Cluster) %>%
summarise(
n_samples = n(),
wards = paste(unique(Ward), collapse = ", "),
dates = paste(min(Collection_Date), max(Collection_Date), sep = " to ")
)
summary_table
## # A tibble: 2 × 4
## Cluster n_samples wards dates
## <dbl> <int> <chr> <chr>
## 1 1 5 Ward_A 2023-01-01 to 2023-01-05
## 2 2 5 Ward_B 2023-01-06 to 2023-01-10
The following dendrogram was generated by MicroTrace. Red dashed lines indicate the SNP threshold used to define clusters.
We summarize the intra-cluster SNP distances for each cluster, including mean, standard deviation, and min/max distances.
intra_stats <- read_csv("data/intra_cluster_stats.csv")
## Rows: 2 Columns: 6
## ── Column specification ───────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (6): Cluster, Size, Mean, SD, Min, Max
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
intra_stats
## # A tibble: 2 × 6
## Cluster Size Mean SD Min Max
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 5 1.7 0.675 1 3
## 2 2 5 2 1.05 1 4
MicroTrace enables rapid and reproducible outbreak cluster detection from SNP distance matrices. This HTML report provides a clear summary of potential outbreak groupings, supporting infection prevention and genomic epidemiology workflows.