Comparison of cgMLST Pipelines for Hospital Outbreak Detections

Summary: Three commercial cgMLST pipelines were compared, using 255 hospital outbreak isolates, evaluating their concordance in identifying outbreak-related clusters and whether they consistently distinguish “clustered” vs “non-clustered” isolate pairs. 
Comparison of cgMLST Pipelines for Hospital Outbreaks
Different Software Pipelines Can Give Non-identical Genomic Distance Metrics
Why this matters:
  • Whole-genome sequencing (WGS) is increasingly adopted for investigating hospital outbreaks, where sequence analysis may involve SNP or CgMLST comparisons. 
  • CgMLST, compared with SNP analysis, offers a more standardized approach for analyzing more diverse strains and providing a standardized, gene-by-gene approach for assessing strain relatedness by comparing thousands of genes shared by nearly all strains of a species - the core genome.
  • Differences in allele calling, scheme definitions, and clustering thresholds between CgMLST software pipelines can lead to divergent interpretations.
  • This study provides empirical evidence on the degree of variability between pipelines and their reliability in identifying true transmission clusters.

Key findings:  Glasgow, et al. used 255 clinical isolates of common bacterial healthcare-associated pathogens (Acinetobacter baumannii (N = 6), E. coli (N = 64), Enterococcus faecalis (N = 25), Enterococcus faecium (N = 9), Klebsiella pneumoniae (N = 45), Pseudomonas aeruginosa (N = 36), Staphylococcus aureus (N = 65), Serratia marcescens (N = 5) previously classified as “clustered” or “non-clustered”.1 The authors compared cgMLST analyses performed using the Ridom SeqSphere+, 1928 Diagnostics, and Ares Genetics ARESdb software pipelines.

  • Concordance in cluster calling: Using suggested clustering thresholds, 1928 Diagnostics’ pipeline matched SeqSphere+ with 100% concordance, while ARESdb matched 99.5% overall. 
  • Allelic distance differences: ARESdb systematically reported significantly greater allele differences (i.e. more divergence) than SeqSphere+ or 1928 in same-patient clustered and different-patient clustered isolate pairs, though all 3 pipelines agreed on non-clustered pairs.
  • Mean (± SD) allele distances in same-patient clustered pairs were ~7.6 (7.2) for ARESdb vs ~1.18 (1.6) for 1928 (1.6) vs ~1 for SeqSphere+./li>
  • Despite numerical distance discrepancies, the cgMLST analysis using these commercial pipelines generally agreed on outbreak clustering outcomes, which are most relevant for infection control interpretation.

Bigger picture: While cgMLST is increasingly recognized as a preferred framework for standardized bacterial strain typing, this study shows that different software pipelines can yield non-identical genomic distance metrics. These discrepancies underscore that cgMLST results are pipeline-dependent, emphasizing the need for method harmonization and shared clustering thresholds to ensure consistent interpretation across hospitals and public health laboratories. Integrating cgMLST with SNP-based analyses or establishing universal allele-distance standards could further enhance reproducibility. Overall, this comparison strengthens the case for standardized WGS-based outbreak surveillance pipelines—a critical step toward scalable, interoperable genomic epidemiology across healthcare networks and public health systems.

References:

1. Glasgow et al. (2025). “Comparison of core genome multi-locus sequencing typing pipelines for hospital outbreak detection of common bacterial pathogens.” Journal of Clinical Microbiology Vol. 63, Issue 10: e0064625.