Case Study

High-Performance Clustering Engine

April 2021Utah State UniversityArchitectureDistributed SystemsCUDA

Summary

Engineered a benchmarked multi-backend clustering pipeline to raise throughput on high-volume datasets.

Benchmarked parallel implementations and achieved multi-fold throughput gains for million-point datasets.

Balancing algorithm quality, execution speed, and memory pressure across different hardware targets.

Compute-intensive clustering with multiple execution backends (CUDA, MPI, OpenMP) behind a shared evaluation harness.

Standardized benchmark inputs and instrumented each backend to compare tradeoffs objectively before selecting defaults.

Optimized memory access patterns and batching strategy to keep performance stable at higher data volumes.

Last updated: July 11, 2026