Can Topology See the Shape of Sound?: Robust Feature Extraction from Noisy Speech Using TDA

Místo
Anotace

Understanding sound is not just about hearing — it’s about recognizing structure. As audio data becomes central to smart devices, healthcare, and manufacturing, the challenge grows: how do we extract meaningful, noise-robust features from speech or environmental sound, especially when annotated data are limited?

This project investigates Topological Data Analysis (TDA) as a geometry-driven approach to uncovering structure in sound. Using publicly available datasets — Google Speech Commands and BirdSet — we will explore multiple TDA-based embedding extraction methods, fuses them with classic spectrogram features, and benchmarks classifiers on Google Speech Commands and BirdSet to surpass strong baselines while proving noise robustness through synthetic-noise experiments. Moreover, mapper-based techniques can capture the “shape” of sound in ways resilient to noise and useful for classification.

Objectives of the Work (specific tasks will be agreed individually with the student)

In this study, you will:
   • Survey and implement several TDA-based embedding extraction methods (e.g., persistent homology on raw waveforms, persistent landscapes on MFCCs, graph-based TDA features).
   • Design and evaluate fusion strategies that combine TDA embeddings with conventional feature representations such as spectrograms or mel-spectrograms.
   • Conduct classification experiments on multiple open audio datasets—most notably the Google Speech Commands corpus and the BirdSet bird-song dataset—to benchmark against standard baselines (e.g., CNNs trained on spectrograms). 
   • Aim to outperform these baseline models in terms of accuracy and robustness.
   • Validate noise resilience by introducing controlled, synthetic noise into test audio and measuring the degradation (or stability) of TDA-augmented models compared to non-TDA baselines.

Benefits for the Student

   • Gain hands-on experience with modern TDA techniques for embedding extraction and their integration into deep-learning workflows.
   • Learn to architect and implement feature-fusion pipelines, combining topological and spectral methods.
   • Master the end-to-end machine-learning lifecycle: dataset preprocessing, model training, hyperparameter tuning, and evaluation.
   • Develop skills in robustness analysis through systematic noise experiments and ablation studies.
   • Contribute to cutting-edge research with potential applications in explainable AI and edge-intelligent acoustic sensing.

Resources

Datasets:

   • Google Speech Commands (https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html)
   • BirdSet (https://huggingface.co/datasets/DBD-research-group/BirdSet)

Papers & Tutorials:

  1. Tulchinskii, E., Kuznetsov, K., Kushnareva, L., Cherniavskii, D., Barannikov, S., Piontkovskaya, I., Nikolenko, S., & Burnaev, E. (2022). Topological data analysis for speech processing. arXiv preprint arXiv:2211.17223. https://arxiv.org/abs/2211.17223
  2. Yu, Z. (2025). Topological deep learning for speech data. arXiv preprint arXiv:2505.21173. https://arxiv.org/abs/2505.21173
  3. Tlachac, M. L., Sargent, A., Toto, E., Paffenroth, R., & Rundensteiner, E. (2020). Topological data analysis to engineer features from audio signals for depression detection. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 302-307). IEEE. 
    DOI: 10.1109/ICMLA51294.2020.00056
Vedoucí práce
KOS