ML-Based Audio Fingerprinting for Noisy Environment

Uncategorized

Authors: Manthan Gavali, Om Malode, Shreeyash Jadhav, Yash Chaudhari, Assistant Professor Vaibhav Dabhade

Abstract: This project addresses the challenge of robust audio content identification in noisy environments by developing an ML-based audio fingerprinting system.To overcome this limi-tation, our methodology leverages a deep learning approach, using a Convolutional Neural Network to automatically extract a compact, noise-invariant fingerprint from audio spectrograms. The system involves a multi-stage process: a diverse dataset of clean audio is first augmented with various types of noise which has different signal to noise ratios. The trained model then generates a unique fingerprint for each audio track in a database. Finally, these fingerprints are stored using a fast and efficient hashing mechanism, enabling quick retrieval and identification. Our evaluation will demonstrate that this ML-based system significantly outperforms Existing methods in terms of accuracy and robustness, particularly at low SNRs, thereby providing a more reliable solution for applications such as music recognition, broadcast monitoring, and copyright enforcement.It further introduces spectrogram normalization and data-driven feature learning that minimize the impact of background dis-tortions. A contrastive-learning objective enforces the noisy and clean versions of the same audio to have similar embeddings. To facilitate fast retrieval, the system uses an approximate nearest-neighbor search mechanism optimized for large-scale databases. The approach’s low cost computational for fingerprint generation and matching is also demonstrated by experimental results. In general, the proposed approach allows for a scalable, high-performance framework suitable for real-time audio identifi-cation in adverse acoustic environments.This paper proposes a machine learning-based audio fingerprinting system for accurate audio identification in noisy conditions. A Convolutional Neural Network (CNN) is employed to learn noise-robust and compact audio fingerprints from audio spectrograms. Noise is added to clean audio examples with varying signal-to-noise ratio (SNR) values to enhance robustness. Contrastive learning is employed to guarantee that embeddings of noisy and clean audio examples are similar. The produced audio fingerprints are stored through a hashing function, and an approximate nearest neighbor search is employed for efficient retrieval. Experimental results show enhanced audio identification accuracy with low computational complexity in low SNR conditions. The proposed system is appropriate for scalable and real-time audio identification tasks

DOI: https://doi.org/10.5281/zenodo.20743337

× How can I help you?