Authors: Venkata Nagamani Reddi, Charitha Pasumarthi, Mounika Mudavath, SriLaxmi Thurupu, Keerthana Vadagam
Abstract: The emergence of AI-generated voices has posed significant problems with the authenticity of media and their digital safety. False audio detection or fake audio has been critical in such areas as audio forensics and voice authentication. In this paper, a literature review of deep fake audio detection with deep learning is conducted. The system used currently works with Mel-frequency Cepstral Coefficients (MFCCs) as the input feature and a VGG16based Convolutional Neural Network (CNN) as transfer learning to classify the real and fake voices. VGG16 is an effective model that can capture spectral variations but it is not able to learn temporal dependencies. To overcome this hybrid CNN-LSTM models have been investigated, which combine both spatial and time based feature learning to make them more accurate and robust.