Authors: Alim Shaikh, Dr.Santosh Gaikwad, Dr. A. A. Khan, Dr. R. S. Deshpande
Abstract: Speech-based interaction with large lan- guage models (LLMs) is revolutionizing human-computer communication by enabling natural, voice-driven inter- faces. This study explores methods to prompt LLMs through automatic speech recognition (ASR) while ad- dressing challenges such as transcription errors, noise interference, latency, and prompt optimization. The proposed framework integrates ASR with LLMs using noise reduction, structured prompt engineering, and contextual adaptation. Experimental evaluations using models like OpenAI Whisper and GPT-4 demonstrate improvements in performance metrics such as Word Error Rate (WER) and response latency. Applications span healthcare, accessibility, and customer support, and future work will focus on expanding multimodal capabilities and enhancing ethical and energy efficiency aspects.
DOI: http://doi.org/