Advanced Multi Model RAG Application/strong>
Authors:-Professor Disha Nagpure, Sujal Pore, Shardul Deshmukh, Aditya Suryawanshi
Abstract-This paper presents a modular, context-aware multimodal Retrieval-Augmented Generation (RAG) application that leverages both chain-based and agentic execution strategies. Powered by Gemini 1.5 Flash as the core language model, the system integrates Langchain and Langsmith frameworks to enable dynamic document retrieval, task orchestration, and seamless handling of multiple data sources. Key features include a YouTube summarizer using transcript APIs, real-time web search via the Tavily search tool, and support for text, image, and audio inputs, with OpenAI’s Whisper model for speech-to-text conversion. The application’s contextual awareness is enhanced by chat memory fallback functions, ensuring continuous, coherent interaction across sessions. Additionally, vector databases are employed for efficient multimodal retrieval. This system represents a significant advancement in RAG applications, offering flexibility, scalability, and adaptability across various input modalities and real-time tasks.
