Authors: D.Hari Priya, Ch.Charmi Sri, A.Rohit, K.Harika Sri, Ms. M. Soumya
Abstract: This paper presents PDFChatBot, a comprehensive AI-driven system for automated PDF summarization and intelligent query answering. Our hybrid approach integrates Rhetorical Structure Theory (RST), transformerbased models (BERT, GPT-4, Gemini-1.5-Pro), and FAISS vector databases, achieving state-of-the-art ROUGE-L scores of 0.51 and F1-scores of 0.87 across 50 diverse documents spanning research papers, legal contracts, medical reports, financial statements, and technical manuals. The system processes 100-page documents in under 120 seconds, reducing document review time by 80% while maintaining semantic coherence. We demonstrate superior performance over TextRank (ROUGE-L: 0.37), BART-large (0.44), and T53B (0.47) baselines through rigorous evaluation across five distinct domains. Production-ready deployment via FastAPI, Streamlit, Docker, and Redis caching ensures scalability for enterprise applications with 99.9% uptime and sub-second query latency.