Authors: Mukul Rane, Om Baviskar, Devendra Nikam, Tejaswi Malode, Associate Professor Vaibhav Dabhade
Abstract: This paper proposes DocInsight, a context-aware document analysis system that integrates preprocessing, Optical Character Recognition (OCR), layout analysis, and semantic processing into a unified pipeline. The system enhances text extraction accuracy while preserving document structure, en- abling efficient understanding of unstructured documents. By leveraging layout-aware OCR and transformer-based semantic models, DocInsight supports intelligent search, context-driven retrieval, and automated report generation. The framework ensures improved accuracy, structural consistency, and reduced manual effort in document processing. The system is applicable across multiple domains such as healthcare, legal systems, educa- tion, and enterprise environments, where efficient and intelligent document understanding is essential