Authors: Olalekan Oluyinka
Abstract: This paper presents the design and implementation of an architecture for automated hybrid data extraction, integration, and entity resolution for sports aggregation. The system consolidates inconsistent records from multiple heterogeneous sources into a centralized, deduplicated interface for sports event discovery and streaming access. Data is extracted in real time across seven heterogeneous sources and directly ingested in the automated pipeline. A multi-step entity resolution algorithm, combined with data pre-processing within a Single Source of Truth (SSOT) framework transforms heterogeneous data into a unified, deduplicated index. The architecture employs edge caching and batching to reduce latency and improve operational performance in constrained environments. A prototype further demonstrates the practicality of automated multi-source sports event aggregation through entity resolution.