Authors: Srujana Parepalli
Abstract: Data quality has emerged as a critical challenge in modern enterprise information systems, as rapid growth in data volume, velocity, and heterogeneity amplifies issues such as inconsistency, incompleteness, redundancy, and semantic ambiguity across distributed platforms. Traditional rule-based data validation techniques, including integrity constraints and handcrafted business rules, offer strong interpretability and auditability but often fail to scale or adapt in dynamic environments where schemas, data sources, and usage patterns continuously evolve. In contrast, purely statistical and machine-learning driven approaches excel at identifying latent patterns and anomalies in large datasets but frequently suffer from limited explainability, making governance, regulatory compliance, and root-cause analysis difficult. This article presents an integrated framework for Intelligent Data Quality Engineering that synergistically combines constraint-based validation, probabilistic modeling, and AI-driven anomaly detection to overcome these limitations. By grounding adaptive learning models in well-established research on conditional functional dependencies, probabilistic databases, and entity resolution, the framework enables predictive detection of quality issues and supports self-healing data pipelines capable of learning from historical errors and feedback. This hybrid approach effectively bridges deterministic data rules with adaptive intelligence, delivering scalable, transparent, and governance-aligned data quality solutions suitable for enterprise-grade analytics and decision systems.
DOI: http://doi.org/10.5281/zenodo.17987694
Published by: vikaspatanker