Authors: Ritika Ghosh, Abhishek Dey, Sonali Mondal, Arjun Sen
Abstract: In today's data-intensive environments, the Zettabyte File System (ZFS) plays a central role in ensuring reliable and high-performance storage for applications ranging from databases to high-performance computing and cloud workloads. However, predicting future storage consumption, ARC/L2ARC cache pressure, and snapshot bloat has become increasingly complex due to the dynamic and non-linear nature of modern workload behaviors. Traditional statistical approaches often fall short in capturing these complexities, necessitating the adoption of hybrid AI models that blend statistical, machine learning (ML), and deep learning techniques. These hybrid systems can more accurately model usage trends, recognize anomalous patterns, and respond to previously unseen behaviors, especially when trained on detailed ZFS telemetry. This review article explores the use of hybrid AI techniques for ZFS usage forecasting, focusing on time series modeling, anomaly detection, snapshot growth prediction, and proactive capacity management. It begins with a foundational overview of ZFS architecture, highlighting the importance of ARC, L2ARC, ZIL, and snapshot layers in the overall usage landscape. It then discusses the specific forecasting challenges that arise in ZFS due to caching hierarchies, concurrent access patterns, and latency-sensitive applications. We examine a taxonomy of AI models used in the domain and analyze how hybrid designs can improve accuracy and adaptability. The review further details the construction of end-to-end pipelines for training, evaluating, and deploying predictive models based on ZFS metrics. Case studies from healthcare, research clusters, and enterprise NAS environments are presented to demonstrate the operational impact of intelligent forecasting. Finally, the article outlines future directions including federated learning, online retraining, and integration with AIOps platforms to support self-optimizing storage infrastructures.