Authors: Pardeep Mehta, Sudhakar Ranjan
Abstract: In today’s digital age, the amount of data being created is growing at an extraordinary pace. Sources like social media, online shopping platforms, IoT devices, mobile apps, and business systems all contribute to this growth. This massive expansion has given rise to big data, which is often described by five key features: volume, velocity, variety, veracity, and value. Handling such huge and complex datasets requires storage systems that are flexible, scalable, and efficient in storing, managing, and retrieving information. Traditional storage models, such as centralized databases and file systems, often fall short when it comes to big data. They face challenges like limited scalability, poor fault tolerance, redundant data issues, and slow performance. To address these problems, new storage designs have shifted toward distributed and cloud-based systems, which provide better scalability and high availability. As data continues to grow across industries, the need for advanced storage solutions has become more urgent. Old systems struggle to keep up with fast data intake, quick access requirements, and the ever-changing demands of large-scale analytics. This research explores ways to optimize modern storage systems to improve the performance of big data processing. It looks at distributed file systems, object storage, and cloud-native methods, focusing on aspects such as data distribution, replication, metadata management, and efficient resource use. The study also considers how to balance scalability, fault tolerance, and consistency while integrating with platforms like Hadoop and Spark. By testing and evaluating performance, the research aims to develop solutions that increase speed, reliability, and cost-effectiveness. Ultimately, the findings are expected to guide the creation of next-generation storage systems that can support the rapid expansion of big data.
DOI: https://doi.org/10.5281/zenodo.17285290