Authors: Rohit K. Basnet
Abstract: The rapid adoption of artificial intelligence has increased the demand for scalable, efficient, and cost-effective computational infrastructures. Traditional server-based architectures often result in underutilized resources, idle compute time, and increased operational overhead, which can limit the performance and scalability of AI workloads. Serverless AI models provide a transformative solution by leveraging event-driven, cloud-native architectures that dynamically allocate resources based on demand, abstracting infrastructure management from developers and organizations. These models enable functions to execute on-demand, scale automatically, and terminate once tasks are completed, ensuring optimized utilization of computational resources. This review examines the concept, architecture, and methodologies underlying serverless AI, highlighting how it improves computational efficiency while reducing costs. Key enabling technologies such as function-as-a-service (FaaS), microservices, containerization, orchestration frameworks, and cloud-native pipelines are explored. Additionally, the paper evaluates techniques for optimizing serverless AI performance, including dynamic scaling, resource-aware scheduling, asynchronous execution, and caching mechanisms. Challenges such as cold start latency, state management, integration complexities, and vendor lock-in are also addressed. Finally, the review explores emerging trends in hybrid and edge serverless AI, predictive resource allocation, and energy-efficient model execution, positioning serverless AI as a strategic enabler for agile, cost-effective, and high-performance AI computing in modern cloud ecosystems.