Authors: Vishrut Nath Jha, Joanne Anto, Athira KK
Abstract: Recent advancements in vector databases and embedding-based retrieval have transformed how large unstruc- tured datasets are indexed and searched. However, the perfor- mance of emerging storage and retrieval systems varies widely depending on their architectural design and optimization goals. This study presents a comparative evaluation of three distinct approaches MemVid, Qdrant, and Amazon S3 Vectors using a dataset of 10,417 medical-text chunks derived from research documents. Each system was assessed in terms of indexing efficiency, retrieval accuracy, latency, and resource utilization. Experimental results demonstrate that MemVid, which stores em- beddings in a video-encoded FAISS-based format, achieved lower query latency and higher retrieval precision for this corpus, while Qdrant exhibited superior scalability and flexibility in handling dynamic updates and metadata filtering. Amazon S3 Vectors, though currently in preview, offered cloud-native durability and seamless AWS integration with moderate performance overhead. The analysis reveals that no single vector system universally outperforms others; rather, each excels under specific workload conditions. These findings provide practical guidance for selecting appropriate vector storage backend based on corpus size, update frequency, and deployment environment.