Custom Gpt System

Authors: Prathibhavani P M, Pragathi S

Abstract: We propose a Custom GPT System that allows domain-specific adaptation of a large language model through a flexible, modular pipeline. The system uses a Python Flask REST API as the backend server, facilitating a microservices-style deployment of core components. A high performance Groq API endpoint executes inference on Meta’s LLaMA 3 model (available in 8B and 70B parameter sizes) to generate responses. Crucially, our pipeline integrates a Retrieval Augmented Generation (RAG) stage: incoming queries trigger semantic retrieval from a vector store of domain documents, and the returned context is combined with the user prompt to guide generation. Prompts and model settings are parameterized via human-readable YAML configuration files, enabling easy customization of system behaviour and personality. This architecture can be applied to sectors like healthcare, education, and customer support by supplying relevant documents and prompt templates. We describe the system architecture (see Fig. 1), implementation details, and an API-based deployment strategy. In evaluation, the Groq-accelerated LLaMA 3 achieves up to 18× throughput improvements versus a GPU baseline, and the RAG component markedly reduces hallucinations by grounding output in up-to-date knowledge.

DOI: http://doi.org/10.5281/zenodo.16444282

Related posts

Follow Us on