Developing an Advanced domain-specific Chatbot with RAG Model and AWS Integration
Project Overview
Developed a comprehensive chatbot application that integrates PDFs and text data, utilizing AWS and Python for seamless end-to-end implementation. This project showcases expertise in advanced prompt engineering, data quality management, and cost-efficient solutions.
Key Achievements
- Chatbot Development: Created a sophisticated chatbot that integrates various data formats, ensuring efficient and accurate information retrieval.
- Advanced Prompt Engineering: Leveraged state-of-the-art prompt engineering techniques and the Batch GPT-4 mini API to clean unstructured in-house credit stories text, achieving a 95% accuracy in data quality and saving 90% in operational costs.
- RAG Model Implementation: Developed and implemented a Retrieval-Augmented Generation (RAG) model to generate CMBS-specific answers using ChromaDB on AWS SageMaker, significantly enhancing data relevance and precision.
- Evaluation Datasets: Conducted thorough research and development of experimental evaluation datasets for the RAG application with the assistance of the GPT API, establishing a foundation for future industry-standard evaluation datasets.
- Model Enhancement: Enhanced the RAG model with techniques such as metadata tagging, filtering, and post-retrieval reranking algorithms, boosting the domain-specific retrieval index accuracy from 60% to 90%.
Technical Stack
- Tools & Technologies: Python, Spacy, AWS SageMaker, ChromaDB, GPT API
Impact
The project not only improved data quality and operational efficiency but also set new benchmarks in domain-specific retrieval accuracy, showcasing the potential for future advancements in the field.