Developing an Advanced domain-specific Chatbot with RAG Model and AWS Integration

Project Overview

Developed a comprehensive chatbot application that integrates PDFs and text data, utilizing AWS and Python for seamless end-to-end implementation. This project showcases expertise in advanced prompt engineering, data quality management, and cost-efficient solutions.

Key Achievements

  • Chatbot Development: Created a sophisticated chatbot that integrates various data formats, ensuring efficient and accurate information retrieval.
  • Advanced Prompt Engineering: Leveraged state-of-the-art prompt engineering techniques and the Batch GPT-4 mini API to clean unstructured in-house credit stories text, achieving a 95% accuracy in data quality and saving 90% in operational costs.
  • RAG Model Implementation: Developed and implemented a Retrieval-Augmented Generation (RAG) model to generate CMBS-specific answers using ChromaDB on AWS SageMaker, significantly enhancing data relevance and precision.
  • Evaluation Datasets: Conducted thorough research and development of experimental evaluation datasets for the RAG application with the assistance of the GPT API, establishing a foundation for future industry-standard evaluation datasets.
  • Model Enhancement: Enhanced the RAG model with techniques such as metadata tagging, filtering, and post-retrieval reranking algorithms, boosting the domain-specific retrieval index accuracy from 60% to 90%.

Technical Stack

  • Tools & Technologies: Python, Spacy, AWS SageMaker, ChromaDB, GPT API

Impact

The project not only improved data quality and operational efficiency but also set new benchmarks in domain-specific retrieval accuracy, showcasing the potential for future advancements in the field.