Vector Database for ChatGPT

Review our ChatGPT architecture

Updated over a week ago

Large language models open up some very interesting opportunities for policies and procedures. Introducing a conversational approach to your end users means that your users can get answers to their questions instead of list of blue links to click on. Both search and chat have their own set of strengths. This document is here to explain our architecture at a high level.

We store all of your existing policies and procedures in an ElasticSearch database. ElasticSearch is great and finding various word combinations and surfacing results. Large language models have a more sophisticated mechanism for indexing the relative meaning of words.

"I'm standing on the bank of a river."

"The airplane banked left."

"I'm going to deposit money into my bank account."

Traditional search models are not equipped to understand that the word "bank" has three completely different meanings. This is where vector databases shine. Each word in the sentences above are associated with the surrounding words with a distance and angle (a vector). By comparing these vectors with a huge open source training set, models like ChatGPT are able decode what we perceive as meaning from these statements.

How do we implement this technically?

We index all of your active and released policies and procedures into a vector database. This process analyzes the text and stores mathematical values of your policies as vectors. These values are stored as groups of numbers. These numbers allow for highly efficient search and retrieval of relevant content based on the question asked.

When you ask a question, ChatGPT converts your question into these vectors and then searches our vector database to determine which documents most closely match. Then ChatGPT composes a conversational response.

What information is stored in Our Vector Database?

We store the following fields our vector database. Our vector database is secured in our AWS environment on an ec2 instance. This instance is backed up daily and retention is set for 14 days.

  1. Article Sequence

  2. Article Title

  3. Article Content

  4. Policy Name

  5. Policy Effective Date

  6. Policy Version

  7. Policy Manager

  8. Procedure Sequence

  9. Procedure Title

  10. Procedure Content

  11. Procedure Department

Did this answer your question?