Building a Deep Learning Retrieval-Based Chatbot with BERT and FAISS

Introduction

In this lecture, you will build a deep-learning retrieval-based chatbot using:

Sentence-BERT (SBERT) – to generate contextual sentence embeddings

FAISS (Facebook AI Similarity Search) – to perform fast similarity search among stored responses

A small intents dataset stored in JSON format

By the end, your chatbot will:

✅ Understand the meaning of a user’s message (not just keywords)
✅ Retrieve the most semantically relevant response
✅ Perform similarity search efficiently using FAISS

We’ll also discuss how this approach forms the foundation of modern retrieval-augmented chatbots (RAG systems).

Prepare the Dataset

Just as before, we’ll use a small intents.json file containing:

tag → the intent category
patterns → example user inputs
responses → possible chatbot replies

import json

# Load intents.json
with open("./resources/intents.json", "r") as f:
    intents = json.load(f)

# Quick check
print(intents["intents"][0])

Expected Output:

{
  "tag": "greeting",
  "patterns": ["hello", "hi there", "hey", "good morning", "good evening"],
  "responses": ["Hello! How can I help you today?"]
}

Installing Dependencies

We’ll need three major packages:

sentence-transformers → for deep contextual embeddings
faiss-cpu → for efficient similarity search
numpy → for vector operations

Run these commands in your terminal or Jupyter environment:

pip install -U sentence-transformers faiss-cpu numpy

Load the BERT-Based Model

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model developed by Google that understands the meaning of words in context by processing text bidirectionally—that is, considering both the left and right surroundings of each word. Unlike earlier models that read text in one direction, BERT captures nuanced semantic relationships, enabling it to generate rich, contextualized embeddings that reflect sentence meaning rather than just word similarity. In retrieval-based chatbots, using BERT (or its optimized variants like Sentence-BERT) is considered best practice because it allows the system to retrieve responses based on semantic relevance rather than exact keyword overlap—meaning the chatbot can recognize that “How’s it going?” and “How are you?” express the same intent, leading to more natural, intelligent, and user-aligned responses.

We’ll use a lightweight Sentence-BERT variant called MiniLM (small but powerful).

from sentence_transformers import SentenceTransformer

# Load pretrained Sentence-BERT model
model = SentenceTransformer('all-MiniLM-L6-v2')

This model maps sentences to 768-dimensional embeddings that capture semantic meaning, not just word overlap.

Flatten and Encode All Patterns

We’ll create a list of all patterns (possible user phrases) and encode them into embeddings.

pattern_texts = []
pattern_tags = []

for intent in intents["intents"]:
    for pattern in intent["patterns"]:
        pattern_texts.append(pattern)
        pattern_tags.append(intent["tag"])

# Convert patterns to embeddings
pattern_embeddings = model.encode(pattern_texts, convert_to_numpy=True, normalize_embeddings=True)

normalize_embeddings=True ensures cosine similarity ≈ dot product.

Each pattern_text now has a semantic vector (deep contextual meaning).

Building a FAISS Similarity Index

Now we’ll store all embeddings in a FAISS index, which enables instant nearest-neighbor search.

import faiss
import numpy as np

# Determine embedding dimension
embedding_dim = pattern_embeddings.shape[1]

# Create FAISS index (L2 distance, cosine works similarly since we normalized)
index = faiss.IndexFlatIP(embedding_dim)  # IP = Inner Product
index.add(pattern_embeddings)

print(f"Indexed {index.ntotal} patterns for retrieval.")

FAISS allows us to search for the most similar embeddings in O(log N) time — far faster than manually looping through vectors.

Implementing the Retrieval Function

By accomplishing the following, we will be able to take in user input and return an appropriate response for the correct intent:

Encode the user input into an embedding
Query FAISS for the most similar patterns
Retrieve the corresponding intent and response

import random

def retrieve_response(user_input, top_k=3):
    # Step 1: Encode input into embedding
    user_emb = model.encode([user_input], convert_to_numpy=True, normalize_embeddings=True)

    # Step 2: Search for top-k most similar patterns
    distances, indices = index.search(user_emb, top_k)

    # Step 3: Retrieve the best matching intent
    best_idx = indices[0][0]
    best_tag = pattern_tags[best_idx]

    # Step 4: Retrieve a random response from that intent
    for intent in intents["intents"]:
        if intent["tag"] == best_tag:
            return random.choice(intent["responses"])

Notes:

FAISS returns the indices of the most similar embeddings.
You can adjust top_k to analyze multiple potential matches.

Let’s try a few examples.

print(retrieve_response("hey there"))
print(retrieve_response("good night"))
print(retrieve_response("thanks!"))
print(retrieve_response("who are you"))

Example Output:

Hello! How can I help you today?
Goodbye! Have a wonderful day!
You're very welcome!
I'm your friendly retrieval-based chatbot!

Even if the user says “hey there!” instead of “hi”, SBERT understands the meaning — semantic similarity, not word matching.

Creating an Interactive Chat Loop

print("Chatbot is ready! Type 'quit' to exit.\n")

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        print("Chatbot: Goodbye!")
        break
    response = retrieve_response(user_input)
    print(f"Chatbot: {response}")

✅ Try paraphrasing phrases — you’ll see that Sentence-BERT still retrieves correct intents even with different wording.

Conceptual Diagram

User Input
   ↓
Sentence-BERT
   ↓
[User Embedding]
   ↓
FAISS Index ───→ [Pattern Embeddings]
   ↓
Retrieve Most Similar Pattern
   ↓
Get Corresponding Intent
   ↓
Return Response

Next-Step Improvements

To evolve this into a production-grade chatbot, you can:

Add Confidence Thresholds

If the similarity score is below a threshold, respond with a fallback:

if distances[0][0] < 0.5:
    return "I'm not sure I understand. Could you rephrase that?"

Re-Rank with a Cross-Encoder

Use a cross-encoder model to re-score top results for more precision.

Store in a Vector Database

Move FAISS into a persistent vector DB like:

Pinecone
Weaviate
Qdrant
Retrieval-Augmented Generation (RAG)

Combine this retriever with an LLM generator (like GPT or LLaMA):

Retrieve top 3 results with FAISS
Feed them as context to an LLM to generate a contextualized response

Summary

In this lecture, you:

✅ Built a retrieval-based chatbot powered by deep contextual embeddings ✅ Used Sentence-BERT to transform text into semantic vectors ✅ Integrated FAISS for fast similarity search ✅ Implemented a clean retrieval + response pipeline ✅ Learned how to upgrade spaCy-based bots into deep-learning systems