Implementing FAISS: Vector Similarity Search for Recommendations

7 min readMar 10, 2024

FAISS, short for Facebook AI Similarity Search, is an open-source library created by Facebook AI Research (FAIR) to facilitate efficient similarity search and clustering of high-dimensional vectors. It is designed to excel in situations with extensive datasets and high-dimensional feature vectors, which are frequently encountered in tasks like image and text retrieval, recommendation systems, and natural language processing.

https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/

Let’s implement FAISS in Google Colab (T4 GPU) and leverage it for efficient vector similarity search in order to develop a recommendation system.

GitHub Repository for the Project

1. Install and Import relevant Libraries

# Installing relevant libraries
!pip install sentence_transformers
!pip install datasets
!pip install faiss-gpu
!pip install faiss-cpu

# Importing relevant libraries
from datasets import load_dataset
import pandas as pd
from sentence_transformers import SentenceTransformer
import faiss
import warnings

2. Set Options

warnings.simplefilter('ignore')
pd.set_option("display.max_columns", None)
pd.options.display.float_format = '{:.2f}'.format

3. Load Dataset

Here we are using a small dataset of book titles and their descriptions for our use-case.

dataset = load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='test')
df = pd.DataFrame(dataset)
df.head()

4. Basic Metrics

# Shape of the dataset -
df.shape

# Removing duplicate rows
df.drop_duplicates(inplace=True)

# Checking number of Nulls
df.isnull().sum(axis=1).sum()

5. Embeddings

We’ll gather all the descriptions and create sentence embeddings for each one, which can subsequently be stored in FAISS.

# Creating list of sentences
sentences = df['description'].tolist()

Next, we’ll proceed with constructing the sentence embeddings.

# Creating sentence embeddings
model = SentenceTransformer('bert-base-nli-mean-tokens')
sentence_embeddings = model.encode(sentences)

6. Vector Similarity Search Implementation

We configure the dimensionality of our FAISS database according to these vector embeddings.

# Vector dimensionality
vec_dim = sentence_embeddings.shape[1]
vec_dim

# Setting number of nearest neighbours
k = 4

# Creating Search Query embedding
query = model.encode(["Aliens and Spaceships"])

6.1 Flat L2 Index

We begin by initializing the flat L2 distance index ‘IndexFlatL2’, where we only need to specify the vector dimensionality. In this case, the dimensionality is vec_dim = 768, which corresponds to the output embeddings size of the sentence-BERT model.

# Creating index
index = faiss.IndexFlatL2(vec_dim)
index.add(sentence_embeddings)

Once we are satisfied that our index is prepared, we proceed to add new vectors using the add method.

Next, we’ll locate the nearest query embeddings to our search query embedding based on the specified value of ‘k’.

# Searching neighbours for the query
%%time
D, I = index.search(query, k)
print(I)

After selecting the descriptions that closely match the search query, we print the respective titles of the books associated with those descriptions.

# Printing nearest neighbours for book titles
[f'{i}: {df.iloc[i].title}' for i in I[0]]

Indeed, it appears we have found some promising matches. All the recommendations are related to either space, aliens, or both. Adjusting the number of recommendations can be achieved by updating the value of ‘k’.

6.2 Adding Partitioning to the Index

FAISS indeed provides the option to enhance our search efficiency through various methods, with partitioning the index into Voronoi cells being a popular approach. With this method, we take our query vector, determine the cell it belongs to, and then utilize our ‘IndexFlatL2’ to search among the query vector and all indexed vectors within that cell. Additionally, we can include vectors from other nearby cells if needed.

# Defining number of partitions
nlist = 200

In this updated setup, we’ve introduced a new parameter called ‘nlist’. This parameter allows us to specify the number of partitions we want our index to have.

# Creating index
quantizer = faiss.IndexFlatL2(vec_dim)
index = faiss.IndexIVFFlat(quantizer, vec_dim, nlist)

We initialize our new partitioned index by incorporating our previous ‘IndexFlatL2’ operation as a quantization step, which serves as another stage in the search process. Then, we feed this into the new ‘IndexIVFFlat’ operation.

# Training index
index.train(sentence_embeddings)

With the inclusion of partitioning using ‘IndexIVFFlat’, training becomes necessary as grouping and transformation are involved in building this index. Therefore, before adding any data to the index, we need to train it on our dataset. This step is essential so that the index can appropriately group each vector.

# Adding sentence embeddings
index.add(sentence_embeddings)

After training our index, we proceed to add our data in the same manner as before. We can then conduct our search again using the same indexed sentence embeddings and the same ‘query’.

# Searching neighbours for the query
%%time
D, I = index.search(query, k)
print(I)

# Printing nearest neighbours for book titles
[f'{i}: {df.iloc[i].title}' for i in I[0]]

We can also adjust the number of nearby cells to search using the parameter ‘nprobe’.

# Increasing number of nearby cells to search
index.nprobe = 20

# Searching neighbours for the query
%%time
D, I = index.search(query, k)
print(I)

# Printing nearest neighbours for book titles
[f'{i}: {df.iloc[i].title}' for i in I[0]]

Increasing the value of ‘nprobe’ can enhance the accuracy of our search, but at the expense of time. In our previous search using only ‘IndexFlatL2’, which was exhaustive (comparing every single vector), we achieved perfect accuracy in identifying the closest matches. By reducing the ‘nprobe’ value, we limit the scope of our search. While we obtained perfect results matching our previous ‘IndexFlatL2’ only results, if we find that the results are not closely matching, we can increase ‘nprobe’ to improve accuracy, albeit at the cost of increased time. It’s important to note that the time taken for the search can vary with each run as well.

6.3 Quantization

Now that we have significantly reduced the search time, the next step is to address the issue of storage space consumption when storing full vectors, especially in large datasets.

Fortunately, FAISS offers a solution by providing the ability to compress vectors using transformations based on Product Quantization (PQ). But what exactly is PQ?

PQ can be seen as an additional approximation step, akin to our use of ‘IVF’, which allowed us to approximate by reducing the scope of our search. However, PQ differs slightly as it approximates the distance (or similarity) calculation instead.

PQ achieves this by compressing the vectors themselves, which involves several steps:

We partition each original vector into several subvectors.
For each set of subvectors, we conduct a clustering operation, generating numerous centroids for each set of subvectors.
Within our vector of subvectors, we replace each subvector with the ID of its nearest centroid.

This compression process helps alleviate the storage space issue associated with storing full vectors.

# Number of centroid IDs
num_cent_ids = 8

# Number of bits in each centroid
cent_bits = 8

# Keeping the same L2 distance flat index
quantizer = faiss.IndexFlatL2(vec_dim)
index = faiss.IndexIVFPQ(quantizer, vec_dim, nlist, num_cent_ids, cent_bits)

Once more, we’ll have to train the index and add our vectors before proceeding further.

# Training index
index.train(sentence_embeddings)

# Adding sentence embeddings
index.add(sentence_embeddings)

Let’s compare it with our previous index, which didn’t utilize Product Quantization (PQ) and had a ‘nprobe’ value of 20.

# Defining number of nearby cells to search
index.nprobe = 20

# Searching neighbours for the query
%%time
D, I = index.search(query, k)
print(I)

By incorporating Product Quantization (PQ), we have successfully reduced our search time. Although the difference may be small on a dataset of this size, it can make a significant impact when scaled to larger datasets.

It’s important to note that with these speed optimization techniques (IVF and PQ), there might be slight differences in the results returned. However, upon printing out these results, we’ll observe that each item is still a relevant match.

# Printing nearest neighbours for book titles
[f'{i}: {df.iloc[i].title}' for i in I[0]]

In conclusion, despite not achieving perfect accuracy, the close results obtained through approximation techniques like ‘IVF’ and ‘PQ’ provide substantial speed enhancements.