AI News

Comparing Vector Stores For Retrieval Augmented Generation

· March 29, 2025

I have spent the last couple months building an AI chatbot which answers questions based on the content of any website. For that, I tried a couple of famous vector stores out there for storing the embeddings created from websites data.

This article contains real world takeaways based on my research and experiments. Please keep in mind that the ecosystem is evolving at a very high pace so things said in this article might not be relevant at the time when you are reading it.

Pinecone

This is the easiest option to start with. They offer a free tier where you can create a single index which can store almost 2.5M embeddings (or vectors) created by OpenAI text embedding models. This is sufficient for storing the data from a significantly large site. The only downside of the free plan is that your index gets deleted after 7 days of inactivity.

In their dashboard you can track the traffic of queries (reads, inserts, updates and deletes) and storage consumed. Other options in this post do not give this freedom to visualize the data.

Their paid plan starts at $70/month where you can store more than 2.5M embedding and your index won't get automatically deleted.

PGVector

If you are already using Postgres in your infrastructure, you may want to go with PGVector as it is a Postgres extension which converts your normal RDBMS into a vector store. You can enable the PGVector extension in the Azure Flexible servers. Please check if your cloud provider allows extensions on the Postgres instances before proceeding further with this option.

While the feature set of PGVector is quite good and it supports most of the desired features expected of a vector store, I found it to be quite sluggish as compared to the other options on this list. It might be my Azure Postgres instance so please do your performance benchmarking before going with this option.

Chroma

This is the closest competitor to Pinecone in my opinion. Plus, it is also open source so it can be hosted on-premise. Chroma offers an "in-memory" mode, where you can directly start an instance along with your Python app, without installing it separately.

I found Chroma to be fast and reliable. The only gripe at the time of testing was that they did not offer an easy way to host it. I had to rummage through a lot of docs to make it run on my Kubernetes cluster. In their townhall, they stated that they were working on a better documentation and a better way to self-host. Hopefully, it will come out soon.

Redis

If you use Redis already, you can turn it into a vector store by using a Redis extension called RediSearch. This extension can be deployed on your self-hosted instances and is also supported on major Redis cloud hosting providers.

I found it to be quite fast and reliable but you may lose your data if the Redis instance goes down, so look into how to persist data to be extra safe. One more issue at the time of testing was that it did not support metadata filtering which I required for segregating data based on customers. If you want to build an app where you want to have some sort of distinction between user data, Redis might not work for you. You can track the feature request here.

[Feature] Redis Vectorestore - similarity_search filter by metadata · Issue #3967 · langchain-ai/langchain (github.com )

FAISS

This is from Facebook and can be self-hosted. This is also the one I have the least experience with. I tried it during my search for the perfect vector store for my app. I found it to be fast and it supports metadata filtering.

The reason I didn't go ahead with this one was that it didn't support automatic loading of data i.e., you cannot put more data on to FAISS while it is running. You have to restart the server to load new data.