Find Relevant Answers: Semantic Search with Text Embeddings
Last updated
Was this helpful?
Last updated
Was this helpful?
Today, we are going to use to transform a list of phrases into vectors. When a user asks a question, we will convert it into a vector as well and find the phrases from the list that are semantically closest. This approach is useful, for example, to immediately suggest relevant FAQ sections to the user and reduce the need for full support requests.
So, here's a plan:
Prepare the data: Create a numbered list of text phrases.
Generate embeddings: Use a model to embed each phrase into a vector.
Embed the question: When the user asks something, embed the question text.
Find similar phrases: Calculate the similarity (e.g., cosine similarity) between the question vector and the list vectors. Show the top 1–3 most similar phrases as the answer.
We have compiled the following list of FAQ headings:
Now each of our headings has a corresponding embedding vector.
Similarly, we process the user's query. We save the embedding vector generated by the model into a separate variable.
We calculate the similarity between the question vector and the list vectors.
There are different metrics and functions you can use for this, such as cosine similarity, dot product, or Euclidean distance.
In this example, we use cosine similarity because it measures the angle between two vectors and is a popular choice for comparing text embeddings, especially when the magnitude of the vectors is less important than their direction.
In this section, you will find the complete Python code for the described use case, along with an example of the program's output.
Naturally, this is a simplified example. You can develop a more comprehensive implementation by introducing features such as:
Add a minimum similarity threshold to filter out irrelevant results,
Cache embeddings for faster lookup without recalculating them each time,
Allow partial matches or fuzzy search for broader results,
Handle multiple user questions at once (batch processing) — and more.
Let's save our headings as a list and pass them to the model. We chose the model — it has been trained on a large dataset and is powerful enough to build complex semantic connections.
Please note that to use the cosine similarity function, you need to install the library separately. You can install it with the following command:
Do not forget to replace <YOUR_AIMLAPI_KEY>
with your actual AI/ML API key from on our platform.
Here is the program output after we switched to the small version of the model, :
Apparently, it was trained a bit less thoroughly and doesn't recognize who cynologists are We didn't notice much difference in speed, but the larger version is somewhat more expensive.