Semantic Textual Similarity

Similarity:

Where 1.00 is most similar and -1.00 is least similar.

What is semantic similarity?

Semantic similarity is a measure of how similar two sentences are in concept and meaning, regardless of their length or structure.

For example, "The cat sat on the mat" and "The mat had a cat sitting on it" have a high similarity. The sentences convey the same information, just with different structure and perspective. If the second sentence was instead, "The mat had a dog sitting on it", the similarity drops significantly as the meaning is different!

How can you extract the meaning from a sentence?

The best way to extract the meaning from sentences is by using some form of Natural Language Processing (NLP). The interactive tool above utilises Sentence-BERT (SBERT) which takes a sentence, analyses which words have been used and in what context, and creates an embedding. The embedding is a long list of numbers representing the main features of the text passage which can then be used for comparison against other sentences.

How is semantic similarity measured?

On this site, the semantic similarity between sentences is calculated by the cosine similarity score. It ranges from -1 to 1, where a score of -1 demonstrates the sentences have opposite meaning and 1 demonstrates the sentences mean the same thing. The cosine similarity can be calculated quickly from the embedding created by SBERT. There's other metrics that can be used to measure similarity such as the Dot product, Euclidean distance, and Manhattan distance, but generally cosine similarity is the default. By the time you have your embedding, you're really just finding ways to compare how similar the two lists of numerical features are.

What are the uses of semantic similarity?

Semantic search

Search is the obvious one. Understanding a user's intentions and the meaning of their search term enables you to provide much better results than if you were to just compare the words in the text. This can be useful when you have a large FAQ section or collection of blogs that you want a user to be able to search.

Semantic search can also be very efficient through an approximate nearest-neighbor search (ANNS). If you're setting up a book recommendation service based on someone's previous reads, there's no correct answer. You just need some close matches to what they've previously read and enjoyed. Here, you could employ an ANNS to filter through millions of embeddings in milliseconds.

Topic analysis

Once your sentences have been converted into embeddings it's possible to group them into related topics. Using a clustering algorithm, you can find clusters of related sentences based on how similar the features are. This could be really helpful in understanding your user's reviews and a quick way to find the reviews which need addressing most urgently.

Translation

When translating a sentence, it's important to understand the full meaning of the sentence. Not often is a translation correct if you're just replacing the words like for like with their counterpart.

Data Policy

Data is transferred to a server for processing. However, no input data is stored in this process. Cookies are used on the site for use with Google Analytics, with consent being handled by Cookiebot.

Sentence Similarity

Compare the semantic similarity of two sentences