Nov 25, 2025

[AI Math] Cosine similarity

Cosine similarity is a metric used to measure how similar two things are, mathematically represented as vectors.


1. The Core Concept: It's All About the Angle

Identical: If the arrows point in the exact same direction, the angle is 0. The cosine of 0 is 1. (Maximum similarity).
Unrelated: If the arrows are at a 90 angle (perpendicular), the cosine of 90 is 0. (No similarity).
Opposite: If they point in opposite directions 180, the cosine is -1.


2. The "Killer Feature": Magnitude Invariance

Why do we use this instead of just measuring distance (Euclidean distance)?

The problem with distance: Imagine you have two documents about "Apples."

Doc A: A short sentence: "I like apples."

Doc B: A massive 500-page book about apples.

If you count the word frequencies, Doc A might be vector [1, 0...] and Doc B might be [1000, 0...].

Euclidean Distance: These two are very far apart because Doc B has way more words.

Cosine Similarity: These two point in the exact same direction. They are both about apples.




Reference:
https://youtu.be/zcUGLp5vwaQ?si=IhjCEhAj6YoEsCtA

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.