Cosine similarity is a metric used to measure how similar two things are, mathematically represented as vectors.
1. The Core Concept: It's All About the Angle
Identical: If the arrows point in the exact same direction, the angle is 0. The cosine of 0 is 1. (Maximum similarity).
Unrelated: If the arrows are at a 90 angle (perpendicular), the cosine of 90 is 0. (No similarity).
Opposite: If they point in opposite directions 180, the cosine is -1.
2. The "Killer Feature": Magnitude Invariance
Why do we use this instead of just measuring distance (Euclidean distance)?
The problem with distance: Imagine you have two documents about "Apples."
Doc A: A short sentence: "I like apples."
Doc B: A massive 500-page book about apples.
If you count the word frequencies, Doc A might be vector [1, 0...] and Doc B might be [1000, 0...].
Euclidean Distance: These two are very far apart because Doc B has way more words.
Cosine Similarity: These two point in the exact same direction. They are both about apples.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.