Ataraxia through Epoché: [LLM] basic

Jan 28, 2025

[LLM] basic

The algorithm underlying BPE breaks down words that aren’t in its predefined

vocabulary into smaller subword units or even individual characters, enabling it to

handle out-of-vocabulary words. So, thanks to the BPE algorithm, if the tokenizer

encounters an unfamiliar word during tokenization, it can represent it as a sequence

of subword tokens or characters.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Subscribe to: Post Comments (Atom)