• 0 Posts
  • 5 Comments
Joined 2 years ago
cake
Cake day: June 24th, 2023

help-circle

  • That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

    The reason we often simplify it as token = words is that it is the case for most of the common words.