Why AI language models choke on too much text
Large language models represent text using tokens, each of which is a few characters. Short words are represented by a single token (like "the" or "it"), whereas larger words may be represented by several tokens (GPT-4o represents "indivisible" with "ind," "iv," and "isible").
When OpenAI released ChatGPT two years ago, it had a memoryβknown as a context windowβof just 8,192 tokens. That works out to roughly 6,000 words of text. This meant that if you fed it more than about 15 pages of text, it would βforgetβ information from the beginning of its context. This limited the size and complexity of tasks ChatGPT could handle.
Todayβs LLMs are far more capable: