Question 1

Approximately how many tokens are in 1,000 English words?

Accepted Answer

1,333 tokens. 1,000 words ≈ 1,333 tokens in English (since 1 token ≈ 0.75 words, so 1 word ≈ 1.33 tokens).

Question 2

What is the main advantage of subword tokenization over word-level tokenization?

Accepted Answer

It handles unseen words, typos, and multiple languages gracefully. Subword methods like BPE can represent any string by combining known subword pieces, making them robust to rare words and multilingual text.

Question 3

If you send a 10,000-token prompt to GPT-4o, how does this affect the available output tokens?

Accepted Answer

It reduces the remaining context window for the response by 10,000 tokens. Input and output tokens share the same context window budget. A 10,000-token prompt leaves 118,000 tokens for the response in GPT-4o's 128K window.

LLM Tokenization Explained

What is a Token?

Why Not Just Use Words?

Tokens and Context Windows

Practical Token Tips

Deep Dive Articles

Related Concepts