Question 1

If an LLM has a 128,000-token context window and your conversation uses 120,000 tokens, what happens?

Accepted Answer

Only 8,000 tokens remain for the model's next response. The full context (input + output) must fit within 128K tokens. With 120K used, only 8K tokens remain for the response. The model may also start dropping earlier messages depending on the API implementation.

Question 2

Based on the "lost in the middle" research, where should you put the most critical instructions in a long prompt?

Accepted Answer

At the start and end of the prompt. Studies show LLMs have recency and primacy bias — they best remember content at the very start and very end. Put critical instructions in both locations.

Question 3

For processing a 500-page PDF, what approach is most effective given context window limitations?

Accepted Answer

Use RAG to retrieve only relevant sections when answering questions. RAG is the most scalable approach — it embeds all 500 pages and only retrieves the most relevant chunks per question, avoiding context stuffing and the "lost in the middle" problem.

Context Window Explained — What LLMs Can "Remember"

What Is the Context Window?

Context Windows by Model (2026)

The "Lost in the Middle" Problem

Practical Context Window Strategy

Deep Dive Articles

Related Concepts