Chunking that respects structure — don't shred your own documents — step 7 of 9
Sentences are split cleanly into chunks, but the chunks have NO overlap. The user asks "Acme makes WHAT — and when was it founded?" — that fact spans both chunks. Retrieval might pull only chunk 0 ("founded in 1958") OR only chunk 1 ("makes running shoes"), missing half the answer.
Fix the chunks to add 10 characters of overlap from the END of chunk 0 INTO chunk 1, so retrievers that match either chunk see both facts.
Expected output:
['Acme was founded in 1958.', 'd in 1958. The company makes running shoes.']
Sentences are split cleanly into chunks, but the chunks have NO overlap. The user asks "Acme makes WHAT — and when was it founded?" — that fact spans both chunks. Retrieval might pull only chunk 0 ("founded in 1958") OR only chunk 1 ("makes running shoes"), missing half the answer.
Fix the chunks to add 10 characters of overlap from the END of chunk 0 INTO chunk 1, so retrievers that match either chunk see both facts.
Expected output:
['Acme was founded in 1958.', 'd in 1958. The company makes running shoes.']
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.