CSV — the format every AI dataset and spreadsheet export uses

Pull any dataset off Kaggle, export anything from Notion or Salesforce, dump any database table to a file — you get a CSV. Cursor ships CSV-handling code constantly: read this export, filter these rows, write a cleaned version. Knowing the one right shape of that code saves you from the four wrong ones AI alternates between.

The right shape is csv.DictReader. It reads each row as a dict keyed by the header column names. You don't index by position (row[2]), you index by column name (row["age"]), which means a new column showing up next month doesn't break your code.

The mental model

A CSV file is a plain text file where each line is a row and commas separate the fields. The first line is usually the header — column names — and every line after that is data:

name,role,age
maya,admin,29
marcus,viewer,34

csv.DictReader reads the first line, remembers the column names, then yields each subsequent line as a dict mapping column name to string value:

{"name": "maya", "role": "admin", "age": "29"}
{"name": "marcus", "role": "viewer", "age": "34"}

Three things to internalize:

All values are strings. Even "29". CSV has no type system — it's text. If you need an int, you int(row["age"]) yourself.
You iterate the reader. It doesn't load the whole file into memory. for row in reader: streams one row at a time, which means it works on a 2GB CSV the same way it works on a 50-line file.
You need newline="" on the open call. This is the gotcha. Without it, Windows-style line endings can cause csv.DictReader to see blank rows. Always include it. Always.

A worked example

The editor on the right writes a tiny CSV and reads it back:

import csv
from pathlib import Path

Path("/tmp/users.csv").write_text(
    "name,role,age\n"
    "maya,admin,29\n"
    "marcus,viewer,34\n"
)

with open("/tmp/users.csv", newline="") as f:
    for row in csv.DictReader(f):
        print(row["name"], "->", row["role"])

The with block opens the file for reading. csv.DictReader(f) wraps the file handle and yields one dict per data row. The loop prints name -> role for each. Output:

maya -> admin
marcus -> viewer

pathlib's read_text doesn't fit this case because csv.DictReader wants a file-like object that yields lines, not a string. So we drop back to open(...) here — that's the only time in this chapter you'll see the old open shape. Remember newline="".

Where AI specifically gets this wrong

Three patterns to watch for in code Cursor writes you.

One: csv.reader instead of csv.DictReader. csv.reader gives you a list per row — ["maya", "admin", "29"]. You then have to remember that index 0 is name, index 1 is role, index 2 is age. That's brittle and unreadable. AI defaults to csv.reader because it's older. Reach for DictReader unless you're explicitly working with a header-less file.

Two: forgetting newline="". Cursor will write with open(p) as f: csv.DictReader(f) and ship it. On the developer's Mac it works. On a teammate's Windows machine, every other row is empty. Add newline="" every single time you open a file for csv work.

Three: forgetting the values are strings. int(row["age"]) > 30 works. row["age"] > 30 is "29" > 30 — a TypeError (Python 3 won't compare strings to ints). When you see CSV data being treated like numbers without an explicit int() or float(), that's a bug about to happen.

Run the editor. Pyodide's virtual filesystem makes the file write real, and DictReader streams it back as dicts.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 02 · real python›ch 10 · files and i/o

lesson 3 of 3 · csv and jsonl — the two formats ai moves data instep 1 / 9

CSV — the format every AI dataset and spreadsheet export uses

The mental model

A CSV file is a plain text file where each line is a row and commas separate the fields. The first line is usually the header — column names — and every line after that is data:

name,role,age
maya,admin,29
marcus,viewer,34

csv.DictReader reads the first line, remembers the column names, then yields each subsequent line as a dict mapping column name to string value:

{"name": "maya", "role": "admin", "age": "29"}
{"name": "marcus", "role": "viewer", "age": "34"}

Three things to internalize:

All values are strings. Even "29". CSV has no type system — it's text. If you need an int, you int(row["age"]) yourself.
You iterate the reader. It doesn't load the whole file into memory. for row in reader: streams one row at a time, which means it works on a 2GB CSV the same way it works on a 50-line file.
You need newline="" on the open call. This is the gotcha. Without it, Windows-style line endings can cause csv.DictReader to see blank rows. Always include it. Always.

A worked example

The editor on the right writes a tiny CSV and reads it back:

import csv
from pathlib import Path

Path("/tmp/users.csv").write_text(
    "name,role,age\n"
    "maya,admin,29\n"
    "marcus,viewer,34\n"
)

with open("/tmp/users.csv", newline="") as f:
    for row in csv.DictReader(f):
        print(row["name"], "->", row["role"])

The with block opens the file for reading. csv.DictReader(f) wraps the file handle and yields one dict per data row. The loop prints name -> role for each. Output:

maya -> admin
marcus -> viewer

Where AI specifically gets this wrong

Three patterns to watch for in code Cursor writes you.

Run the editor. Pyodide's virtual filesystem makes the file write real, and DictReader streams it back as dicts.

⌘↵ runs the editor.read, then continue.

CSV and JSONL — the two formats AI moves data in — step 1 of 9

CSV — the format every AI dataset and spreadsheet export uses

The mental model

A worked example

Where AI specifically gets this wrong

CSV and JSONL — the two formats AI moves data in — step 1 of 9

CSV — the format every AI dataset and spreadsheet export uses

The mental model

A worked example

Where AI specifically gets this wrong