promptdojo_

The ML/data package map

When an AI writes data or ML code, it reaches for a handful of packages, and the imports tell you what kind of work is happening. You don't need to use them to read them — you need the map of what each is for.

  • numpy — fast numeric arrays and math. The base layer most others build on.
  • pandastables (dataframes): load a CSV, filter rows, group and aggregate. The spreadsheet of Python.
  • scikit-learnclassic ML on tabular data: train a decision tree, a logistic regression, a clustering model. Not deep learning.
  • torch (PyTorch) — deep learning: tensors, neural networks, training loops. Heavyweight; needed for custom nets.
  • transformersuse a pretrained LLM or NLP model without training one yourself.

Run the editor for the map.

Reading the import line

import pandas as pd at the top of an AI-written script tells you "this is table wrangling." from sklearn... import says "classic model on tabular data." import torch says "neural nets — this is heavier." Recognizing the package is recognizing the kind of program before you read a line of logic.

Why a builder cares

Picking the wrong package is a classic AI mistake — reaching for torch to filter a CSV (overkill; pandas does it in a line), or hand-rolling math that numpy does faster. Knowing the map lets you sanity-check "is this the right tool for the task?" You'll match tasks to packages next — no installs, just the mapping.