Dimensionality and distance traps
Similarity systems compare records across chosen fields. If you choose poor fields, the system can group records for the wrong reason. Customer id, random row order, or pasted boilerplate can drown out the evidence that matters.
Metadata is part of the tool, not decoration. For messy survey responses, useful fields might include product area, issue words, role, region, and date. The right metadata helps reviewers understand why two records landed near each other.
The beginner goal is to ask: what should count as similar for this workplace decision?