Why the narrow scope still matters
A controlled benchmark can still answer a real research question
This benchmark asks a focused question: can current models recover harmful meaning when each item is a lexical form with linked senses and one controlled speech clip? That focused setting is still useful because it keeps ambiguity, label structure, and speech grounding visible inside one auditable release.
Interpretation guide.
Speech-related tasks on this site should be read within this controlled lexical setting. They are not meant to stand in for open conversational moderation or broad speaker-variation studies.