CantHarm Scope

Benchmark object

What the release contains

4,823 form rows with adjudicated coarse, fine, and severity targets
6,365 linked sense rows that preserve source-level evidence
4,823 canonical audio files, one per retained form
Manifest, statistics, result summary, and reproducibility documentation

Non-claims

What this site does not ask reviewers to assume

No claim of in-the-wild conversational coverage
No claim of speaker-transfer or speaker-generalization evaluation
No claim that the released severity score is a universal harm scale
No claim that the benchmark is already sufficient for real moderation deployment

Why the narrow scope still matters

A controlled benchmark can still answer a real research question

This benchmark asks a focused question: can current models recover harmful meaning when each item is a lexical form with linked senses and one controlled speech clip? That focused setting is still useful because it keeps ambiguity, label structure, and speech grounding visible inside one auditable release.

Interpretation guide. Speech-related tasks on this site should be read within this controlled lexical setting. They are not meant to stand in for open conversational moderation or broad speaker-variation studies.

Speech validity boundary

Controlled audio facts

All 4,823 audio files are stereo 16-bit PCM waveforms.
3,222 clips are stored at 44.1 kHz and 1,601 at 48 kHz.
Median duration is 2.0647 s and the 95th percentile is 3.1096 s.
Speaker counts range from 108 to 637 clips across SPK01 to SPK11.

Reviewer-safe framing

What CantHarm is measuring

What the release contains

What this site does not ask reviewers to assume

A controlled benchmark can still answer a real research question

Controlled audio facts

Recommended reading of the benchmark claim