Release structure

Sheet roles and benchmark meaning

Release object Key fields Role in the benchmark
Form rows class_1_final, class_2_final, severity_score_final, jyutping, ipa, primary_filename Defines the adjudicated benchmark gold for form-level tasks and the form side of the speech-conditioned tasks.
Sense rows class_1_source, class_2_source, severity_score_source, source_dictionary, source_status, definitions Preserves source-linked ambiguity, provenance, and cross-source disagreement for sense-level evaluation and audit.
Canonical audio Fxxxxx_SPKyy.wav naming pattern, one waveform per form Provides the controlled speech grounding for audio-only, fusion, ASR-cascade, and retrieval tasks.

Supervision contract

Form gold versus sense evidence

Form-level evaluation uses adjudicated gold targets. Sense-level evaluation uses source-linked targets. The form gold should not be reconstructed by mechanically unioning or voting over linked sense rows.

  • 507 polysemous forms keep one coarse and one fine label across senses.
  • 199 polysemous forms keep one coarse label but multiple fine labels.
  • 408 forms remain coarse-diverse across senses.

Layer relationship

Why both layers are released

  • One lexical form can link to more than one sense.
  • Source dictionaries can differ in granularity and wording.
  • Form-level targets are finalized for scoring, while sense rows preserve ambiguity and provenance.

The form layer and sense layer are therefore complementary views of the same release rather than duplicate tables.

Label ontology

Coarse categories in the locked release

Coarse label Forms Share
Insult-discrimination247051.21%
Illicit-illegal120625.01%
Sexual-obscene60212.48%
Benign3587.42%
Political1813.75%
Terror-extremism60.12%
Schema note. Source provenance, review flags, missingness indicators, and other audit fields are retained so that ambiguity and curation history remain inspectable. They are not part of the benchmark gold definition and are not unrestricted model inputs.