Task matrix

Locked benchmark views

Task Benchmark unit Released input view Target Split sizes Metric
Form coarse classificationFormReleased form-side lexical inputclass_1_final3375 / 483 / 965Macro-F1
Form fine classificationFormReleased form-side lexical inputclass_2_final3371 / 482 / 964Macro-F1
Severity predictionFormReleased form-side lexical inputseverity_score_final3375 / 483 / 965Spearman
Audio coarse classificationFormCanonical waveform onlyclass_1_final3375 / 483 / 965Macro-F1
Fusion coarse classificationFormForm-side lexical input plus canonical waveformclass_1_final3375 / 483 / 965Macro-F1
ASR cascade coarseFormCanonical waveform to transcript inside cascadeclass_1_final3375 / 483 / 965Macro-F1
ASR cascade fineFormCanonical waveform to transcript inside cascadeclass_2_final3371 / 482 / 964Macro-F1
Spoken retrievalFormSurface-form text query and canonical waveformpaired retrieval3375 / 483 / 965Mean R@1
Sense coarse classificationSenseReleased sense-entry textclass_1_source4464 / 631 / 1270Macro-F1
Sense fine classificationSenseReleased sense-entry textclass_2_source4451 / 634 / 1266Macro-F1

Field-level contract

Allowed versus diagnostic-only inputs

Field group Form text / severity Sense text Audio / fusion / ASR / retrieval Policy note
Canonical surface form / headword strings Allowed Allowed Allowed for fusion text side and retrieval text query; not ASR side info Lexical-unit text input.
Linked definition text (ZH / EN) Allowed if present in the released lexical export Allowed Allowed only on the fusion text side; not used by ASR or retrieval Must be reported as released lexical input, not as hidden metadata.
Normalized phonetics (jyutping, ipa) Allowed Not a sense-side benchmark input Allowed only on the fusion text side Part of the released form object.
Canonical waveform No No Allowed for audio-only, fusion, ASR cascade, and retrieval One controlled clip per released form.
ASR transcript No No Allowed only inside the ASR cascade after speech decoding Not an extra released lexical side channel.
Source provenance, review flags, missingness indicators Diagnostic only Diagnostic only Diagnostic only Used for audit and slice analysis, not for benchmark scoring inputs.
Speaker ids, filenames, split ids, ordering fields Forbidden Forbidden Forbidden Identity and bookkeeping fields are never model inputs.
Gold-label and adjudication fields Forbidden Forbidden Forbidden Targets define evaluation only.

Split note

Fine-label exclusions

Six form rows and fourteen sense rows are excluded from stable fine-task evaluation because a small number of ultra-rare fine labels do not support a reviewer-stable train / validation / test split. These rows remain part of the released inventory and are not deleted from the dataset.

Speaker policy

Not speaker-disjoint

All benchmark splits are lexical-unit based. Speaker overlap across train, validation, and test is expected by design and should not be treated as an omitted speaker-generalization benchmark.