CantHarm Protocol

Task matrix

Locked benchmark views

Task	Benchmark unit	Released input view	Target	Split sizes	Metric
Form coarse classification	Form	Released form-side lexical input	`class_1_final`	3375 / 483 / 965	Macro-F1
Form fine classification	Form	Released form-side lexical input	`class_2_final`	3371 / 482 / 964	Macro-F1
Severity prediction	Form	Released form-side lexical input	`severity_score_final`	3375 / 483 / 965	Spearman
Audio coarse classification	Form	Canonical waveform only	`class_1_final`	3375 / 483 / 965	Macro-F1
Fusion coarse classification	Form	Form-side lexical input plus canonical waveform	`class_1_final`	3375 / 483 / 965	Macro-F1
ASR cascade coarse	Form	Canonical waveform to transcript inside cascade	`class_1_final`	3375 / 483 / 965	Macro-F1
ASR cascade fine	Form	Canonical waveform to transcript inside cascade	`class_2_final`	3371 / 482 / 964	Macro-F1
Spoken retrieval	Form	Surface-form text query and canonical waveform	paired retrieval	3375 / 483 / 965	Mean R@1
Sense coarse classification	Sense	Released sense-entry text	`class_1_source`	4464 / 631 / 1270	Macro-F1
Sense fine classification	Sense	Released sense-entry text	`class_2_source`	4451 / 634 / 1266	Macro-F1

Field-level contract

Allowed versus diagnostic-only inputs

Field group	Form text / severity	Sense text	Audio / fusion / ASR / retrieval	Policy note
Canonical surface form / headword strings	Allowed	Allowed	Allowed for fusion text side and retrieval text query; not ASR side info	Lexical-unit text input.
Linked definition text (ZH / EN)	Allowed if present in the released lexical export	Allowed	Allowed only on the fusion text side; not used by ASR or retrieval	Must be reported as released lexical input, not as hidden metadata.
Normalized phonetics (`jyutping`, `ipa`)	Allowed	Not a sense-side benchmark input	Allowed only on the fusion text side	Part of the released form object.
Canonical waveform	No	No	Allowed for audio-only, fusion, ASR cascade, and retrieval	One controlled clip per released form.
ASR transcript	No	No	Allowed only inside the ASR cascade after speech decoding	Not an extra released lexical side channel.
Source provenance, review flags, missingness indicators	Diagnostic only	Diagnostic only	Diagnostic only	Used for audit and slice analysis, not for benchmark scoring inputs.
Speaker ids, filenames, split ids, ordering fields	Forbidden	Forbidden	Forbidden	Identity and bookkeeping fields are never model inputs.
Gold-label and adjudication fields	Forbidden	Forbidden	Forbidden	Targets define evaluation only.

Split note

Fine-label exclusions

Six form rows and fourteen sense rows are excluded from stable fine-task evaluation because a small number of ultra-rare fine labels do not support a reviewer-stable train / validation / test split. These rows remain part of the released inventory and are not deleted from the dataset.

Speaker policy

Not speaker-disjoint

All benchmark splits are lexical-unit based. Speaker overlap across train, validation, and test is expected by design and should not be treated as an omitted speaker-generalization benchmark.