| Bundle identity |
cantharm_release_manifest.json, cantharm_release_readme.md |
2026-04-02 public release |
Defines the single authoritative public release line used by the paper and this site. |
| Form-level manual label review |
label_review_flag |
1376 form rows |
Shows that a substantial subset of released forms was manually reviewed rather than untouched source carry-over. |
| Variant / normalization review |
variant_rule_manual_review |
67 sense rows |
Confirms that some variant handling decisions were explicitly reviewed. |
| Multi-source provenance fields |
source_dictionary, source_status |
Present in released sense rows |
Source information remains visible in the public workbook for audit and interpretation. |
| Form and sense layers |
Finalized form targets and linked sense fields |
Both released |
The benchmark provides a finalized form layer for scoring and a linked sense layer for interpretation. |
| Pronunciation normalization |
jyutping, jyutping_status, ipa |
Present in released form rows |
Pronunciation is part of the released benchmark object. |
| Documentation pages |
Site pages, README, reproducibility note |
Included |
The public bundle includes written guidance for scope, protocol, access, ethics, and release interpretation. |