v1.1-gb provides a complete second canonical recording line for the original CantHarm benchmark. The selected manifest covers all 4,823 original forms, with one second canonical recording per form and file-level hashes for inspection.
What v1.1-gb adds
- One selected second canonical recording for each original form.
- 4,823 selected manifest rows and 4,823 unique form IDs.
- SHA256 hashes for every selected audio file.
- Recorded sample rate, channel count, and duration metadata.
- A clean candidate line over the original lexical inventory, with no new lexical entries mixed into the selected manifest.
Manifest validation
| Check | Result |
|---|---|
| Selected rows | 4,823 |
| Unique form IDs | 4,823 |
| Audio readability | All selected files exist and are readable. |
| File integrity | Each selected file has a SHA256 hash. |
| Audio metadata | Sample rate, channel count, and duration are present. |
Candidate channels
- GitHub pre-release
- HuggingFace candidate branch
- Zenodo candidate record / DOI
- OSF candidate component
- Revised-candidate result tables
How it relates to v1.0
v1.0 remains the stable public release and archival DOI. v1.1-gb is a separately versioned candidate line that lets reviewers and users inspect the revised-audio candidate without changing the v1.0 citation or release package.