# CantHarm 2026-04-02 Public Release

This README documents the locked CantHarm 2026-04-02 public release and its public mirrors, license scope, citation, and contact/takedown channels.

Official website: https://cantharm.dataset.aidimsum.com/

## Release Identity

- Release version: 2026-04-02 locked public release
- Forms: 4,823
- Senses: 6,365
- Canonical audio clips: 4,823
- Speaker IDs: 11
- Audio design: one canonical audio clip per retained form
- Primary split: lexical-unit based; this release is not a speaker-generalization benchmark

## Included Public Files

- `cantharm_release_manifest.json`
- `cantharm_release_workbook.xlsx`
- `cantharm_release_metadata_bundle.zip`
- `cantharm_dataset_statistics.csv`
- `cantharm_benchmark_highlights.csv`
- `cantharm_audio_inventory.csv`
- `cantharm_audio_spk01.zip` to `cantharm_audio_spk11.zip`

## Not Included

- Materials outside the 2026-04-02 locked public release
- Non-release operational notes, non-release result artifacts, trained checkpoints, or non-release adjudication notes
- Redistribution of dictionary-derived definitions/source text requires source-specific permission

## Public Mirrors And Archives

- GitHub repository: https://github.com/GZU-JK/CantHarm
- GitHub release: https://github.com/GZU-JK/CantHarm/releases/tag/v2026.04.02
- HuggingFace Dataset: https://huggingface.co/datasets/jk-gjom/CantHarm
- Zenodo record: https://zenodo.org/records/20511573
- Zenodo DOI: https://doi.org/10.5281/zenodo.20511573
- OSF project: https://osf.io/3uhpx/

## Safety Notice

CantHarm contains harmful and offensive lexical material and waveform audio. It is for research and benchmark analysis, not production moderation or punitive decision-making.

Contact/takedown: yueyu_dimsum@163.com. Backup: qijiayin@139.com.
