Research access and use boundary
CantHarm contains offensive lexical material and speaker audio. Access is intended for research, audit, documentation, and benchmark reproduction within the documented release scope.
Research use
- Dataset inspection, documentation review, and reproducibility checks.
- Evaluation of harmful lexical-form and sense-aware classification.
- Controlled audio, ASR cascade, multimodal, and spoken retrieval experiments within the benchmark design.
- Scholarly citation and derived analysis that respects license and source-text caveats.
Non-use
Do not use CantHarm as a standalone basis for punitive moderation decisions, user enforcement, or production safety systems.
- Do not use the audio for speaker recognition, biometric modeling, voice cloning, or re-identification.
- Do not infer speaker demographics from the audio or pseudonymous speaker IDs.
- Do not treat dictionary-derived definitions or source text as unconditionally open-license material.
Access boundaries
The release documents a controlled benchmark with one canonical clip per retained form. It does not claim conversational coverage, spontaneous speech coverage, production moderation readiness, or demographic representativeness.
Contact and takedown
For questions, takedown requests, harmful-content concerns, or audio concerns, contact yueyu_dimsum@163.com. Backup author contact: qijiayin@139.com.
Contact endpoint: yueyu_dimsum@163.com; backup contact: qijiayin@139.com.