Ethics and safety

This page states the handling boundary for harmful lexical categories and voice recordings in the current release.

Harmful-content warning

Research material may be offensive

The release includes harmful, offensive, sexual-obscene, illicit, political, extremist, and insulting lexical categories in a benchmark context.

Audio privacy

The current release provides one canonical audio clip per retained form across 11 pseudonymous speaker IDs. Speaker consent supports CC BY 4.0 audio release, accompanied by separate ethics and acceptable-use guidance.

The release does not make speaker demographic claims. Pseudonymous speaker IDs are provided for inventory and audit purposes, not identity inference.

Unacceptable uses

  • Voice cloning, speaker recognition, biometric modeling, re-identification, or demographic inference from audio.
  • Using harmful examples outside a research, audit, or documentation context.
  • Using labels or severity scores as a standalone basis for punitive user decisions.

Maintenance and concerns

Report concerns or takedown requests to yueyu_dimsum@163.com; backup: qijiayin@139.com.

Contact endpoint: yueyu_dimsum@163.com; backup contact: qijiayin@139.com.

Ethics guidance is documentation for responsible use. It is not written as an extra downstream license restriction on top of CC BY 4.0.