Frequently asked release questions

Answers here apply to the 2026-04-02 locked public release.

Questions

Is this conversational speech?

No. It is a controlled lexical form-sense-audio benchmark with one canonical clip per retained form.

Are the primary splits separated by speaker?

No. The primary split is based on lexical units. Speaker generalization is not claimed for the current release.

Does this page describe the current release only?

Yes. These pages and mirrors describe the 2026-04-02 locked public release.

Are dictionary definitions fully open?

No. Derived labels, metadata, and source references are release metadata. Dictionary-derived definitions or source text are restricted or excluded unless source-specific permissions are confirmed.

Has an archive DOI been minted?

Yes. The Zenodo DOI is available at 10.5281/zenodo.20511573.

Where are the public mirrors hosted?

The official website remains the primary access point. Public mirrors are available on GitHub, HuggingFace, Zenodo, and OSF.

Can audio be used for voice cloning?

No. Voice cloning, speaker recognition, biometric modeling, re-identification, and demographic inference are unacceptable uses.

Contact

For concerns, contact yueyu_dimsum@163.com; backup qijiayin@139.com.