23.10% polysemous forms retained after manual verification
95.25% form-side Chinese definition coverage
97.68% form-side English definition coverage
8.46% coarse-diverse forms across source dictionaries
12.59% fine-diverse forms across source dictionaries
100% audio coverage over retained released forms

What this benchmark is

Controlled, interpretable, and release-closed

The benchmark object is a released form layer with adjudicated gold labels, a linked sense layer with source-level evidence, and one canonical audio clip per retained form. It supports form-level text classification, sense-level classification, audio-only coarse classification, controlled text-audio fusion, and additional speech-conditioned benchmark views under one locked data line.

What this benchmark is not

Not a broad spoken moderation crawl

  • Not an in-the-wild conversational speech corpus
  • Not a speaker-generalization benchmark or a speaker-disjoint split design
  • Not a deployment-ready moderation standard
  • Not a dataset whose form gold should be inferred by flattening raw sense evidence

Start here

Where to read first

  1. Open Scope for the benchmark boundary and non-claims.
  2. Open Schema for form-versus-sense definitions and the supervision contract.
  3. Open Protocol for allowed inputs, splits, metrics, and speech policies.
  4. Open Downloads for the actual release files and bundle packages.

Release completeness

What the site closes explicitly

  • Public workbook alias and manifest-backed release identity
  • Readable public pages for scope, QC, ethics, access, and reproducibility
  • Download hub with stable public filenames and speaker-packaged audio
  • Benchmark summary aligned with the manuscript and public release files

Stable benchmark anchors

Representative benchmark summary

The overview highlights the main benchmark results summarized in the manuscript and public release notes.

Task Metric Best configured model Value
Form coarse classificationMacro-F1XLM-R0.5353
Form fine classificationMacro-F1Hierarchical TF-IDF + Linear SVM0.4594
Severity predictionSpearmanchar TF-IDF + Ridge0.6037
Fusion coarse classificationMacro-F1Tuned late fusion0.5415
Sense coarse classificationMacro-F1TF-IDF + LogReg0.5033
Sense fine classificationMacro-F1TF-IDF + LogReg0.4057
Reading note. This table presents the representative benchmark results highlighted on this site. Full task definitions and release files remain available from Protocol and Downloads.

Download Hub

Actual release files, not just descriptions

Open the download hub for the public workbook alias, release manifest, statistics, benchmark highlights, metadata bundle zip, and the speaker-packaged audio downloads.

Protocol

Clear task and input contract

Input availability, forbidden fields, split policy, and benchmark interpretations are centralized in one place.

Package Identity

Paper truth and website truth aligned

Release snapshot, workbook hash, exclusions, and public package names are all tied back to the same locked release contract.