Revised-candidate final-test-once results

Transparent v1.0-versus-v1.1 candidate comparison for the versioned revised-audio line.

The v1.1-gb candidate result table compares the submitted v1.0 benchmark scores with validation-frozen, final-test-once candidate scores from the revised-audio workspace. It highlights where the revised candidate strengthens the table and where v1.0 remains the stronger retained reference.

Versioning note. These results are published as candidate evidence and reported separately from the v1.0 benchmark table.

Main result table

TaskMetricSubmitted v1.0 scorev1.1 candidate scoreDisplay decision
form_coarsemacro-F10.5353070.566938v1.1 candidate stronger; report separately.
form_finemacro-F10.4594120.435410v1.0 retained as stronger reference.
severity_formSpearman0.6036580.616618v1.1 candidate stronger; report separately.
fusion_coarsemacro-F10.541452No v1.1 candidate run; v1.0 retained.
sense_coarsemacro-F10.5032780.475998v1.0 retained as stronger reference.
sense_finemacro-F10.4057040.517665v1.1 candidate stronger; report separately.

Interpretation

Additional task-coverage results

These rows broaden benchmark usage scenarios and are best treated as appendix/task-coverage evidence.

TaskMetricValidationFinal testDisplay policy
form_binary_label_retrievalMRR0.9303790.930855appendix/task-coverage table
sense_binary_label_retrievalMRR0.9295270.919798appendix/task-coverage table
form_coarse_label_retrievalMRR0.7437870.746519appendix table
sense_coarse_label_retrievalMRR0.7213900.733313appendix table
form_binary_harmmacro-F10.6731610.642728appendix/task-coverage table
sense_binary_harmmacro-F10.7046240.667597appendix/task-coverage table
form_severity_bin4macro-F10.6619190.627395appendix/task-coverage table
sense_severity_bin4macro-F10.6697930.651957appendix/task-coverage table
form_pairwise_severitySpearman0.6208810.572182appendix with caveat
sense_pairwise_severitySpearman0.5747550.563883appendix with caveat
form_polysemymacro-F10.5997120.586502appendix with caveat
form_source_coarse_diversitymacro-F10.5889940.587166appendix with caveat
form_source_fine_diversitymacro-F10.5716270.510565appendix with caveat