This paper reports on experiments that demonstrate the importance of feature selection as well as generalization towards deepfake methods that deviate from training distribution; it presents the CtrSVDD dataset which was curated for controlled singing voice deepfake protection with enhanced controllability, diversity, and data openness.
This paper discusses the impact of recent singing voice synthesis and conversion advancements, and the resulting need for singing voice deepfake detection (SVDD) models. It introduces the CtrSVDD model, a large-scale, diverse collection of bonafide and deepfake singing vocals, which are synthesized using cutting edge methods from publicly accessible singing voice datasets, including 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, and spanning 14 deepfake methods and 164 singer identities. The CtrSVDD benchmark dataset was curated for controlled SVDD with enhanced controllability, diversity, and data openness with the hope that it will accelerate research toward SVDD. The paper describes the CtrSVDD dataset design, baseline systems, and the experiments and results that led to the CtrSVDD model presented here. The CtrSVDD dataset, baseline system implementations, and trained model weights are publicly accessible.
Downloads
Similar Publications
- Quantitative Analysis of Δ9-tetrahydrocannabinol (Δ9-THC) in Cannabis Plants Using the Fast Blue BB (FBBB) and 4-aminophenol (4-AP) Colorimetric Tests
- An Assessment of the Performance Limitations of the Integrated Quantifiler™ Trio-HRM Assay: A Forensic Tool Designed to Identify Mixtures at the Quantification Stage
- Identification of ADB-5'Br-BINACA in Plant Material and Analytical Characterization Using GC-MS, LC-QTOF-MS, NMR and ATR-FTIR