Data Collection Synthetic Voice Generative AI
High-Volume Fiction and Non-Fiction Longform Content Creation
The Challenge
Our client set out to commission ten million words of original longform content across eight languages, spanning both fiction and non-fiction genres. Each piece needed to meet specific requirements for length, plot structure, and character dialogue, with a strong emphasis on emotional resonance and reader engagement. The ultimate goal was to record the content and use it to train expressive text-to-speech (TTS) voices.
• • • •The Solution• • • •
The DataForce team designed a customized process to produce large volumes of high quality, original content. This entailed:
- Dedicated Global Team
- Assembled a team using resources from DataForce’s worldwide offices to provide real-time support in each language's time zone, ensuring around-the-clock progress to meet demanding timelines.
- Streamlined Writer Qualification and Training
- Developed a scalable, multi-step qualification and training process that verified nativeness, writing experience, and quality through sample reviews.
- Cross-team collaboration helped refine this process to ensure efficiency and scalability to meet the high writer volume required.
- Multipronged Approach to Detect AI-Generated Content
- Continuously monitored evolving LLM capabilities to identify trends in AI-generated content early, ensuring that all submissions were human-generated.
- Retrained the QA team as needed to stay ahead of developments in the latest LLM models.
- Quality Assurance and Performance Monitoring
- Maintained close engagement with the QA team to monitor quality at the individual writer level.
- Implemented a continuous feedback loop with writers to prevent quality dips as volumes scaled.
- Validated quality through questionnaire-based testing.
- Customized Tech Workflow with Automations
- Created a specialized schema on the DataForce platform for content submission.
- Leveraged automations for text normalization, spelling and grammar checks, and file processing.
Results
- Achieved a 100% acceptance rate after a round of feedback on the pilot set.
- Met original deadlines and later decreased original turnaround timelines through gained efficiencies.
- After successful project completion, the client requested an additional seven million words across five new languages.
- They also commissioned an expanded English content project, featuring longform novellas (12,500+ words each) spanning genres such as romance, children’s literature, and self-help.
- The client now possesses millions of words of high-quality content, useful not only for training expressive text-to speech voices, but also for a range of other features and services.
Thanks to DataForce’s focus on quality, scalable workflows, and agile methodologies, the client achieved its objectives. This case study exemplifies our ability to handle large-scale content creation projects with precision and efficiency.