Skip to main content

Data Collection

Audio Collection for Toxic Speech Detection


The Challenge

Our client, an international technology company, was looking for a partner to assist in a large conversational speech data collection project. Toxic speech is a growing concern in today's society with the rise of hate speech, online harassment, and other verbal attacks. The goal was to collect a minimum of 40 hours of highly toxic speech across two prescribed topics.

• • • •The Solution• • • •

Our proposed solution was to execute a moderated collection, both in-person and remote, with more than 140 participants, each given real-life scenarios and prompts. We recorded each group of one to four participants to ensure a productive and genuine conversation with demographic diversity in mind. We sourced participants between the ages of 18–70 to ensure diversity in gender, education, and geo-locations and guarantee a successful and robust collection.

Working closely with the participants, we were able to collect 100 hours of data, exceeding our client's expectations. With a throughput of over 40% toxic speech, our partner was very pleased with both the quality and diversity of the dataset.

This case study demonstrates the importance of accurate data collection for speech detection technologies. With the rise of online hate speech, it is imperative to have a diverse dataset to identify the nuances of harmful language and improve the detection of toxic speech in various contexts.

At DataForce, we pride ourselves on our ability to approach challenging projects with innovative solutions. Our success in this project illustrates our expertise in data collection and analysis. Through partnerships such as this, we can aid in the development of advanced technologies that empower individuals and organizations to combat the harmful effects of toxic speech.

Toxic Speech

DataForce has a global community of over 1,000,000 members from around the globe and linguistic experts in over 250 languages. DataForce is its own platform but can also use client or third-party tools. This way, your data is always under control.

Request a consultation.