Crowd Sourced Data
Crowd-sourced data leverages contributions from numerous individuals to generate large datasets for various applications, primarily aiming to overcome limitations of traditional data collection methods in terms of cost, scale, and accessibility. Current research focuses on improving data quality through techniques like iterative annotation strategies, quality control measures at multiple stages of data acquisition, and sophisticated algorithms for outlier detection and bias correction, often employing machine learning models such as neural networks (including deep mixture density networks and BERT variants) and Markov chains. This approach significantly impacts diverse fields, enabling advancements in areas like language model benchmarking, 3D reconstruction, environmental modeling, and real-time crisis response by providing large-scale, diverse datasets previously unattainable through conventional means.
Papers
Building a Luganda Text-to-Speech Model From Crowdsourced Data
Sulaiman Kagumire, Andrew Katumba, Joyce Nakatumba-Nabende, John Quinn
Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare
P. Barai, G. Leroy, P. Bisht, J. M. Rothman, S. Lee, J. Andrews, S. A. Rice, A. Ahmed