Paper ID: 2407.13071
Analysing the Public Discourse around OpenAI's Text-To-Video Model 'Sora' using Topic Modeling
Vatsal Vinay Parikh
The recent introduction of OpenAI's text-to-video model Sora has sparked widespread public discourse across online communities. This study aims to uncover the dominant themes and narratives surrounding Sora by conducting topic modeling analysis on a corpus of 1,827 Reddit comments from five relevant subreddits (r/OpenAI, r/technology, r/singularity, r/vfx, and r/ChatGPT). The comments were collected over a two-month period following Sora's announcement in February 2024. After preprocessing the data, Latent Dirichlet Allocation (LDA) was employed to extract four key topics: 1) AI Impact and Trends in Sora Discussions, 2) Public Opinion and Concerns about Sora, 3) Artistic Expression and Video Creation with Sora, and 4) Sora's Applications in Media and Entertainment. Visualizations including word clouds, bar charts, and t-SNE clustering provided insights into the importance of topic keywords and the distribution of comments across topics. The results highlight prominent narratives around Sora's potential impact on industries and employment, public sentiment and ethical concerns, creative applications, and use cases in the media and entertainment sectors. While limited to Reddit data within a specific timeframe, this study offers a framework for understanding public perceptions of emerging generative AI technologies through online discourse analysis.
Submitted: May 30, 2024