Paper ID: 2411.08885 • Published Oct 26, 2024
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Inaccuracies in polygraph tests often lead to wrongful convictions, false
information, and bias, all of which have significant consequences for both
legal and political systems. Recently, analyzing facial micro-expressions has
emerged as a method for detecting deception; however, current models have not
reached high accuracy and generalizability. The purpose of this study is to aid
in remedying these problems. The unique multimodal transformer architecture
used in this study improves upon previous approaches by using auditory inputs,
visual facial micro-expressions, and manually transcribed gesture annotations,
moving closer to a reliable non-invasive lie detection model. Visual and
auditory features were extracted using the Vision Transformer and OpenSmile
models respectively, which were then concatenated with the transcriptions of
participants micro-expressions and gestures. Various models were trained for
the classification of lies and truths using these processed and concatenated
features. The CNN Conv1D multimodal model achieved an average accuracy of
95.4%. However, further research is still required to create higher-quality
datasets and even more generalized models for more diverse applications.