Can large language models help predict results from a complex behavioural science study?

Abstract

We tested whether large language models (LLMs) can help predict results from a complex behavioural science experiment. In study 1, we investigated the performance of the widely used LLMs GPT-3.5 and GPT-4 in forecasting the empirical findings of a large-scale experimental study of emotions, gender, and social perceptions. We found that GPT-4, but not GPT-3.5, matched the performance of a cohort of 119 human experts, with correlations of 0.89 (GPT-4), 0.07 (GPT-3.5) and 0.87 (human experts) between aggregated forecasts and realized effect sizes. In study 2, providing participants from a university subject pool the opportunity to query a GPT-4 powered chatbot significantly increased the accuracy of their forecasts. Results indicate promise for artificial intelligence (AI) to help anticipate—at scale and minimal cost—which claims about human behaviour will find empirical support and which ones will not. Our discussion focuses on avenues for human–AI collaboration in science.

Publication
Royal Society Open Science, 11
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.

Francisco Cruz
Francisco Cruz
Doctoral Student

Francisco Cruz is a doctoral student in social psychology at the Faculty of Psychology, University of Lisbon, under the supervision of Prof. André Mata (University of Lisbon) and Prof. Tania Lombrozo (Princeton University). Currently, he is visiting Princeton University in research collaborator capacity. His project explores why people are sceptical of psychology as a science, as well as how to increase trust in psychological science. His research interests include lay beliefs about science (i.e., what people believe that science can or cannot explain and why), motivated beliefs in science (i.e., the contexts in which people are more prone to accepting scientific explanations), representation of social groups (i.e., how people integrate information to provide judgments on shared homogeneity vs. heterogeneity across group members), epistemic trespassing (i.e., when people provide judgments on domains beyond those in which they are experts), intuitive mind-body dualism (i.e., a natural tendency to see the world as split in material and immaterial portions), and face perception (i.e., features driving the advantage in recall for own- vs. other-race faces). He is a Student Affiliate at the Center for the Science of Moral Understanding, an Author at CogBites, and an Opinion Editor at Cruamente.