22–25 Jul 2025
EAM2025
Atlantic/Canary timezone

Predicting Item Response Theory Parameters from the Semantic Space of Computational Language Models

25 Jul 2025, 09:30
15m
Faculty of Social Sciences and Communication. (The Pyramid)/9 - Room (Faculty of Social Sciences and Communication. (The Pyramid))

Faculty of Social Sciences and Communication. (The Pyramid)/9 - Room

Faculty of Social Sciences and Communication. (The Pyramid)

30
Show room on map

Speakers

Diego Iglesias (Universidad Autónoma de Madrid) Francisco José Abad García (Universidad Autónoma de Madrid) Miguel A. Sorrel (Universidad Autónoma de Madrid) Ricardo Olmos (Universidad Autónoma de Madrid)

Description

Parallel to the development of new technologies, computational language models have emerged as automated tools for analyzing semantic relationships between linguistic units. Due to their success in performing human-like tasks, such as vocabulary tests and sentiment analysis, interest in the practical applications of these models has grown exponentially, resulting in the development of larger models with enhanced predictive capabilities.
In this study, we examine whether the high-dimensional semantic space underlying computational language models, such as ChatGPT, can be used to predict item parameters. In ChatGPT, linguistic units are represented as n-dimensional embedding vectors, which can be manipulated through mathematical operations.
We extracted embeddings for an item pool of 220 items from an English vocabulary test. The loadings of each item in ChatGPT’s 1536-dimensional space were used as independent variables to predict their corresponding item response theory item parameters. The predictive accuracy of various machine learning models was evaluated using cross-validation procedures and compared with human-expert ratings. Despite the relatively small size of the training set, preliminary results are promising (^2_=0.40). We discuss the potential of using larger datasets for training the predictive model and the promising role of generative artificial intelligence in creating large item pools with desirable psychometric properties at minimal cost.

Primary authors

Diego Iglesias (Universidad Autónoma de Madrid) Francisco José Abad García (Universidad Autónoma de Madrid) Miguel A. Sorrel (Universidad Autónoma de Madrid) Ricardo Olmos (Universidad Autónoma de Madrid)

Presentation materials