Assessing the Relational Abilities of Large Language Models and Large Reasoning Models

Matthias Raemaekers; Martin Finn; Jan De Houwer

doi:10.3390/bs16010045

,

and

Department of Experimental Clinical and Health Psychology, Ghent University, 9000 Ghent, Belgium

^*

Author to whom correspondence should be addressed.

Behav. Sci.2026, 16(1), 45;https://doi.org/10.3390/bs16010045

This article belongs to the Special Issue Advanced Studies in Human-Centred AI

Version Notes

Order Reprints

Review Reports

Abstract

We assessed the relational abilities of two state-of-the-art large language models (LLMs) and two large reasoning models (LRMs) using a new battery of several thousand syllogistic problems, similar to those used in behavior-analytic tasks for relational abilities. To probe the models’ general (as opposed to task- or domain-specific) abilities, the problems involved multiple relations (sameness, difference, comparison, hierarchy, analogy, temporal and deictic), specified between randomly selected nonwords and varied in terms of complexity (number of premises, inclusion of irrelevant premises) and format (valid or invalid conclusion prompted). We also tested transformations of stimulus function. Our results show that the models generally performed well in this new task battery. The models did show some variability across different relations and were to a limited extent affected by task variations. Model performance was, however, robust against the randomization of premise order in a replication study. Our research provides a new framework for testing a core aspect of intellectual (i.e., relational) abilities in artificial systems; we discuss the implications of this and future research directions.

Keywords:

large language models; reasoning models; relational reasoning; relational abilities index; transformation of function

Assessing the Relational Abilities of Large Language Models and Large Reasoning Models

Abstract

Article Metrics

Citations

Article Access Statistics