This study investigated how generative Artificial Intelligence (AI) systems—now increasingly integrated into public services—respond to different technical configurations, and how these configurations affect the
perceived quality of the outputs. Drawing on an experimental evaluation of
Govern-AI, a chatbot designed for professionals in
[...] Read more.
This study investigated how generative Artificial Intelligence (AI) systems—now increasingly integrated into public services—respond to different technical configurations, and how these configurations affect the
perceived quality of the outputs. Drawing on an experimental evaluation of
Govern-AI, a chatbot designed for professionals in the social, educational, and labor sectors, we analyzed the impact of the
temperature parameter—which controls the degree of creativity and variability in the responses—on two key dimensions:
accuracy and
comprehensibility. This analysis was based on 8880 individual evaluations collected from five professional profiles. The findings revealed the following: (1) the high-temperature responses were generally more comprehensible and appreciated, yet less accurate in strategically sensitive contexts; (2) professional groups differed significantly in their assessments, where trade union representatives and regional policy staff expressed more critical views than the others; (3) the
type of question—whether operational or informational—significantly influenced the perceived output quality. This study demonstrated that the AI performance was far from neutral: it depended on technical settings, usage contexts, and the profiles of the end users. Investigating these “behind-the-scenes” dynamics is essential for fostering the
informed governance of AI in public services, and for avoiding the risk of technology functioning as an opaque
black box within decision-making processes.
Full article