PAGURI: A User Experience Study of Creative Interaction with Text-to-Music Models
Abstract
1. Introduction
- Objectives: to investigate the creative interaction of musicians and music practitioners with a TTM system, focusing on prompt consistency, personalization, and system integration.
- Methodology: a mixed-method user study with 24 participants, combining Likert-scale ratings, open interviews, and thematic analysis.
- Contributions: empirical insights into user expectations and challenges with TTM models, an analysis of model personalization as a creative instrument, and recommendations for future human–AI co-creative system design.
2. Background
2.1. Text-to-Music Models
2.2. Personalization Techniques
2.3. Human–AI Interaction and Collaborative Creativity
2.4. Interfaces for Human–AI Interaction
3. Study Design and Method
3.1. Model and Personalization Technique
3.2. PAGURI Interface
3.3. Experiment Procedure
- Preliminary analysis: The participants were introduced to the study, and they were required to answer a brief questionnaire containing demographic questions (age, nationality, etc.). To frame the knowledge of participants concerning TTM models, we asked them to answer a second questionnaire regarding their musical knowledge and experiences with AI tools, using questions partially taken from the Goldsmiths Musical Sophistication Index (Gold-MSI) [59].
- Text-to-music interaction: Participants were asked to interact with the PAGURI interface. At each iteration, participants had the option to input a new prompt into the TTM model, choose whether or not to personalize the model with their selected audio, and then generate a desired number of new audio samples. After each generation iteration, participants were asked to complete the Model Evaluation Survey, available on the Supplementary Material on https://ronfrancesca.github.io/PAGURI/ (accessed on 21 August 2025), expressing their satisfaction with the generated audio samples regarding their consistency with the input prompt, audio quality, and alignment with their general expectations. To balance speed with quality, we limited the available audio to 5 samples based on observed interaction times in the Fast mode, where the average time between fine-tuning and completing the intermediate questionnaire was 8 min.
- Final analysis. Upon completion of the experiment, participants were requested to complete a questionnaire regarding their satisfaction with the entire interaction experience with the TTM model via PAGURI. We also asked for open-answer comments and suggestions regarding possible applications and the inclusion of TTM models in artistic practice.
4. Results
4.1. Demographics Analysis of Participants
4.2. Participants’ Musical Knowledge and AI Tool Experience: Demographics and Analysis
4.3. Text-to-Music Interaction
Personalization of the TTM Model
4.4. Consistency, Expectations, and Quality
4.5. Example of Interaction with the Model
4.6. Correlations Among Questionnaire Responses
4.6.1. User Experience & Engagement (Music & AI)
4.6.2. AI Perception and Adoption
4.6.3. Expectations vs. Reality (Consistency and Quality)
4.7. Integration of TTM Models in the Creative Process
5. Discussion
User Study Limitations
6. Conclusions
Future Work
Supplementary Materials
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
API | Application Programming Interface |
DAC | Discrete Audio Codec |
GUI | Graphical User Interface |
HCI | Human–Computer Interaction |
IPI | Instance Placeholder Identifier |
MSI | Musical Sophistication Index |
M.Sc. | Master of Science |
PAGURI | Prompt Audio Generation User Research Investigation |
SD | Standard Deviation |
TTI | Text-To-Image |
TTM | Text-To-Music |
UX | User Experience |
References
- Hiller, L., Jr.; Isaacson, L. Musical Composition with a High Speed Digital Computer. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA, 8–12 October 1957. [Google Scholar]
- Mathews, M.V.; Miller, J.E.; Moore, F.R.; Pierce, J.R.; Risset, J.C. The Technology of Computer Music; The MIT Press: Cambridge, MA, USA, 1969. [Google Scholar]
- Briot, J.P.; Pachet, F. Deep learning for music generation: Challenges and directions. Neural Comput. Appl. 2020, 32, 981–993. [Google Scholar] [CrossRef]
- Kamath, P.; Morreale, F.; Bagaskara, P.L.; Wei, Y.; Nanayakkara, S. Sound Designer-Generative AI Interactions: Towards Designing Creative Support Tools for Professional Sound Designers. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar]
- Zang, Y.; Zhang, Y. The Interpretation Gap in Text-to-Music Generation Models. arXiv 2024, arXiv:2407.10328. [Google Scholar]
- Tailleur, M.; Lee, J.; Lagrange, M.; Choi, K.; Heller, L.M.; Imoto, K.; Okamoto, Y. Correlation of fréchet audio distance with human perception of environmental audio is embedding dependent. In Proceedings of the 32nd European Signal Processing Conference (EUSIPCO), Lyon, France, 26–30 August 2024. [Google Scholar]
- Gui, A.; Gamper, H.; Braun, S.; Emmanouilidou, D. Adapting frechet audio distance for generative music evaluation. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024. [Google Scholar]
- Nistal, J.; Pasini, M.; Aouameur, C.; Grachten, M.; Lattner, S. Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models. arXiv 2024, arXiv:2406.08384. [Google Scholar]
- Liu, H.; Yuan, Y.; Liu, X.; Mei, X.; Kong, Q.; Tian, Q.; Wang, Y.; Wang, W.; Wang, Y.; Plumbley, M.D. Audioldm 2: Learning holistic audio generation with self-supervised pretraining. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 2871–2883. [Google Scholar] [CrossRef]
- Plitsis, M.; Kouzelis, T.; Paraskevopoulos, G.; Katsouros, V.; Panagakis, Y. Investigating personalization methods in text to music generation. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024. [Google Scholar]
- Sarmento, P.; Loth, J.; Barthet, M. Between the AI and Me: Analysing Listeners’ Perspectives on AI-and Human-Composed Progressive Metal Music. arXiv 2024, arXiv:2407.21615. [Google Scholar]
- Hashim, S.; Stewart, L.; Küssner, M.B.; Omigie, D. Music listening evokes story-like visual imagery with both idiosyncratic and shared content. PLoS ONE 2023, 18, e0293412. [Google Scholar] [CrossRef] [PubMed]
- Tchemeube, R.B.; Ens, J.; Plut, C.; Pasquier, P.; Safi, M.; Grabit, Y.; Rolland, J.B. Evaluating Human-AI Interaction via Usability, User Experience and Acceptance Measures for MMM-C: A Creative AI System for Music Composition. In Proceedings of the IJCAI, Macao, China, 19–25 August 2023. [Google Scholar]
- Afchar, D.; Meseguer-Brocal, G.; Hennequin, R. AI-Generated Music Detection and its Challenges. arXiv 2025, arXiv:2501.10111. [Google Scholar]
- Barnes, C.F.; Rizvi, S.A.; Nasrabadi, N.M. Advances in residual vector quantization: A review. IEEE Trans. Image Process. 1996, 5, 226–262. [Google Scholar] [CrossRef]
- Défossez, A.; Copet, J.; Synnaeve, G.; Adi, Y. High Fidelity Neural Audio Compression. arXiv 2023, arXiv:2210.13438. [Google Scholar]
- Zeghidour, N.; Luebs, A.; Omran, A.; Skoglund, J.; Tagliasacchi, M. Soundstream: An end-to-end neural audio codec. IEEE/Acm Trans. Audio Speech Lang. Process. 2021, 30, 495–507. [Google Scholar] [CrossRef]
- Kumar, R.; Seetharaman, P.; Luebs, A.; Kumar, I.; Kumar, K. High-Fidelity Audio Compression with Improved RVQGAN. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 6000–6010. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Borsos, Z.; Marinier, R.; Vincent, D.; Kharitonov, E.; Pietquin, O.; Sharifi, M.; Roblek, D.; Teboul, O.; Grangier, D.; Tagliasacchi, M.; et al. Audiolm: A language modeling approach to audio generation. Acm Trans. Audio Speech Lang. Process. 2023, 31, 2523–2533. [Google Scholar] [CrossRef]
- Chung, Y.A.; Zhang, Y.; Han, W.; Chiu, C.C.; Qin, J.; Pang, R.; Wu, Y. W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 13–17 December 2021. [Google Scholar]
- Agostinelli, A.; Denk, T.I.; Borsos, Z.; Engel, J.; Verzetti, M.; Caillon, A.; Huang, Q.; Jansen, A.; Roberts, A.; Tagliasacchi, M.; et al. Musiclm: Generating music from text. arXiv 2023, arXiv:2301.11325. [Google Scholar] [CrossRef]
- Huang, Q.; Jansen, A.; Lee, J.; Ganti, R.; Li, J.Y.; Ellis, D.P. Mulan: A joint embedding of music audio and natural language. arXiv 2022, arXiv:2208.12415. [Google Scholar] [CrossRef]
- Kreuk, F.; Synnaeve, G.; Polyak, A.; Singer, U.; Défossez, A.; Copet, J.; Parikh, D.; Taigman, Y.; Adi, Y. Audiogen: Textually guided audio generation. arXiv 2022, arXiv:2209.15352. [Google Scholar]
- Copet, J.; Kreuk, F.; Gat, I.; Remez, T.; Kant, D.; Synnaeve, G.; Adi, Y.; Défossez, A. Simple and controllable music generation. Adv. Neural Inf. Process. Syst. 2024, 36, 47704–47720. [Google Scholar]
- Ziv, A.; Gat, I.; Lan, G.L.; Remez, T.; Kreuk, F.; Défossez, A.; Copet, J.; Synnaeve, G.; Adi, Y. Masked Audio Generation using a Single Non-Autoregressive Transformer. arXiv 2024, arXiv:2401.04577. [Google Scholar] [CrossRef]
- Yang, D.; Yu, J.; Wang, H.; Wang, W.; Weng, C.; Zou, Y.; Yu, D. Diffsound: Discrete diffusion model for text-to-sound generation. Acm Trans. Audio Speech Lang. Process. 2023, 31, 1720–1733. [Google Scholar] [CrossRef]
- Huang, R.; Huang, J.; Yang, D.; Ren, Y.; Liu, L.; Li, M.; Ye, Z.; Liu, J.; Yin, X.; Zhao, Z. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Liu, H.; Chen, Z.; Yuan, Y.; Mei, X.; Liu, X.; Mandic, D.; Wang, W.; Plumbley, M.D. AudioLDM: Text-to-Audio Generation with Latent Diffusion Models. arXiv 2023, arXiv:2301.12503. [Google Scholar]
- Huang, J.; Ren, Y.; Huang, R.; Yang, D.; Ye, Z.; Zhang, C.; Liu, J.; Yin, X.; Ma, Z.; Zhao, Z. Make-an-audio 2: Temporal-enhanced text-to-audio generation. arXiv 2023, arXiv:2305.18474. [Google Scholar]
- Melechovsky, J.; Guo, Z.; Ghosal, D.; Majumder, N.; Herremans, D.; Poria, S. Mustango: Toward Controllable Text-to-Music Generation. arXiv 2024, arXiv:2311.083552024. [Google Scholar]
- Evans, Z.; Parker, J.D.; Carr, C.; Zukowski, Z.; Taylor, J.; Pons, J. Stable Audio Open. arXiv 2024, arXiv:2407.14358. [Google Scholar] [CrossRef]
- Suno AI. Suno.com-Make a Song About Anything. 2024. Available online: https://suno.com/ (accessed on 20 July 2024).
- Udio. Udio|AI Music Generator. 2024. Available online: https://www.udio.com/ (accessed on 20 July 2024).
- Gal, R.; Alaluf, Y.; Atzmon, Y.; Patashnik, O.; Bermano, A.H.; Chechik, G.; Cohen-or, D. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; Aberman, K. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Chu, H.; Kim, J.; Kim, S.; Lim, H.; Lee, H.; Jin, S.; Lee, J.; Kim, T.; Ko, S. An empirical study on how people perceive AI-generated music. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022. [Google Scholar]
- Newman, M.; Morris, L.; Lee, J.H. Human-AI Music Creation: Understanding the Perceptions and Experiences of Music Creators for Ethical and Productive Collaboration. In Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 5–9 November 2023. [Google Scholar]
- Morreale, F.; De Angeli, A. Collaborating with an autonomous agent to generate affective music. Comput. Entertain. 2016, 14, 1–21. [Google Scholar] [CrossRef]
- Louie, R.; Coenen, A.; Huang, C.Z.; Terry, M.; Cai, C.J. Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar]
- Louie, R.; Engel, J.; Huang, C.Z.A. Expressive communication: Evaluating developments in generative models and steering interfaces for music creation. In Proceedings of the 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, 22–25 March 2022. [Google Scholar]
- Zhou, Y.; Koyama, Y.; Goto, M.; Igarashi, T. Interactive exploration-exploitation balancing for generative melody composition. In Proceedings of the 26th International Conference on Intelligent User Interfaces, Virtual, 14–17 April 2021. [Google Scholar]
- Huang, C.Z.A.; Koops, H.V.; Newton-Rex, E.; Dinculescu, M.; Cai, C.J. AI song contest: Human-AI co-creation in songwriting. arXiv 2020, arXiv:2010.05388. [Google Scholar]
- Feng, Y.; Wang, X.; Wong, K.K.; Wang, S.; Lu, Y.; Zhu, M.; Wang, B.; Chen, W. Promptmagician: Interactive prompt engineering for text-to-image creation. IEEE Trans. Vis. Comput. Graph. 2023, 30, 295–305. [Google Scholar] [CrossRef]
- Brade, S.; Wang, B.; Sousa, M.; Oore, S.; Grossman, T. Promptify: Text-to-image generation through interactive prompt exploration with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, San Francisco, CA, USA, 29 October–1 November 2023. [Google Scholar]
- Kirstain, Y.; Polyak, A.; Singer, U.; Matiana, S.; Penna, J.; Levy, O. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Adv. Neural Inf. Process. Syst. 2024, 36, 36652–36663. [Google Scholar]
- Bougueng Tchemeube, R.; Ens, J.J.; Pasquier, P. Calliope: A co-creative interface for multi-track music generation. In Proceedings of the 14th Conference on Creativity and Cognition, Venice, Italy, 20–23 June 2022. [Google Scholar]
- Simon, I.; Morris, D.; Basu, S. MySong: Automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, 5–10 April 2008. [Google Scholar]
- Huang, C.Z.A.; Hawthorne, C.; Roberts, A.; Dinculescu, M.; Wexler, J.; Hong, L.; Howcroft, J. The Bach doodle: Approachable music composition with machine learning at scale. In Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 4–8 November 2019. [Google Scholar]
- Rau, S.; Heyen, F.; Wagner, S.; Sedlmair, M. Visualization for AI-Assisted Composing. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, 4–8 December 2022. [Google Scholar]
- Zhang, Y.; Xia, G.; Levy, M.; Dixon, S. COSMIC: A conversational interface for human-AI music co-creation. In Proceedings of the 21th International Conference on New Interfaces for Musical Expression (NIME), Shanghai, China, 14–18 June 2021. [Google Scholar]
- Zhou, Y.; Koyama, Y.; Goto, M.; Igarashi, T. Generative melody composition with human-in-the-loop bayesian optimization. In Proceedings of the 2020 Joint Conference on AI Music Creativity, Stockholm, Sweden, 20–23 October 2020. [Google Scholar]
- Yakura, H.; Goto, M. IteraTTA: An Interface for Exploring Both Text Prompts and Audio Priors in Generating Music with Text-to-Audio Models. In Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 5–9 November 2023. [Google Scholar]
- Rau, S.; Heyen, F.; Brachtel, B.; Sedlmair, M. MAICO: A Visualization Design Study on AI-Assisted Music Composition. IEEE Trans. Vis. Comput. Graph. 2025, 31, 4110–4125. [Google Scholar] [CrossRef]
- Chung, J.J.Y.; He, S.; Adar, E. The intersection of users, roles, interactions, and technologies in creativity support tools. In Proceedings of the 2021 ACM Designing Interactive Systems Conference, Virtual, 28 June–2 July 2021. [Google Scholar]
- Frich, J.; MacDonald Vermeulen, L.; Remy, C.; Biskjaer, M.M.; Dalsgaard, P. Mapping the landscape of creativity support tools in HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019. [Google Scholar]
- Weisz, J.D.; Muller, M.; He, J.; Houde, S. Toward general design principles for generative AI applications. arXiv 2023, arXiv:2301.05578. [Google Scholar] [CrossRef]
- Müllensiefen, D.; Gingras, B.; Musil, J.; Stewart, L. The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS ONE 2014, 9, e89642. [Google Scholar] [CrossRef] [PubMed]
- Maguire, M. Doing a thematic analysis: A practical, step-by-step guide for learning and teaching scholars. All Irel. J. High. Educ. 2017, 9. [Google Scholar]
- Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. Caa J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
- Bird, C.; Ford, D.; Zimmermann, T.; Forsgren, N.; Kalliamvakou, E.; Lowdermilk, T.; Gazit, I. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 2022, 20, 35–57. [Google Scholar] [CrossRef]
- Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the International conference on machine learning, Hangzhou, China, 17–19 September 2021. [Google Scholar]
- Team, G.; Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Forsgren, S.; Martiros, H. Riffusion-Stable Diffusion for Real-Time Music Generation. 2022. Available online: https://riffusion.com/about (accessed on 21 August 2025).
- Ludbrook, J.; Dudley, H. Why permutation tests are superior to t and F tests in biomedical research. Am. Stat. 1998, 52, 127–132. [Google Scholar] [CrossRef]
- Demerlé, N.; Esling, P.; Doras, G.; Genova, D. Combining audio control and style transfer using latent diffusion. arXiv 2024, arXiv:2408.00196. [Google Scholar] [CrossRef]
- Lin, L.; Xia, G.; Jiang, J.; Zhang, Y. Content-based controls for music large language modeling. arXiv 2023, arXiv:2310.17162. [Google Scholar]
- Bafghi, R.A.; Bagwell, C.; Ravichandran, A.; Shrivastava, A.; Raissi, M. Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation. arXiv 2025, arXiv:2501.15377. [Google Scholar]
- Feng, T.; Li, W.; Zhu, D.; Yuan, H.; Zheng, W.; Zhang, D.; Tang, J. ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think. arXiv 2025, arXiv:2501.01045. [Google Scholar] [CrossRef]
- Chen, Y.; Huang, L.; Gou, T. Applications and Advances of Artificial Intelligence in Music Generation: A Review. arXiv 2024, arXiv:2409.03715. [Google Scholar] [CrossRef]
- Vinay, A.; Lerch, A. Evaluating Generative Audio Systems and Their Metrics. In Proceedings of the 23th International Society for Music Information Retrieval Conference, Bengaluru, India, 4–8 December 2022. [Google Scholar]
- Kilgour, K.; Zuluaga, M.; Roblek, D.; Sharifi, M. Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms. In Proceedings of the INTERSPEECH, Graz, Austria, 15–19 September 2019. [Google Scholar]
- Ren, J.; Xu, H.; He, P.; Cui, Y.; Zeng, S.; Zhang, J.; Wen, H.; Ding, J.; Liu, H.; Chang, Y.; et al. Copyright Protection in Generative AI: A Technical Perspective. arXiv 2024, arXiv:2402.02333. [Google Scholar] [CrossRef]
- Franceschelli, G.; Musolesi, M. Copyright in generative deep learning. Data Policy 2022, 4, e17. [Google Scholar] [CrossRef]
- Zhang, Y.; Ikemiya, Y.; Xia, G.; Murata, N.; Martínez-Ramírez, M.A.; Liao, W.H.; Mitsufuji, Y.; Dixon, S. Musicmagus: Zero-shot text-to-music editing via diffusion models. arXiv 2024, arXiv:2402.06178. [Google Scholar]
- Wu, S.L.; Donahue, C.; Watanabe, S.; Bryan, N.J. Music controlnet: Multiple time-varying controls for music generation. Acm Trans. Audio Speech Lang. Process 2024, 32, 2692–2703. [Google Scholar] [CrossRef]
(a) | |
---|---|
Iteration | Prompt |
1 | A song about summer containing an electric guitar lead on top of a ukulele rhythm and a glockenspiel. The glockenspiel is off |
2 | Smooth jazz being played from the other room; the main line is played with a saxophone, backed by a Leslie organ |
3 | An abcdef kalimba melody in a 90 BPM midtempo music drop |
4 | An abcdef voice singing a Gregorian chant |
5 | A kalimba with a harsh bitcrusher applied to it |
6 | A abcdef kalimba with a bitcrusher applied to it |
(b) | |
Iteration | Prompt |
1 | A sound of an electric guitar |
2 | A flute playing “My Heart Will Go On” by Celine Dion |
3 | A ballad in the style of sas pirate metal |
4 | A jazz music in the style of sas |
5 | A doom metal music in the style of sas pirate metal |
6 | A 90’s disco hit in the style of sas pirate metal |
7 | Vegeta singing a song in the style of sas pirate metal |
8 | “Epic sax guy” in the style of sas pirate metal |
9 | A dog barking in the style of sas pirate metal |
10 | A dog barking in the style of pirate metal |
11 | A rubber chicken singing a song in the style of sas pirate metal |
12 | A sas pirate metal song, but with the lyrics made by a door bell ring |
Question I | Question II | p-Value | Pearson’s R |
---|---|---|---|
Q6—I am able to identify what is special about a given musical piece | Q27—The audio generated by the personalized model is better than the audio generated by the based model | 0.0194 | 0.5 |
Q9—At the peak of my interest, I practiced _____ hours per day on my primary instrument | Q24—I enjoyed the interaction with the text- to-music generation system | 0.0166 | 0.5 |
Q12—I often read or search the internet for things related to AI tools | Q24—I enjoyed the interaction with the text- to-music generation system | 0.0292 | 0.46 |
" | Q25—The data generated by the model are consistent with respect to the desired audio file provided by the user | 0.0254 | 0.48 |
" | Q27—The audio generated by the personalized model is better than the audio generated by the based model | 0.0428 | 0.44 |
Question I | Question II | p-Value | Spearman’s |
---|---|---|---|
Q13—AI can be a tool to support human activities | Q24—Enjoyment of T2M system | 0.0019 | 0.61 |
" | Q25—Model output consistency with desired audio | 0.0109 | 0.49 |
" | Q29—T2M models support music creation | 0.0037 | 0.55 |
" | Q30—Willingness to use the system again | 0.0055 | 0.53 |
Question I | Question II | p-Value | Spearman’s |
---|---|---|---|
Q21—The audio(s) generated is consistent with respect to my expectations | Q26—The waiting time of the fine-tuning of the model is proportionate with the quality of the generated audio | 0.0458 | 0.43 |
" | Q28—There is consistency between input prompt and audio(s) generated | 0.0006 | 0.66 |
Q22—The quality of the audio(s) generated is consistent with my expectations | Q24—I enjoyed the interaction with the text-to-music generation system | 0.0092 | 0.53 |
" | Q25—The data generated by the model are consistent with respect to the desired audio file provided by the user | 0.0048 | 0.58 |
" | Q26—The waiting time of the fine-tuning of the model is proportionate with the quality of the generated audio | 0.015 | 0.52 |
" | Q28—There is consistency between input prompt and audio(s) generated | 0.0032 | 0.58 |
" | Q29—The use of text-to-music models can support musicians in musical creation endeavours | 0.0426 | 0.42 |
" | Q30—I would use this system again | 0.007 | 0.55 |
Q23—The generated audio(s) is consistent with respect to the input prompt | Q28—Generated audio consistency with prompt vs. consistency of generated audio | 0.0028 | 0.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ronchini, F.; Comanducci, L.; Perego, G.; Antonacci, F. PAGURI: A User Experience Study of Creative Interaction with Text-to-Music Models. Electronics 2025, 14, 3379. https://doi.org/10.3390/electronics14173379
Ronchini F, Comanducci L, Perego G, Antonacci F. PAGURI: A User Experience Study of Creative Interaction with Text-to-Music Models. Electronics. 2025; 14(17):3379. https://doi.org/10.3390/electronics14173379
Chicago/Turabian StyleRonchini, Francesca, Luca Comanducci, Gabriele Perego, and Fabio Antonacci. 2025. "PAGURI: A User Experience Study of Creative Interaction with Text-to-Music Models" Electronics 14, no. 17: 3379. https://doi.org/10.3390/electronics14173379
APA StyleRonchini, F., Comanducci, L., Perego, G., & Antonacci, F. (2025). PAGURI: A User Experience Study of Creative Interaction with Text-to-Music Models. Electronics, 14(17), 3379. https://doi.org/10.3390/electronics14173379