1. Introduction
Granular synthesis algorithms have been explored in real-time audio processing applications [
1]. However, using low-cost computing platforms for real-time granular audio has not yet been thoroughly investigated. The Raspberry Pi offers an inexpensive system capable of signal processing tasks that previously required more expensive hardware [
2], presenting opportunities to develop accessible real-time granular synthesis based on this platform. This study aims to address the gap by developing and evaluating a real-time granular synthesis system using the Raspberry Pi [
3]. Performance is assessed on latency, audio quality, and computational load to determine capabilities for creative applications [
4].
2. Materials and Methods
The real-time granular synthesis system is implemented in Pure Data (PD) and allows loading audio files to apply granular processing algorithms in real-time using adjustable parameters [
5]. The system employs a granular algorithm with user-adjustable parameters for grain density, duration, stereo width, and pitch shifting. A graphics processing unit provides playback controls and visualization. Real-time digital signal processing (DSP) is performed using a Raspberry Pi. The patch includes a graphical interface that enables user control. The hardware setup is centered on the Raspberry Pi for DSP, supported by an audio interface, output modules, and a development computer.
3. Methods
The granular algorithm incorporates user-adjustable parameters for density, duration, stereo width, and pitch shifting. Density determines the frequency of grain generation, while duration specifies grain length, ranging from short impulses to the full sample. Stereo width controls the panning of grains across the stereo field, and pitch shifting alters playback speed by up to an octave. Additional randomization sliders introduce variability into these parameters to enhance sonic diversity. The implementation of the granular algorithm in PD was achieved using the following approach.
Audio input: Samples are loaded into a Pd array object, enabling efficient access and manipulation.
Grain generation: A metro object triggers grain creation at intervals defined by the density parameter, as seen in
Figure 1. For each grain, a random object selects the starting point within the sample. The tabread4~ object then reads the audio data, applying interpolation to ensure smooth playback. Finally, the vline~ object generates an amplitude envelope, such as a trapezoidal shape, to define the dynamics of the grain.
Grain manipulation: Pitch shifting is accomplished through resampling, with the tabread4~ object’s read speed modified by 2(pitch/12) to achieve semitone shifts. Stereo panning is implemented using two line~ objects to create smooth amplitude curves across the left and right channels.
Grain mixing: All active grains are summed using ~ and +~ objects before producing the final output.
Input audio samples encompassed musical and non-musical sounds, including piano, guitar, and dialogue. Multiple output recordings were captured for each input due to the stochastic granular algorithm. Average CPU usage was measured during playback along with the corresponding synth parameters. Evaluation was conducted using the quantitative metrics of performance and qualitative audio characteristics.
Processing latency of less than 20 ms;
Perceptual audio quality ratings;
CPU utilization measurements;
Subjective assessments of output musicality.
Collected metrics were analyzed using R (version 4.3.2) software to evaluate system performance. Hypothesis tests were conducted at a 0.05 significance level. For the latency objective, a one-sample
t-test was utilized:
where
is the mean latency,
is the 20 ms target,
is the standard deviation, and
is the number of measurements.
The null hypothesis states no significant difference between target and measured latency. The alternative claims a significant difference. is rejected if the critical t-value is less than the computed t-value. ANOVA tests were used to determine significant effects and differences.
Two-way ANOVA on latency versus window size and grain parameters;
One-way ANOVA on perceptual audio quality ratings;
Two-way ANOVA on CPU utilization versus parameters.
Correlation analysis was also utilized to evaluate computational efficiency associations. Subjective musicality assessments were also conducted on the audio outputs.
4. Results
Audio samples were used as inputs to the real-time granular synthesis system, including musical sounds such as guitar, piano, and drums, as well as non-musical noises such as Foley and dialogue. For each input sample, multiple output recordings were captured due to the stochastic nature of the granular algorithm. Average CPU usage levels were measured during playback along with the corresponding synth parameters using Pure Data’s built-in monitoring. The sample rate was kept constant at 44.1 kHz along with the buffer sample size to keep measurements consistent and fair.
To evaluate the real-time granular synthesis system, experiments were conducted using a variety of audio samples, including guitar, piano, drums, and Foley recordings. For each input sample, multiple output recordings were generated using varying granular synthesis parameters. The latency, average CPU usage, and subjective audio quality ratings were measured for each output. All tests were performed using a constant buffer size of 256 samples and a sample rate of 44.1 kHz.
Table 1 shows a sample of the raw data collected during the experiments.
5. Discussion
5.1. Statistical Analysis
Quantitative analysis was conducted using audio quality, computational load, and latency.
Audio quality rating (
Q): Subjective rating on a scale from 1–5
Computation load (
U): Measured as a percentage
Latency (L): Measured in ms
A one-sample t-test was used to evaluate the null hypothesis of no significant difference between the measured latency and the 20 ms target at a 0.05 significance level.
5.2. Audio Latency Evaluation
The one-sample t-test was conducted to evaluate the null hypothesis that the latency is not significantly different from the 20 ms target. The calculated t-value of 1.28 with 49 degrees of freedom exceeds the critical t-value of 1.677 at t a 0.05 significance level, was not rejected, implying no significant difference between the measured latency and the target. The system achieved an average latency of 18.6 ms across different inputs and parameters.
5.3. Effects of Window Size and Grain Parameters
To assess the effects of window size and grain synthesis parameters on latency and CPU utilization, two-way ANOVA tests were performed.
Table 2 shows the ANOVA results for latency with factors of window size and four grain parameters: density, length, width, and pitch shifting.
The results indicate statistically significant effects of window size (
), density (
), and length (
on latency at a 0.05 significance level. In contrast, width and pitch shifting did not exhibit significant effects [
6]. Post hoc tests using Tukey’s Honestly Significant Difference (HSD) revealed specific differences among levels.
Table 3 presents the ANOVA results for CPU utilization with the same factors. The analysis shows significant effects of all parameters except pitch shifting.
As expected, larger window sizes and higher grain densities resulted in increased computational load. The correlations between CPU utilization and window size () and between CPU utilization and density () were both strongly positive.
5.4. Perceptual Evaluation of Audio Quality
To evaluate the perceptual quality of the granular synthesis output, listening tests were conducted on a sample of 15 participants with varying levels of musical experience. A selection of input samples and parameter settings was used to generate audio examples, which were presented to participants in random order. Participants rated each example on a 5-point Likert scale from 1 (Poor) to 5 (Excellent) based on perceived audio quality in terms of artifacts, noise, and timbral fidelity compared to the original input.
A one-way ANOVA was performed on the audio quality ratings with granular synthesis parameters as the factor. The results, presented in
Table 4, revealed a statistically significant effect of parameters on perceived audio quality (
). Tukey’s HSD test indicated that higher grain density and lower length settings tended to produce lower quality ratings, likely due to audible artifacts and loss of timbral characteristics [
7]. However, no significant differences were found between moderate parameter settings and the original input.
5.5. Creative Applications and Subjective Assessments
To explore potential creative applications, the real-time granular synthesis system was used to process a variety of input sounds from commercial music samples as well as field recordings. Moderate parameter settings were used to balance audio quality and timbral transformations [
8]. The resulting outputs were then reviewed by a panel of three experts in music production, sound design, and multimedia art installation.
The music production expert commented on the system’s ability to create unique timbral and rhythmic textures from conventional sources. While conventional granular synthesis methods were already utilized in production, the real-time control enabled by the Raspberry Pi system allowed for more dynamic and interactive manipulation during the creative process [
9]. The sound designer noted the potential of the system for generating complex, evolving ambiance and drones for cinematic applications. The granular processing imparted an organic, constantly shifting quality that could enhance the depth and movement of environmental sounds in a scene. Finally, the multimedia artist highlighted the system’s capacity for real-time audio/visual correlation when coupled with generative visuals driven by the granular parameters and audio output. This opened up intriguing avenues for immersive, responsive installations that could engage audiences through synesthetic experiences.
6. Conclusions
Such subjective assessments indicated promising applications for the real-time granular synthesis system in diverse creative domains. The combination of affordability, portability, and sufficient audio quality made the Raspberry Pi an appealing platform for artists and designers seeking new ways to transform and interact with sound [
10].
Author Contributions
Conceptualization, R.L.R.L. and M.V.C.C.; methodology, R.L.R.L.; software, R.L.R.L.; validation, R.L.R.L.; formal analysis, R.L.R.L.; investigation, R.L.R.L.; resources, R.L.R.L. and M.V.C.C.; data curation, R.L.R.L.; writing—original draft preparation, R.L.R.L.; writing—review and editing, R.L.R.L. and M.V.C.C.; visualization, R.L.R.L.; supervision, M.V.C.C.; project administration, M.V.C.C.; funding acquisition, R.L.R.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors on request.
Acknowledgments
The authors gratefully acknowledge Miller Puckette for creating Pure Data, and the broader open-source community for its continued development and maintenance.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| CPU | Central processing unit |
| DSP | Digital signal processing |
| ANOVA | Analysis of variance |
| Pd | Pure data |
| GUI | Graphical user interface |
References
- Truax, B. Real-Time Granular Synthesis with a Digital Signal Processor. Comput. Music J. 1988, 12, 14–26. [Google Scholar] [CrossRef]
- Caya, M.V.C.; Calites, J.V.G.; Sioson, G.C.D. Wireless Sensor and Actuator Network-based Power Management System Using Raspberry Pi and Image Processing for Smart Classroom. In Proceedings of the 2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Manila, Philippines, 7–9 December 2020; IEEE: New York, NY, USA, 2021. [Google Scholar]
- Smith, P.G.R.; Hardman, V. Fine-Grained Scalable Sound Representations for Collaborative Composition and Performance. In Proceedings of the IEE Colloquium on Audio and Music Technology, The Challenge of Creative DSP, London, UK, 18 November 1998. [Google Scholar]
- Meier, F.; Fink, M.; Zölzer, U. The JamBerry—A Stand-Alone Device for Networked Music Performance Based on the Raspberry Pi. In Proceedings of the Linux Audio Conference 2014, Karlsruhe, Germany, 1 May 2014. [Google Scholar]
- Zicarelli, D. An Extensible Real-time Signal Processing Environment for Max. In Proceedings of the International Computer Music Conference, San Francisco, CA, USA, 1–10 October 1998. [Google Scholar]
- Puckette, M. Phase-locked Vocoder. In Proceedings of the 1995 Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 19–22 October 1995. [Google Scholar]
- Chowning, J. The Synthesis of Complex Audio Spectra by Means of Frequency Modulation. Comput. Music J. 1973, 21, 526–534. [Google Scholar]
- Rodet, X.; Depalle, P. Spectral Envelopes and Inverse FFT Synthesis. In Proceedings of the 93rd Audio Engineering Society Convention, San Francisco, CA, USA, 1–4 October 1992. [Google Scholar]
- Miller, S.P. Pure Data: Another Integrated Computer Music Environment. IPSJ SIG Notes 1996, 17, 37–41. [Google Scholar]
- Bell, J.; Wyatt, A. Common Ground, Music and Movement Directed by a Raspberry Pi. In Proceedings of the TENOR Conference, Marseille, France, 13–16 May 2020. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |