Since the 1960s, turbomachinery design has relied on similarity theory and empirical correlations based on the regression of experimental data [
1]. This has been done by exploiting consolidated design experience by means of normalized parameters—namely specific speed (
Ns) and specific diameter (
Ds)—according to the typical design rules defined [
2,
3,
4,
5]. In this way, it is possible to select a fan to reach a specific duty point (axial, mixed, radial) and the best expected efficiency, using Balje charts [
4] and similar performance maps [
1]. However,
Ns and
Ds, intended to represent the meridian flow geometry, hide the contribution the different design parameters have on the final performance of the fan. In fact,
Ns and
Ds, according to their respective definitions, depend upon rotating speed, maximum diameter, flow rate, and specific work at best efficiency operations. In reality, a larger set of parameters concur to the performance, such as blade aspect ratio, chord and twist distributions along the blade span, hub-to-tip ratio, solidity, blade number, and tip gap [
6], related to the three-dimensional design criterion or even to the manufacturing process of the fan. All those parameters need to be selected during the design process, and often this is done by exploiting other charts, which were also derived from consolidated empirical manufacturer experiences [
7]. In these charts, correlations between some of the design parameters are presented and summarized into different coefficients and correction factors that can enrich the classic design space given by Euler work analysis. Starting from the work of Balje, Lieblein, and Howell [
4,
5,
6], many scholars derived correlations and corrections to account for different design parameters to improve the performance of turbomachinery [
8]. Still today most of the works are limited to linear regression approaches, and often limited to specific classes of fans [
9].
In recent years, social networks have overhauled not just social dynamics and media, but also the approach to big data analysis. In fact, the formidable amount of data exchanged by users on large platforms needs to be classified and correlated to be monetized [
10]. This led to the application and revamping of old statistical approaches but also to the development of new methods for big data analysis [
11,
12]. One of the key properties of big data analysis lies in the principle that correlations and relationships inside the dataset can be unveiled independently from the nature of data [
13]. Therefore, it is possible to use the same algorithm to classify photos on Instagram [
14], customers of a bank [
15], or classify galaxies inside all-sky surveys [
16,
17]. This opened new research perspectives in finance, astrophysics, molecular chemistry, turbulence modelling, and other fields where large dataset are available [
18]. Industrial product research and development is in fact another potential test bed for the application of data-driven analysis.
The following article presents preliminary work on the exploration of correlations between axial flow fan design parameters and performance carried out on a database of about 4000 individuals. The idea is that this procedure can be applied to a dataset that is heterogeneous, incomplete, and populated with a significant number of samples, for example the database of a fan manufacturer. Once the population has enough samples, in fact, the source of the data is not important. In this work, we will refer to a specific class of turbomachinery: Axial flow fans with rotor-only arrangement.
The aim of this work is to explore the possibilities and limits of big data analytics, through a combination of a multi-variate statistical approaches of principal component analysis (PCA) and projection to latent structures (PLS) to the design and optimization of industrial turbomachinery.
In the next sections, the methods used for the analysis are illustrated. Then the process of dataset creation is described and correlated to the considerations typical of the axial flow fan design and manufacturing process. Results of the analysis of said dataset are then discussed and finally conclusions are drawn.