Time Series-to-Image Encoding for Classification Using Convolutional Neural Networks: A Novel and Robust Approach
Abstract
1. Introduction
- They require manual feature engineering.
- They are highly subjective.
- They do not generalize well in other scenarios.
- They require a high level of expertise.
- They are time-consuming during data preprocessing with more than 50% of the whole data processing.
- They are influenced by human factors.
2. Review of Related Work and Basic Methods
2.1. Definitions
- Univariate time series: only one descriptive variable/feature has been measured and observed over time.
- Multivariate time series: several descriptive variables/features have been measured and observed over time.
2.2. Imaging Time Series Techniques
2.2.1. Recurrence Plot (RP)
- Ones (recurrent state: ;
- Zeros (transient state: .
- m is the number of dimensions of the considered phase space.
- n is the length of the trajectory through the phase space/number of the future states resulting from a given initial state in phase space.
- is the time delay considered.

- is the binarized pairwise distance matrix of the trajectory of the recurrence of a state () between initial time point i and time point j over time in the m-dimensional phase space.
- is the Heaviside function.
- is the recurrence threshold.
- is the norm operator (the Euclidean norm is typically used).
2.2.2. Gramian Angular Field (GAF)
- , the two vectors are similar;
- , the two vectors are orthogonal;
- , the two vectors are opposite.
- A Min-Max Scaling process is applied to transfer the time series onto the range.
- A Polar Encoding of each sample () within the scaled time series is calculated as follows:where is the time stamp.
- The Gramian matrix can be defined by two types of fields that encode the time series signals into images:
- Gramian angular summation field (GASF) matrix:
- Gramian angular difference field (GADF) matrix:
- Generation of related GAF image.
2.2.3. Markov Transition Field (MTF)
- is the probability of the transition of the () form (i) state to the (j) state during one time step/unit of motion in the state space.
- Q is the number of states/size of the state space of the considered dynamical system.
- Discretize the time series to quantile bins.
- Build the Markov transition matrix.
- Compute transition probabilities.
- Compute the Markov transition field.
- Compute an aggregated MTF.
2.3. Convolutional Neural Networks as Deep Learning Systems Applied to Imaging Time Series Datasets
2.3.1. Convolutional Neural Networks (CNNs)
- Feature Extraction:
- Convolutional Layer: Basically, a convolution is defined as a mathematical operation in which two sets of information are merged to create a new set that contains more informative information about the task at hand. In this sense, the first information set is considered an image represented as a 2D matrix with one or three channels. The second information set is a convolution filter, usually a square matrix with dimensions of 2, 3, etc., that extracts specific features. These features can be low-level (such as contours, edges, angles, and colors) or high-level (such as shapes and objects). The extracted features are collected together to create a 2D “feature map.” The convolution process can also be applied to the feature map generated from the previous layer to create a new feature map with more complex and intricate features than the original.
- Nonlinear Activation Layer: It consists of nonlinear functions, such as the rectified linear unit (ReLU) and sigmoid functions. The correlation between the convolution process and the nonlinearity process (1) enables the network/model to learn/build complex representations and relationships with a “nonlinearity property” of the input/output data/information and (2) prevents the exponential growth of the computation required to run the neural network.
- Pooling Layer: In this step, the size of the feature map generated by the (Convolution + ReLU) layer is subsampled through the aggregation of features from local regions to learn invariant features and reduce computational complexity. In this process, the m-square matrix (usually or ) is shifted over the processed feature map using a predefined step called the stride. For each shift, a value (e.g., the average, maximum, or minimum) is computed and substituted in place of the originally processed values in the new feature map.
Sequences of these layers (convolutional layer, nonlinear activation function, and pooling), called hidden layers, can be stacked several times, and their final output yields the feature map. The deeper the feature extraction block of the CNN, the more complex the feature map, i.e., the feature map starts with low-level features and then becomes more complex as it passes through each hidden layer until it reaches high-level features at the end of the feature extraction block. - Flattening Layer: The final output of the last (Convolution + ReLU + Pooling) sequence is the last layer in the feature extraction stage and is formatted as a 2D matrix. However, the classification stage in the CNN model requires input to be formatted as a 1D matrix; thus, the “Flatten” process should be applied.
- Classification: The extracted and flattened features maps of the considered states will be processed by a Fully Connected (FC) network that consists of several feedforward layers.
- Probabilistic Distribution: To ensure that the considered classes follow a probabilistic distribution, a SoftMax Activation function is usually applied.
2.3.2. CNN Classification Systems Based on Time Series-to-Image Encoding
- Gross et al. [1] suggested the following concepts for building the classification system of a time series, as illustrated in Figure 10 [1]:
- Use the GAF, GASF, and GADF as time series imaging techniques to image the considered time series dataset;
- Employ benchmarking transfer learning strategies to improve the learning of new tasks through knowledge transfer from a related task that has already been learned (such as VGG16, VGG19, ResNet50V2, and Xception).
To evaluate the transfer learning strategies for benchmarking time series imaging, the following datasets were used, whose numerical specifications are listed in Table 1 [34]:- (a)
- Computers: These problems were taken from data recorded as part of a government-sponsored study called Powering the Nation. It aimed to collect behavioral data about how consumers use electricity within the home to help reduce the UK’s carbon footprint. The data contain readings from 251 households, sampled in two-minute intervals over a month. Each series is 720 data points long (24 h of readings taken every 2 min). Its classes are Desktop and Laptop.
- (b)
- DodgerLoopGame: Traffic data were collected with a loop sensor installed on a ramp for the 101 North freeways in Los Angeles. This location is close to the Dodgers Stadium; therefore, traffic is affected by the volume of stadium visitors. Class 1: Normal Day; Class 2: Game Day.
- (c)
- ECG200: Each series traces the electrical activity recorded during one heartbeat. The two classes are normal heartbeat and Myocardial Infarction.
- (d)
- AbnormalHeartbeat: Heartbeat recordings were gathered from both the iStethoscope Pro iPhone app and from clinical trials using the digital stethoscope DigiScope. The time series represent the change in amplitude over time during an examination of patients suffering from any four common arrhythmias.
- Buz et al. [20] developed another approach and application of time series-to-image transformation methods for classifying underwater objects (sonar signals), as shown in Figure 11. The sonar dataset developed by Gorman et al. [35] was used to evaluate Buz et al.’s method. This dataset contains the signals reflected from cylindrical mines and cylindrical rocks resembling these mines from different angles. The dataset contains 208 samples in time series format, where 111 samples belong to cylindrical mines and 97 samples to cylindrical rocks. The length of each time series is 60 numbers, ranging from 0.0 to 1.0. Each number represents the energy collected over a fixed period in a given frequency band [35].The core of the comparative analysis is the conversion of the sonar time series dataset into the related and individual 2D image dataset through the separate application of the GASF, MTF, and RP techniques. Then, the three related 2D image datasets are merged to generate a 3D image for each sample in the sonar dataset, as shown in Figure 12. The resulting 3D image dataset is used to train and evaluate the CNN architecture shown in Figure 11.
3. Novel Approach for Time Series-to-Image Encoding for Convolutional Neural Network-Based Classification
3.1. Grayscale Fingerprint Features Field Imaging Techniques
3.1.1. Motivation
- The transformation processes are characterized by a high degree of complexity.
- The resulting image structure exhibits diagonal symmetry, resulting in the presence of redundant and duplicated information.
- The resolution of an image is directly proportional to the square of the number of descriptive features of the time series data under consideration. Consequently, as the number of features increases, the image resolution improves. However, this enhancement in resolution necessitates a proportional increase in the time required to process the image, thereby achieving the desired objectives, such as classification.
- Only a transformation process based on two simple steps is applied to transfer the descriptive features/variables of each sample of the time series into the 2D image.
- The resulting image has a non-symmetric property, and its resolution is equal to the number of descriptive features/variables of the transformed sample.
3.1.2. Methodology
- is the ith descriptive feature.
- l is the number of descriptive features used/columns of the multivariate time series dataset.
- The descriptive features of each sample undergo Grayscale-based normalization, leading to new feature values between zero (black) and 255 (white):where
- is the normalized/scaled vector of the descriptive features of each sample considered.
- is the minimum value of the descriptive features of each sample considered.
- is the maximum value of the descriptive features of each sample considered.
- 255 is the scalar of grayscale processing.
- A G3FI image () is generated by reshaping (Reshape) the grayscale normalized vector of each considered sample into the () matrix as follows:where
- is the reshaping process of the grayscale normalized/scaled vector into the () matrix to generate a G3FI image.
- K is the number of rows in the G3FI image matrix ().
- L is the number of columns in the G3FI image matrix ().
3.2. Comparison of Images of State-of-the-Art and Novel G3FI Techniques
3.3. CNN Classification System Based on G3FI Image-Encoded Time Series
- ReLU is incorporated as the activation function in all applied layers.
- The batch normalization (BN) technique is implemented prior to the ReLU activation function of the convolutional layers. This is intended to avoid the phenomenon of vanishing gradients and to speed up model training.
- The training parameters are:
- –
- “Same” for padding
- –
- “He uniform variance scaling initializer” for the weight initializer
- –
- “Gradient descent” with a learning rate of 0.01 and a momentum of 0.9 for the optimizer
- –
- “Binary cross entropy” for the loss function
- –
- “Accuracy” for the evaluation metric
- –
- “Early stopping” with a monitor of “validation loss,” a mode of “min,” and a patience of 3
- –
- “Model checkpoint” with a monitor of “validation accuracy,” a mode of “max,”
- –
- 30 epochs, and batch size of 32.
4. Results and Discussion
- Comparative results for the Computer and DodgerLoopGame datasets;
- Superior results for the ECG200 and AbnormalHeartbeat datasets.
5. Conclusions and Future Work
- Investigating and analyzing the effect of the “K” and “L” parameters on the CNN model’s performance.
- Implementing a feature selection technique to identify the most relevant features before the imaging process.
- Application of the G3FI-CNN approach to multivariate time series datasets and evaluation using accuracy, recall, precision, and F1-score metrics.
- Identify the “K” and “L” parameters using machine learning regression tasks based on the dimensions of the multivariate time series dataset in question. This version implemented a mathematical calculation method to maximize the G3FI image square in the identification process.
- Application of other mainstream deep learning architectures besides the CNN structure as the classifier.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gross, J.; Baumgartl, R.; Hermann, B. Benchmarking Transfer Learning Strategies in Time-Series Imaging: Recommendations for Analyzing Raw Sensor Data. IEEE Access 2022, 10, 16977–16991. [Google Scholar] [CrossRef]
- Ferencz, K.; Domokos, J.; Kovács, L. Analysis of time series data for anomaly detection. In Proceedings of the 2022 IEEE 22nd International Symposium on Computational Intelligence and Informatics and 8th IEEE International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics (CINTI-MACRo), Budapest, Hungary, 21–22 November 2022. [Google Scholar] [CrossRef]
- Jantawong, P.; Hnoohom, N.; Jitpattanakul, A.; Mekruksavanich, S. Time Series Classification Using Deep Learning for HAR Based on Smart Wearable Sensors. In Proceedings of the 26th International Computer Science and Engineering Conference (ICSEC), Sakon Nakhon, Thailand, 21–23 December 2022. [Google Scholar] [CrossRef]
- Fu, T. A review on time series data mining. Eng. Appl. Artif. Intell. 2011, 24, 164–181. [Google Scholar] [CrossRef]
- Bottou, L. From machine learning to machine reasoning. Mach. Learn. 2011, 94, 133–149. [Google Scholar] [CrossRef]
- Artemios-Anargyros, S.; Evangelos, S.; Vassilios, A. Image-based time series forecasting: A deep convolutional neural network approach. Neural Netw. 2023, 157, 39–53. [Google Scholar] [CrossRef]
- Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef]
- Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-spatial residual networks for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
- Silberzahn, R.; Uhlmann, E.L.; Martin, D.P.; Anselmi, P.; Aust, F.; Awtrey, E.; Bahník, S.; Bai, F.; Bannard, C.; Bonnier, E.; et al. Many analysts, one data set: Making transparent how variations in analytic choices affect results. Adv. Methods Pract. Psychol. Sci. 2018, 1, 337–356. [Google Scholar] [CrossRef]
- Rawat, T.; Khemchandani, V. Feature engineering (FE) tools and techniques for better classification performance. Int. J. Innov. Eng. Technol. (IJIET) 2017, 8, 169–179. [Google Scholar] [CrossRef]
- Schmidt, J.; Marques, M.R.G.; Botti, S.; Marques, M.A.L. Recent advances and applications of machine learning in solid-state materials science. Npj Comput. Mater. 2019, 5, 83. [Google Scholar] [CrossRef]
- Janssens, O.; Slavkovikj, V.; Vervisch, B.; Stockman, K.; Loccuer, M.; Verstockt, S.; Van de Walle, R.; Van Hoecke, S. Convolutional Neural Network Based Fault Detection for Rotating Machinery. J. Sound Vib. 2016, 377, 331–345. [Google Scholar] [CrossRef]
- Schmitz-Valckenberg, S.; Göbel, A.P.; Saur, S.; Steinberg, J.; Thiele, S.; Wojek, C.; Russmann, C.; Holz, F.G. Automated retinal image analysis for evaluation of focal hyperpigmentary changes in intermediate age-related macular degeneration. Transl. Vis. Sci. Technol. 2016, 5, 3. [Google Scholar] [CrossRef]
- Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Müller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 512, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Tang, Y.; Blincoe, K.; Kempa-Liehr, A.W. Enriching feature engineering for short text samples by language time series analysis. EPJ Data Sci. 2020, 9, 26. [Google Scholar] [CrossRef]
- Smyl, S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int. J. Forecast. 2020, 36, 75–85. [Google Scholar] [CrossRef]
- Wang, Z.; Oates, T. Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks. In Proceedings of the Workshops at the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Hatami, N.; Gavet, Y.; Debayle, J. Classification of Time-Series Images Using Deep Convolutional Neural Networks. arXiv 2017, arXiv:1710.00886. [Google Scholar] [CrossRef]
- Buz, A.C.; Demirezen, M.U.; Yavanoglu, U. A Novel Approach and Application of Time Series to Image Transformation Methods on Classification of Underwater Objects. Gazi J. Eng. Sci. 2021, 7, 1–11. [Google Scholar] [CrossRef]
- Barra, S.; Carta, S.M.; Corriga, A.; Podda, A.S.; Recupero, D.R. Deep learning and time series-to-image encoding for financial forecasting. IEEE/CAA J. Autom. Sin. 2020, 7, 683–692. [Google Scholar] [CrossRef]
- Kiangala, K.; Wang, Z. An effective predictive maintenance framework for conveyor motors using dual time-series imaging and convolutional neural network in an industry 4.0 environment. IEEE Access 2020, 8, 121033–121049. [Google Scholar] [CrossRef]
- PYTS. A Python Package for Time Series Classification. Available online: https://pyts.readthedocs.io/en/stable/user_guide.html (accessed on 31 October 2023).
- Mitiche, I.; Morison, G.; Nesbitt, A.; Hughes-Narborough, M.; Stewart, B.G.; Boreham, P. Imaging Time Series for the Classification of EMI Discharge Sources. Sensors 2018, 18, 3098. [Google Scholar] [CrossRef]
- Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D. Recurrence Plots of Dynamical System. Europhys. Lett. 1987, 4, 973–977. [Google Scholar] [CrossRef]
- Ramirez-Amaro, K.; Figueroa-Nazuno, J. Recurrence Plot Analysis and its Application to Teleconnection Patterns. In Proceedings of the 15th International Conference on Computing, Mexico City, Mexico, 21–24 November 2006. [Google Scholar] [CrossRef]
- Caraiani, P.; Haven, E. The Role of Recurrence Plots in Characterizing the Output-Unemployment Relationship: An Analysis. PLoS ONE 2013, 8, e56767. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Wolfram MathWorld. Recurrence Plot. Available online: https://mathworld.wolfram.com/RecurrencePlot.html (accessed on 16 September 2025).
- Wang, Z.; Oates, T. Imaging Time-Series to Improve Classification and Imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar] [CrossRef]
- Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Demertzis, K.; Demertzis, S.; Iliadis, L. A Selective Survey Review of Computational Intelligence Applications in the Primary Subdomains of Civil Engineering Specializations. Appl. Sci. 2023, 13, 3380. [Google Scholar] [CrossRef]
- Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2019; pp. 445–480. [Google Scholar]
- UCR Time Series Classification Archive. The UCR Time Series Classification Archive. Available online: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 31 October 2023).
- Gorman, R.P.; Sejnowski, T.J. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1988, 1, 75–89. [Google Scholar] [CrossRef]















| Dataset | Number of Samples | Length of Time Series | Classes |
|---|---|---|---|
| Computers | 500 | 720 | 1: Desktop; 2: Laptop |
| DodgerLoopGame | 158 | 288 | 1: Normal Day; 2: Game Day |
| ECG200 | 200 | 96 | 1: Normal; 2: Ischemia |
| AbnormalHeartbeat | 606 | 3053 | 1: Normal; 2: Arrhythmias |
| Dataset | Total Number |
|---|---|
| AbnormalHeartbeat | 5,866,658 |
| Computers | 1,705,122 |
| DodgerLoopGame | 853,154 |
| ECG200 | 525,474 |
| Sonar | 394,402 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Al Joumaa, H.; Al-Shrouf, L.; Jelali, M. Time Series-to-Image Encoding for Classification Using Convolutional Neural Networks: A Novel and Robust Approach. Mach. Learn. Knowl. Extr. 2025, 7, 155. https://doi.org/10.3390/make7040155
Al Joumaa H, Al-Shrouf L, Jelali M. Time Series-to-Image Encoding for Classification Using Convolutional Neural Networks: A Novel and Robust Approach. Machine Learning and Knowledge Extraction. 2025; 7(4):155. https://doi.org/10.3390/make7040155
Chicago/Turabian StyleAl Joumaa, Hammoud, Loui Al-Shrouf, and Mohieddine Jelali. 2025. "Time Series-to-Image Encoding for Classification Using Convolutional Neural Networks: A Novel and Robust Approach" Machine Learning and Knowledge Extraction 7, no. 4: 155. https://doi.org/10.3390/make7040155
APA StyleAl Joumaa, H., Al-Shrouf, L., & Jelali, M. (2025). Time Series-to-Image Encoding for Classification Using Convolutional Neural Networks: A Novel and Robust Approach. Machine Learning and Knowledge Extraction, 7(4), 155. https://doi.org/10.3390/make7040155

