Bitcoin Analysis and Forecasting through Fuzzy Transform

: Sentiment analysis to characterize the properties of Bitcoin prices and their forecasting is here developed thanks to the capability of the Fuzzy Transform (F-transform for short) to capture stylized facts and mutual connections between time series with different natures. The recently proposed L p -norm F-transform is a powerful and ﬂexible methodology for data analysis, non-parametric smoothing and for ﬁtting and forecasting. Its capabilities are illustrated by empirical analyses concerning Bitcoin prices and Google Trend scores (six years of daily data): we apply the (inverse) F-transform to both time series and, using clustering techniques, we identify stylized facts for Bitcoin prices, based on (local) smoothing and ﬁtting F-transform, and we study their time evolution in terms of a transition matrix. Finally, we examine the dependence of Bitcoin prices on Google Trend scores and we estimate short-term forecasting models; the Diebold–Mariano (DM) test statistics, applied for their signiﬁcance, shows that sentiment analysis is useful in short-term forecasting of Bitcoin cryptocurrency. volatility and a student-t distribution gives the best results. The same topic of Bitcoin-realized volatility


Introduction
The goal of the present paper is to analyze the connection between Bitcoin returns and the level of interest in the world wide web through two types of regressions: quantile and expectile through inverse and direct F-transform. The Bitcoin observed time series is modelled through fuzzy-valued functions, whose level-cuts can be interpreted in the setting of expectile and quantile fuzzy regressions; these last are introduced in ([1], [2]) as non-parametric smoothing methodologies and are constructed by defining fuzzy-valued expectile (L 2 -norm) and quantile (L 1 -norm) extensions of the F-transforms. We recall that F-transform has been introduced by [3] (see also [4], [5], [6], [7]) Quantile regression is also applied in [8] to show that Bitcoin reacts positively to uncertainty at both higher quantiles and shorter frequency movements of Bitcoin returns.
Following recent research on financial time series, where the properties of quantile and expectile modelling are discussed with respect to coherent and elicitable risk measures (see [9], [10]), expectile methods seem to compete favourably with quantiles; furthermore, some recent papers (e.g., [11]) suggest to adopt L p -norm based procedures with p between 1 and 2 (e.g. p ∈ {1.25, 1.5, 1.75}) in order to adjust for tail behaviour according to robust Extreme Value Theory.
Literature about Bitcoin has hugely grown in the last years and some papers deserve a citation. An exhaustive analysis of Bitcoin and its statistical properties are explored in [12] by a comparison with standard currencies dynamics. Yermack in [13] shows that Bitcoin does not satisfy the three main properties called medium of exchange, unit of account and store of value, concluding that it is not a currency but rather a speculative asset. However, a debate is still open about the nature of Bitcoin and many hints can be found in [14], [15] and [16]. A shared property by financial instruments is the day-of-the-week effect and in [17] the same effect is proved for Bitcoin returns and volatility through OLS and GARCH model. The possibility of global economic policy uncertainty to produce valid information to improve the prediction of returns and volatility in the Bitcoin market is detailed in [18].
Research on possible forecasting models to be used as decision support tools in investment strategies is more recent; in [19], monthly data are considered and it is shown that the predictive ability of the internet-based economic uncertainty related queries index is statistically stronger than the measure of uncertainty derived from newspapers in predicting Bitcoin returns. The nexus between Bitcoin prices and market sentiment is further studied in many papers: in [20] sentiment is shown to explain about 2.5% to 5% of the unusual level of price clustering in Bitcoin. In [21] the cross-correlations between Google Trends and Bitcoin market is analyzed through the Multifractal Detrended Cross-correlation Analysis (MF-DCCA) method and in [22], within the more general context of Dow Jones Industrial Average, it is shown that Google searches are power-law correlated with Hurst exponents between 0.8 and 1.1 and that there is no universal relationship between the online search queries and some financial measures. In [23] investor sentiment regarding Bitcoin is introduced because of its significant information for explaining changes in Bitcoin volatility for future periods; on this basis Bitcoin is proved to be an investment asset with high volatility and dependence on investor sentiment rather than a monetary asset.
In a more general framework, in [24], interactions between (mass) media reporting and financial market movements are measured with particular focus on the property of sentiment as a predictors of securities' prices.
A non-parametric forecasting model based on technical analysis is presented in [25] focusing on the presence of predictive local non-linear trends that reflect the speculative nature of crypto-currency trading. In [26] a computational intelligence technique that uses a hybrid Neuro-Fuzzy controller is introduced in order to forecast the direction in the change of the daily price of Bitcoin and its performance is the best when compared with two other computational intelligence models.
Forecasting of Bitcoin risk measures is developed in [27] by comparing predictability of the one-step-ahead volatility with Value-at-Risk using several volatility models.
Many other authors approach general cryptocurrencies properties. For example, in [28] there is evidence that Bitcoin is the most influential among digital coins both as a transmitter toward digital currencies and as a receiver of spillovers from virtual and traditional instruments. An extended analysis is also presented in [29] where the four crypto-currencies Bitcoin, Ethereum, Ripple and Litecoin are predicted through a combination of eight models revealing that a combination of stochastic volatility and a student-t distribution gives the best results. The same topic of Bitcoin-realized volatility forecasting is studied in [30] where conventional regression models are substituted by least squares model-averaging methods and no investor sentiment is modelled.
In [31] and [32] a continuous time model for Bitcoin price dynamics is studied in order to detect bubbles; regarding the existence of a bubble, in [33] it is proved it holds from early 2013 to mid 2014, but, not in late 2017 as it was supposed. Evidence of bubbly Bitcoin behavior, mainly in the 2017-2018 period, is shown in [34], where it is also proved that economic policy uncertainty and stock market volatility play the most important role in bitcoin values.
The evidence for the Bitcoin bubble is confirmed in [35] through the empirical validation of three properties: volume of trading is mainly explained in terms of price dynamics, trading is based exclusively on past prices and the price of Bitcoin is an explosive process.
In [36] an thorough analysis is conducted: several alternative univariate and multivariate models for point and density forecasting of crypto-series are compared, finding statistically significant improvements in point forecasting when using combinations of univariate models, and in density forecasting when relying on the selection of multivariate models.
Various deep learning-based Bitcoin price prediction models are studied in [37] using Bitcoin blockchain information; regression and classification problems are addressed in the sense that the first predicts the future Bitcoin price and the second one predicts whether or not the future price will go up or down.
In the case of Bitcoin prices using high frequency data, in [38] it is shown that it exists a large degree of multi-fractality in all examined time intervals which can be attributed to the high kurtosis and the fat distributional tails of the series returns; in [39] there is evidence about the leverage effect as the most powerful effect in volatility forecasting; volatility is also analysed in [40] in terms of the property of the long memory parameter to be significant and quite stable for both unconditional and conditional volatilities at different time scales. Extending the study to several high frequency cryptocurrencies data, in [41] the investigation on stylized facts is developed in terms of the Hurst exponent of dependence between four different cryptocurrencies.
Also in [42] multi-fractality of Bitcoin time series is investigated, confirming that both temporal correlation and the fat-tailed distribution are the main sources, in addition in [43] a possible use of multi-fractal parameters in Technical Analysis is suggested.
The paper is organized into six sections: preliminary facts on methodology concerning Fuzzy Transform are presented in section two, while the empirical experiments concerning Bitcoin prices are detailed in sections three and four. Possible forecasting techniques are shown in section five thereafter the last one closes with some hints for future research paths.

Fuzzy-Transform Smoothing
We introduced in [44] and then we enhance in [1] two non-parametric smoothing methodologies called expectile and quantile Fuzzy-transform; the first one is based on the classical direct F-transform and it is obtained by minimizing a least squares (L 2 -norm) operator while the second one is based on the L 1 -type direct F-transform and it is obtained by minimizing an L 1 -norm operator.
Some preliminary notions compose the research framework: a fuzzy set is a mapping u : R −→ [0, 1] and a fuzzy interval is a fuzzy set on R with the properties that the mapping u is The space of real fuzzy intervals is denoted R F and the mapping u ∈ R F satisfies what follows: For a given real compact interval [a, b], a generalized r-partition is defined by a triplet (P, A, r) where r ≥ 1 is integer, P = x j = a + j−1 n−1 (b − a); j = 1, 2, ..., n , n ≥ 2, is a uniform decomposition of [a, b]; for simplicity of notation, if r > 1 we extend P by adding r − 1 points x 1−j = a − j b−a n−1 , j = 1, ..., r − 1 on the left of a and r − 1 points x n+j = b + j b−a n−1 , j = 1, ..., r − 1 on the right of b. The second term of the triplet is a family A = {A −r+2 , ..., A 1 , A 2 , ..., A n , ..., A n+r−1 } of n + 2r − 2 continuous fuzzy sets on R, called basic functions, that satisfy the following condition for all x ∈ [a, b] and are such that A k (x k ) = 1, for k = 2 − r, ..., 1, 2, ..., n, ..., n + r − 1, A k (x) = 0 for all x / ∈ ]x k−r , x k+r [. If r = 1, the partition (P, A, 1) will be simply denoted by (P, A). Families of basic functions can be obtained in terms of increasing shape functions such as rational splines of the form with real parameters β 0 ≥ 0, β 1 ≥ 0; the Hermite-type conditions L(0) = 0, L(1) = 1, L (0) = β 0 , L (1) = β 1 are satisfied and L (t) ≥ 0 for all t ∈ [0, 1]. By any pair of non-negative values β 0 , β 1 , a large number of shape functions can be generated; for example, if β 0 we have a quadratic function L(t) = (1 − β 0 )t 2 + β 0 t, e.g., L(t; 2, 0) = 2t − t 2 , L(t; 0, 2) = t 2 and L(t; 1, 1) = t is linear. Each basic function A k , k = 2 − r, ..., 1, 2, ..., n, ..., n + r − 1, increasing on [x k−r , x k ] and decreasing on [x k , x k+r ], is obtained by translating t → L(t, β 0 , β 1 ) and t → L(1 − t, β 0 , β 1 ) from [0, 1] onto [x k−r , x k ] and [x k , x k+r ], respectively (each A k is finally extended to R by setting A k (x) = 0 on the left of x k−r and on the right of x k+r ).

L 2 -norm F-transform in expectile smoothing
We just recall the discrete version of the direct F-transform.

Definition 1. (from [3]) Given a set of m values
, then the discrete direct L 2 -type F-transform of Y with respect to (P, A) is the n-tuple of real numbers (F 1 , ..., F n ) where each component F k minimizes the .., n. The associated inverse F-transform function is defined by More generally, we consider a generalized r−partition (P, A, r) and substitute the direct F-transform components F k , with an (n + 2r − 2)-tuple of polynomials of order q ≥ 0, say The q + 1 coefficients F k,j , j = 0, 1, ..., q are obtained, for fixed k, by minimizing the function Φ k (y 0 , ..., with respect to the parameters y 0 , ..., y q , under the assumption that, for each k, the data points (t i , f i ) with t i in the interval x max(a,k−r) , x min(b,k+r) produce a unique optimal solution. The details are shown in [1].
define the α-cuts of a fuzzy number u k ∈ R F having membership function where each fuzzy interval F k has α-cuts U k,α given by (4) in Proposition 1, is called the discrete direct expectile fuzzy-valued F-transform of f with respect to (P, A, r), based on the data-set Y. The corresponding inverse expectile fuzzy-valued iF-transform is the fuzzy-valued function defined by The fuzzy-valued function f (P,A,r) (x) is well defined as indeed each basic function A k has non-negative values for each x ∈ [a, b]. The α-cuts U k,α of F k will be denoted by and the α-cuts of the fuzzy-valued function f (P,A,r) (x), x ∈ [a, b], will be given by When α = 1 we obtain the standard direct F−transform and the standard iF−transform function, corresponding to the core of the fuzzy-valued iF-transform.

L 1 -norm F-transform in quantile smoothing
The L 1 -norm direct and inverse F-transform are defined as follows.

Definition 3. Given a set of m values
] contains at least one point t i in its interior, then the discrete direct L 1 -type F-transform of Y with respect to (P, A) is the n-tuple of real numbers Also in this case, we consider a generalized r−partition (P, A, r) and substitute the direct F-transform components G k , with an (n + 2r − 2)-tuple of polynomials of order q ≥ 0, say The q + 1 coefficients G k,j , j = 0, 1, ..., q are obtained, for fixed k, by minimizing the function with respect to the parameters y 0 , ..., y q (see details in [1]).
The corresponding L 1 -type inverse F-transform function is given by The inverse F-transform function of order zero (p = 0) becomes with the (n + 2r − 2)-tuple of the L 1 -type direct F-transform (G 2−r , ..., G n+r−1 ). We recall from [1] that the quantile direct F-transform is defined in terms of the minimizers of the convex functions, for k = 1, ..., n and ω ∈]0, 1], As detailed in [1], the minimization of Ψ k,ω (η) produces the family of compact intervals, for α ∈ [0, 1] and ω ∈ α 2 , 1 − α 2 , and we obtain the α-cuts of a fuzzy number v k ∈ R F having membership function .., m} and a fuzzy r-partition (P, A, r) of [a, b], the (n + 2r − 2)-vector of fuzzy numbers where each fuzzy interval G k has α-cuts V k,α , is called the discrete direct quantile fuzzy transform of Y with respect to (P, A, r).
The corresponding inverse quantile fuzzy transform of f is the fuzzy-valued function defined by Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 November 2020 doi:10.20944/preprints202011.0055.v1 Denoting the α-cuts V k,α of G k by then, the α-cuts of the corresponding fuzzy-valued function f (P,A,r) (x), x ∈ [a, b], will be given by

General L p -norm based discrete F-transform
The general L p -norm based F-transform has been analysed in detail in [45] for the continuous case. Its interest in time series applications is motivated by recent literature on tail behaviour of economic and financial time series (see, e.g., [11]) and in modelling risk measures ( [9], [10]): L p -norm estimation with 1 < p < 2 has been suggested to balance robustness and fitting properties.

Proposition 2. Given the set of minimizers
define the α-cuts of a fuzzy number w k ∈ R F having membership function where each fuzzy interval H k has α-cuts W k,α given by (9) in Proposition 2, is called the discrete direct L p -norm fuzzy-valued F-transform with respect to (P, A, r), based on the data-set Y. The corresponding inverse L p -norm fuzzy-valued iF-transform is the fuzzy-valued function defined by The α-cuts W k,α of H k will be denoted by and the α-cuts of the fuzzy-valued function f When α = 1 we obtain the L p -norm direct F−transform and inverse iF−transform function, corresponding to the core of the fuzzy-valued L p -norm iF-transform.

Analysis of Bitcoin prices and Google trends by F-transform
To focus on the strength of fuzzy-valued L p -norm F-transform smoothing, we will apply the proposed models to the time series of Bitcoin prices, that has received many attentions by regulators and investors in the last decade and because it deserves more in-depth analysis to capture its intrinsic nature.
Bitcoin was released at the beginning of 2009 as one of the first digital currencies in the market, it remained under $0.20 for three years, it began to increase during the first quarter of 2013. By the end of 2017, Bitcoin was valued at nearly $18,000 per "coin". In 2018, the price plummeted $4000 and it grew again in 2019.
The second dataset we consider is Google Trends, the search index that measures what people are currently interested in and curious about. In particular, we consider Google Trends with value 100 out of 100 meaning that trend (word Bitcoin) is on its peak in the considered time period.
Here, we work on two daily time series, as in Fig. 1, from April 2013 to June 2019: Bitcoin prices (from www.blockchain.info) and Google Trends GT100 (from https://trends.google.com). Remark that the F-transform (direct and inverse) is linear with respect to the data set, in particular it is homogeneous and scale invariant: we can normalize the two time series and the direct F-transform components (or the iF-transform function) are multiplied by the same factor. In this way, we can compare the F-transform results for the two series in terms of the obtained smoothing effect and by visualizing the scatter plots of the of each series and the obtained iF-transform reconstructions.
The degree of smoothness of a given time series f t , t = 1, 2, ..., M is measured in terms of its (average) absolute variation, given by on the other hand, it is well known that the inverse L p -norm F-transform function, for a fixed r-partition (P, A, r), allows to compute the smoothing values The corresponding absolute variation, given by represents the proportion of absolute variation which remains in the smoothed time series f (r) with respect to the original data f , while 1 − L( f , f (r) ) is the amount of removed variation. We have computed the L p -norm F-transforms for different values of p ∈ {1, 1.25, 1.5, 1.75, 2} and orders q ∈ {0, 1, 3}. The excellent performances of the smoothing based on F-transform are strongly confirmed for these two special time series, as summarized in Table 1 and Table 2.
All computations are performed with a 1-partition (r = 1), a decomposition P of total variation expressed by the ratios L( f , f (r) ), for both series, is more depending on the order q than on the used norm L p ; this is not surprising, because increasing the degree of local polynomials will reduce the average fitting errors but increase their variation. In Table 2 the smoothing and fitting of the time series by L p -norm F-transforms are compared in terms of three well known indices: the mean square error MSE, the mean absolute percentage error %MAE and the Kendall τ rank correlation. Here the series are not normalized (in particular, the value of index MSE depends on the scale of the series). In all cases, the F-transform fitting for Bitcoin series has significantly smaller errors than for GT100, as is evidenced by indices %MAE and τ (the Kendall τ is always significantly positive with p-value less than 10 −8 ).
For the fitting F-transform functions obtained by L p -norm with p = 1.5 and q = 0, the scatter plots of time series ( f t , f r t ) are pictured in Figure 2; Figure 3 plots f t and f r t with respect to time.  The research question we want now to investigate is the following: the higher is GT100 about the word Bitcoin, the higher Bitcoin returns can be expected?
First of all, a scatter plot of the pairs is pictured in Figure 4 (the two time series are normalized in the range [0, 1000] and GT100 appears in horizontal axis); observations in our data-set cover the cited time period 2013-2019, are concentrated on the bottom-left part of the positive quadrant and have rare points with big values in both series. At a first look, no evident functional relationships emerge from the data; they simply show a tendency to be co-monotonic, but the points are very sparse. For a deeper analysis, we propose a model based on F-transform, relating the pairs of data (GT100 t , BitCoin t ). F-transform is thus used to model Bitcoin as a function of GT100. The L 1 -norm and L 2 -norm based inverse iF-transforms of the data-set (GT100 t , BitCoin t ), t = 1, 2, ..., M are computed; taking into account the sparsity of the values GT100 t (in particular above the threshold 40), we use a non-uniform 1-partition (P, A) of the range [0, 100] of the observed GT100 t , namely the set of 25 nodes {0, 2,3,4,5,6,7,8,10,12,14,16,18,20,22,24,26,28,30,35,40,45,50, 60, 100}, as pictured in Figure 5. The two curves give the predominant relationship between BitCoin and GT100. It is evident that both iF-transforms of BitCoin t are not increasing on the whole range of GT100 (in particular, they decrease when GT100 is around 7 − 8, around 20 and around 40). On the other hand, observing the dispersion of points in Figure 4, both iF-transforms are not good fitting of the complete data set and it appears that the points can be "clustered", e.g., for different levels of Bitcoin prices, and better sub-fittings can be attempted. This can be better analysed by applying the quantile and the expectile F-transforms to our data set (GT100 t , BitCoin t ), pictured in Figure 6, which shows the fuzzy-valued expectile F-transform of Bitcoin as a function of GT100 for five different α-cuts corresponding to α ∈ {0.01, 0.25, 0.5, 0.75, 1} (i.e. ten values ω ∈ { α 2 , 1 − α 2 ; all α} for the asymmetry expectile parameter ω). We see that, corresponding to different values of α, i.e., corresponding to subintervals in the range of Bitcoin prices, the relationship between our time series changes significantly.
This suggests that, possibly, the clustering of the data into subsets may significantly improve the quality of fitting.

F-transform of clustered subsets of observed data
Clearly, there are several procedures and criteria to cluster the observed data (GT100 t , BitCoin t ); we use the well-known k-means method and the number of clusters is selected according to the silhouette measures available in MATLAB R2018b. We have performed three types of clusters, the first on the basis of variable BitBoin, the second using the pair (BitCoin, GT100) and the third using observations (BitCoin, GT100, ∆BitCoin, ∆GT100) where ∆ is the first difference operator ∆ f t = f t − f t−1 . The clustering measure is the standard euclidean distance.
Let nCl denote the number of clusters and let j = 1, 2, ..., nCl be the labels of cluster S j . Each observation (BitCoin t , GT100 t ) is assigned to cluster S c(t) , i.e., the observation t is assigned to cluster labelled c(t) ∈ {1, 2, ..., nCl}.
For a given clustering, identified by clusters S 1 , S 2 , ..., S nCl , the L 2 -norm based F-transform is applied (independently) on each subset of data, for j = 1, 2, ..., nCl, Finally, for the observations of each cluster, the inverse iF-transform is computed and the fitted values, for each cluster, are obtained and recomposed to obtain the fitted values for the whole data set.
If all the data are collected in a unique cluster and the F-transform is applied to the whole data set, we obtain the fitted BitCoin series pictured in Figure 7: the green points give the observed BitCoin t , the red points are the GT100 t series and in blue is the fitting of Bitcoin. We see that the fitting preserves  Significant improvements are obtained by adopting the three described pre-clustering, denoted respectively by labels A, B and C. The computations are performed on the second half portion of the time-period, starting with observation time t 1121 ; the first part is less interesting because, from observation t 350 to t 1220 both time series have small variations and relatively flat curves. Without performing pre-clustering, the L 2 -norm F-transform reconstruction (of order 1) of BitCoin in terms of GT100 has Kendall correlation τ = 0.6049 and Spearman correlation ρ = 0.7830. We will compare τ and ρ indices as preliminary evaluation of the effect of pre-clustering on the fitting quality.
Clustering A. Clusters are based on variable BitCoin: the number of clusters is nCl = 20. The L 2 -norm F-transform reconstruction (of order 1) of BitCoin in terms of GT100 with pre-clustering A has much higher Kendall correlation τ = 0.9457 and Spearman correlation ρ = 0.9956.   In Figure 8 we plot the observed and fitted Bitcoin series for the second half of observations, with evidence that clustering A allows a much better fitting. The 20 clusters are pictured (blue colours) in Figure 9 and expanded in 10 where also the sub-fittings are visible.
Clustering B. Clusters are based on both variables (BitCoin, GT100): the number of clusters is nCl = 21. The L 2 -norm F-transform reconstruction (of order 1) of BitCoin in terms of GT100 with pre-clustering B has high Kendall correlation τ = 0.9447 and Spearman correlation ρ = 0.9954, similar to clustering A.   In Figure 11 we plot the observed and fitted Bitcoin series for the second half of observations, with evidence that clustering B allows a good fitting. The 21 clusters are pictured (blue colours) in Figure 12 and expanded in 13 where also the sub-fittings are visible.
Clustering C. Clusters are based on variables (BitCoin, GT100, ∆BitCoin, ∆GT100); the number of clusters is nCl = 24. The L 2 -norm F-transform reconstruction (of order 1) of BitCoin in terms of GT100 with this pre-clustering has high Kendall correlation τ = 0.9183 and Spearman correlation ρ = 0.9905, similar but not better than pre-clustering A and B.   In Figure 14 we plot the observed and fitted Bitcoin series for the second half of observations, with evidence that also clustering C allows a good fitting. The 24 clusters are pictured (blue colours) in Figure 15 and expanded in 16 where also the sub-fittings are pictured.
The overall result is that pre-clustering of the data, even based on very simple clustering strategies and a relatively small number of clusters (from 20 to 24) significantly improves the fitting ability of F-transform.
It is also interesting to see that the form of relationships between BitCoin and GT100 is very different for each cluster; this has important consequences on the analysis and modelling of Bitcoin time series as, in particular, it follows different paths in various sub-periods of time and in cases of rare values of the data (e.g., big values and/or big absolute changes).

Stylized facts identified by F-transform components
In this section we will analyse the local F-transform components, in particular the form of polynomials ϕ k of L 2 -norm F-transform, of orders q = 1 and q = 2 described in section 2.1, applied to the Bitcoin time series. We have selected the number m of daily observations such that m − 1 is a multiple of 7 (Bitcoin and GT100 are observed all the days in the year): in this way, the available data are m = 2276.
Consider an r-partition (P, A, r) with nodes x k , k = 1, 2, ..., n; denoting the time points of observations simply by t = 1, 2, ..., m (or t j = j for j = 1, ..., m) we consider two uniform (P, A, r): (P a ) -a dense partition with n = m and x k = k (i.e., each observation is a node) and the bandwidth r is chosen such that each open interval I (P b ) -a sparse partition with n = 326 and x k = 1 + 7(k − 1) (i.e., there is a node every 7 observations) and the bandwidth r = 3 is chosen. The (direct) F-transform components will span 21 observed values on each side (left or right) of the nodes.

F-transform fitting with dense r-partition
In the dense partition case, we obtain the best L 2 -norm F-transform components ϕ k (t) associated to all observations (indeed, x k = k corresponds to all observed times k = 1, ..., n = m); in this way, we are able to estimate the local trend around every observation and se can follow the time evolution of trends by plotting ϕ k (t) around x k on subintervals I (r) k (see Figures 17 and 18). On the other hand, if we translate vertically the polynomials ϕ k (t), the polynomials ϕ k (t) = ϕ k (t) − φ k,0 are such that ϕ k (x k ) = 0 for all k and, if q > 0, we can cluster the ϕ k by clustering the set of vectors (φ 1,k , ..., φ q,k ) of the estimated coefficients. If q = 1 we obtain a set of lines through the origin with different slopes (in terms of a single variable φ 1,k ); if q = 2 we obtain a set pf parabolic functions through the origin, in terms of two variables φ 1,k and φ 2,k .  Using the k-means clustering method with the Euclidean distance and testing the number of clusters using the silouhette values, we have that the best number of clusters is nCl = 9 when q = 1 and nCl = 15 when q = 2.
The interpretation of nCl = 9 clusters (characterised by variable φ 1,k ) is interesting, because we have a central cluster of local trends with slope around 0 and other eight clusters characterized by slopes ranging from very negative values (cluster 1) to intermediate negative values (cluster 3) up to intermediate positive (cluster 7) to very positive slopes (cluster 9).
Analogous interpretation is applied to the case of nCl = 15 clusters (and q = 2), where again cluster 1 corresponds to the most negative slope, cluster 8 to almost zero slope and cluster 15 to the most positive slope; clearly, also the degrees of concavity and convexity (represented by the second variable φ 2,k ) are taken into account in this case. Figure 19 pictures, for each cluster S (1) c , c = 1, 2, ..., 9, the first order shifted polynomials ϕ k (t) = φ 1,k (t − x k ) assigned to S Similarly, Figure 20 pictures, for each cluster S (2) c , c = 1, 2, ..., 15, the second order shifted polynomials c (red colours) and the centroid polynomial (blue colour) obtained by averaging the pairs of parameters (φ 1,k , φ 2,k ).
As we have said, the nCl centroid polynomials, identified by averaging the parameters of all elements assigned to each cluster S c , can be considered as the stylized forms of the local trends. If we identify each estimated trend by the centroid of its cluster, we then have nCl stylized forms, one for each cluster, that form the possible typical trends around the observed points of the time series.  As a last step, we can produce a simple analysis of how good the stylized trends represent the effective observations: let's denote by f q (P a ,A,r) (t j ) the standard inverse iF-transform values at times t j obtained with estimated direct F-transform (polynomial) components ϕ k (t), i.e., denote by f q c,(P a ,A,r) (t j ) the analogous expression obtained by substituting each local polynomial ϕ k (t) by ϕ k (t) = φ 0,k + φ 1,k (t − x k ) + ... + φ q,k (t − x k ) q (here q = 1 or q = 2), where the parameters φ 1,k or the pairs ( φ 1,k , φ 2,k ) are the ones that identify the centroid of cluster containing time k, i.e., if an observation belongs to cluster S c we substitute the computed local trend with the local trend of the corresponding centroid: Essentially, we identify the elements of each cluster by its centroid and we estimate its goodness in terms of the vicinity between the modified version f q c,(P a ,A,r) (t) at times t j = 1, ..., M and the observed data f j = BitCoin j . In Figures 21 and 22 the data f j (green colours) and f q c,(P a ,A,r) (t j ) (black colours) are plotted for all the data with r = 4 and r = 7, respectively.
Scatter plots of the same values for the cases r = 4, r = 7 are pictured in Figures 23 and 24; remark in particular that the iF-transform values and the modified values have a very high correlation and the two values are very near to each-other on the whole range of small and big values of observed prices.    Finally, it is interesting to observe the time evolution of the different clusters (see Figure 25). Remark that in the first part of the time series, the data (i.e., the local trends) persist into the quasi-zero slope (quasi-constant time series); after observation 1400, frequent changes in the local Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 November 2020 doi:10.20944/preprints202011.0055.v1 trends appear evident, but changes seem to be gradual from one class to a near one and only rarely the local trends jump from a form to a very different one. This appears clearly from the transition matrix P = [prob(i, j)], i, j = 1, 2, ..., 9 given below (prob(i, j) is the probability that local trend of cluster i moves to class j): we see that matrix P is essentially tridiagonal.

F-transform fitting with sparse r-partition
The results obtained with the dense r-partition P a are confirmed when using the sparse r-partition P b . The computations are performed only for the bandwidth r = 3, corresponding to n = 326 nodes of P b and 42 = 2 * 7 * 3 data belonging to each interval [x k−r , xk + r].   The local trends are clustered into nCl = 9 groups when q = 1 and nCl = 15 groups when q = 2, pictured in Figures 28 and 29. It appears that , for q = 1, clusters 8 and 9 (and clusters 1, 13, 14, 15, when q = 2) contain very few elements and possibly, in this sparse case, the number of clusters should be reduced to 7 and 11.  The substitution of estimated local trends with the ones obtained from centroids of each cluster, produces the smooth reconstructions represented in figures 30 and 31, respectively; the scatter plots of the data and their smoothing with standard F-transform and the modified versions, are plotted in Figures 32 and 33. We remark that, corresponding to a stronger smoothing effect obtained with a smaller number of nodes in the decomposition (now it is sparse) the quality of the fitting is reduced.

Forecasting Bitcoin Prices with GT100 index
An attempt for forecasting is based on a local polynomial (of other functional form) of F-transform components, used to model the Bitcoin Prices, such as where ϑ k , k = ..., N − 2, N − 1, N is the k-th local trend function (k-th direct F-transform component) and g 1 (t), ..., g s (t) are delayed versions of the Google Trends GT100(t) and/or BitCoin(t) series. Suppose we are interested to a forecasting model by which, having available observations of BitCoin(t) and GT100(t) at times t = ..., T − 2, T − 1, T up to time T, we like to construct a forecast of BitCoin(T + l) for l steps ahead. To do this, we estimate the direct F-transform components (13) with appropriate q, s and values g 1 (t − l), ..., g s (t − l), obtained from observed values of BitCoin(t − l) and/or GT100(t − l) for t = ..., T − 2, T − 1, T; then, e.g., using the last estimated trend function ϑ N (t) we approximate BitCoin(T + l) with ϑ N (T + l). This is always possible if the fuzzy r-partition is such that T + l ∈ ]T = x N , x N+r [. Alternatively, if r > 1, we can approximate BitCoin(T + l) by computing the inverse F-transform at time T + l as the combination of local trends ϑ k (T + l) that have positive weights A k (T + l), the basic functions active at time T + l. Clearly, this construction may be good and reasonable only for short-term forecasting. In our experiments we have used the first approximation only to forecast Bitcoin prices BitCoin(T + l) with l ∈ {1, 2, 3, 4, 5}. The reported results are given for the last 1200 values of the available time series. We have used q ∈ {0, 1, 2}, r ∈ {1, 2} and two cases of functions g j (t): Model A: s = 1 and g 1 (t) = GT100(t); Model B: s = 2 and g 1 (t) = GT100(t), g 2 (t) = BitCoin(t − 1), i.e., by adding an autoregressive term.
Our simple forecasting model is obtained by the following three steps: Step 1. We start with one of the fitting model obtained in the analysis in previous sections and we chose the pair of values (r, m r ) where r ≥ 1 is the bandwidth of the r-partition and the associated value m r is the number of time observations used to estimate the parameters of local trend functions ϑ k (t). We have found that m r observations on each subinterval of the partition, in the range m r = 11, ..., 25, produces in general the best fitting results: two or three weeks of data are sufficient to obtain the forecast.
Step 2. The parameters in ϑ k (t) are estimated using the L p -norm based criterion: we assume p = 1.5 as a good intermediate value between quantile (p = 1) and expectile (p = 2) estimators.
Step 3. Each l-steps ahead forecast value f T+l for the last more recent 1200 available observations ending at time T f inal = 2242 is obtained from ϑ N (t), where N = 2 + r is the number on intervals in the partition covering m r data that terminate at time T: forecast f T+l is then estimated from data BitCoin t , t = T − m r + 1, ..., T, GT100 t , t = T − m r − l + 1, ..., T − l for g 1 (t − l) and, in Model B, BitCoin t , t = T − m r − l + 1, ..., T − l for g 1 (t − l). In this way, we can compute f T+l = ϑ N (T + l) as the needed values g 1 (T + l) = GT100 T and g 2 (T + l) = BitCoin T are available from observations. For the fitting (i.e., with l = 0) and forecasting models (with l > 0) we report the mean square error (  We see from Table 3 that the fitting of Bitcoin and GT100 time series become significantly better by increasing the order q; it is also interesting to remark that GT100 fitting is much less precise that Bitcoin fitting (with the same p and q). On the other hand, polynomials of orders q > 1 are not useful for extrapolation as they tend to be highly oscillating for a lag l > 1. For these reasons, only three pairs (p, q) as in Tables 4 and 5 are considered and we see that, for forecasting, q = 1 gives good results for small lag l while q = 0 is better for higher lags.  The forecast Bitcoin time series for lags l = 1, 2 obtained with model A are pictured in Figure 34; for lags l = 4, 5 and model B are plotted in Figure 35; Figure 36 plots the pairs ( f Ser t , f For t ) of (observed, forecast) time series for lags l = 1, 2, 4, 5.   To conclude this section on forecasting Bitcoin time series, we shortly explore the statistical significance of the proposed models A and B. Following some ideas in [25], we compare our forecast estimates with the so called random walk model, assumed as a benchmark for single step forecasting, i.e., with lag l = 1. Defining the return series as f Ret t = log( f Ser t f Ser t−1 ), the random walk forecast r T of the returns at time T is defined in terms of a fixed time horizon of s observations f Ret t , t = T − s + 1, ..., T ending at time T by Ret t = r T + ε t , ε t ∼ ID(0, σ T ), t = T − s + 1, ..., T. forecast of f Ser T+1 from the definition of return at time T + 1 and obtain f rw T+1 = exp( r T ) f Ser T . This calculations are performed for T being each of the last 730 available observations (two years). As in [25], the Diebold-Mariano (DM) test statistics is applied to test the significance of the MSE measures for our forecasts (with models A and B) in comparison with the random walk forecast. The results are reported in Table 6.  Table 6 (in particular the small P-value) confirm that Bitcoin prices can be forecasted using F-transform of order q = 1, as indeed both (simple) short term models A and B outperform the random walk model.

Final comments and conclusions
In this paper we develop a sentiment analysis model in order to evaluate the relationship between Google Trends and Bitcoin prices. Thanks to the high flexibility of smoothing techniques and forecasting models based on Fuzzy transform we show that the great interest in Bitcoin phenomenon can produce an increase in Bitcoin prices while the contrary is weak; we contribute to a research topic that has rapidly increased in the last years and however deserves more investigations. Our results confirm the general hypothesis that short-term local trends characterize the Bitcoin time series and the possibility to forecast its values at least for small steps ahead, in the range 1 to 5. Possibly, improved forecasting models are possible by identifying specific forms of local trends such as, e.g., parametric exponential functions, more suitable and stable for extrapolation than polynomials.