Several numerical experiments are performed, in which the values of the critical parameters l and l0 vary as 64, 120, 128, and 30, 40, and 50. In general, these trials provide similar results. However, the best outcomes are achieved for [l/l0] = 2. This article’s limited scope reports only the most representative results, for l = 128 and l0 = 50. In this matter, Cl0 (the source collection) includes 7619 chunks, with an approximate size of 8.5 MB. Correspondingly, the alternative class contains 819 pieces using about 1.0 MB. As such, the imbalance ratio (IR) is about 8.5.
The training data consists of about 17,700 units, with about 10,000 training, 3300 validation, and 4400 testing samples in each iteration. Following the procedure described earlier, we get training sets that are almost identical in terms of their sizes at each step. I.e., the minor class is multiplied six times. Experiments have shown that this is probably the minimum suitable value. The learning process repeatedly does not converge or exceed the 0.75 validation accuracy for smaller values. Moreover, this characteristic significantly increases in the final stages of training.
exhibits the characteristics of the alternative collection.
Documents 6 and 7 dominate in this class.
Based on the Parula color map in the “scaled rows” fashion, a heat map demonstrates the experiments in Figure 1
. The document numbers are mapped on the horizontal axis, while the vertical axis represents the experiment number. As can be seen, two text groups are divided by brightness and color into two parts: 1–6 and 7–10. The corresponding cluster procedure separates these sets with the silhouette value of 0.8836.
The same situation appears in the error bar charts given in Figure 2
. Recall that this graphical representation displays the mean with the variability, specifying by the bars the uncertainty in a measurement. In our case, it embodies one standard deviation. The dotted lines represent the average cluster centroids (the clusters’ averages) y
= 0.3259 and y
= 0.6090, respectively. The central line corresponds to the line separating the clusters y0
The partition mentioned above is likewise clearly comprehended here. Let us ascertain this result from the standpoints of the structure of the considered texts. First of all, note that the procedure unquestionably identifies the first six books, known as authorized by Al Ghazali. The procedure tags two books (numbers eight and nine) as Pseudo-Ghazali, perfectly matching the inherent labeling. The two remaining books’ classification is of the most significant interest and novelty:
Tahafut al-Falasifa (The Incoherence of the Philosophers, seventh in the list of tested manuscripts).
According to the common standpoint, this milestone opus was created by Al Ghazali, together with a student of the Asharite school of Islamic theology. The book criticizes some positions of Greek and other earlier Muslim theorists, mostly those of Ibn Sina (Avicenna) and Al-Farabi (Alpharabius). The manuscript is reputedly an exceptionally successful creation and a landmark in Islamic philosophy.
We explore this topic using additional text representations highlighted by our model. As mentioned before, the procedure divides texts into successive equal-length pieces with the size l
= 128. According to the predicted classification, each of them is split into batches with the length l0
= 50, tagged as 0 or 1
. In this way, each document D
is embodied as a signal, having the length
possible values, signifying the pieces tags’ mean values. An example of such a signal representation is given in Figure 3
. The X
-axis represents a piece’s sequential number, and the Y
-axis shows the average scores of the pieces, which could be 0, 0.5, or 1 in the considered case. Here, m
= 397; that is, the tested document is divided into 397 pieces, having approximately the same size of 1.2 K. Thus, the numbering on the X
-axis is from 1 to 397.
Even here, it can be seen that the Pseudo-Ghazali (the score is above the cluster separation line y0 = 0.4674) covers a more meaningfully significant part of the manuscript.
The overall conclusion has to be based not just on a simple random sample but on the whole assembly of all 20 simulated samples. To do it, we average such curves (signals) obtained in these 20 iterations and consider the resulting sequence, as seen in Figure 4
The values derived from the averaged series are marked in blue. The red line is the result of the moving average smoothing with lag, equaling 7. This outline characterizes the style’s overall behavior, demonstrating that most segments strive to fit the “1” Pseudo-Ghazali style. The observation is also confirmed by histograms generated for the original signal and its smoothed version (see Figure 5
Both distributions have a negative skew, specifying a long left tail, the left asymmetry of a distribution around its mean. About 20% of the data are smaller than 0.5 in the left panel and about 8% in the red one. Thus, it is possible to conclude that the dominant part of the considered manuscript Tahafut al-Falasifa, is not written in the inherent Al Ghazali style.
Mishakat al-Anwar (The Niche of Lights, number 10 in the tested manuscripts’ list)
The prominent official Al Ghazali internet resource (https://www.ghazali.org
) dedicates a subsite (https://www.ghazali.org/site/on-mishkat.htm
) to the authorship problem of Mishakat al-Anwar
. Additionally, for several manuscript versions, the site presents the background information and the six crucial papers [5
]. These articles apparently can be treated, with some limitations, as the core discussion material in the problem.
The ongoing debate surrounding Al Ghazali’s authorship of this manuscript in numerous scientific forums is much more wide-ranging than this website. It refers to documents not mentioned in the current article. In this long-time dispute, the participants present compelling arguments for and against the alleged authorship, based mainly on linguistic, religious, and philosophical outlooks. An analysis and review of these essential issues are not the present paper’s subjects because we focus on formal algorithmic methods designed to evaluate the manuscript’s authorship.
As in the previous case in Figure 3
, we start from an example of a digital signal representation of pieces, given in Figure 6
This document is significantly shorter (just 78 pieces) than the one mentioned above; in conclusion, the graph appears to be more sparse. However, the dominance of the scores larger than 0.5 is undoubtedly visible. A chart of the average mean score (blue line) in the trials demonstrates the same tendency in Figure 7
. The red line, as previously, corresponds to the moving average smoothing line with lag equaling 7.
The resultant histograms also exhibit a left side tail, the left asymmetry of a distribution around its mean (Figure 8
The quantities of the scores lying below 0.5 are 36% and 20%. The general conclusion is that most of the text of Mishakat al-Anwar is not composed of the inherent Al Ghazali writing style.
As remarked earlier, we strive to propose a new perspective on the discussed problem. The suggested approach is fundamentally different from those commonly accepted. On the other hand, one study case, in our opinion, is to be debated.
As stated in a paper by Watt [18
], “Most of the problems formulated by Gairdner are connected with the last section of the Mishkiit, the detailed interpretation of the Tradition about the Seventy (or Seventy Thousand) Veils (which for convenience I shall call the “Veils-section”).” The article [26
] of Gairdner is mentioned here. Watt continues, “If the above investigations have not overlooked some crucial point, there is no avoiding the conclusion that the Veils-section of Mishkat al-Anwar is a forgery”.
This statement agrees with the results obtained here, where the book’s smoothed profile (marked in red in Figure 7
) is mostly located above the line y
= 0.5 in the last part of the chart. As for most of the manuscript, we conclude that this part is not written in the inherent Al Ghazali style. On the one hand, it shows that the obtained results do not contradict the widely accepted opinions. On the other hand, our results generalize them, indicating that the considered book’s overall style differs from the inherent one ascribed to Al Ghazali.