1. Introduction and Preliminaries
Neutrosophic Logic is a neonate study area in which each proposition is estimated to have the proportion (percentage) of truth in a subset T, the proportion of indeterminacy in a subset I, and the proportion of falsity in a subset F. We utilize a subset of truth (or indeterminacy, or falsity), instead of a number only, since in many situations we do not have ability to strictly specify the proportions of truth and of falsity but only approximate them; for instance, a proposition is between 25% and 55% true and between 65% and 78% false; even worse: between 33% and 48% or 42 and 53% true (pursuant to several observer), and 58% or between 66% and 73% false. The subsets are not essential intervals, but any sets (open or closed or half open/half-closed intervals, discrete, continuous, intersections or unions of the previous sets, etc.) in keeping with the given proposition. Zadeh initiated the adventure of obtaining meaning and mathematical results from uncertainty situations (fuzzy) [
1]. Fuzzy sets brought a new dimension to the concept of classical set theory. Atanassov introduced intuitionistic fuzzy sets including membership and non-membership degrees [
2]. Neutrosophy was proposed by Smarandache as a computational approach to the concept of neutrality [
3]. Neutrosophic sets consider membership, non-membership and indeterminacy degrees. Intuitionistic fuzzy sets are defined by the degree of membership and non-membership and, uncertainty degrees by the 1-(membership degree plus non-membership degree), while the degree of uncertainty is evaluated independently of the degree of membership and non-membership in neutrosophic sets. Here, membership, non-membership, and degree of uncertainty (uncertainty), such as degrees of accuracy and falsity, can be evaluated according to the interpretation of the places to be used. It depends entirely on the subject area (the universe of discourse). This reveals a difference between neutrosophic set and intuitionistic fuzzy set. In this sense, the concept of neutrosophic is a possible solution and representation of problems in various fields. Two detailed and mathematical fundamental differences between relative truth (IFL) and absolute truth (NL) are:
- (i)
NL can discern absolute truth (truth in all possible worlds, according to Leibniz) from the relative truth (truth in at least one world) because NL (absolute truth) = 1+ while IFL (relative truth) = 1. This has practice in philosophy (see the Neutrosophy). The standard interval [0, 1] used in IFL has been extended to the unitary non-standard interval ]− 0, 1+ [ in NL. Parallel earmarks for absolute or relative falsehood and absolute or relative indeterminacy are permitted in NL.
- (ii)
There is no limit on T, I, F other than they are subsets of ]− 0, 1+ [, thus: −0 ≤ inf T + inf I + inf F ≤ sup T + sup I + sup F ≤ 3+ in NL. This permissiveness allows dialetheist, paraconsistent, and incomplete information to be described in NL, while these situations cannot be described in IFL since F (falsehood), T (truth), I (indeterminacy) are restricted either to t + i + f = 1 or to t2 + f2 ≤ 1, if T, I, F are all reduced to the points t, i, f respectively, or to sup T + sup I + sup F = 1 if T, I, F are subsets of [0, 1] in IFL.
Clustering data is one of the most significant problems in data analysis. Useful and efficient algorithms are needed for big data. This is even more challenging for neutrosophic data sets, particularly those involving uncertainty. These sets are elements of some decision-making problems, [
4,
5,
6,
7,
8]. Several distances and similarities are used for decision-making problems [
9,
10]. Algorithms for the clustering big data sets use the distances (metrics). There are some metrics used in algorithms to analysis neutrosophic data sets: Hamming, Euclidean, etc. In this paper, we examine clustering of neutrosophic data sets via neutrosophic valued distances.
The big data notion is a new label for the giant size of data–both structured and unstructured—that overflows several sectors on a time-to-time basis. It does not mean overall data are significant and the significant aspect is to obtain desired specific data interpretation. Big data can be analyzed for pre-cognition that make possible more consistent decisions and strategic having positions. Doug Laney [
11] sort to make the definition of big data the three Vs and Veracity widespread: (1) Velocity: This refers to dynamic data and captures data streams in near real-time. Data streams in at an exceptional speed and must be dealt with in a well-timed mode. (2) Variety: Data comes in all types of formats—from structured, numeric data in traditional databases to formless materials. On the one hand, variety denotes to the various sources and types of organized and formless data. Storing data is made from sources like worksheets and databases. (3) Volume: Organizations gather data from a range of sources, including social media, business operations, and data from the sensor or machine to machine. (4) Veracity: It mentions to the biases, noise, and anomaly in data. That corresponds with the question “Is the data that is being put in storage and extracted meaningful to the problem being examined?”.
In this paper, we also focus on K-sets cluster algorithm which is a process of analyzing data with the aim of evaluating neutrosophic big data sets. The K-sets cluster is an unrestrained type of learning that is used when one wants to utilize unlabeled data, [
12]. The goal of the algorithm is to find groups of data with the number of groups represented by variable K. The algorithm works iteratively to set-aside each data point obtained to one of the K groups based on the properties obtained. The data points are clustered according to feature similarity. Instead of identifying groups before examining patterns, clustering helps to find and analyze naturally occurring groups. “Choosing K” has the goal of “how the number of groups can be determined”. Each center of a congregation is a collection of property values describe the groups that emerged. Analysis of centroid feature weights can be used to qualitatively interpret what kind of group is represented by each cluster. The algorithm finds the clusters and data set labels for a particular pre-chosen K. To have the number of clusters in the data, the user must run the K-means clustering algorithm for a range of K values and compare the results. In general, there is no technique to determine a specific K value, but a precise estimate can be obtained using the following methods. In general, one of the metrics used to compare the results between the different K values as the average distance between the data points and their cluster synthesis. As the number of sets increases, it will always reduce the distance to the data points, while the K increment will always lower this metric as other criteria, and when K is the same as the number of data points, reaching zero will be excessive. Thus, this metric cannot be used as a single purpose. Rather, the average distance to the center as a function of K is plotted where the shear rate falls sharply, it can be used to determine K approximately.
A number of other techniques are available for verification of K, including cross-validation, information criteria, information theoretical jump method, and G-tools algorithm. In addition, monitoring the distribution of data points between groups provides information about how the algorithm splits data for each K. K-sets algorithms base on the measurement of distances of sets. A distance is a measurement of how far apart each pair of elements of a given set is. Distance functions in mathematics and many other computational sciences are important concepts. They have wide usage areas, for example, the goal of quantifying a dissimilarity (or equivalently similarity) between two objects, sets or set of sets in some sense. However, due to the massive, complicated and different type data sets today, definitions of distance functions are required to be more generalized and detailed. For this purpose, we define a novel metric for similarity and distance to give Neutrosophic Valued-Metric Spaces (NVGMS). We present relative weighted measure definition and finally K-sets algorithm after given the definition of NVGMS.
Some readers who are unfamiliar with the topic in this paper need to have a natural example to understand the topic well. There is a need for earlier data in everyday life to give a natural example for the subject first described in this paper. There is no this type of data (we mean neutrosophic big data) in any source, but we will give an example of how to obtain and cluster such a data in
Section 6 of the paper. If we encounter a sample of neutrosophic big data in the future, we will present the results with a visual sample as a technical report. In this paper, we have developed a mathematically powerful method for the notion of concepts that are still in its infancy.
1.1. -Metric Spaces
Metric space is a pair of (
A,
d), where
A is a non-empty set and
d is a metric which is defined by a certain distance and the elements of the set
A. Some metrics may have different values such as a complex-valued metric [
13,
14]. Mustafa and Sims defined
G-metric by generalizing this definition [
15]. Specifically, fixed point theorems on analysis have been used in
G-metric spaces [
16,
17].
Definition 1. Let A be a non-empty set and d be a metric on A, then if the following conditions hold, the pair (A, d) is called a metric space. Let
- (1)
, (non-negativity)
- (2)
, (identity)
- (3)
, (symmetry)
- (4)
(triangle inequality).
where .
Definition 2. [15] Let A be a non-empty set. A function is called G-distance if it satisfies the following properties: - (1)
if and only if ,
- (2)
whenever,
- (3)
for any , with ,
- (4)
(symmetric for all elements),
- (5)
for all (Rectangular inequality).
The pair (
A,
G) is called a
G-metric space. Moreover, if
G-metric has the following property then it is called symmetric:
.
Example 1. In 3-dimensional Euclidean metric space, one can assume the G-metric space as the following:where and represent the norm of the vector product of two vectors in . It is obvious that it satisfies all conditions in the Definition 2 because of the norm has the metric properties, and it is symmetric. Example 2. Let (A, d) is a metric space. Thenis a G-metric, where . The fact that d is a metric indicates that it has triangle inequality. Thus, G is always positive definite. Proposition 1. [17] Let (A, G) be a G-metric space then a metric on A can be defined from a G-metric: 1.2. Neutrosophic Sets
Neutrosophy is a generalized form of the philosophy of intuitionistic fuzzy logic. In neutrosophic logic, there is no restriction for truth, indeterminacy, and falsity and they have a unit real interval value for each element neutrosophic set. These values are independent of each other. Sometimes, intuitionistic fuzzy logic is not enough for solving some real-life problems, i.e., engineering problems. So, mathematically, considering neutrosophic elements are becoming important for modelling these problems. Studies have been conducted in many areas of mathematics and other related sciences especially computer science since Smarandache made this philosophical definition, [
18,
19].
Definition 3. Let E be a universe of discourse andis a neutrosophic set or single valued neutrosophic set (SVNS), whereare the truth-membership function, the indeterminacy-membership function and the falsity-membership function, respectively. Here,.
Definition 4. For the SVNS A in E, the tripleis called the single valued neutrosophic number (SVNN).
Definition 5. Letbe an SVNN, then the score function ofcan be given as follow:where.
Definition 6. Letbe an SVNN, then the accuracy function of n can be given as follow:where.
Definition 7. Letandbe two SVNNs. Then, the ranking of two SVNNs can be defined as follows:
- (I)
Ifthen;
- (II)
Ifthen.
2. Neutrosophic Valued Metric Spaces
The distance is measured via some operators which are defined in some non-empty sets. In general, operators in metric spaces have zero values, depending on the set and value.
2.1. Operators
Definition 8. [
20,
21],
Let be non-empty SVNS and be two SVNNs. The operations that addition, multiplication, multiplication with scalar , and exponential of SVNNs are defined as follows, respectively: From this definition, we have the following theorems as a result:
Theorem 1. Letbe an SVNN. The neutral element of the additive operator of the setis.
Proof. Let
and
are two SVNN and using Definition 8 we have
(There is no need to show left-hand side because the operator is commutative in every component). □
To compare the neutrosophic values based on a neutral element, we shall calculate the score and accuracy functions of a neutral element
, respectively:
Theorem 2. Letbe an SVNN. The neutral element of the multiplication operator of theis.
Proof. Let
and
are two SVNN and using Definition 8 we have
In addition, score and accuracy functions of the neutral element are and , respectively. □
2.2. Neutrosophic Valued Metric Spaces
In this section, we consider the metric and generalized metric spaces in the neutrosophic meaning.
Definition 9. Ordering in the Definition 6 gives an order relation for elements of the conglomerate SVNN. Suppose that the mappingwhereandare SVNS, satisfies:
- (I)
andfor all.
- (II)
for all.
Then d is called a neutrosophic valued metric on , and the pair is called neutrosophic valued metric space. Here, the third condition (triangular inequality) of the metric spaces is not suitable for SVNS because the addition is not ordinary addition.
Theorem 3. Letbe a neutrosophic valued metric space. Then, there are relationships among truth, indeterminacy and falsity values:
- (I)
and if.
- (II)
If
- (III)
,,so, each distance function must be symmetric.
where,andare distances within themselves of the truth, indeterminacy and falsity functions, respectively.
Proof. - (I)
- (II)
□
Example 3. Letbe non-empty SVNS andbe two SVNNs. If we define the metricas:then - (I)
Then it satisfies the first condition.
- (II)
Since the properties of the absolute value function, this condition is obvious.
So,is a neutrosophic-valued metric space.
3. Neutrosophic Valued -Metric Spaces
Definition 10. Let X and A be a non-empty SVNS. A functionis called neutrosophic valued-metric if it satisfies the following properties:
- (1)
if and only if,
- (2)
whenever,
- (3)
for any, with,
- (4)
(symmetric for all elements).
The pair (X, G) is called a neutrosophic valued G-metric space.
Theorem 4. Let (X, G) be a neutrosophic valued G-metric space then, it satisfies followings:
- (1)
- (2)
Assume, then
- (3)
- (4)
are symmetric for all elements.
where,andare G-distance functions of truth, indeterminacy and falsity values of the element of the set, respectively.
Proofs are made in a similar way to neutrosophic valued metric spaces.
Example 4. Letbe non-empty SVNS and the G-distance function defined by:whereis a neutrosophic valued metric. The pair (X, G) is obviously a neutrosophic valued G-metric space because of. Further, it has commutative properties. 4. Relative Weighted Neutrosophic Valued Distances and Cohesion Measures
The relative distance measure is a method used for clustering of data sets, []. We define the relative weighted distance, which is a more sensitive method for big data sets.
Let
be SVNNs. Then neutrosophic weighted average operator of these SVNNs is defined as:
where
is weighted for the
i th data. For a given a neutrosophic data set
and a neutrosophic valued metric
d, we define a relative neutrosophic valued distance for choosing another reference neutrosophic data and compute the relative neutrosophic valued distance as the average of the difference of distances for all the neutrosophic data
.
Definition 11. The relative neutrosophic valued distance from a neutrosophic datato another neutrosophic datais defined as follows: Here, since T, I, F values of SVNNs cannot be negative, we can define the expressionas the distance between these two neutrosophic-valued metrics. Furthermore, the distance of metrics is again neutrosophic-valued here so, a related neutrosophic-valued distance can be defined as: The difference operatorgenerally is not a neutrosophic-valued metric (or G-metric). We used some abbreviations for saving space.
where,,and,,are the first, second, and third elements of SVNN in the previous equation, respectively. Definition 12. The relative weighted neutrosophic valued distance from a neutrosophic datato another neutrosophic datais defined as follows:
where.
Definition 13. The relative weighted neutrosophic valued distance (from a random neutrosophic data) to a neutrosophic datais defined as follows: Definition 14. The relative weighted neutrosophic valued distance from a neutrosophic data setto another neutrosophic data setis defined as follows: Definition 15. (Weighted cohesion measure between two neutrosophic data) The difference of the relative weighted neutrosophic-valued distance toand the relative weighted neutrosophic-valued distance fromto, i.e.,is called the weighted neutrosophic-valued cohesion measure between two neutrosophic dataand. Ifthenandare said to be cohesive (resp. incohesive). So, the relative weighted neutrosophic distance fromandis not larger than the relative weighted neutrosophic distance (from a random neutrosophic data) to.
Definition 16. (Weighted cohesion measure between two neutrosophic data sets) Letandare elements of the neutrosophic data sets U and V, respectively. Then the measureis called weighted cohesion neutrosophic-valued measure of the neutrosophic data sets U and V. Definition 17. (Cluster) The non-empty neutrosophic data set W is called a cluster if it is cohesive, i.e.,.
6. Application and Example
We will give an example of the definition of the data that could have this kind of data and fall into the frame to fit this definition. We can call a data set a big data set if it is difficult and/or voluminous to define, analyze and visualize a data set. We give a big neutrosophic data example in accordance with this definition and possible use of G-metric, but it is fictional since there is no real neutrosophic big data example yet. It is a candidate for a good example that one of the current topics, image processing for big data analysis. Imagine a camera on a circuit board that is able to distinguish colors, cluster all the tools it can capture in the image and record that data. The camera that can be used for any color (for example white color vehicle) assigns the following degrees:
- (I)
The vehicle is at a certain distance at which the color can be detected, and the truth value of the portion of the vehicle is determined.
- (II)
The rate at which the vehicle can be detected by the camera is assigned as the uncertainty value (the mixed color is the external factors such as the effect of daylight and the color is determined on a different scale).
- (III)
The rate of not seeing a large part of the vehicle or the rate of out of range of the color is assigned as the value of falsity.
Thus, data of the camera is clustering via G-metric. This result gives that the numbers according to the daily quantities and colors of vehicles passing by are determined. The data will change continuously as long as the road is open, and the camera records the data. There will be a neutrosophic data for each vehicle. So, a Big Neutrosophic Data Clustering will occur.
Here, the weight functions we have defined for the metric can be given 1 value for the main colors (red-yellow-blue). For other secondary or mixed colors, the color may be given a proportional value depending on which color is closer.
A Numerical Toy Example
Take 5 neutrosophic data with their weights are equal to 1 to make a numerical example:
K = 3 disjoint sets can be chosen .
Then
where we assume the
as in Example 3. So, we can compute the
G-metrics of the data as in Equation (3):
So, according to the calculations above, belongs to set and the other data belong to . Here, we have made the data belonging to the clusters according to the fact that the truth values of the G-metrics are mainly low. If the truth value of G-distance is low, then the data is closer to the set.