Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces

Taş, Ferhat; Topal, Selçuk; Smarandache, Florentin

doi:10.3390/sym10100430

Open AccessArticle

Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces

by

Ferhat Taş

¹

,

Selçuk Topal

^2,* and

Florentin Smarandache

³

¹

Department of Mathematics, Faculty of Science, İstanbul University, İstanbul 34134, Turkey

²

Department of Mathematics, Faculty of Science and Arts, Bitlis Eren University, Bitlis 13000, Turkey

³

Department of Mathematics, University of New Mexico, Gallup, NM 87301, USA

^*

Author to whom correspondence should be addressed.

Symmetry 2018, 10(10), 430; https://doi.org/10.3390/sym10100430

Submission received: 5 August 2018 / Revised: 11 September 2018 / Accepted: 21 September 2018 / Published: 24 September 2018

(This article belongs to the Special Issue Algebraic Structures of Neutrosophic Triplets, Neutrosophic Duplets, or Neutrosophic Multisets)

Download Versions Notes

Abstract

:

In this paper, we define the neutrosophic valued (and generalized or G) metric spaces for the first time. Besides, we newly determine a mathematical model for clustering the neutrosophic big data sets using G-metric. Furthermore, relative weighted neutrosophic-valued distance and weighted cohesion measure, is defined for neutrosophic big data set. We offer a very practical method for data analysis of neutrosophic big data although neutrosophic data type (neutrosophic big data) are in massive and detailed form when compared with other data types.

Keywords:

G-metric; neutrosophic G-metric; neutrosophic sets; clustering; neutrosophic big data; neutrosophic logic

1. Introduction and Preliminaries

Neutrosophic Logic is a neonate study area in which each proposition is estimated to have the proportion (percentage) of truth in a subset T, the proportion of indeterminacy in a subset I, and the proportion of falsity in a subset F. We utilize a subset of truth (or indeterminacy, or falsity), instead of a number only, since in many situations we do not have ability to strictly specify the proportions of truth and of falsity but only approximate them; for instance, a proposition is between 25% and 55% true and between 65% and 78% false; even worse: between 33% and 48% or 42 and 53% true (pursuant to several observer), and 58% or between 66% and 73% false. The subsets are not essential intervals, but any sets (open or closed or half open/half-closed intervals, discrete, continuous, intersections or unions of the previous sets, etc.) in keeping with the given proposition. Zadeh initiated the adventure of obtaining meaning and mathematical results from uncertainty situations (fuzzy) [1]. Fuzzy sets brought a new dimension to the concept of classical set theory. Atanassov introduced intuitionistic fuzzy sets including membership and non-membership degrees [2]. Neutrosophy was proposed by Smarandache as a computational approach to the concept of neutrality [3]. Neutrosophic sets consider membership, non-membership and indeterminacy degrees. Intuitionistic fuzzy sets are defined by the degree of membership and non-membership and, uncertainty degrees by the 1-(membership degree plus non-membership degree), while the degree of uncertainty is evaluated independently of the degree of membership and non-membership in neutrosophic sets. Here, membership, non-membership, and degree of uncertainty (uncertainty), such as degrees of accuracy and falsity, can be evaluated according to the interpretation of the places to be used. It depends entirely on the subject area (the universe of discourse). This reveals a difference between neutrosophic set and intuitionistic fuzzy set. In this sense, the concept of neutrosophic is a possible solution and representation of problems in various fields. Two detailed and mathematical fundamental differences between relative truth (IFL) and absolute truth (NL) are:

(i): NL can discern absolute truth (truth in all possible worlds, according to Leibniz) from the relative truth (truth in at least one world) because NL (absolute truth) = 1⁺ while IFL (relative truth) = 1. This has practice in philosophy (see the Neutrosophy). The standard interval [0, 1] used in IFL has been extended to the unitary non-standard interval ]⁻ 0, 1⁺ [ in NL. Parallel earmarks for absolute or relative falsehood and absolute or relative indeterminacy are permitted in NL.
(ii): There is no limit on T, I, F other than they are subsets of ]⁻ 0, 1⁺ [, thus: ⁻0 ≤ inf T + inf I + inf F ≤ sup T + sup I + sup F ≤ 3⁺ in NL. This permissiveness allows dialetheist, paraconsistent, and incomplete information to be described in NL, while these situations cannot be described in IFL since F (falsehood), T (truth), I (indeterminacy) are restricted either to t + i + f = 1 or to t² + f² ≤ 1, if T, I, F are all reduced to the points t, i, f respectively, or to sup T + sup I + sup F = 1 if T, I, F are subsets of [0, 1] in IFL.

Clustering data is one of the most significant problems in data analysis. Useful and efficient algorithms are needed for big data. This is even more challenging for neutrosophic data sets, particularly those involving uncertainty. These sets are elements of some decision-making problems, [4,5,6,7,8]. Several distances and similarities are used for decision-making problems [9,10]. Algorithms for the clustering big data sets use the distances (metrics). There are some metrics used in algorithms to analysis neutrosophic data sets: Hamming, Euclidean, etc. In this paper, we examine clustering of neutrosophic data sets via neutrosophic valued distances.

The big data notion is a new label for the giant size of data–both structured and unstructured—that overflows several sectors on a time-to-time basis. It does not mean overall data are significant and the significant aspect is to obtain desired specific data interpretation. Big data can be analyzed for pre-cognition that make possible more consistent decisions and strategic having positions. Doug Laney [11] sort to make the definition of big data the three Vs and Veracity widespread: (1) Velocity: This refers to dynamic data and captures data streams in near real-time. Data streams in at an exceptional speed and must be dealt with in a well-timed mode. (2) Variety: Data comes in all types of formats—from structured, numeric data in traditional databases to formless materials. On the one hand, variety denotes to the various sources and types of organized and formless data. Storing data is made from sources like worksheets and databases. (3) Volume: Organizations gather data from a range of sources, including social media, business operations, and data from the sensor or machine to machine. (4) Veracity: It mentions to the biases, noise, and anomaly in data. That corresponds with the question “Is the data that is being put in storage and extracted meaningful to the problem being examined?”.

In this paper, we also focus on K-sets cluster algorithm which is a process of analyzing data with the aim of evaluating neutrosophic big data sets. The K-sets cluster is an unrestrained type of learning that is used when one wants to utilize unlabeled data, [12]. The goal of the algorithm is to find groups of data with the number of groups represented by variable K. The algorithm works iteratively to set-aside each data point obtained to one of the K groups based on the properties obtained. The data points are clustered according to feature similarity. Instead of identifying groups before examining patterns, clustering helps to find and analyze naturally occurring groups. “Choosing K” has the goal of “how the number of groups can be determined”. Each center of a congregation is a collection of property values describe the groups that emerged. Analysis of centroid feature weights can be used to qualitatively interpret what kind of group is represented by each cluster. The algorithm finds the clusters and data set labels for a particular pre-chosen K. To have the number of clusters in the data, the user must run the K-means clustering algorithm for a range of K values and compare the results. In general, there is no technique to determine a specific K value, but a precise estimate can be obtained using the following methods. In general, one of the metrics used to compare the results between the different K values as the average distance between the data points and their cluster synthesis. As the number of sets increases, it will always reduce the distance to the data points, while the K increment will always lower this metric as other criteria, and when K is the same as the number of data points, reaching zero will be excessive. Thus, this metric cannot be used as a single purpose. Rather, the average distance to the center as a function of K is plotted where the shear rate falls sharply, it can be used to determine K approximately.

A number of other techniques are available for verification of K, including cross-validation, information criteria, information theoretical jump method, and G-tools algorithm. In addition, monitoring the distribution of data points between groups provides information about how the algorithm splits data for each K. K-sets algorithms base on the measurement of distances of sets. A distance is a measurement of how far apart each pair of elements of a given set is. Distance functions in mathematics and many other computational sciences are important concepts. They have wide usage areas, for example, the goal of quantifying a dissimilarity (or equivalently similarity) between two objects, sets or set of sets in some sense. However, due to the massive, complicated and different type data sets today, definitions of distance functions are required to be more generalized and detailed. For this purpose, we define a novel metric for similarity and distance to give Neutrosophic Valued-Metric Spaces (NVGMS). We present relative weighted measure definition and finally K-sets algorithm after given the definition of NVGMS.

Some readers who are unfamiliar with the topic in this paper need to have a natural example to understand the topic well. There is a need for earlier data in everyday life to give a natural example for the subject first described in this paper. There is no this type of data (we mean neutrosophic big data) in any source, but we will give an example of how to obtain and cluster such a data in Section 6 of the paper. If we encounter a sample of neutrosophic big data in the future, we will present the results with a visual sample as a technical report. In this paper, we have developed a mathematically powerful method for the notion of concepts that are still in its infancy.

1.1. $G$ -Metric Spaces

Metric space is a pair of (A, d), where A is a non-empty set and d is a metric which is defined by a certain distance and the elements of the set A. Some metrics may have different values such as a complex-valued metric [13,14]. Mustafa and Sims defined G-metric by generalizing this definition [15]. Specifically, fixed point theorems on analysis have been used in G-metric spaces [16,17].

Definition 1.

Let A be a non-empty set and d be a metric on A, then if the following conditions hold, the pair (A, d) is called a metric space. Let

x, y, z \in A

(1): $d (x, y) \geq 0$ , (non-negativity)
(2): $d (x, y) = 0 \Leftrightarrow x = y$ , (identity)
(3): $d (x, y) = d (y, x)$ , (symmetry)
(4): $d (x, z) \leq d (x, y) + d (y, z)$ (triangle inequality).

where

d : A \times A \to R^{+} \cup {0}

.

Definition 2.

[15] Let A be a non-empty set. A function

G : A \times A \times A \to [0, + \infty)

is called G-distance if it satisfies the following properties:

(1): $G (x, y, z) = 0$ if and only if $x = y = z$ ,
(2): $G (x, x, y) \neq 0$ whenever $x \neq y$ ,
(3): $G (x, x, y) \leq G (x, y, z)$ for any $x, y, z \in A$ , with $z \neq y$ ,
(4): $G (x, y, z) = G (x, z, y) = \dots$ (symmetric for all elements),
(5): $G (x, y, z) \leq G (x, a, a) + G (a, y, z)$ for all $a, x, y, z \in A$ (Rectangular inequality).

The pair (A, G) is called a G-metric space. Moreover, if G-metric has the following property then it is called symmetric:

G (x, x, y) = G (x, y, y), \forall x, y \in A

.

Example 1.

In 3-dimensional Euclidean metric space, one can assume the G-metric space

(E^{3}, G)

as the following:

G (x, y, z) = 2 (‖ x \times y ‖ + ‖ z \times y ‖ + ‖ x \times z ‖)

where

x, y, z \in E^{3}

and

‖ . \times . ‖

represent the norm of the vector product of two vectors in

E^{3}

. It is obvious that it satisfies all conditions in the Definition 2 because of the norm has the metric properties, and it is symmetric.

Example 2.

Let (A, d) is a metric space. Then

G (x, y, z) = d (x, y) + d (y, z) - d (x, z)

is a G-metric, where

x, y, z \in A

. The fact that d is a metric indicates that it has triangle inequality. Thus, G is always positive definite.

Proposition 1.

[17] Let (A, G) be a G-metric space then a metric on A can be defined from a G-metric:

d_{G} (x, y) = G (x, x, y) + G (x, y, y)

1.2. Neutrosophic Sets

Neutrosophy is a generalized form of the philosophy of intuitionistic fuzzy logic. In neutrosophic logic, there is no restriction for truth, indeterminacy, and falsity and they have a unit real interval value for each element neutrosophic set. These values are independent of each other. Sometimes, intuitionistic fuzzy logic is not enough for solving some real-life problems, i.e., engineering problems. So, mathematically, considering neutrosophic elements are becoming important for modelling these problems. Studies have been conducted in many areas of mathematics and other related sciences especially computer science since Smarandache made this philosophical definition, [18,19].

Definition 3.

Let E be a universe of discourse and

A \subseteq E .

A = {(x, T (x), I (x), F (x)) : x \in E}

is a neutrosophic set or single valued neutrosophic set (SVNS), where

T_{A}, I_{A}, F_{A} : A \to]^{-} 0, 1^{+} [

are the truth-membership function, the indeterminacy-membership function and the falsity-membership function, respectively. Here,

^{-} 0 \leq T_{A} (x) + I_{A} (x) + F_{A} (x) \leq 3^{+}

.

Definition 4.

For the SVNS A in E, the triple

〈 T_{A}, I_{A}, F_{A} 〉

is called the single valued neutrosophic number (SVNN).

Definition 5.

Let

n = 〈 T_{n}, I_{n}, F_{n} 〉

be an SVNN, then the score function of

n

can be given as follow:

s_{n} = \frac{1 + T_{n} - 2 I_{n} - F_{n}}{2}

(1)

where

s_{n} \in [- 1, 1]

.

Definition 6.

Let

n = 〈 T_{n}, I_{n}, F_{n} 〉

be an SVNN, then the accuracy function of n can be given as follow:

h_{n} = \frac{2 + T_{n} - I_{n} - F_{n}}{3}

(2)

where

h_{n} \in [0, 1]

.

Definition 7.

Let

n_{1}

and

n_{2}

be two SVNNs. Then, the ranking of two SVNNs can be defined as follows:

(I): If $s_{n_{1}} > s_{n_{2}},$ then $n_{1} > n_{2}$ ;
(II): If $s_{n_{1}} = s_{n_{2}} and h_{n_{1}} \geq h_{n_{2}},$ then $n_{1} \geq n_{2}$ .

2. Neutrosophic Valued Metric Spaces

The distance is measured via some operators which are defined in some non-empty sets. In general, operators in metric spaces have zero values, depending on the set and value.

2.1. Operators

Definition 8.

[20,21], Let

A

be non-empty SVNS and

x = 〈 T_{x}, I_{x}, F_{x} 〉, y = 〈 T_{y}, I_{y}, F_{y} 〉

be two SVNNs. The operations that addition, multiplication, multiplication with scalar

α \in ℝ^{+}

, and exponential of SVNNs are defined as follows, respectively:

\begin{array}{l} x \oplus y = 〈 T_{x} + T_{y} - T_{x} T_{y}, I_{x} I_{y}, F_{x} F_{y} 〉 \\ x ⊙ y = 〈 T_{x} T_{y}, I_{x} + I_{y} - I_{x} I_{y}, F_{x} + F_{y} - F_{x} F_{y} 〉 \\ α x = 〈 1 - {(1 - T_{x})}^{α}, I_{x}^{α}, F_{x}^{α} 〉 \\ x^{α} = 〈 T_{x}^{α}, 1 - {(1 - I_{x})}^{α}, 1 - {(1 - F_{x})}^{α} 〉 \end{array}

From this definition, we have the following theorems as a result:

Theorem 1.

Let

x = 〈 T_{x}, I_{x}, F_{x} 〉

be an SVNN. The neutral element of the additive operator of the set

A

is

0_{A} = 〈 0, 1, 1 〉

.

Proof.

Let

x = 〈 T_{x}, I_{x}, F_{x} 〉

and

0_{A} = 〈 T_{0}, I_{0}, F_{0} 〉

are two SVNN and using Definition 8 we have

\begin{array}{l} x \oplus 0_{A} = 〈 T_{x} + T_{0} - T_{x} T_{0}, I_{x} I_{0}, F_{x} F_{0} 〉 = 〈 T_{x}, I_{x}, F_{x} 〉 \\ \Rightarrow 〈 T_{0}, I_{0}, F_{0} 〉 = 〈 0, 1, 1 〉 = 0_{A} \end{array}

(There is no need to show left-hand side because the operator is commutative in every component). □

To compare the neutrosophic values based on a neutral element, we shall calculate the score and accuracy functions of a neutral element

0_{A} = 〈 0, 1, 1 〉

, respectively:

s_{0} = \frac{1 + T_{0} - 2 I_{0} - F_{0}}{2} = - 1 and h_{0} = \frac{2 + T_{0} - I_{0} - F_{0}}{3} = 0

Theorem 2.

Let

x = 〈 T_{x}, I_{x}, F_{x} 〉

be an SVNN. The neutral element of the multiplication operator of the

A

is

1_{A} = 〈 1, 0, 0 〉

.

Proof.

Let

x = 〈 T_{x}, I_{x}, F_{x} 〉

and

1_{A} = 〈 T_{1}, I_{1}, F_{1} 〉

are two SVNN and using Definition 8 we have

\begin{array}{l} x ⊙ 1_{A} = 〈 T_{x} T_{1}, I_{x} + I_{1} - I_{x} I_{1}, F_{x} + F_{1} - F_{x} F_{1} 〉 = 〈 T_{x}, I_{x}, F_{x} 〉 \\ \Rightarrow 〈 T_{1}, I_{1}, F_{1} 〉 = 〈 1, 0, 0 〉 = 1_{A} \end{array}

In addition, score and accuracy functions of the neutral element

1_{A} = 〈 1, 0, 0 〉

are

s_{1} = \frac{1 + T_{1} - 2 I_{1} - F_{1}}{2} = 1

and

h_{1} = \frac{2 + T_{1} - I_{1} - F_{1}}{3} = 1

, respectively. □

2.2. Neutrosophic Valued Metric Spaces

In this section, we consider the metric and generalized metric spaces in the neutrosophic meaning.

Definition 9.

Ordering in the Definition 6 gives an order relation for elements of the conglomerate SVNN. Suppose that the mapping

d : X \times X \to A,

where

X

and

A

are SVNS, satisfies:

(I): $0_{A} \leq d (x, y)$ and $d (x, y) = 0_{A} \Leftrightarrow s_{x} = s_{y} and h_{x} = h_{y}$ for all $x, y \in X$ .
(II): $d (x, y) = d (y, x)$ for all $x, y \in X$ .

Then d is called a neutrosophic valued metric on

X

, and the pair

(X, d)

is called neutrosophic valued metric space. Here, the third condition (triangular inequality) of the metric spaces is not suitable for SVNS because the addition is not ordinary addition.

Theorem 3.

Let

(X, d)

be a neutrosophic valued metric space. Then, there are relationships among truth, indeterminacy and falsity values:

(I): $0 < T (x, y) - 2 I (x, y) - F (x, y) + 3$ and if $s_{o} = s_{d} then 0 < T (x, y) - I (x, y) - F (x, y) + 2$ .
(II): If $d (x, y) = 0_{A} \Leftrightarrow T (x, y) = 0, I (x, y) = F (x, y) = 1.$
(III): $T (x, y) = T (y, x)$ , $I (x, y) = I (y, x)$ , $F (x, y) = F (y, x)$ so, each distance function must be symmetric.

where

T (., .)

,

I (., .)

and

F (., .)

are distances within themselves of the truth, indeterminacy and falsity functions, respectively.

Proof.

(I): $\begin{array}{l} 0_{A} < d (x, y) & \Leftrightarrow 〈 0, 1, 1 〉 < 〈 T (x, y), I (x, y), F (x, y) 〉 \\ \Leftrightarrow s_{0} < s_{d} \Leftrightarrow - 1 < \frac{1 + T (x, y) - 2 I (x, y) - F (x, y)}{2} \\ \Leftrightarrow 0 < T (x, y) - 2 I (x, y) - F (x, y) + 3 \end{array}$
(II): $\begin{array}{l} d (x, y) = d (y, x) & \Leftrightarrow 〈 T (x, y), I (x, y), F (x, y) 〉 = 〈 T (y, x), I (y, x), F (y, x) 〉 \\ \Leftrightarrow T (x, y) = T (y, x), I (x, y) = I (y, x), F (x, y) = F (y, x) \end{array}$ □

Example 3.

Let

A

be non-empty SVNS and

x = 〈 T_{x}, I_{x}, F_{x} 〉, y = 〈 T_{y}, I_{y}, F_{y} 〉

be two SVNNs. If we define the metric

d : X \times X \to A,

as:

d (x, y) = 〈 T (x, y), I (x, y), F (x, y) 〉 = 〈 | T_{x} - T_{y} |, 1 - | I_{x} - I_{y} |, 1 - | F_{x} - F_{y} | 〉

then

(I): $\begin{array}{l} 0 < | T_{x} - T_{y} | - 2 (1 - | I_{x} - I_{y} |) - (1 - | F_{x} - F_{y} |) + 3 \\ \Rightarrow 0 < | T_{x} - T_{y} | + 2 | I_{x} - I_{y} | + | F_{x} - F_{y} | \end{array}$
Then it satisfies the first condition.
(II): Since the properties of the absolute value function, this condition is obvious.
So, $(X, d)$ is a neutrosophic-valued metric space.

3. Neutrosophic Valued $G$ -Metric Spaces

Definition 10.

Let X and A be a non-empty SVNS. A function

G : X \times X \times X \to A

is called neutrosophic valued

G

-metric if it satisfies the following properties:

(1): $G (x, y, z) = 0_{A}$ if and only if $x = y = z$ ,
(2): $G (x, x, y) \neq 0_{A}$ whenever $x \neq y$ ,
(3): $G (x, x, y) \leq G (x, y, z)$ for any $x, y, z \in X$ , with $z \neq y$ ,
(4): $G (x, y, z) = G (x, z, y) = \dots$ (symmetric for all elements).

The pair (X, G) is called a neutrosophic valued G-metric space.

Theorem 4.

Let (X, G) be a neutrosophic valued G-metric space then, it satisfies followings:

(1): $T (x, x, x) = 0, I (x, x, x) = F (x, x, x) = 1 .$
(2): Assume $x \neq y$ , then $T (x, y, z) \neq 0, I (x, y, z) \neq 1, F (x, y, z) \neq 1.$
(3): $0 < T (x, y, z) - T (x, x, y) + 2 (I (x, x, y) - I (x, y, z)) + F (x, x, y) - F (x, y, z)$
(4): $T (x, y, z), I (x, y, z) and F (x, y, z)$ are symmetric for all elements.

where

T (., ., .)

,

I (., ., .)

and

F (., ., .)

are G-distance functions of truth, indeterminacy and falsity values of the element of the set, respectively.

Proofs are made in a similar way to neutrosophic valued metric spaces.

Example 4.

Let

X

be non-empty SVNS and the G-distance function defined by:

G (x, y, z) = \frac{1}{3} (d (x, y) \oplus d (x, z) \oplus d (y, z))

where

d (., .)

is a neutrosophic valued metric. The pair (X, G) is obviously a neutrosophic valued G-metric space because of

d (., .)

. Further, it has commutative properties.

4. Relative Weighted Neutrosophic Valued Distances and Cohesion Measures

The relative distance measure is a method used for clustering of data sets, []. We define the relative weighted distance, which is a more sensitive method for big data sets.

Let

x_{i} = 〈 T_{x_{i}}, F_{x_{i}}, I_{x_{i}} 〉 \in A (non-empty SVNS), i = 0 \dots n

be SVNNs. Then neutrosophic weighted average operator of these SVNNs is defined as:

M_{a} (A) = \sum_{i = 1}^{n} χ_{i} x_{i} = 〈 1 - \prod_{i = 1}^{n} {(1 - T_{x_{i}})}^{χ_{i}}, \prod_{i = 1}^{n} {(I_{x_{i}})}^{χ_{i}}, \prod_{i = 1}^{n} {(F_{x_{i}})}^{χ_{i}} 〉

where

χ_{i}

is weighted for the i th data. For a given a neutrosophic data set

W = {w_{1}, w_{2}, w_{3}, \dots, w_{n}}

and a neutrosophic valued metric d, we define a relative neutrosophic valued distance for choosing another reference neutrosophic data and compute the relative neutrosophic valued distance as the average of the difference of distances for all the neutrosophic data

w_{i} \in W

.

Definition 11.

The relative neutrosophic valued distance from a neutrosophic data

w_{i}

to another neutrosophic data

w_{j}

is defined as follows:

R D (w_{i} ‖ w_{j}) = \frac{1}{n} \sum_{w_{k} \in W} (d (w_{i}, w_{j}) \circ d (w_{i}, w_{k}))

Here, since T, I, F values of SVNNs cannot be negative, we can define the expression

d (w_{i}, w_{j}) \circ d (w_{i}, w_{k})

as the distance between these two neutrosophic-valued metrics. Furthermore, the distance of metrics is again neutrosophic-valued here so, a related neutrosophic-valued distance can be defined as:

\begin{array}{l} d (w_{i}, w_{j}) \circ d (w_{i}, w_{k}) & = 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \circ 〈 T (w_{i}, w_{k}), I (w_{i}, w_{k}), F (w_{i}, w_{k}) 〉 \\ = 〈 1 - | T (w_{i}, w_{j}) - {(T (w_{i}, w_{k}) - 1)}^{2} |, 1 - | I (w_{i}, w_{j}) - I {(w_{i}, w_{k})}^{2} |, 1 - | F (w_{i}, w_{j}) - F {(w_{i}, w_{k})}^{2} | 〉 \end{array}

(3)

The difference operator

\circ

generally is not a neutrosophic-valued metric (or G-metric). We used some abbreviations for saving space.

\begin{array}{l} R D (w_{i} ‖ w_{j}) & = \frac{1}{n} \sum_{w_{k} \in W} (d (w_{i}, w_{j}) \circ d (w_{i}, w_{k})) \\ = d (w_{i}, w_{j}) \circ \frac{1}{n} \sum_{w_{k} \in W} d (w_{i}, w_{k}) \\ = 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \circ \frac{1}{n} (d (w_{i}, w_{1}) \oplus d (w_{i}, w_{2}) \oplus \dots \oplus d (w_{i}, w_{n})) \\ = 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \\ \circ \frac{1}{n} [〈 T (w_{i}, w_{1}), I (w_{i}, w_{1}), F (w_{i}, w_{1}) 〉 \oplus \dots \oplus 〈 T (w_{i}, w_{1}), I (w_{i}, w_{1}), F (w_{i}, w_{1}) 〉] \\ = 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \\ \circ \frac{1}{n} [〈 \sum_{k \in W} T (w_{i}, w_{k}) - \prod_{k \in W} T (w_{i}, w_{k}), \prod_{k \in W} I (w_{i}, w_{k}), \prod_{k \in W} F (w_{i}, w_{k}) 〉] \\ = 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \\ \circ 〈 1 - {[1 - \sum_{k \in W} T (w_{i}, w_{k}) + \prod_{k \in W} T (w_{i}, w_{k})]}^{1 / n}, \prod_{k \in W} I {(w_{i}, w_{k})}^{1 / n}, \prod_{k \in W} F {(w_{i}, w_{k})}^{1 / n} 〉 \\ = 〈 T_{1}, I_{1}, F_{1} 〉 \circ 〈 T_{2}, I_{2}, F_{2} 〉 \\ = 〈 1 - | T_{1} - {(T_{2} - 1)}^{2} |, 1 - | I_{1} - I_{2}^{2} |, 1 - | F_{1} - F_{2}^{2} | 〉 \end{array}

where

T_{1}

,

I_{1}

,

F_{1}

and

T_{2}

,

I_{2}

,

F_{2}

are the first, second, and third elements of SVNN in the previous equation, respectively.

Definition 12.

The relative weighted neutrosophic valued distance from a neutrosophic data

w_{i}

to another neutrosophic data

w_{j}

is defined as follows:

\begin{array}{l} R D_{χ} (w_{i} ‖ w_{j}) & = \underset{i \neq j, j \neq k, i \neq k}{\sum_{w_{k} \in W}} χ_{w} (d (w_{i}, w_{j}) \circ d (w_{i}, w_{k})) \\ = χ_{i j} d (w_{i}, w_{j}) \circ \underset{i \neq j, j \neq k, i \neq k}{\sum_{w_{k} \in W}} χ_{i k} d (w_{i}, w_{k}) \\ = χ_{i j} 〈 T (w_{i}, w_{j}), I (w_{i}, w_{j}), F (w_{i}, w_{j}) 〉 \\ \circ (χ_{i 1} 〈 T (w_{i}, w_{1}), I (w_{i}, w_{1}), F (w_{i}, w_{1}) 〉 \oplus \dots \oplus χ_{i n} 〈 T (w_{i}, w_{n}), I (w_{i}, w_{n}), F (w_{i}, w_{n}) 〉) \\ = 〈 1 - {(1 - T (w_{i}, w_{j}))}^{χ_{i j}}, I {(w_{i}, w_{j})}^{χ_{i j}}, F {(w_{i}, w_{j})}^{χ_{i j}} 〉 \\ \circ (\begin{array}{l} 〈 1 - {(1 - T (w_{i}, w_{1}))}^{χ_{i 1}}, I {(w_{i}, w_{1})}^{χ_{i 1}}, F {(w_{i}, w_{1})}^{χ_{i 1}} 〉 \oplus \dots \\ \oplus 〈 1 - {(1 - T (w_{i}, w_{n}))}^{χ_{i n}}, I {(w_{i}, w_{n})}^{χ_{i n}}, F {(w_{i}, w_{n})}^{χ_{i n}} 〉 \end{array}) \\ = 〈 1 - {(1 - T (w_{i}, w_{j}))}^{χ_{i j}}, I {(w_{i}, w_{j})}^{χ_{i j}}, F {(w_{i}, w_{j})}^{χ_{i j}} 〉 \\ \circ 〈 \sum_{\underset{k \neq i, j}{k = 1}}^{n} {\tilde{T}}_{i k} - \prod_{\underset{k \neq i, j}{k = 1}}^{n} {\tilde{T}}_{i k}, \prod_{\underset{k \neq i, j}{k = 1}}^{n} {\tilde{I}}_{i k}, \prod_{\underset{k \neq i, j}{k = 1}}^{n} {\tilde{F}}_{i k} 〉 \\ = 〈 T_{1}, I_{1}, F_{1} 〉 \circ 〈 T_{2}, I_{2}, F_{2} 〉 \\ = 〈 1 - | T_{1} - {(T_{2} - 1)}^{2} |, 1 - | I_{1} - I_{2}^{2} |, 1 - | F_{1} - F_{2}^{2} | 〉 \end{array}

where

{\tilde{T}}_{i k} = 1 - {(1 - T (w_{i}, w_{k}))}^{χ_{i k}}, {\tilde{I}}_{i k} = I {(w_{i}, w_{k})}^{χ_{i k}}, {\tilde{F}}_{i k} = F {(w_{i}, w_{k})}^{χ_{i k}}

.

Definition 13.

The relative weighted neutrosophic valued distance (from a random neutrosophic data

w_{i}

) to a neutrosophic data

w_{j}

is defined as follows:

\begin{array}{l} R D_{χ} (w_{j}) & = \sum_{w_{i} \in W} χ_{i} R D_{χ} (w_{i} ‖ w_{j}) \\ = \sum_{w_{i} \in W} χ_{i} [\sum_{w_{k} \in W} χ_{w} (d (w_{i}, w_{j}) \circ d (w_{i}, w_{k}))] \\ = \sum_{w_{i} \in W} χ_{i} [\sum_{w_{k} \in W} χ_{w} (δ {(d_{i j}, d_{i k})}_{}^{})] \end{array}

Definition 14.

The relative weighted neutrosophic valued distance from a neutrosophic data set

W_{1}

to another neutrosophic data set

W_{2}

is defined as follows:

R D_{χ} (W_{1} ‖ W_{2}) = \sum_{x \in W_{1}} χ_{x} \sum_{y \in W_{2}} χ_{y} R D_{χ} (x ‖ y)

Definition 15.

(Weighted cohesion measure between two neutrosophic data) The difference of the relative weighted neutrosophic-valued distance to

w_{j}

and the relative weighted neutrosophic-valued distance from

w_{i}

to

w_{j}

, i.e.,

ρ_{χ} (w_{i}, w_{j}) = R D_{χ} (w_{j}) \circ R D_{χ} (w_{i} ‖ w_{j})

(4)

is called the weighted neutrosophic-valued cohesion measure between two neutrosophic data

w_{i}

and

w_{j}

. If

ρ_{χ} (w_{i}, w_{j}) \geq 0_{W} (resp. ρ_{χ} (w_{i}, w_{j}) \leq 0_{W})

then

w_{i}

and

w_{j}

are said to be cohesive (resp. incohesive). So, the relative weighted neutrosophic distance from

w_{i}

and

w_{j}

is not larger than the relative weighted neutrosophic distance (from a random neutrosophic data) to

w_{j}

.

Definition 16.

(Weighted cohesion measure between two neutrosophic data sets) Let

w_{i}

and

w_{j}

are elements of the neutrosophic data sets U and V, respectively. Then the measure

ρ_{χ} (U, V) = \sum_{w_{i} \in U} χ_{u} \sum_{w_{j} \in V} χ_{v} ρ_{χ} (w_{i}, w_{j})

(5)

is called weighted cohesion neutrosophic-valued measure of the neutrosophic data sets U and V.

Definition 17.

(Cluster) The non-empty neutrosophic data set W is called a cluster if it is cohesive, i.e.,

ρ (W, W) \geq 0_{W}

.

5. Clustering via Neutrosophic Valued G-Metric Spaces

In this section, we can cluster neutrosophic big data thank to defined weighted distance definitions in Section 4 and G-metric definition.

Definition 18.

The neutrosophic valued weighted G-distance from a neutrosophic data

w

to a neutrosophic big data set

U

is defined as follows:

G (w, y, z) = \sum_{y \in U} χ_{u} \sum_{z \in U} χ_{u} (d (w, y) \oplus d (w, z) \circ d (y, z))

(6)

Algorithm (K-sets algorithm)

Input: A neutrosophic big data set

W = {w_{1}, w_{2}, \dots, w_{n}}

, a neutrosophic distance measure d(.,.), and the number of sets K.
Output: A partition of neutrosophic sets

{U_{1}, U_{2}, \dots, U_{K}} .

Initially, choose arbitrarily K disjoint nonempty sets $U_{1}, U_{2}, \dots, U_{K}$ as a partition of $W$ .
for i from 1 to n do
begin
Compute $G (x_{i}, y_{k}, z_{k})$ for each set $U_{k}$ .
Find the set to which the point $x_{i}$ is closest in terms of G-distance.
Assign point $x_{i}$ to that set.
end
Repeat from 2 until there is no further change.

6. Application and Example

We will give an example of the definition of the data that could have this kind of data and fall into the frame to fit this definition. We can call a data set a big data set if it is difficult and/or voluminous to define, analyze and visualize a data set. We give a big neutrosophic data example in accordance with this definition and possible use of G-metric, but it is fictional since there is no real neutrosophic big data example yet. It is a candidate for a good example that one of the current topics, image processing for big data analysis. Imagine a camera on a circuit board that is able to distinguish colors, cluster all the tools it can capture in the image and record that data. The camera that can be used for any color (for example white color vehicle) assigns the following degrees:

(I): The vehicle is at a certain distance at which the color can be detected, and the truth value of the portion of the vehicle is determined.
(II): The rate at which the vehicle can be detected by the camera is assigned as the uncertainty value (the mixed color is the external factors such as the effect of daylight and the color is determined on a different scale).
(III): The rate of not seeing a large part of the vehicle or the rate of out of range of the color is assigned as the value of falsity.

Thus, data of the camera is clustering via G-metric. This result gives that the numbers according to the daily quantities and colors of vehicles passing by are determined. The data will change continuously as long as the road is open, and the camera records the data. There will be a neutrosophic data for each vehicle. So, a Big Neutrosophic Data Clustering will occur.

Here, the weight functions we have defined for the metric can be given 1 value for the main colors (red-yellow-blue). For other secondary or mixed colors, the color may be given a proportional value depending on which color is closer.

A Numerical Toy Example

Take 5 neutrosophic data with their weights are equal to 1 to make a numerical example:

W = {w_{1} 〈 0.6, 0.6, 0.6 〉, w_{2} 〈 0.8, 0.4, 0.5 〉, w_{3} 〈 0.5, 0.8, 0.7 〉, w_{4} 〈 0.9, 0.5, 0.6 〉, w_{5} 〈 0.1, 0.2, 0.7 〉}

K = 3 disjoint sets can be chosen

U_{1} = {w_{1}, w_{4}, w_{5}}, U_{2} = {w_{2}, w_{3}}

.

Then

d (w_{i}, w_{j}) = [\begin{matrix} 〈 0, 1, 1 〉 & 〈 0.2, 0.8, 0.9 〉 & 〈 0.1, 0.8, 0.9 〉 & \begin{matrix} 〈 0.3, 0.9, 1.0 〉 & 〈 0.5, 0.6, 0.9 〉 \end{matrix} \\ 〈 0.2, 0.8, 0.9 〉 & 〈 0, 1, 1 〉 & 〈 0.3, 0.6, 0.8 〉 & \begin{matrix} 〈 0.1, 0.9, 0.9 〉 & 〈 0.7, 0.8, 0.8 〉 \end{matrix} \\ 〈 0.1, 0.8, 0.9 〉 & 〈 0.3, 0.6, 0.8 〉 & 〈 0, 1, 1 〉 & \begin{matrix} 〈 0.4, 0.7, 0.9 〉 & 〈 0.4, 0.4, 1.0 〉 \end{matrix} \\ \begin{matrix} 〈 0.3, 0.9, 1.0 〉 \\ 〈 0.5, 0.6, 0.9 〉 \end{matrix} & \begin{matrix} 〈 0.1, 0.9, 0.9 〉 \\ 〈 0.7, 0.8, 0.8 〉 \end{matrix} & \begin{matrix} 〈 0.4, 0.7, 0.9 〉 \\ 〈 0.4, 0.4, 1.0 〉 \end{matrix} & \begin{matrix} \begin{matrix} 〈 0, 1, 1 〉 \\ 〈 0.2, 0.8, 0.9 〉 \end{matrix} & \begin{matrix} 〈 0.2, 0.8, 0.9 〉 \\ 〈 0, 1, 1 〉 \end{matrix} \end{matrix} \end{matrix}]

where we assume the

d (w_{i}, w_{j})

as in Example 3. So, we can compute the G-metrics of the data as in Equation (3):

\begin{array}{l} G (w_{1}, U_{1}) = G (w_{1}, w_{4}, w_{5}) = 〈 0.99, 0.90, 0.91 〉 \\ G (w_{1}, U_{2}) = G (w_{1}, w_{2}, w_{3}) = 〈 0.79, 0.72, 0.83 〉 \\ G (w_{2}, U_{1}) = G (w_{2}, w_{1}, w_{4}) \oplus G (w_{2}, w_{1}, w_{5}) \oplus G (w_{2}, w_{4}, w_{5}) = 〈 0.9874, 0.6027, 0.6707 〉 \\ G (w_{2}, U_{2}) = G (w_{2}, w_{2}, w_{3}) = 〈 0, 1, 1 〉 \\ G (w_{3}, U_{1}) = G (w_{3}, w_{1}, w_{4}) \oplus G (w_{3}, w_{1}, w_{5}) \oplus G (w_{3}, w_{4}, w_{5}) = 〈 1, 0.4608, 0.6707 〉 \\ G (w_{3}, U_{2}) = G (w_{3}, w_{2}, w_{3}) = 〈 0, 1, 1 〉 \\ G (w_{4}, U_{1}) = G (w_{4}, w_{1}, w_{5}) = 〈 0.81, 0.64, 0.91 〉 \\ G (w_{4}, U_{2}) = G (w_{4}, w_{2}, w_{3}) = 〈 0.97, 0.73, 0.83 〉 \end{array}

So, according to the calculations above,

w_{4}

belongs to set

U_{1}

and the other data belong to

U_{2}

. Here, we have made the data belonging to the clusters according to the fact that the truth values of the G-metrics are mainly low. If the truth value of G-distance is low, then the data is closer to the set.

7. Conclusions

This paper has introduced many new notions and definitions for clustering neutrosophic big data and geometric similarity metric of the data. Neutrosophic data sets have density. For example, sets having indeterminacy density or neutrosophic density and these are adding the more data and complexity. So, neutrosophic data sets are complex big data sets. Separation and clustering of these sets are evaluated according to weighted distances. Neutrosophic data sets in the last part of the paper, K-sets algorithm has been given for neutrosophic big data sets. We hope that the results in this paper can be applied to other data types like interval neutrosophic big data sets and can be analyzed in other metric spaces such as neutrosophic complex valued G-metric spaces etc. and can help to solve problems in other study areas.

Author Contributions

Conceptualization, F.T. and S.T.; Methodology, F.T.; Validation, F.T., S.T. and F.S.; Investigation, F.T. and S.T.; Resources, F.T., S.T. and F.S.; Writing-Original Draft Preparation, F.T. and S.T.; Writing—Review&Editing, F.T., S.T. and F.S.; Supervision, S.T. and F.S.; Funding Acquisition, F.S.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
Atanassov, K.T. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 87–96. [Google Scholar] [CrossRef]
Smarandache, F. Neutrosophy: Neutrosophic Probability, Set and Logic; American Research Press: Rehoboth, NM, USA, 1998. [Google Scholar]
Deli, I.; Subas, Y. Single valued neutrosophic numbers and their applications to multicriteria decision making problem. Neutrosophic Sets Syst. 2014, 2, 1–13. [Google Scholar]
Fan, C.; Fan, E.; Ye, J. The Cosine Measure of Single-Valued Neutrosophic Multisets for Multiple Attribute Decision-Making. Symmetry 2018, 10, 154. [Google Scholar] [CrossRef]
Li, Y.; Liu, P.; Chen, Y. Some single valued neutrosophic number heronian mean operators and their application in multiple attribute group decision making. Informatica 2016, 27, 85–110. [Google Scholar] [CrossRef]
Lu, Z.; Ye, J. Single-valued neutrosophic hybrid arithmetic and geometric aggregation operators and their decision-making method. Information 2017, 8, 84. [Google Scholar] [CrossRef]
Ye, J.; Smarandache, F. Similarity measure of refined single-valued neutrosophic sets and its multicriteria decision making method. Neutrosophic Sets Syst. 2016, 12, 41–44. [Google Scholar]
Guo, Y.; Şengür, A.; Ye, J. A novel image thresholding algorithm based on neutrosophic similarity score. Measurement 2014, 58, 175–186. [Google Scholar] [CrossRef]
Xia, P.; Zhang, L.; Li, F. Learning similarity with cosine similarity ensemble. Inf. Sci. 2015, 307, 39–52. [Google Scholar] [CrossRef]
Laney, D. 3D Data Management: Controlling Data Volume, Velocity and Variety; Meta Group: Stamford, CT, USA, 2001. [Google Scholar]
Chang, C.; Liao, W.; Chen, Y.; Liou, L. A Mathematical Theory for Clustering in Metric Spaces. IEEE Trans. Netw. Sci. Eng. 2016, 3, 2–16. [Google Scholar] [CrossRef]
Azam, A.; Fisher, B.; Khan, M. Common fixed point theorems in complex valued metric spaces. Numer. Funct. Anal. Optim. 2011, 32, 243–253. [Google Scholar] [CrossRef]
El-Sayed Ahmed, A.; Omran, S.; Asad, A.J. Fixed Point Theorems in Quaternion-Valued Metric Spaces. Abstr. Appl. Anal. 2014, 2014, 9. [Google Scholar] [CrossRef]
Mustafa, Z.; Sims, B. A new approach to generalized metric spaces. J. Nonlinear Convex Anal. 2006, 7, 289–297. [Google Scholar]
Abbas, M.; Nazir, T.; Radenović, S. Some periodic point results in generalized metric spaces. Appl. Math. Comput. 2010, 217, 4094–4099. [Google Scholar] [CrossRef]
Khamsi, M.A. Generalized metric spaces: A survey. J. Fixed Point Theory Appl. 2015, 17, 455–475. [Google Scholar] [CrossRef]
Smarandache, F. Neutrosophic set—A generalization of the intuitionistic fuzzy sets. Int. J. Pure Appl. Math. 2005, 24, 287–297. [Google Scholar]
Vasantha, W.B.; Smarandache, F. Some Neutrosophic Algebraic Structures and Neutrosophic N-Algebraic Structures; Hexis: Phoenix, AZ, USA, 2006. [Google Scholar]
Şahin, M.; Kargın, A. Neutrosophic triplet normed space. Open Phys. 2017, 15, 697–704. [Google Scholar] [CrossRef] [Green Version]
Stanujkic, D.; Smarandache, F.; Zavadskas, E.K.; Karabasevic, D. An Approach to Measuring the Website Quality Based on Neutrosophic Sets; Infinite Study: Brussel, Belgium, 2018. [Google Scholar]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Taş, F.; Topal, S.; Smarandache, F. Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces. Symmetry 2018, 10, 430. https://doi.org/10.3390/sym10100430

AMA Style

Taş F, Topal S, Smarandache F. Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces. Symmetry. 2018; 10(10):430. https://doi.org/10.3390/sym10100430

Chicago/Turabian Style

Taş, Ferhat, Selçuk Topal, and Florentin Smarandache. 2018. "Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces" Symmetry 10, no. 10: 430. https://doi.org/10.3390/sym10100430

APA Style

Taş, F., Topal, S., & Smarandache, F. (2018). Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces. Symmetry, 10(10), 430. https://doi.org/10.3390/sym10100430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces

Abstract

1. Introduction and Preliminaries

1.1. $G$ -Metric Spaces

1.2. Neutrosophic Sets

2. Neutrosophic Valued Metric Spaces

2.1. Operators

2.2. Neutrosophic Valued Metric Spaces

3. Neutrosophic Valued $G$ -Metric Spaces

4. Relative Weighted Neutrosophic Valued Distances and Cohesion Measures

5. Clustering via Neutrosophic Valued G-Metric Spaces

6. Application and Example

A Numerical Toy Example

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Clustering Neutrosophic Data Sets and Neutrosophic Valued Metric Spaces

Abstract

1. Introduction and Preliminaries

1.1. G -Metric Spaces

1.2. Neutrosophic Sets

2. Neutrosophic Valued Metric Spaces

2.1. Operators

2.2. Neutrosophic Valued Metric Spaces

3. Neutrosophic Valued G -Metric Spaces

4. Relative Weighted Neutrosophic Valued Distances and Cohesion Measures

5. Clustering via Neutrosophic Valued G-Metric Spaces

6. Application and Example

A Numerical Toy Example

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

1.1. $G$ -Metric Spaces

3. Neutrosophic Valued $G$ -Metric Spaces