# Analysis of a Similarity Measure for Non-Overlapped Data

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Background and Motivation

#### 1.2. Data Description

- Two data, $\mathit{f}({\mathit{x}}_{\mathit{i}})$ and $\mathit{g}({\mathit{y}}_{\mathit{j}})$ for ${\mathit{x}}_{\mathit{i}},{\mathit{y}}_{\mathit{j}}\u03f5\mathit{X}$, $\mathit{X}$ denotes a universe of discourse. $\mathit{f}({\mathit{x}}_{\mathit{i}})$ and $\mathit{g}({\mathit{y}}_{\mathit{j}})$ have values at the same support ${\mathit{x}}_{\mathit{i}}={\mathit{y}}_{\mathit{j}}$ whether it is same or not. It means direct operation such as summation or subtract is possible between two values.
- On the other hand, they are classified as non-overlapped data. It is rather difficult to attain operation results between two data in different supports. In this paper, we propose a similarity design for such non-overlapped data with the help of preprocessing.
- In general, data—especially big data—provide a large amount of information, and groups of data are located close to or far from each other geometrically. The information analysis on neighbor data is used to design the non-overlapped data in this paper.

## 2. Preliminaries on Similarity Measure

**Definition**

**1.**

- (S1)
- $s(A,B)=s(B,A)$, for $\forall A,B\in F(X)$
- (S2)
- $s(D,{D}^{C})=0$, if and only if $D\in P(X)$
- (S3)
- $s(C,C)=ma{x}_{A,B\in F}s(A,B)$, for $\forall C\in F(X)$
- (S4)
- $A,B,C\in F(X)$, if $A\subset B\subset C$, then $s(A,B)\ge s(A,C)$ and $s(B,C)\ge s(A,C)$

**Definition**

**2.**

- (D1)
- $d(A,B)=d(B,A)$, for $\forall A,B\in F(X)$
- (D2)
- $d(A,A)=0$, $A\in F(X)$
- (D3)
- $d(D,{D}^{C})=ma{x}_{A,B\in F}d(A,B)$, $D\in F(X)$
- (D4)
- $A,B,C\in F(X)$, if $A\subset B\subset C$, then $d(A,B)\le d(A,C)$ and $d(B,C)\le d(A,C)$.

**Theorem**

**1.**

**Proof.**

**Theorem**

**2.**

**Proof.**

- -
- Figure 2a: $X=$ {$x(i)$: 0.5, 0.8, 0.6, 0.0, 0.5, 0.0, 0.0, 0.4, 0.0, 1.0, 0.0, 0.0} and $Y=$ {$y(i)$: 0.0, 0.0, 0.0, 0.4, 0.0, 0.6, 0.7, 0.0, 0.5, 0.0, 0.8, 0.6}
- -
- Figure 2b: $X=$ {$x(i)$: 0.5, 0.8, 0.6, 0.4, 0.5, 0.0, 0.0, 0.4, 0.0, 0.0, 0.0, 0.0} and $Y=$ {$y(i)$: 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.7, 0.0, 0.5, 1.0, 0.8, 0.6}

## 3. Similarity Measure on Non-Overlapped Data

#### 3.1. Data Transformation and Application to Similarity Measure

#### 3.2. Similarity Measure Design Using Neighbor Information

**Theorem**

**3.**

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Proof of Theorem 1

- (S1):
- It is clear from Equation (1) itself, hence $s(A,B)=s(B,A)$ is satisfied.
- (S2):
- $$s(D,{D}^{C})=1-d(D,D{\displaystyle \cap}{D}^{C})-d({D}^{C},D{\displaystyle \cap}{D}^{C})\phantom{\rule{0ex}{0ex}}=1-d(D,{[0]}_{X})-d({D}^{C},{[0]}_{X})=0$$
- (S3):
- It is also clear because:$$s(C,C)=1-d(C,C{\displaystyle \cap}C)-d(C,C{\displaystyle \cap}C)=1-d(C,C)-d(C,C)=1.$$
- (S4):
- From Equation (1), because:$$d(A,A{\displaystyle \cap}C)=d(A,A{\displaystyle \cap}B)\mathrm{and}d(C,A{\displaystyle \cap}C)\ge d(B,A{\displaystyle \cap}B)$$

## Appendix B. Proof of Theorem 2

- (S1)
- It is clear from Equation (2) itself, hence $s(A,B)=s(B,A)$.
- (S2)
- Because:$$s(D,{D}^{C})=2-d((D{\displaystyle \cap}{D}^{C}),[1{]}_{X})-d((D{\displaystyle \cup}{D}^{C}),[0{]}_{X})\phantom{\rule{0ex}{0ex}}=2-d({[0]}_{X},{[1]}_{X})-d({[1]}_{X},{[0]}_{X})=0$$
- (S2)
- is satisfied.
- (S3)
- This property is satisfied because:$$s(C,C)=2-d((C{\displaystyle \cap}C),[1{]}_{X})-d((C{\displaystyle \cup}C),[0{]}_{X})s(C,C)=2-d(C,{[1]}_{X})-d(C,{[0]}_{X})=2-1=1.$$
- (S4)
- From Equation (2), because:$$d((A{\displaystyle \cap}B),[1{]}_{X})=d((A{\displaystyle \cap}C),[1{]}_{X})\mathrm{and}d((A{\displaystyle \cup}B),[0{]}_{X})\text{}\le d((A{\displaystyle \cup}C),[0{]}_{X})$$$$d((A{\displaystyle \cap}C),[1{]}_{X})\ge d((B{\displaystyle \cap}C),[1{]}_{X})\mathrm{and}d((A{\displaystyle \cup}C),[0{]}_{X})=d((B{\displaystyle \cup}C),[0{]}_{X})$$

## Appendix C. Derivation of Equations (12) and (13)

## References

- Zadeh, L.A. Fuzzy sets and systems. In Proceedings of the Symposium on System Theory; Polytechnic Institute of Brooklyn: New York, NY, USA, 1965; pp. 29–37. [Google Scholar]
- Dubois, D.; Prade, H. Fuzzy Sets and Systems; Academic Press: New York, NY, USA, 1988. [Google Scholar]
- Kovacic, Z.; Bogdan, S. Fuzzy Controller Design: Theory and Applications; CRC Press: Boca Raton, FL, USA, 2005. [Google Scholar]
- Plataniotis, K.N.; Androutsos, D.; Venetsanopoulos, A.N. Adaptive Fuzzy systems for Multichannel Signal Processing. Proc. IEEE
**1999**, 87, 1601–1622. [Google Scholar] [CrossRef] - Fakhar, K.; El Aroussi, M.; Saidi, M.N.; Aboutajdine, D. Fuzzy pattern recognition-based approach to biometric score fusion problem. Fuzzy Sets Syst.
**2016**, 305, 149–159. [Google Scholar] [CrossRef] - Pal, N.R.; Pal, S.K. Object-background segmentation using new definitions of entropy. IEEE Proc.
**1989**, 36, 284–295. [Google Scholar] [CrossRef] - Kosko, B. Neural Networks and Fuzzy Systems; Prentice-Hall: Englewood Cliffs, NJ, USA, 1992. [Google Scholar]
- Liu, X. Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets Syst.
**1992**, 52, 305–318. [Google Scholar] - Bhandari, D.; Pal, N.R. Some new information measure of fuzzy sets. Inf. Sci.
**1993**, 67, 209–228. [Google Scholar] [CrossRef] - De Luca, A.; Termini, S. A Definition of nonprobabilistic entropy in the setting of fuzzy entropy. J. Gen. Syst.
**1972**, 5, 301–312. [Google Scholar] - Hsieh, C.H.; Chen, S.H. Similarity of generalized fuzzy numbers with graded mean integration representation. In Proceedings of the 8th International Fuzzy Systems Association World Congress, Taipei, Taiwan, 17–20 August 1999; Volume 2, pp. 551–555. [Google Scholar]
- Chen, S.J.; Chen, S.M. Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Trans. Fuzzy Syst.
**2003**, 11, 45–56. [Google Scholar] [CrossRef] - Lee, S.H.; Pedrycz, W.; Sohn, G. Design of Similarity and Dissimilarity Measures for Fuzzy Sets on the Basis of Distance Measure. Int. J. Fuzzy Syst.
**2009**, 11, 67–72. [Google Scholar] - Lee, S.H.; Ryu, K.H.; Sohn, G.Y. Study on Entropy and Similarity Measure for Fuzzy Set. IEICE Trans. Inf. Syst.
**2009**, E92-D, 1783–1786. [Google Scholar] [CrossRef] - Lee, S.H.; Kim, S.J.; Jang, N.Y. Design of Fuzzy Entropy for Non Convex Membership Function. In Communications in Computer and Information Science; Springer: Berlin, Germany, 2008; Volume 15, pp. 55–60. [Google Scholar]
- Dengfeng, L.; Chuntian, C. New similarity measure of intuitionistic fuzzy sets and application to pattern recognitions. Pattern Recognit. Lett.
**2002**, 23, 221–225. [Google Scholar] [CrossRef] - Li, Y.; Olson, D.L.; Qin, Z. Similarity measures between intuitionistic fuzzy (vague) set: A comparative analysis. Pattern Recognit. Lett.
**2007**, 28, 278–285. [Google Scholar] [CrossRef] - Couso, I.; Garrido, L.; Sanchez, L. Similarity and dissimilarity measures between fuzzy sets: A formal relational study. Inf. Sci.
**2013**, 229, 122–141. [Google Scholar] [CrossRef] - Li, Y.; Qin, K.; He, X. Some new approaches to constructing similarity measures. Fuzzy Sets Syst.
**2014**, 234, 46–60. [Google Scholar] [CrossRef] - Lee, S.; Sun, Y.; Wei, H. Analysis on overlapped and non-overlapped data. In Proceedings of the Information Technology and Quantitative Management (ITQM2013), Suzhou, China, 16–18 May 2013; Volume 17, pp. 595–602. [Google Scholar]
- Lee, S.; Wei, H.; Ting, T.O. Study on Similarity Measure for Overlapped and Non-overlapped Data. In Proceedings of the Third International Conference on Information Science and Technology, Yangzhou, China, 23–25 March 2013. [Google Scholar]
- Lee, S.; Shin, S. Similarity measure design on overlapped and non-overlapped data. J. Cent. South Univ.
**2014**, 20, 2440–2446. [Google Scholar] [CrossRef] - Host-Madison, A.; Sabeti, E. Atypical Information Theory for real-vauled data. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 666–670. [Google Scholar]
- Host-Madison, A.; Sabeti, E.; Walton, C. Information Theory for Atypical Sequence. In Proceedings of the 2013 IEEE Information Theory Workshop (ITW), Sevilla, Spain, 9–13 September 2013; pp. 1–5. [Google Scholar]
- Pemmaraju, S.; Skiena, S. Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica; Cambridge University: Cambridge, UK, 2003. [Google Scholar]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lee, S.; Cha, J.; Theera-Umpon, N.; Kim, K.S.
Analysis of a Similarity Measure for Non-Overlapped Data. *Symmetry* **2017**, *9*, 68.
https://doi.org/10.3390/sym9050068

**AMA Style**

Lee S, Cha J, Theera-Umpon N, Kim KS.
Analysis of a Similarity Measure for Non-Overlapped Data. *Symmetry*. 2017; 9(5):68.
https://doi.org/10.3390/sym9050068

**Chicago/Turabian Style**

Lee, Sanghyuk, Jaehoon Cha, Nipon Theera-Umpon, and Kyeong Soo Kim.
2017. "Analysis of a Similarity Measure for Non-Overlapped Data" *Symmetry* 9, no. 5: 68.
https://doi.org/10.3390/sym9050068