# Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- We developed the entropy-based cluster validity index, which uses entropy to evaluate clustering results based on item difficulty, i.e., an item characteristic inherent in the item response data.
- We founded that domain specific information can be used to improve the quality of feature extraction, similarity computation, grouping, and cluster representation in clustering analysis. It is also important to utilize domain specific information in measuring the validity index to interpret clustering results meaningfully in a specific domain.
- We present a usability and applicability of a standard setting method using a clustering algorithm that has been limitedly used for educational purposes in a range of ability tests.

## 2. Related Works

#### 2.1. Standard Setting

#### 2.2. Item Response Data Analysis Using Data Mining Techniques

## 3. Cluster Validity Index

#### 3.1. Sum of Squared Error

_{j}is the j-th cluster, m

_{j}is the centroid of cluster (the mean vector of all the data points in), and distance (x, m

_{j}) is the distance between data point x and the centroid.

_{j}| is the number of data points in cluster C

_{j}. The distance from a data point to a cluster mean is computed with:

#### 3.2. Entropy

_{i}(c

_{j}) is the probability of class c

_{j}in data set D

_{i}. The total entropy of the whole clustering (which considers all clusters) is:

#### 3.3. Relaxation Error

_{i}is the i-th distinct attribute value, P(x

_{i}) is the occurring probability of x

_{i}in C, and Δ is the maximum difference between two values. From the standpoint of relaxation, let us define the relaxation error of C, RE(C), as the average difference from x

_{i}to x

_{j}, j = 1, …, n. RE(C) can be used to measure the quality of an approximate. Summing RE(C) over all values x

_{i}in C, we have:

#### 3.4. Davies-Bouldin Index

_{i}, c

_{j}) denotes the distance between centers of cluster c

_{i}and c

_{j}. The diameter of a cluster is defined as:

_{i}is the number of points and z

_{i}is the center of cluster c

_{i}.

#### 3.5. Calinski-Harabasz Index

#### 3.6. Silhouette Statistic

_{i}is the average distance between point i and all other points in its own cluster, and b

_{i}is the minimum of the average dissimilarities between i and points in other clusters. Finally, the partition with the highest SI is taken to be optimal.

#### 3.7. Dunn Index

_{i}, c

_{j}) is the dissimilarity between clusters c

_{i}and c

_{j}and diam(C) is the intra-cluster function (or diameter) of the cluster. If the Dunn index is large, it means that compact and well-separated clusters exist. Therefore, the maximum is observed for k equal to the most probable number of clusters in the data set.

#### 3.8. SD Validity Index

_{i}− v

_{j}||, $\forall i,j\in \left\{1,2,\cdots ,{n}_{c}\right\}$ is the maximum distance between the centers of clusters and ${D}_{min}$ = min(||v

_{i}− v

_{j}||), $\forall i,j\in \left\{1,2,\cdots ,{n}_{c}\right\}$ is the minimum distance between the centers of clusters.

#### 3.9. S_Bbw Validity Index

## 4. Research Methods

#### 4.1. Data Sampling

#### 4.2. Standard Setting

#### 4.2.1. Experts Group

- A group of experts was formed, including two experts in ICT literacy and eight elementary teachers in computer education and other subjects.
- As the first round of the cut-off scores setting procedure, the group of experts recorded their ratings for each item.
- The group of experts had a discussion based on their records made in the first round to collect their opinions for setting the cut-off scores.
- As the second round of the cut-off scores setting procedure, the group of experts presented item parameters of the ICT literacy test such as discrimination and difficulty, which were found in the preliminary test and set the second cut-off scores.
- The group of experts had the second discussion on the cut-off scores that were set in consideration of item parameters.
- The group of experts set the third cut-off scores based on the discussion results. When the experts agreed at the standards that were set, the cut-off scores were finalized.

#### 4.2.2. Clustering Method

_{m}

_{−1}C

_{k}, where m is the number of scores in the item response data and k is the number of cut-off scores. In the experiment with data set #1, the number of scores is 24, the test consists of 23 items, and the number of cut-off scores is two. Therefore the number of all possible instances is 253.

#### 4.2.3. Clustering Evaluation Methods

## 5. Results

#### 5.1. Entropy-Based Cluster Validity Index

#### 5.2. Measurement of Consistency

_{0}is the relative observed agreement among raters and P

_{e}is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly with each category. If the raters are in complete agreement, then k = 1. If there is no agreement among the raters other than what would be expected by chance (as defined by P

_{e}), k = 0. Cohen’s kappa coefficients are used for measuring consistency between clusters [24].

## 6. Discussion

## 7. Conclusions and Future Works

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Nitko, A.J. Educational Assessment of Students, 3rd ed.; Merrill/Prentice Hall: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
- Bond, L.A. Norm- and criterion-referenced testing. Pract. Assess. Res. Eval.
**1996**, 5, 120–125. [Google Scholar] - Stiggins, R.J. Relevant classroom assessment training for teachers. Educ. Meas. Issues Pract.
**1991**, 10, 7–12. [Google Scholar] [CrossRef] - Cizek, G.J.; Bunch, M.B.; Koons, H. Setting performance standards: contemporary mMethods. Educ. Meas. Issues Pract.
**2004**, 23, 31–50. [Google Scholar] [CrossRef] - Hwang, G.J. A test-sheet-generating algorithm for multiple assessment requirements. IEEE Trans. Educ.
**2003**, 46, 329–337. [Google Scholar] [CrossRef] - Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv.
**1999**, 31, 264–323. [Google Scholar] [CrossRef] - Popham, W.J. As always, provocative. J. Educ. Meas.
**1978**, 15, 297–300. [Google Scholar] [CrossRef] - Lin, J. The bookmark procedure for setting cut-scores and finalizing performance standards strengths and weaknesses. Alta. J. Educ. Res.
**2006**, 52, 36–52. [Google Scholar] - Hambleton, R.K.; Pitoniak, M.J. Setting Performance Standards. In Educational Measurement; Brennan, R.L., Ed.; Greenwood: Phoenix, AZ, USA, 2006. [Google Scholar]
- Morgan, D.; Perie, M. Setting Standards in Education: Choosing the Best Method for Your Assessment and Population; Educational Testing Service (ETS): Princeton, NJ, USA, 2004. [Google Scholar]
- Castro, F.; Vellido, A.; Nebot, A.; Mugica, F. Applying data mining techniques to e-Learning problems. Stud. Comput. Intell.
**2007**, 62, 183–221. [Google Scholar] - Norcini, J.J. Setting standards on educational tests. Med. Educ.
**2003**, 37, 464–469. [Google Scholar] [CrossRef] [PubMed] - Buckendahl, C.; Ferdous, A.; Gerrow, J. Recommending cut scores with a subset of items: An empirical illustration. Pract. Assess. Res. Eval.
**2010**, 15, 1–10. [Google Scholar] - Halkidi, M.; Batistakis, Y.; Vazirgiannis, M. On clustering validation techniques. J. Intell. Inf. Syst.
**2001**, 17, 107–145. [Google Scholar] [CrossRef] - Kim, H.C.; Kwak, E.Y. Information-Based Pruning for Interesting Association Rule Mining in the Item Response Dataset; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3681, pp. 372–378. [Google Scholar]
- Vawter, R. Entropy state of a multiple choice examination and the evaluation of understanding. Am. J. Phys.
**1979**, 47, 320–324. [Google Scholar] [CrossRef] - Oyelade, O.J.; Oladipupo, O.O.; Obagbuwa, I.C. Application of k-means clustering algorithm for prediction of students’ academic performance. Int. J. Comput. Sci. Inf. Secur.
**2010**, 7, 292–295. [Google Scholar] - Ayesha, S.; Mustafa, T.; Sattar, A.R.; Khan, M.I. Data mining model for higher education system. Eur. J. Sci. Res.
**2010**, 43, 24–29. [Google Scholar] - Sacin, C.V.; Agapito, J.B.; Shafti, L.; Ortigosa, A. Recommendation in higher education using data mining techniques. In Proceedings of the 2nd International Conference on Educational Data Mining, Cordoba, Spain, 1–3 July 2009; pp. 191–199. [Google Scholar]
- Shyamala, K.; Rajagopalan, S.P. Data mining model for a better higher educational system. Inf. Technol. J.
**2006**, 5, 560–564. [Google Scholar] - Sembiring, S.; Zarlis, M.; Hartama, D.; Ramliana, S.; Wani, E. Prediction of student academic performance by an application of data mining techniques. In Proceedings of the 2011 International Conference on Management and Artificial Intelligence, Bali, Indonesia, 1–3 April 2011; pp. 110–114. [Google Scholar]
- Fisher, D.H. Knowledge acquisition via incremental conceptual clustering. Mach. Learn.
**1987**, 2, 139–172. [Google Scholar] [CrossRef] - Bing, L. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications); Springer: Secaucus, NJ, USA, 2006. [Google Scholar]
- Dasgupta, S.; Ng, V. Which clustering do you want? Inducing your ideal clustering with minimal feedback. J. Artif. Intell. Res.
**2010**, 39, 581–632. [Google Scholar] - Crabtree, D.; Gao, X.; Andreae, P. Standardized evaluation method for web clustering results. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne, France, 19–22 September 2005; pp. 280–283. [Google Scholar]
- Park, E.J.; Chung, H.; Jang, D.S. A grading method for student’s achievements based on the clustering technique. Fuzzy Log. Intell. Syst.
**2002**, 12, 151–156. [Google Scholar] [CrossRef] - Chu, W.W.; Chen, Q. Neighborhood and associative query answering. J. Intell. Inf. Syst.
**1992**, 1, 355–382. [Google Scholar] [CrossRef] - Hanson, S.J.; Bauer, M. Conceptual clustering, categorization, and polymorphy. Mach. Learn.
**1989**, 3, 343–372. [Google Scholar] [CrossRef] - Cha, S.E.; Jun, S.J.; Kwon, D.Y.; Kim, H.S.; Kim, S.B.; Kim, J.M.; Kim, Y.A.; Han, S.G.; Seo, S.S.; Jun, W.C.; et al. Measuring achievement of ICT competency for students in Korea. Comput. Educ.
**2011**, 56, 990–1002. [Google Scholar] [CrossRef] - Crocker, L.; Algina, J. Introduction to Classical & Modern Test Theory; Holt, Rinehart and Winston: Orlando, FL, USA, 1986. [Google Scholar]
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas.
**1960**, 20, 37–46. [Google Scholar] [CrossRef] - Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics
**1977**, 33, 159–174. [Google Scholar] [CrossRef] [PubMed] - Baker, F.B.; Kim, S.-H. Item Response Theory: Parameter Estimation Techniques, 2nd ed.; Marcel Dekker: New York, NY, USA, 2004. [Google Scholar]
- DiEugenio, B.; Glass, M. The kappa statistic: A second look. Comput. Linguist.
**2004**, 30, 95–101. [Google Scholar] - Xhafa, F. Processing and analysing large log data files of a virtual campus. J. Converg.
**2012**, 3, 1–8. [Google Scholar] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef]

Step | Description |
---|---|

1 | Meet with officials to determine needs. |

2 | Choose a standard setting method. |

3 | Choose a standard setting panel. |

4 | Write performance level descriptors. |

5 | Train panelists on the method. |

6 | Train the panelists on the content. |

7 | Compile ratings from panelists. |

8 | Conduct panel discussions. |

9 | Consider the consequences or impact. |

10 | Evaluate the process and standards. |

Student | I_{1} | I_{2} | … | I_{n} | Score |
---|---|---|---|---|---|

S_{1} | 0 | 1 | … | 1 | TS_{1} |

S_{2} | 1 | 1 | … | 1 | TS_{2} |

… | … | … | … | … | … |

S_{m} | 0 | 1 | … | 0 | TS_{m} |

Dataset | Below Basic | Basic | Proficient | Advanced | |
---|---|---|---|---|---|

#1 | Population | 5994 (44.0%) | 7141 (52.5%) | n/a | 476 (3.5%) |

Cut-off scores | score ≤ 51 | 51 < score < 66.4 | 66.4 ≤ score | ||

#2 | Population | 468 (3.7%) | 4554 (36.4%) | 5024 (40.2%) | 2451 (19.6%) |

Cut-off scores | score ≤ 7 | 7 < score ≤ 20 | 20 < score ≤ 28 | 28 < score ≤ 36 | |

#3 | Population | 722 (4.6%) | 9175 (58.1%) | 4907 (31.1%) | 991 (6.3%) |

Cut-off scores | score ≤ 6 | 6 < score ≤ 20 | 20 < score ≤ 27 | 27 < score ≤ 36 |

Item No. 1 | Item No. 23 | ||||||
---|---|---|---|---|---|---|---|

Cluster | Correct | Wrong | Entropy | Cluster | Correct | Wrong | Entropy |

1 | 483 | 3144 | 0.150 | 1 | 930 | 2667 | 0.217 |

2 | 1353 | 5009 | 0.349 | 2 | 2703 | 3660 | 0.459 |

3 | 1309 | 2342 | 0.253 | 3 | 2097 | 1554 | 0.263 |

total | 3146 | 10465 | 0.752 | total | 5730 | 7881 | 0.941 |

**Table 5.**Cohen’s kappa coefficients of the cut-off scores by each cluster validity index and cut-off scores by the experts group.

Cluster Validity Index | Data Set #1 | Data Set #2 | Data Set #3 |
---|---|---|---|

Sum of square | 0.125 | 0.699 | 0.252 |

Relaxation | 0.207 | 0.678 | 0.556 |

Davies-Bouldin index | 0.328 | 0.421 | 0.474 |

Calinski-Harabasz index | 0.423 | 0.632 | 0.431 |

Silhouette statistic | 0.232 | 0.544 | 0.594 |

Dunn index | 0.229 | 0.495 | 0.633 |

SD validity index | 0.522 | 0.526 | 0.448 |

S_Bbw validity index | 0.604 | 0.573 | 0.525 |

Entropy | 0.624 | 0.712 | 0.673 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, B.; Kim, J.; Yi, G.
Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores. *Symmetry* **2017**, *9*, 62.
https://doi.org/10.3390/sym9050062

**AMA Style**

Kim B, Kim J, Yi G.
Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores. *Symmetry*. 2017; 9(5):62.
https://doi.org/10.3390/sym9050062

**Chicago/Turabian Style**

Kim, Byoungwook, JaMee Kim, and Gangman Yi.
2017. "Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores" *Symmetry* 9, no. 5: 62.
https://doi.org/10.3390/sym9050062