# Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We design several recommendation systems that jointly combines 3D foot measurements extracted from a fast, portable and low-cost 3D foot digitizer with user preferences extracted from past purchases and self-reported usual size. To the best of our knowledge, this is the first time all this information is jointly taken into account for footwear size recommendation.
- We use methods based on clustering and archetype analysis as user-based collaborative filtering [9] for the first time.
- We use those and another collaborative filtering as imputation methods before the use of an ordinal classifier for the first time.
- We propose an ensemble of an ordinal classifier and collaborative filtering for the first time.
- We compared the performance of the proposed methodologies with that of well-known methods. These well-known methods are: ordinal classifiers that can handle missing values, such as random forests, and ordinal classifiers, such as ordered logistic regression, after using a well-known imputation method.
- We tested all these approaches in a simulation study and applying them to a novel dataset of Spanish users.
- We have made the code of our procedure and synthetic datasets available for reproducing the results (see the Data Availability Statement).

## 2. Data

#### 2.1. Real Data

#### 2.2. Simulated Data

## 3. Related Work

## 4. Background

#### 4.1. Ordinal Classifiers

#### 4.1.1. Ordered Logistic Regression

**MASS**[16]. This method needs complete cases. We refer to this method as POLR. We assign a new case to the class with the highest probability.

**MICE**[17], which was a satisfactory imputation method in the comparison carried out by Hao and Blair [18]. We refer to this method as POLR-MICE.

#### 4.1.2. Random Forests

**randomForest**[20] with the default parameters, which implements the RF algorithm of Breiman [21].

**party**[23,24].

#### 4.2. Collaborative Filtering

**recommenderlab**[26], with the ’UBCF’ method, a user-based CF. User-based CF assumes that customers with similar selections will choose items similarly. It predicts the selection of a customer by first finding a neighborhood of similar customers and then aggregating the selections of these customers to give a prediction. We refer to this method as UBCF.

**cluster**[31] that uses the Partial Distance Strategy (PDS) for missing data [32]. If some pairwise dissimilarities cannot be estimated because both users have not coincided in selecting any shoe model, then that dissimilarity is given a high value, larger than the other dissimilarities (a value of 10 is used in the experiments). If there is a missing value in an archetype, we impute it with the common size selected (usually 42 in the experiments).

#### 4.3. Ensembles

## 5. Proposed Methodologies

**POLR**POLR is applied to the anthropometric measurements and the variable ‘pref’, which are the variables with complete cases, without missing values.**POLR-MICE**POLR-MICE applies POLR to the anthropometric measurements, the variable ‘pref’ and the variables with the preferred size for each model except the variable used as output, after imputing the missing values with MICE.**CondRF**CondRF is applied to the anthropometric measurements, the variable ‘pref’ and the variables with the preferred size for each model, which contain missing values, except the variable used as output. This method handles missing values by using surrogate splits when predictors are missing [22].**ClassRF**ClassRF is applied to the anthropometric measurements, the variable ‘pref’ and the variables with the preferred size for each model, which contain missing values, except the variable used as output. For handling missing data, we use the $rfImpute$ function from the R package**randomForest**that imputes missing values in predictor data using proximity from randomForest, before using the $randomForest$ function.**UBCF**UBCF is applied to the variables with the preferred size for each model, which contain missing values.**k-POD**k-POD is applied to the variables with the preferred size for each model, which contain missing values. We consider k = 3 since users select their usual size or one size up or one size down, as mentioned above. To give a recommendation, i.e., to predict a missing value of a given observation, we use the value for that variable of the cluster centroid of the cluster to which the given observation is assigned by k-POD.**AAcMDS**AAcMDS is applied to the variables with the preferred size for each model, which contain missing values. We consider k = 3, as with k-POD. To give a recommendation, i.e., to predict a missing value of a given observation, we use the approximation given by the archetypes to the referred observation.**AAHP**AAHP is applied to the variables with the preferred size for each model, which contain missing values. We consider k = 3, as with k-POD. To give a recommendation, i.e., to predict a missing value of a given observation, we use the approximation given by the archetypes to the referred observation.**CO-POLR-UBCF**CO-POLR-UBCF combines the information of the variables with complete cases with the user preference information with missing data using UBCF. UBCF is used as an imputation method. UBCF is used with the variables with the preferred size for each model to give a recommended size for the missing values. Then, we apply POLR to these data together with the anthropometric measurements and the variable ‘pref’.**CO-POLR-k-POD**CO-POLR-k-POD combines the information of the variables with complete cases with the user preference information with missing data using k-POD. k-POD is used as an imputation method. k-POD is used with the variables with the preferred size for each model to give a recommended size for the missing values. Then, we apply POLR to these data together with the anthropometric measurements and the variable ‘pref’.**CO-POLR-AAcMDS**CO-POLR-AAcMDS combines the information of the variables with complete cases with the user preference information with missing data using AAcMDS. AAcMDS is used as an imputation method. AAcMDS is used with the variables with the preferred size for each model to give a recommended size for the missing values. Then, we apply POLR to these data together with the anthropometric measurements and the variable ‘pref’.**CO-POLR-AAHP**CO-POLR-AAHP combines the information of the variables with complete cases with the user preference information with missing data using AAHP. AAHP is used as an imputation method. AAHP is used with the variables with the preferred size for each model to give a recommended size for the missing values. Then, we apply POLR to these data together with the anthropometric measurements and the variable ‘pref’.**EN-POLR-UBCF**EN-POLR-UBCF builds an ensemble of the two previous methods POLR and UBCF, as described in Section 4.3. EN needs the predicted class probabilities for each classifier. POLR returns them, but not UBCF. UBCF returns a real number as a recommendation. Thus, we recast these recommendations as the role of probabilities as follows. If the $recommendation$ is between 41 and 42, we consider that the predicted probability for size 41 is 1 − ($recommendation$− 41), while the predicted probability for size 42 is 1 − (42- $recommendation$) and zero for size 43. On the contrary, if the $recommendation$ is between 42 and 43, we consider that the predicted probability for size 42 is 1 −($recommendation$− 42), while the predicted probability for size 43 is 1 − (43−$recommendation$) and zero for size 41.**EN-POLR-k-POD**EN-POLR-k-POD builds an ensemble of the two previous methods, POLR and k-POD, as described in Section 4.3. As with UBCF, k-POD does not return probabilities but real numbers as recommendations. Therefore, we follow the same strategy with k-POD as with UBCF in EN-POLR-UBCF to obtain probabilities and build the ensemble.**EN-POLR-AAcMDS**EN-POLR-AAcMDS builds an ensemble of the two previous methods, POLR and AAcMDS, as described in Section 4.3. Again, we follow the same strategy as with UBCF in EN-POLR-UBCF to obtain probabilities and build the ensemble, since AAcMDS does not return probabilities either.**EN-POLR-AAHP**EN-POLR-AAHP builds an ensemble of the two previous methods, POLR and AAHP, as described in Section 4.3. Again, we follow the same strategy as with UBCF in EN-POLR-UBCF to obtain probabilities and build the ensemble, since AAHP does not return probabilities either.

**klaR**[36].

#### Experimental Set-Up

## 6. Results and Discussion

#### 6.1. Synthetic Data Results

**Models where anthropometry and preference are relevant give the best performance:**The best accuracies are achieved by the models that are more closely related to ‘pref’, i.e., M1 (89.1%) and M4 (88.9%) for Scenario 1 and M4 (88.8%) for Scenario 2. On the contrary, the models built without a relationship with ‘pref’ and anthropometric data (FL) give the worst results: M1 (60.8%), M2 (55.8%) and M3 (62.4%) for Scenario 2. This makes sense since these last models can only be predicted by the preferred sizes of other models. This is the reason accuracies for Scenario 1 are higher than for Scenario 2.**Performance is more affected by high variability in size selection than by bias:**The best accuracy for M3 in Scenario 1 (82.1%) is higher than that for M2 (69.1%) in Scenario 1. This also happens in Scenario 2. The data in M2 are generated with more variability, and therefore, less predictability.**CO-methods and EN-methods are good alternatives to established tools:**For Scenario 1, the CO-methods and EN-methods return very competitive results with the different CF strategies. This also happens in Scenario 2, but in this scenario CO-POLR-k-POD seems to be the best option. In both scenarios, the use of classic MICE for imputation yields worse results than using CF strategies for imputation, i.e., our proposed CO-methods are better. Our proposed CO-methods and EN-methods also provide better results than established tools, such as CondRF and ClassRF.

#### 6.2. Real Data Results

**Accuracies depend on the shoe model:**The best accuracies vary according to the shoe model, ranging from 57% for model M6 to 88% for model M2. This is also shown in Section 6.1. However, the average of the best accuracies for all the shoe models is 77%. In any case, these results are much higher than the accuracy obtained by the traditional foot-length-based strategy, which was 34.2%, as previously discussed in Section 1.**The best result is obtained with different methods for each shoe model:**There is no single method that is the best for all shoe models. For some of them, the information on past purchases is sufficient. Only for model M3 is the best classification obtained with the anthropometric measurements and ‘pref’ (its own preference, not relative to the other users), although for model M7 this comment could also be valid. Very good results are obtained with POLR and LDA for both with M3 and M7. For the other shoe models, the information given by the size selection made by other users is useful to the point that, in four of the shoe models, M1, M4, M6 and M7, the best classification is obtained without taking the anthropometric measurements and ‘pref’ into account, just the size selections. In the remaining models (M2, M5 and M8), both measurements and selections are useful for obtaining accurate predictions. In short, it is clear that the selections made by other users are important.**EN-methods are very competitive:**In global terms, if the mean accuracy for all the shoe models is analyzed, the best method is EN-POLR-AAcMDS (72.9%), followed by AAcMDS (71.9%), EN-POLR-AAHP (71.4%) and EN-POLR-UBCF (70.4%). On the one hand, the ensemble methodologies (EN-methods) provide excellent results. They are better than other ways of combining the different kinds of information, such as the use of RFs or classification by POLR after imputation (CO-methods). Furthermore, the ensemble methodologies improve the results of the individual classifiers in all cases when global results are analyzed: mean accuracy of EN-POLR-UBCF (70.4%) is higher than that of UBCF (66.5%), and the same happens for EN-POLR-k-POD (66%) versus k-POD (63%), EN-POLR-AAcMDS (72.9%) versus AAcMDS (71.9%) and EN-POLR-AAHP (71.4%) versus AAHP (68.9%).**Collaborative filtering techniques with past purchases return very competitive results:**The two techniques that we propose for the first time as collaborative filtering techniques, AAcMDS (71.9%) and AAHP (68.9%), provide higher mean accuracy than the well-established technique UBCF (66.5%).**Suitability of CF tools in classification problems with uncertainties:**Our results may seem to disagree with the message given by Hao and Blair [18]. They showed that user-based collaborative filtering was consistently inferior to logistic regression and random forests with different imputations on clinical prediction. However, there are relevant differences in both studies. First of all, Hao and Blair [18] indicated that CF may not be desirable in datasets where classification is an acceptable alternative, but this is not the case in our situation. Note that global accuracies for RFs (51.6% and 63.5% for CondRF and ClassRF, respectively) are lower than for CF in general (71.9%, 68.9%, 66.5% and 63% for AAcMDS, AAHP, UBCF and k-POD, respectively). Moreover, the problems are different. The responses and input variables of their clinical datasets are objective. However, in our problem, the size selection is quite subjective; each user has his own preferences even when they have similar anthropometric measurements. Therefore, ours is a difficult problem due to the presence of uncertainties in all parts of the problem: the outcome and the inputs. In fact, in other medical problems [38], their CF-based approach achieved a higher predictive accuracy than popular classification techniques such as logistic regression and support vector machines.

#### 6.3. What Are the Advantages and Limitations of Our Proposal?

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Huang, S.; Wang, Z.; Jiang, Y. Guess your size: A hybrid model for footwear size recommendation. Adv. Eng. Inform.
**2018**, 36, 64–75. [Google Scholar] [CrossRef] - Lu, Z.; Stauffer, J. Fit Recommendation via Collaborative Inference. U.S. Patent 8,478,663, 2 July 2013. [Google Scholar]
- Dumke, M.A.; Briare, M.B. Recommending a Shoe Size Based on Best Fitting Past Shoe Purchases. U.S. Patent Application No. 12/655,553, 30 June 2011. [Google Scholar]
- Wilkinson, M.T.; Fresen, G.B.; End, N.B.; Wolodzko, E. Method and System for Recommending a Default Size of a Wearable Item Based on Internal Dimensions. U.S. Patent 9,366,530, 14 June 2016. [Google Scholar]
- Pierola, A.; Epifanio, I.; Alemany, S. An ensemble of ordered logistic regression and random forest for child garment size matching. Comput. Ind. Eng.
**2016**, 101, 455–465. [Google Scholar] [CrossRef][Green Version] - Gutiérrez, P.; Pérez-Ortiz, M.; Sánchez-Monedero, J.; Fernández-Navarro, F.; Hervás-Martínez, C. Ordinal Regression Methods: Survey and Experimental Study. IEEE Trans. Knowl. Data Eng.
**2016**, 28, 127–146. [Google Scholar] [CrossRef][Green Version] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning. Data Mining, Inference and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Hand, D.J. Classifier Technology and the Illusion of Progress. Stat. Sci.
**2006**, 21, 1–14. [Google Scholar] [CrossRef][Green Version] - Su, X.; Khoshgoftaar, T.M. A survey of collaborative filtering techniques. Adv. Artif. Intell.
**2009**, 2009. [Google Scholar] [CrossRef] - Ballester, A.; Piérola, A.; Parrilla, E.; Izquierdo, M.; Uriel, J.; Nácher, B.; Alemany, S. Fast, portable and low-cost 3D foot digitizers: Validity and reliability of measurements. In Proceedings of the 3DBODY, TECH 2017 8th International Conference and Exhibition on 3D Body Scanning and Processing Technologies, Montreal, QC, Canada, 11–12 October 2017; pp. 218–225. [Google Scholar]
- Alcacer, A.; Epifanio, I.; Ibá nez, M.V.; Simó, A.; Ballester, A. A data-driven classification of 3D foot types by archetypal shapes based on landmarks. PLoS ONE
**2020**, 15, e0228016. [Google Scholar] [CrossRef][Green Version] - Tran, B.; Tran, H. Systems and Methods for Footwear Fitting. U.S. Patent 9,460,557, 4 October 2016. [Google Scholar]
- Wilkinson, M.T.; End, N.B.; Fresen, G.B.; Wolodzko, E. Method and System for Recommending a Size of a Wearable Item. U.S. Patent 10,311,498, 4 June 2019. [Google Scholar]
- Marks, W.H. Footwear Recommendations From Foot Scan Data Describing Feet of a User. U.S. Patent 9,648,926, 16 May 2017. [Google Scholar]
- Agresti, A. Categorical Data Analysis; Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]
- Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
- Van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw.
**2011**, 45, 1–67. [Google Scholar] [CrossRef][Green Version] - Hao, F.; Blair, R.H. A comparative study: Classification vs. user-based collaborative filtering for clinical prediction. BMC Med. Res. Methodol.
**2016**, 16, 1–14. [Google Scholar] [CrossRef][Green Version] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News
**2002**, 2, 18–22. [Google Scholar] - Breiman, L. Manual On Setting Up, Using, and Understanding Random Forests V4.0; Statistics Department, University of California: Berkeley, CA, USA, 2003. [Google Scholar]
- Hothorn, T.; Hornik, K.; Zeileis, A. Unbiased Recursive Partitioning: A Conditional Inference Framework. J. Comput. Graph. Stat.
**2006**, 15, 651–674. [Google Scholar] [CrossRef][Green Version] - Hothorn, T.; Buehlmann, P.; Dudoit, S.; Molinaro, A.; Laan, M.V.D. Survival Ensembles. Biostatistics
**2006**, 7, 355–373. [Google Scholar] [CrossRef] [PubMed] - Strobl, C.; Boulesteix, A.L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional Variable Importance for Random Forests. BMC Bioinform.
**2008**, 9, 1–11. [Google Scholar] [CrossRef] [PubMed][Green Version] - Janitza, S.; Tutz, G.; Boulesteix, A.L. Random forest for ordinal responses: Prediction and variable selection. Comput. Stat. Data Anal.
**2016**, 96, 57–73. [Google Scholar] [CrossRef] - Hahsler, M. Recommenderlab: Lab for Developing and Testing Recommender Algorithms. R Package Version 0.2-6. 2020. Available online: https://www.rdocumentation.org/packages/recommenderlab/versions/0.2-6 (accessed on 22 January 2021).
- Chi, J.T.; Chi, E.C.; Baraniuk, R.G. k-POD: A Method for k-Means Clustering of Missing Data. Am. Stat.
**2016**, 70, 91–99. [Google Scholar] [CrossRef][Green Version] - Epifanio, I.; Ibá nez, M.V.; Simó, A. Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles. Am. Stat.
**2020**, 74, 169–183. [Google Scholar] [CrossRef] - Cutler, A.; Breiman, L. Archetypal Analysis. Technometrics
**1994**, 36, 338–347. [Google Scholar] [CrossRef] - Epifanio, I. h-plots for displaying nonmetric dissimilarity matrices. Stat. Anal. Data Min.
**2013**, 6, 136–143. [Google Scholar] [CrossRef][Green Version] - Maechler, M.; Rousseeuw, P.; Struyf, A.; Hubert, M.; Hornik, K. Cluster: Cluster Analysis Basics and Extensions. R package version 2.1.1. 2021. Available online: https://cran.r-project.org/web/packages/cluster/index.html (accessed on 22 January 2021).
- Dixon, J.K. Pattern Recognition with Partly Missing Data. IEEE Trans. Syst. Man, Cybern.
**1979**, 9, 617–621. [Google Scholar] [CrossRef] - Dietterich, T.G. Ensemble Methods in Machine Learning. In Proceedings of the First International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Springer: London, UK, 2000; pp. 1–15. [Google Scholar]
- Wilks, D. Statistical Methods in the Atmospheric Sciences; Academic Press: Cambridge, MA, USA, 2006. [Google Scholar]
- NCAR—Research Applications Laboratory. Verification: Weather Forecast Verification Utilities. R Package Version 1.42. 2015. Available online: https://rdrr.io/cran/verification/ (accessed on 22 January 2021).
- Weihs, C.; Ligges, U.; Luebke, K.; Raabe, N. klaR Analyzing German Business Cycles. Data Analysis and Decision Support; Springer: Berlin/Heidelberg, Germany, 2005; pp. 335–343. [Google Scholar]
- Vinué, G.; Epifanio, I. Robust archetypoids for anomaly detection in big functional data. Adv. Data Anal. Classif.
**2020**, 1–26. [Google Scholar] [CrossRef] - Hassan, S.; Syed, Z. From netflix to heart attacks: Collaborative filtering in medical datasets. In Proceedings of the ACM International Health Informatics Symposium. ACM, Arlington, VA, USA, 11–12 November 2010; pp. 128–134. [Google Scholar]
- Cabero, I.; Epifanio, I.; Piérola, A.; Ballester, A. Archetype analysis: A new subspace outlier detection approach. Knowl.-Based Syst.
**2021**, 217, 106830. [Google Scholar] [CrossRef] - Mørup, M.; Hansen, L.K. Archetypal analysis for machine learning and data mining. Neurocomputing
**2012**, 80, 54–63. [Google Scholar] [CrossRef] - Chen, Y.; Mairal, J.; Harchaoui, Z. Fast and Robust Archetypal Analysis for Representation Learning. In Proceedings of the CVPR 2014—IEEE Conference on Computer Vision & Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1478–1485. [Google Scholar]
- Bauckhage, C.; Kersting, K.; Hoppe, F.; Thurau, C. Archetypal analysis as an autoencoder. In Proceedings of the Workshop New Challenges in Neural Computation, Aachen, Germany, 7–10 October 2015; pp. 8–15. [Google Scholar]
- Mair, S.; Boubekki, A.; Brefeld, U. Frame-based data factorizations. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2305–2313. [Google Scholar]
- Shahbazi, Z.; Hazra, D.; Park, S.; Byun, Y.C. Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches. Symmetry
**2020**, 12, 1566. [Google Scholar] [CrossRef] - Zhang, Z.P.; Kudo, Y.; Murai, T.; Ren, Y.G. Enhancing Recommendation Accuracy of Item-Based Collaborative Filtering via Item-Variance Weighting. Appl. Sci.
**2019**, 9, 1928. [Google Scholar] [CrossRef][Green Version] - Sun, M.; Min, T.; Zang, T.; Wang, Y. CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction. Processes
**2019**, 7, 265. [Google Scholar] [CrossRef][Green Version]

**Table 1.**The variables are sampled independently from the following distributions. Tria(a, c, b) stands for the triangular distribution, with values in $[a,b]$ and mode in c; ceiling returns the smallest integer not less than the corresponding element; and round rounds the values.

Variables | Scenario 1 | Scenario 2 |
---|---|---|

FL | Tria(245, 259.6, 277) | Tria(245, 259.6, 277) |

pref | 42 − ceiling((FL + 10) × 3/20) | 42 − ceiling((FL + 10) × 3/20) |

M1 | pref + round(Tria(−1, 0, 1)) | round(Tria(40.5, 42, 43.5)) |

M2 | $M1$ + round(Tria(−2, 0, 2)) | round($M1$ + Tria(−1.5, 0, 1.5)) |

M3 | $M1$ + round(Tria(−1, 1, 1.5)) | round($M1$ + Tria(−1, 1, 1.5)) |

M4 | 42 + pref + round(Tria(−1, 0, 1)) | 42 + pref + round(Tria(−1, 0, 1)) |

Methods | Implementation |
---|---|

POLR | polr and extractAIC from MASS [16] with d.p. |

POLR-MICE | mice from MICE [17] |

ClassRF | randomForest from randomForest [20] with d.p. |

CondRF | cforest form party [23,24] with d.p. |

UBCF | Recommender from recommenderlab [26] with method = “UBCF” and d.p. |

K-POD | kpod form kpodclustr [27] with k = 3 with d.p. |

AACMDS/AAHP | daisy from cluster [31] (missing dissimilarities are replaced by 10) with d.p. stepArchetypesRawData_ and norm_frob from adamethods [37] with k = 3 (missing values in archetypes are replaced by 42) and d.p. |

CO-methods | the implementation used in the respective method |

EN-methods | the implementation used in the respective method, with r = 4 |

**Table 3.**Mean and standard deviation, in brackets, of accuracy (percentage) over 10 simulations of the classifiers for the different models of Scenario 1 and their average. The maximum value in each column appears in bold.

Models | M1 | M2 | M3 | M4 | Average |
---|---|---|---|---|---|

POLR | 88.3 (0.03) | 66 (0.06) | 80.6 (0.05) | 88.9 (0.02) | 81.0 |

POLR-MICE | 83 (0.05) | 63.3 (0.06) | 79.1 (0.03) | 88.1 (0.03) | 78.4 |

CondRF | 81.4 (0.08) | 61.5 (0.1) | 79.1 (0.07) | 86.2 (0.03) | 77.0 |

ClassRF | 84.5 (0.04) | 59.3 (0.08) | 79.9 (0.04) | 85.5 (0.03) | 77.3 |

UBCF | 76.7 (0.08) | 63.4 (0.16) | 79 (0.06) | 72.2 (0.03) | 72.8 |

k-POD | 68.9 (0.18) | 60.6 (0.11) | 63.3 (0.2) | 71 (0.04) | 66.0 |

AAcMDS | 80.6 (0.04) | 42.2 (0.24) | 76.7 (0.05) | 76.6 (0.04) | 69.0 |

AAHP | 77.1 (0.08) | 45.2 (0.14) | 78.2 (0.06) | 74.2 (0.04) | 68.7 |

CO-POLR-UBCF | 87.6 (0.04) | 66.9 (0.08) | 80.9 (0.04) | 88.9 (0.02) | 81.1 |

CO-POLR-k-POD | 86.2 (0.04) | 65.7 (0.05) | 80 (0.02) | 88.2 (0.03) | 80.0 |

CO-POLR-AAcMDS | 86.8 (0.04) | 66.9 (0.08) | 82.1 (0.05) | 88.4 (0.03) | 81.0 |

CO-POLR-AAHP | 87.6 (0.03) | 69.1 (0.08) | 81 (0.03) | 88.5 (0.03) | 81.6 |

EN-POLR-UBCF | 89.1 (0.02) | 67.2 (0.08) | 81 (0.04) | 88.9 (0.02) | 81.5 |

EN-POLR-k-POD | 88.3 (0.03) | 66 (0.06) | 81.5 (0.04) | 88.9 (0.02) | 81.2 |

EN-POLR-AAcMDS | 88.6 (0.02) | 66.4 (0.07) | 79.3 (0.06) | 88.8 (0.03) | 80.8 |

EN-POLR-AAHP | 88.5 (0.02) | 66 (0.06) | 80.9 (0.04) | 88.9 (0.02) | 81.1 |

**Table 4.**Mean and standard deviation, in brackets, of accuracy (percentage) over 10 simulations of the classifiers for the different models of Scenario 2 and their average. The maximum value in each column appears in bold.

Models | M1 | M2 | M3 | M4 | Average |
---|---|---|---|---|---|

POLR | 55.9 (0.05) | 37.6 (0.12) | 51.3 (0.05) | 88.8 (0.02) | 58.4 |

POLR-MICE | 52.6 (0.09) | 48.2 (0.07) | 54.2 (0.06) | 88.1 (0.03) | 60.8 |

CondRF | 54.4 (0.07) | 42.7 (0.08) | 52.5 (0.08) | 86.7 (0.03) | 59.1 |

ClassRF | 51.8 (0.1) | 43.3 (0.05) | 51.6 (0.07) | 85.8 (0.03) | 58.1 |

UBCF | 53.8 (0.07) | 46.3 (0.1) | 55.8 (0.08) | 41.2 (0.05) | 49.3 |

k-POD | 56.1 (0.09) | 40.4 (0.1) | 54.2 (0.15) | 49.7 (0.12) | 50.1 |

AAcMDS | 58 (0.05) | 53.1 (0.07) | 50.6 (0.11) | 48.3 (0.09) | 52.5 |

AAHP | 58 (0.05) | 52.1 (0.08) | 49 (0.11) | 51.6 (0.07) | 52.7 |

CO-POLR-UBCF | 60.3 (0.09) | 51.9 (0.09) | 61.2 (0.08) | 88.6 (0.02) | 65.5 |

CO-POLR-k-POD | 60.8 (0.1) | 55.8 (0.08) | 62.4 (0.07) | 88.7 (0.02) | 66.9 |

CO-POLR-AAcMDS | 57.7 (0.11) | 51.2 (0.08) | 58.7 (0.08) | 88.7 (0.02) | 64.1 |

CO-POLR-AAHP | 58.3 (0.11) | 52.2 (0.09) | 60.3 (0.07) | 88.4 (0.02) | 64.8 |

EN-POLR-UBCF | 54.5 (0.09) | 45.4 (0.09) | 57.5 (0.08) | 88.8 (0.02) | 61.5 |

EN-POLR-k-POD | 58 (0.06) | 37.6 (0.12) | 58.7 (0.08) | 88.8 (0.02) | 60.8 |

EN-POLR-AAcMDS | 57.7 (0.06) | 52.1 (0.08) | 50 (0.1) | 88.8 (0.02) | 62.2 |

EN-POLR-AAHP | 58 (0.05) | 51.6 (0.07) | 50.3 (0.11) | 88.8 (0.02) | 62.2 |

**Table 5.**Accuracy (percentage) of the classifiers for the different models of shoes and their average for the real dataset. The maximum value in each column appears in bold. Underlined numbers indicate that LDA had to be used instead of POLR.

Models | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | Average |
---|---|---|---|---|---|---|---|---|---|

POLR | 37 | 58 | 85 | 59 | 44 | 54 | 83 | 50 | 58.8 |

POLR-MICE | 37 | 65 | 63 | 55 | 68 | 50 | 70 | 54 | 57.8 |

CondRF | 48 | 65 | 78 | 50 | 0 | 57 | 57 | 58 | 51.6 |

ClassRF | 44 | 88 | 70 | 64 | 60 | 54 | 74 | 54 | 63.5 |

UBCF | 70 | 81 | 67 | 73 | 52 | 46 | 78 | 65 | 66.5 |

k-POD | 63 | 81 | 74 | 64 | 48 | 54 | 74 | 46 | 63 |

AAcMDS | 74 | 77 | 74 | 68 | 76 | 57 | 87 | 62 | 71.9 |

AAHP | 74 | 77 | 78 | 64 | 68 | 50 | 78 | 62 | 68.9 |

CO-POLR-UBCF | 41 | 81 | 52 | 55 | 80 | 46 | 87 | 65 | 63.4 |

CO-POLR-k-POD | 52 | 81 | 56 | 41 | 84 | 50 | 83 | 62 | 63.6 |

CO-POLR-AAcMDS | 59 | 85 | 48 | 59 | 68 | 36 | 87 | 54 | 62 |

CO-POLR-AAHP | 56 | 77 | 56 | 55 | 52 | 39 | 87 | 58 | 60 |

EN-POLR-UBCF | 70 | 77 | 85 | 73 | 52 | 54 | 83 | 69 | 70.4 |

EN-POLR-k-POD | 56 | 81 | 85 | 64 | 52 | 57 | 83 | 50 | 66 |

EN-POLR-AAcMDS | 74 | 81 | 85 | 68 | 76 | 57 | 83 | 58 | 72.8 |

EN-POLR-AAHP | 74 | 81 | 85 | 64 | 68 | 54 | 83 | 62 | 71.4 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alcacer, A.; Epifanio, I.; Valero, J.; Ballester, A.
Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size. *Mathematics* **2021**, *9*, 771.
https://doi.org/10.3390/math9070771

**AMA Style**

Alcacer A, Epifanio I, Valero J, Ballester A.
Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size. *Mathematics*. 2021; 9(7):771.
https://doi.org/10.3390/math9070771

**Chicago/Turabian Style**

Alcacer, Aleix, Irene Epifanio, Jorge Valero, and Alfredo Ballester.
2021. "Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size" *Mathematics* 9, no. 7: 771.
https://doi.org/10.3390/math9070771