3.2. Running Event Recommendation Ontologies
The data and information from running event sources and factors influencing the participation in running events described in
Table 1 and
Table 2 respectively were used to model the ontologies for running event recommendation. The ontologies were developed in Protégé [
41] according to 7 steps of ontology development by Noy and McGuinness [
25]. We enriched the ontology structure proposed in our previous work [
24] to fulfill the needs of a recommendation system. This resulted in two types of ontologies: the
Running Event Ontology and the
User Profile Ontology, as shown in
Figure 2 and
Figure 3, respectively.
Figure 2 shows the structure and relationships of the
Running Event Ontology. It encompasses the information and factors related to running events. It’s composed of 7 classes, 8 object properties representing relationships between classes, and 33 data properties representing class attributes. The core class is
RunningEvent, which represents general information about running events. This class has various data properties such as
StandardOfEvent,
TypeOfEvent,
StartDate, and
EndDate, indicating the running event properties, including standard, event type, start date, and end date, respectively. The
RaceType class defines different running race categories, including
FunRun,
MiniMarathon,
HalfMarathon, and
Marathon. This class uses data properties such as
RaceDay,
StartTime, and
CutOffTime to specify the date, start time, and cut-off time for each race type, respectively. Furthermore, the
RunningEvent class is linked to
RaceType via the
hasRaceType object property, indicating the specific race types associated with a running event. The
hasEventVenue object property connects
RunningEvent to
RunningEventVenue, signifying the venue where events are hosted. All classes and relationships of the
Running Event Ontology are available at [
42].
Figure 3 illustrates the structure of the
User Profile Ontology, which captures diverse information about users, including personal details, factors influencing running event selection, and participation history. This ontology comprises 1 main class, 7 object properties representing relationships between classes, and 11 data properties representing class attributes. The main class is
User, which represents information related to running competitors. It includes personal information through data properties such as
UserAge,
UserSex, and
Nationality. Additionally, it details factors of interest in running competitions, including
StandardOfEventInterest,
ActivityAreaInterest,
StartPeriodInterest, and
RewardInterest. The User class also defines various relationships. For instance, the
hasRaceTypeInterest object property links the
User class to
RaceType, indicating the specific race types a user is interested in. Similarly, the
hasRunningEventHistory object property connects the User class to
RunningEvent, representing the user’s past participation in running events. All classes and relationships within the
User Profile Ontology are accessible at [
42].
To ensure the quality and effectiveness of our formally represented knowledge, the overall structure and organisation of the developed ontologies was evaluated. The evaluation was conducted with the experts, consisting of three experts in the field of ontology and two experts in the field of running events. These experts assessed the ontologies based on their expertise and knowledge in their respective domains, focusing on questions concerning the coherence and consistency of classes, object properties, and data properties. They also reviewed the representation of relationships between different classes and properties, as well as the ontology’s modularity and reusability. A Likert scale [
43] was employed to establish agreement levels for each question: Strongly Agree (5), Agree (4), Neutral (3), Disagree (2), and Strongly Disagree (1).
Table 3 presents the evaluation results. Both the
Running Event Ontology and the
Profile Ontology have appropriate structures, receiving average overall agreement levels of 3.6 and 3.68, respectively, indicating an “Agree” level. Following this evaluation, we incorporated expert recommendations and suggestions to refine and optimise the ontology structures for their use within the recommender system.
Furthermore, an information retrieval evaluation was conducted to validate the accuracy and precision of our developed ontology. The competency questions regarding the semantic in the ontology were formulated for this purposed [
26]. 17 competency questions, derived from running competitors’ expectations, were formulated to assess the semantic content of the ontology. An example of these competency questions is shown in
Table 4, with all questions and their analysis results available on the system website [
42]. To facilitate the information retrieval evaluation within the ontologies, all formulated questions were converted into SPARQL queries. The computation metrics of precision and recall was adopted from [
44] as
and
where TP is retrieved information and relevant to the question, FP is retrieved information but not relevant to the question and FN is the information cannot be retrieved but relevant to the question.
Out of 17 competency questions, only 3 yielded very low precision or recall, indicating areas for improvement. For instance, in question 2, SPARQL query broadly returned all running events containing “Marathon” word such as “Mini Marathon” and “Half Marathon”. This led to a higher number of FP than TP, resulting in a very low precision of 0.2. Similarly, considering question 3, SPARQL query failed to retrieve a running event that should have awarded a medal prize, causing a low recall of 0.6. Despite these specific instances, the overall results demonstrate high performance. The average precision across all questions was 0.91, and the average recall was 0.98. This confirms the ontology’s strong information retrieval capabilities, showcasing its high accuracy and effectiveness in retrieving relevant information.
3.4. Recommendation Prototype Development
All functionalities for the recommendation system were implemented as web services, to provide standard information exchange mechanisms. Six services were implemented using Spring Boot 3.1.7 [
50] as shown in
Table 9. They cooperate with a Recommendation Engine to perform complex calculations. This engine utilises Apache Jena APIs, a Java library, to establish connections between the application and the ontologies with the recommendation rules. Jena Rule Engines were implemented to extract these rules, thereby facilitating the inference process from classes and instances encoded in Web Ontology Language (OWL) and RDF Schema (RDFS) formats.
To facilitate the recommendation mechanism, we developed a web application enabling user interfaces for inputting data and visualising recommended information.
Figure 5 illustrates an example of this web application at the presentation layer.
Figure 5a displays the user profile creation screen, which comprises three sections: (1) Personal information, (2) Running event history which collects the past running events and the type of running events that users used to participated in, (3) Factors of interest that provides 10 factors influencing users to participate in a running event including location (as in district), race type, type of event, price, orgainzation, activity area, standard of event, level of event, start period and reward.
After the user profile is created, the Recommendation Engine utilises the Jena API’s inference rule function to apply semantic rules to the users’ instance. This process generates and returns a list of potential running events via the web services. The results are then displayed on the web application screen, as shown in
Figure 5b, presenting a list of recommended running events along writh their confidence scores derived from the recommendation rules.
Figure 5c further enhances this by showing detailed information for a selected recommended event, highlighting the matching factors between the user’s preferences and the event’s attributes.
3.5. Prototype Validation and Evaluation
To validate the accuracy and correctness of the developed prototype, the realistic scenarios from 10 interviewed running competitors were conducted from data source No. 1 in
Table 1. For each competitor, we created a profile based on their chosen factors and event participation history. We also designed a set of running event recommendations that reflected what these selected competitors expected. These expected event listings then were compared against the event recommendations generated by the prototype from the corresponding user profiles. For evaluation purposes, Precision, Recall, and F-measure metrics were used. The computation were adapted from [
44] as
,
and
. The factors and event participation history of the 10 competitors for the evaluation are shown in
Table 10.
The evaluation was conducted by comparing the event recommendations generated by the prototype with the expected results anticipated by the running competitors.
Figure 6 presents the evaluation results of the prototype. The results show that the prototype’s recommendations are consistent with the anticipated results, with an average precision of 83%, an average recall of 95%, and a resulting average F-measure of 86%. We discovered that the prototype result perfectly matched the expected data (100% of precision and recall), indicating high reliability in R6, R8, and R10. However, we have discovered that in some cases, the number of running events recommended by the prototype is more than the expected result (running competitor). For instance, in R2, only 7 of 13 recommended running events by the prototype were matched with the designed expected events. This leads the precision value to be low (54%). This is because the recommendation rules of the prototype predicted the factors arising from the association rules that possibly have the opportunity to be selected by running competitors in the future. This, therefore, resulted in additional running events being able to be considered as interesting choices for the competitors. Furthermore, considering R5, all 4 recommended running events by the prototype were matched with the 5 of expected running events. This results in low recall (80%). Similarly, for R7, the expected event count exceeded the prototype’s recommendations by two. This limitation is caused by the prototype’s recommendation rules, which did not adequately cover less frequent factors like charity-type running events that did not frequently appear in the most famous association rules. To improve this, we need to enrich user preferences by asking about their satisfaction with recommendation diversity and coverage of their interests.
Furthermore, to ensure that the recommendation rules represented genuine behavioral patterns, we compared the prototype performance using 83 initial rules versus 70 validated rules through permutation testing, as illustrated in
Figure 7. As shown in
Figure 7a, refining the rule set significantly improved F-measure scores for several competitors, most notably R3 (increasing from 50% to 75%), R4 (57% to 71%), and R5 (50% to 89%). Performance for the remaining competitors stayed consistently high, with R6, R8, and R10 achieving perfect 100% scores. Considering the aggregate metrics of
Figure 7b, the validation process substantially increased average precision from 72% to 83%. This indicates that the refined rules effectively filter irrelevant recommendations without reducing recall, which remained stable at 95%. Consequently, the overall F-measure, representing the harmonic mean of precision and recall, improved from 78% to 86%. This can confirm that the refined rule set provides a more accurate and robust representation of competitor behaviour.
Finally, the prototype performance was evaluated.
Figure 8 demonstrates the evaluation results measuring the semantic reasoning execution time against input (factor parameters) and output (recommended events) complexity across 10 distinct user profiles. To ensure accuracy, we ran the experiment three times for each user profile and calculated the average execution time (seconds). The overall execution time for semantic inference through the Jena engine was 0.382 s (SD = 0.043s), with the maximum reasoning time of 0.446s. Furthermore, a correlation analysis reveals that this computation time is independent of the user’s input and output complexity. For example, R3 submitted only 4 parameters, and the system processed this in 0.316 s. Conversely, R2 and R4 submitted a highly complex profile with 7 parameters; the system processed their requests in 0.394 and 0.344 s, respectively. Similarly, when retrieving a high set of only 2 events for R6, the system processed 0.383 s. Compared with 13 event retrieval for R2, the system processed in the same time (0.394 s). The result confirms that our proposed framework is highly optimised for real-time applications.
3.6. Lesson Learned and Discussion
3.6.1. Comparative Baseline Evaluation
To empirically validate the contribution of the proposed framework, a comparative evaluation was conducted against two baselines: a popularity-based method and a pure association rule-based method (Apriori). We discovered that the popularity-based baseline successfully achieved 100% recommendation coverage, as it consistently resulted in the highest-frequency events (e.g., standard morning marathons) for every query. However, it is limited to providing the personalisation required for niche sport tourism.
On the other hand, the pure rule-based approach (Apriori only) demonstrated high precision for the majority of behaviours. For example, it effectively recommended events for typical competitors preferring “morning” start periods and “international” level events, as these features frequently co-occurred and surpassed the 0.20 minimum support threshold. However, this method shows a weakness regarding niche sport tourists. Due to frequency bias, minority rules were pruned during mining to prevent overfitting. This results in a significant drop in coverage for edge-case users.
Our proposed framework outperformed both baselines by effectively utilising semantic reasoning to resolve the trade-off between personalisation and coverage. For standard profiles, it used the high-confidence Apriori rules to maintain baseline precision. In niche scenarios, the proposed framework invoked an ontology-based knowledge base. For instance, the semantic architecture links the Running Event class to the Race Type class (and its subclases, e.g., Marathon, Fun Run via the hasRaceType object property. Finally, the Jena rules successfully executed structural and semantic matching between the user’s profile and the event characteristics, ensuring continuous recommendation coverage.
3.6.2. Comparison with Existing KB Approaches
This section highlights the benefits of our proposed approach over benchmark and other KB methods, evaluated across four criteria, including Method, ML Type, Rule Type, and Context-Awareness, as demonstrated in
Table 11. Firstly, knowledge-based ontologies help to reduce data redundancy and identify specific contextual needs of tourists. In addition, RBOs strengthen knowledge representation by integrating rule-based reasoning for inference and decision-making. This significantly enhances the accuracy and explainability RS through explicitly encoded decision logic. When applied to Context-Aware systems, RBOs provide granular context modeling, dynamic adaptation, and deep personalisation [
18,
20,
22]. Our RS evaluation has confirmed that the proposed approach delivers highly tailored recommendations that precisely align with user preferences. In contrast to the work of [
19], where recommendations were derived through ontology queries (SPARQL) without explicit rules, our approach avoids limitations in precisely detecting and comprehending user context due to a lack of comprehensive user preference analysis.
Furthermore, the adoption of ML offers a significant advantage by organizing and categorizing vast amounts of domain (e.g., Online shopping [
18] and sport event [
19]) and user data. The application of unsupervised clustering groups running competitors based on their event participation factors, which is similar to [
19]. Together with the Apriori algorithm, this results in the recommendation rule specifically for more personalised and appropriate running event recommendations to be discovered. We also found that these rules reveal hidden, meaningful relationships among factors influencing participation. This approach overcomes the limitations of deriving SWRL rules solely from expert knowledge, domain policies, or regulations (in [
22]), which often fail to uncover unforeseen patterns and correlations in user behavior or contextual dynamics. Furthermore, unlike supervised learning methods such as Neuro-Fuzzy classification (in [
18]), ZeroR and OneR (in [
51]), our unsupervised approach avoids the challenges of time-consuming and expertise-demanding sentiment data labelling and computationally expensive training on large datasets.
Finally, the prototype’s Jena rules, derived from association rules, enhance the semantic interpretation of statistically significant patterns. We have also discovered that converting the association rules to Jena inference rules bridges the gap between data mining and the semantic web. As the Jena engine can operate on ontologies, it can infer semantically new knowledge. Since the class
User in
User Profile Ontology consists of interest factors such as race type, location interest, event price, and reward that correspond to the event properties of the
Running Event and
Race Type class defined in
Running Event Ontology, Jena rules can automatically tag a race as a match for a specific user, as demonstrated in
Table 8. Furthermore, Jena’s built-in RDFS reasoner automatically understands that any rules applying to the class also apply to a sub-class such as class
Race Type, to sub-class
Marathon. The Jana inference engine also facilitates real-time personalisation. As the engine is integrated into the application-logic layer, rules are triggered instantly whenever the user adds or updates their profile and preferences. This allows the recommendation list to refresh without costly re-scanning the entire database. Consistently, the framework is both data-driven (via Apriori algorithm) and context-aware (via Jana). Evaluations with real running competitor scenarios confirm that the recommended events and information are accurate and appropriate. This validates that the proposed framework effectively recommends running events aligning with sports tourists’ interests.
3.6.3. Methodological Limitations and Robustness Analysis
However, our proposed framework has limitations. Firstly, the COVID-19 outspread situation caused a considerable decrease in the number of running events between 2019 and 2021. This resulted in little available data and information regarding running events and competitors in between those years to be collected and processed in the research experiment. Secondly, the prototype evaluation of this research was conducted based on running competitor scenarios. It needs to be evaluated by experts and target running competitors to make the recommendation rules more accurate.
Although a dataset significantly impacted by the COVID-19 event reduction, the resulting association rules remain strong, consistent, and predictive patterns. This is because they were generated across factor attributes rather than specific event IDs under strict thresholds (min supp. = 0.2 and a min conf. = 0.8). To ensure the reliability of these findings despite the small sample size, a sensitivity analysis was conducted. We evaluated the stability of the extracted rules by incrementally and decrementally varying the parameters. The results demonstrated remarkable stability, as an increase in the minimum confidence to 0.85 resulted in zero reduction in the number of generated rules (100% retention). This indicates that the condition probability of these underlying associations inherently exceeds 85%. Furthermore, tightening the minimum support to 0.25 retained approximately 73.4% of the baseline rules. This confirms that these behavioural patterns are pervasive across the cohort rather than marginal outliers. Conversely, relaxing the constraints to a minimum support of 0.15 and confidence of 0.75 resulted in only a nominal 7.6% increase in total rules. This high degree of stability confirms that the extracted patterns capture genuine behavioral dependencies rather than random artifacts of threshold manipulation.
However, in the case of a future sustained reduction in marathon events, this data sparsity may reduce the diversity and coverage of niche rules. In a sparse environment, the reliance on a high minimum support threshold (0.2) creates a frequency bias. Rare but significant occurrences, such as charity-type marathons, statistically fail to meet the required frequency. This directly impacts the recall of niche rules (e.g., for R5), where the model essentially overfits to the most frequent patterns. Although these frequent patterns offer high confidence, they often suffer from low informational Lift, providing redundant information rather than novel insights. This can be addressed by potentially reducing the minimum support threshold in conjunction with the use of metrics, including Lift and Conviction, to evaluate the highlight rules that contain the stronger relationships, prioritising statistical significance over simple frequency requirements.
Lastly, in terms of the system scalability and computational implications, the proposed framework is primarily determined by the computational requirements of two distinct stages: offline rule extraction and online semantic reasoning. Since the performance of the Apriori algorithm possesses a theoretical worst-case time complexity of
as discussed in [
28,
52], where
d denotes the number of distinct items or attributes in the dataset, increasing the number of users or events in the future can become computationally expensive. However, this process is executed as an offline batch operation, which does not affect the real-time latency of the recommendation delivery. In contrast, the online stage relies on the Jena inference engine to provide personalised recommendations. Preliminary observations confirm that, for a standard user profile, reasoning times remain within acceptable thresholds for the interactive real-time applications. However, further empirical benchmarking with high-concurrency loads remains a focus for future optimisation.