Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning

Proaño-Ríos, Verónica; González-Ibáñez, Roberto

doi:10.3390/data5040092

Open AccessData Descriptor

Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning

by

Verónica Proaño-Ríos

^1,2,*

and

Roberto González-Ibáñez

¹

Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Avenida Ecuador #3659, Santiago 9170124, Chile

²

Departamento de Ciencias Exactas, Universidad de las Fuerzas Armadas—ESPE, Av. General Rumiñahui s/n, Sangolquí 171103, Ecuador

^*

Author to whom correspondence should be addressed.

Data 2020, 5(4), 92; https://doi.org/10.3390/data5040092

Submission received: 5 August 2020 / Revised: 16 September 2020 / Accepted: 18 September 2020 / Published: 27 September 2020

(This article belongs to the Special Issue Big Data and E-learning)

Download

Browse Figures

Versions Notes

Abstract

In this article, we introduce a dataset of curated learning paths (LPs) to support search as learning. LPs were obtained through an online survey delivered to experts in different domains. Data were then analyzed and described in terms of a set of variables. The resulting dataset comprised 83 LPs, each containing three web pages, for an overall collection consisting of 249 documents. The dataset is intended to provide information scientists, education researchers, and industry professionals, who provide information services in educational contexts, a valuable resource to (i) investigate patterns in the order of LPs, (ii) improve ranking models and/or re-ranking methods, (iii) explain the structure of the recommended LPs, and (iv) investigate alternative approaches to display search results based on the features of LPs.

Dataset: The dataset is available on the Mendeley Data repository. Doi: http://dx.doi.org/10.17632/nvk6p4xp5c.1.

Dataset License: The license under which the dataset is made available is CC-BY 4.0.

Keywords:

data mining; expert recommendations; learning object (LO); learning path (LP); search as learning (SAL); search engine results; Spanish dataset

1. Summary

This article describes a novel dataset comprising 83 learning paths (LPs) curated by experts. Each LP in the dataset includes a list of three sequential web pages, experts’ demographic information, experts’ judgment and reasoning to include the selected sources and their order, LP extension, and content type. The uses of the dataset includes, but are not limited to, research and development in areas such as information science, education, information retrieval, linguistics, and industry. The dataset is available through the Mendeley Data repository.

The remaining sections of this article are structured as follows. First, Section 2 provides background information and rationale. Section 3 offers a detailed description of the files included in the dataset, definitions of variables, and data distribution. Then, Section 4 describes the methodology and instruments used to collect the data. Finally, Section 5 comprises conclusions and applications for future research. Additionally, the Spanish version of the questionnaire is provided as Supplementary Materials, and the Python script used to extract general features in the dataset is presented in Appendix A.

2. Background and Rationale

The concept of LPs—defined as a finite and organized sequence of learning objects (LOs)—can be linked to Vannevar Bush’s notions of trails [1]. When trails are situated in learning contexts, a fundamental problem, known as curriculum sequencing, can be used to find the optimal sequence to maximize learning outcomes, which is considered as NP-hard [2,3] (i.e., problems that may not be decidable).

More than 50 years after Bush’s influential piece, the impact of technology in education has been prolific. In fact, different tools and resources have been developed, which include learning management systems (LMS) (e.g., Moodle) and massive online open courses (MOOCs), among others. Although different in nature, a common characteristic of both is that the content they provide is the same for all users, despite their prior knowledge or learning style. To address this problem, some have focused on personalization within this type of platforms [4,5]; however, in spite of the efforts to pursue this goal, statistics indicate that 83% of students use search engines to meet their immediate information needs [6,7].

Current search technology is based on years of research and development on information retrieval and information science. Popular alternatives such as Google and Bing provide rapid access to a vast amount of content on the Web. Although there have been numerous advances in search systems, these still face major challenges when it comes to understanding searchers’ complex information needs [8]. Search technologies exploit a wide range of features in the retrieval and ranking process, the latter being critical in the phase of organizing results in terms of their relevancy to users’ needs. Unfortunately, most approaches mainly rely on topical relevance [9,10] with limited coverage of other manifestations of relevance such as cognitive, affective, and situational [11,12], which may be critical in learning scenarios.

Despite the active role of search technologies in educational settings, these were not designed to support complex learning processes [13,14]. This is evidenced by: (i) a mismatch between search engine results’ presentation style and how people learn under high levels of uncertainty [15,16]; (ii) students’ attitudes and behaviors toward information search [17,18]; and (iii) low levels of information literacy of students [19,20]. Thus, it is fundamental to investigate alternative solutions to enhance learning outcomes as part of online search (i.e., search as learning—SAL). Furthermore, this need becomes even more urgent as the current pandemic (COVID-19) evolves, since a large number of people worldwide are turning to online resources found through search engines to support their learning processes. In this case, it is fundamental to find suitable approaches (such as LPs) to better support learning in the context of searching for information on the Web.

Regardless of numerous efforts to build LPs, to the best of our knowledge, no research has been conducted to study the effects of search results presented as LPs—based on expert knowledge—on learning outcomes. Therefore, we have seen the need to build a dataset of LPs which could be used by information scientists, education researchers, and industry professionals that offer services related to information seeking and retrieval for educational purposes. The dataset will allow them to conduct studies with numerous applications in textual data extraction. For example, in research contexts, the dataset could be used to extract features to: (i) identify patterns in the sorted LOs recommended by experts, (ii) improve ranking models and/or re-ranking algorithms to fulfill immediate learning needs, (iii) explain the order of documents within LPs, and (iv) investigate alternative approaches to display search results based on the features of LPs. Conversely, in educational settings, the dataset could be used as a learning resource or to further investigate teaching and learning strategies of complex topics. For doing so, we asked experts on specific topics and domains to provide a sequence of three web pages (mostly based on text) that can be used to guide the learning process of incoming college students who know little to nothing about a selected topic. We considered two criteria to classify an individual as an expert in a specific area: knowledge and experience time. For this particular case, we considered any person with a bachelor’s degree or higher and at least one year of experience in a specific topic to be an expert. The survey was specially directed, but not limited to, professors and researchers. Due to the geographical location of the research group carrying out the present study and the native language of the environment, the dataset was built containing mainly LOs in Spanish. It is also worth noting that Spanish is the third most used language on the Internet [21].

3. Data Description

We invited several experts from Hispanic countries to participate in an online survey. We obtained answers from seven different countries in six domains (i.e., computer science, physics, finances, laws, biology, and industrial engineering), as shown in Figure 1.

3.1. Files

We provided two files in the repository:

A comma-separated value (CSV) file with all the data in Spanish. The file name was LP_dataset_spanish_version.csv.
A copy of the previous CSV file (LP_dataset_english_version.csv) with categorical data and variable names translated into English in order to facilitate analyses for English-speaking researchers.

In this article, we addressed the English version.

3.2. Features

Table 1 describes the features available in the dataset. The features were organized as follows:

The first fourteen features corresponded to demographic information provided by survey respondents.
The following twelve variables described the LP, considering three sorted LOs and the description of the selection criteria provided by the experts.
The last three characteristics were general features that we extracted from the recommended LPs—to facilitate the classification process—which are described in the following section.

3.3. Data Distribution

The dataset consisted of 249 LOs organized in 83 LPs recommended by experts from Argentina, Chile, Colombia, Ecuador, Mexico, Spain, and Venezuela. Table 2 summarizes demographic data by domain, level of education, and sex. As shown in Figure 2, 81.93% of the experts belonged to higher level education institutions and 18.07% belonged to research centers. In addition, 81.93% of the respondents were men and the remaining 18.07% were women, with their age distribution shown in Figure 3. Respondents were experts belonging to six different domains: biology, computer science, finances, industrial engineering, laws, and physics—63.85% of them had a doctoral degree, 19.28% had a master’s degree, and the remaining 16.87% had a bachelor’s degree. Figure 4 shows this distribution identifying two groups: students and faculty. Finally, Figure 5 shows a brief distribution of the collected data in relation to the last three extracted characteristics: 91.57% of the documents were in Spanish (the remaining were in English), 89.16% were in text, and 67.47% were short.

4. Methods

In order to study the various aspects introduced in Section 1, we had to create a dataset to consider the aspects detailed in Table 3. These guidelines were based on the literature on searching as learning and information seeking introduced in Section 2. To carry out the creation of the dataset, we followed a method based on expert judgment, which is widely used in fields such as education (e.g., [22,23]), research (e.g., [24,25]), and industry (e.g., [26,27,28,29]), among others.

Based on the guidelines shown in Table 3, we designed a semi-structured questionnaire, which was implemented using Google Forms. The application of the questionnaire was targeted to experts in six different domains. To select domains, we first identified top searched domains on the Internet [31]. After that, we selected six domains: computer science, finances, laws, biology, industrial engineering, and physics.

In order to define specific subjects in each domain, we first identified two experts per domain. More specifically, we located 12 faculty members from different universities and countries. Once the experts were identified, an appointment was made with each one. Every one of them was asked to suggest a topic of interest for society and formulate a general question related to it. Several interviews were scheduled with each expert until the structure of the questions and the language used were fine-tuned in order to be appropriate for students who have no prior knowledge of the subject. Once the questions were defined, we requested the assistance of an expert in formulating questions in educational contexts, with the purpose of validating if they were properly posed.

Once the validation process was completed, an online survey was designed and the study was presented to the Institutional Ethics Committee of the Universidad de Santiago de Chile. The research protocol for this project was approved on 16 April 2019 (Ethical Report No. 160.2019).

The overall questionnaire consisted of 23 items including 2 agreement questions, 11 closed-ended demographic questions, and 10 open-ended questions. The online survey was tested on a pilot study by 32 members of the InTeracTion (http://www.interaction-lab.info) research group. Data collection was carried out in three stages:

In the first stage, prestigious universities, research centers, and industries of Spanish-speaking countries in each of the six domains of interest were identified.
In the second stage, we created a list including faculty members, researchers, and professionals whose institutional email was available.
In the third stage, invitations were sent out to the experts via an email to participate in the online survey. In addition, the experts were asked to share the survey with senior students (with at least a bachelor’s degree) who are proficient in the subject.

In the online survey, each expert was first required to fill in a demographic questionnaire. Second, experts selected a topic according to his or her field of expertise (Table 4). Third, we asked experts to provide a sorted sequence of three web pages (mostly based on text) that can be used to guide the learning process of students who know little or nothing about the selected topic. A restriction indicated for this task was that all three selected documents should be readable in a time span of 20 min (maximum). Finally, the experts were asked to describe their selection criteria.

We invited faculty members, researchers, and senior students (with at least a bachelor’s degree) from prestigious universities and research groups of Spanish-speaking countries to complete the online survey. The survey was available from 25 May 2019 to 31 January 2020.

In the 10 Hispanic countries that were invited to participate in the online survey, 3717 experts were enrolled in a university, research group, and/or industry; 109 experts completed the survey, for a response rate of 2.93%.

Collected raw data were filtered to eliminate observations containing broken URLs, duplicated URLs within a single register, or inconsistent data. Twenty-six observations were discarded during this process. In order to guarantee the selection criteria of the dataset, three variables were created using the Python script shown in Appendix A. The variables were the following:

LP document’s extension: This allows to identify if a LP document is short or long. For this purpose, we counted the number of words in each document of the LP. If the overall number of words was 4000 or less, the LP was classified as short. Otherwise, it was considered to be long. This decision was supported by the fact that the average reading rate is 200 words per minute for comprehensive reading tasks in the reader’s native language [32].
Document language: This allows to identify if the LP documents are in Spanish or English.
Document type: This allows to identify if the content of documents is mostly based on text or multimedia (i.e., audio and/or video).

To make it easier to identify the LOs, these were linked to unique identifiers (ID_LO) according to the following template DNNO, composed of four digits:

D: The first digit indicates the domain: (1) computer science, (2) finances, (3) industry, (4) physics, (5) laws, and (6) biology.
NN: The two digits in the middle correspond to a sequential number for each domain. Note that this number does not indicate ranking or any other ordering criteria.
O: The last digit indicates whether the LO is at (1) the beginning, (2) the middle, or (3) the end of the LP.

For example: the document with ID 4032 belongs to the physics domain (4) and it is in the middle of the LP (2).

Actual web documents were not included due to potential copyrights infringement. Access to actual documents will be provided upon request in case these are no longer available through the URL in the dataset.

5. Conclusions

In this article, we introduced a dataset of curated LPs. We provided detailed descriptions of the dataset structure, definitions of variables, data distribution, and methodological approach. Our dataset constitutes a valuable resource for researchers and educators dealing with problems related to information search and learning.

The dataset responds to current issues identified in the literature. First, the lack of curated search results linked to learning goals. Second, the current presentation style of search results implemented by popular search engines. Third, common students’ attitudes toward learning complex topics using Internet resources. Fourth, the fact that most content on the Web is text-based. Finally, the lack of datasets in Spanish.

The proposed dataset has different types of applications. First, researchers on information science and education could investigate the effects of LPs on students’ learning on a given topic. Second, researchers could use the dataset to find patterns that could be applied in the improvement of ranking algorithms, explain the order of documents, and investigate novel approaches to display search results in learning contexts. The dataset could also be used to investigate teaching–learning strategies of complex topics.

Finally, to the best of our knowledge, this is the first open dataset containing curated learning paths in Spanish. While relatively small compared to datasets in other domains, the methodological approach provided in this article can be followed by other researchers to further extend the current dataset with other topics and languages.

Supplementary Materials

The following supplemental data are available online at https://www.mdpi.com/2306-5729/5/4/92/s1, Questionnaire S1: Spanish-version questions asked in the online survey.

Author Contributions

Conceptualization, V.P.-R. and R.G.-I.; methodology, V.P.-R. and R.G.-I.; software, V.P.-R.; validation, V.P.-R.; formal analysis, V.P.-R.; investigation, V.P.-R.; data curation, V.P.-R.; writing—original draft preparation, V.P.-R.; writing—review and editing, V.P.-R. and R.G.-I.; supervision, R.G.-I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Secretaría Nacional de Educación Superior, Ciencia, Tecnología e Innovación (SENESCYT) as a part of the Programa de Becas Convocatoria Abierta 2014—Primera Fase; and research grant FONDECYT Regular #1201610 funded by the National Agency for Research and Development (ANID).

Acknowledgments

The authors would like to thank all the experts for their time and valuable contribution to data collection.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The following Python script was used to extract general features from the dataset.

Figure A1. Python script used to extract general features from the dataset.

References

Bush, V. As we may think. Atl. Mon. 1945, 176, 101–108. [Google Scholar]
Acampora, G.; Gaeta, M.; Loia, V. Hierarchical optimization of personalized experiences for e-Learning systems through evolutionary models. Neural Comput. Applic. 2011, 20, 641–657. [Google Scholar] [CrossRef]
Al-Muhaideb, S.; Menai, M.E.B. Evolutionary computation approaches to the Curriculum Sequencing problem. Nat. Comput. 2011, 10, 891–920. [Google Scholar] [CrossRef]
Caputi, V.; Garrido, A. Student-oriented planning of e-learning contents for Moodle. J. Netw. Comput. Appl. 2015, 53, 115–127. [Google Scholar] [CrossRef]
Dwivedi, P.; Kant, V.; Bharadwaj, K.K. Learning path recommendation based on modified variable length genetic algorithm. Educ. Inf. Technol. 2018, 23, 819–836. [Google Scholar] [CrossRef]
Byrne, J.; Kardefelt-Winther, D.; Livingstone, S.; Stoilova, M. Global Kids Online Research Synthesis, 2015–2016; UNICEF Office of Research—Innocenti and London School of Economics and Political Science: London, UK, 2016; pp. 1–75. [Google Scholar]
Livingstone, S.; Kardefelt-Winther, D.; Saeed, M. Global Kids Online Comparative Report 2019; UNICEF Office of Research—Innocenti and London School of Economics and Political Science: London, UK, 2019; pp. 1–135. [Google Scholar]
Rieh, S.Y.; Collins-Thompson, K.; Hansen, P.; Lee, H.-J. Towards searching as a learning process: A review of current perspectives and future directions. J. Inf. Sci. 2016, 42, 19–34. [Google Scholar] [CrossRef]
Saracevic, T. Relevance reconsidered. In Proceedings of the Second Conference on Conceptions of Library and Information Science (CoLIS 2), Copenhagen, Denmark, 13–16 October 1996; ACM: New York, NY, USA, 1996; pp. 201–218. [Google Scholar]
Saracevic, T. Why is relevance still the basic notion in information science. In Proceedings of the Re: Inventing Information Science in the Networked Society and Proceedings of the 14th International Symposium on Information Science (ISI 2015), Zadar, Croatia, 19–21 May 2015; pp. 26–35. [Google Scholar]
Nahl, D.; Tenopir, C. Affective and cognitive searching behavior of novice end-users of a full-text database. J. Am. Soc. Inf. Sci. 1996, 47, 276–286. [Google Scholar] [CrossRef]
Nahl, D.; Bilal, D. Information and Emotion: The Emergent Affective Paradigm in Information Behavior Research and Theory; American Society for Information Science and Technology: Silver Spring, MD, USA; Information Today, Inc.: Medford, NJ, USA, 2007; ISBN 978-1-57387-310-9. [Google Scholar]
Farrell, R.G.; Liburd, S.D.; Thomas, J.C. Dynamic assembly of learning objects. In Proceedings of the WWW—ACM, New York, NY, USA, 17–22 May 2004; Association for Computing Machinery: New York, NY, USA, 2004; pp. 162–169. [Google Scholar]
Syed, R.; Collins-Thompson, K. Optimizing Search Results for Educational Goals: Incorporating Keyword Density as a Retrieval Objective. In Proceedings of the SAL@SIGIR, Pisa, Italy, 17–21 July 2016. [Google Scholar]
Hearst, M. Search User Interfaces; Cambridge University Press: New York, NY, USA, 2009; ISBN 978-0-521-11379-3. [Google Scholar]
Marchionini, G. Exploratory search: From finding to understanding. ACM 2006, 49, 41–46. [Google Scholar] [CrossRef]
Large, A.; Nesset, V.; Beheshti, J. Children as information seekers: What researchers tell us. New Rev. Child. Lit. Librariansh. 2008, 14, 121–140. [Google Scholar] [CrossRef]
Rieh, S.Y.; Kim, Y.-M.; Markey, K. Amount of invested mental effort (AIME) in online searching. Inf. Process. Manag. 2012, 48, 1136–1150. [Google Scholar] [CrossRef][Green Version]
Graham, L.; Metaxas, P.T. “Of course it’s true; I saw it on the Internet!”: Critical thinking in the Internet era. Commun. ACM 2003, 46, 70–75. [Google Scholar] [CrossRef]
Johnston, B.; Webber, S. Information Literacy in Higher Education: A review and case study. Stud. High. Educ. 2003, 28, 335–352. [Google Scholar] [CrossRef]
Fernández Vítores, D. El Español: Una Lengua Viva—Informe 2019; Instituto Cervantes: Madrid, Spain, 2019; pp. 1–96. [Google Scholar]
Escobar-Pérez, J.; Martínez, A. Validez de contenido y juicio de expertos: Una aproximación a su utilización. Av. En Med. 2008, 6, 27–36. [Google Scholar]
Fotheringham, D. The role of expert judgement and feedback in sustainable assessment: A discussion paper. Nurse Educ. Today 2011, 31, e47–e50. [Google Scholar] [CrossRef] [PubMed]
Hyrkäs, K.; Appelqvist-Schmidlechner, K.; Oksa, L. Validating an instrument for clinical supervision using an expert panel. Int. J. Nurs. Stud. 2003, 40, 619–625. [Google Scholar] [CrossRef]
Drew, A.; Perera, A. Expert Knowledge as a Basis for Landscape Ecological Predictive Models. In Predictive Species and Habitat Modeling in Landscape Ecology: Concepts and Applications; Springer: New York, NY, USA, 2011; pp. 229–248. [Google Scholar]
Tsyganok, V.V.; Kadenko, S.V.; Andriichuk, O.V. Significance of expert competence consideration in group decision making using AHP. Int. J. Prod. Res. 2012, 50, 4785–4792. [Google Scholar] [CrossRef]
Hughes, R.T. Expert judgement as an estimating method. Inf. Softw. Technol. 1996, 38, 67–75. [Google Scholar] [CrossRef]
Jørgensen, M. Forecasting of software development work effort: Evidence on expert judgement and formal models. Int. J. Forecast. 2007, 23, 449–462. [Google Scholar] [CrossRef]
Burgman, M.; Fidler, F.; Mcbride, M.; Walshe, T.; Wintle, B. Eliciting Expert Judgments: Literature Review; Australian Centre for Excellence in Risk Analysis (ACERA): Melbourne, VIC, Australia, 2006. [Google Scholar]
Ritter, F.E.; Nerb, J.; Lehtinen, E.; O’Shea, T.M. In Order to Learn: How the Sequence of Topics Influences Learning; Oxford University Press: Oxford, UK, 2007; Volume 2, ISBN 978-0-19-803977-8. [Google Scholar]
White, R.W.; Dumais, S.T.; Teevan, J. Characterizing the influence of domain expertise on web search behavior. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, Barcelona, Spain, 9–12 February 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 132–141. [Google Scholar]
Grabe, W.; Stoller, F.L. Teaching and Researching Reading, 3rd ed.; Routledge: Abingdon, UK, 2019; ISBN 978-1-317-53642-0. [Google Scholar]

Figure 1. Hispanic countries where the online survey was applied.

Figure 2. Nationality and type of institution the experts belonged to.

Figure 3. Number of experts surveyed by age and sex.

Figure 4. Respondents’ education level and expertise domain.

Figure 5. Features of documents.

Table 1. Dataset content, including names, variable types, and descriptions.

Column Name	Type	Description	*
ID_LP	Identifier	Row unique identifier (ID) or key	C
Age	Categorical	Expert’s age range	S
Sex	Categorical	Woman or Man	S
Nationality	Categorical	Expert’s nationality	S
Native_language	Categorical	Native language	S
Education	Categorical	Highest degree obtained or in course: bachelor’s, master’s, or doctorate	S
Professional_degree	Categorical	Expert’s career or profession	S
Main_activity	Categorical	Main activity: student, lecturer (those that deal only with teaching duties), and faculty member (or researcher alone)	S
Current_year_study	Ordinal	If the expert is a student (e.g., doctoral program), current progress in terms of years within the program	S
Institution_type	Categorical	Higher level institution or research group	S
Time_spent	Categorical	Time spent on the Web according to the following scale: 0. Never 1. Once a week 2. Two or three days a week 3. At least five days a week, less than an hour a day 4. At least five days a week, between one hour and three hours a day 5. At least five days a week, more than three hours a day	S
Domain	Categorical	Expertise area: biology, computer science, finances, laws, physics, industrial engineering	S
Topic	Categorical	It can be one of the following six topics: - Bioethics of animal tissue cloning for human intake - Artificial neural networks - Investment projects - Inheritance laws in Chile - Quantum computing - Industrial revolutions	S
Experience_time	Categorical	Years of experience in the selected topic according to the following ranges: <1 year 2–3 years 4–5 years 6–9 years >10 years	S
Id_ LO_1	Ordinal	Id of the first LO in the LP	C
URL_1	Qualitative	URL of the first LO in the LP	S
Query_1	Qualitative	Query used by the expert to obtain LO_1	S
Reason_1	Qualitative	Reasons for recommending reading LO_ 1 in first place	S
Id_ LO_2	Ordinal	Id of the second LO in the LP	C
URL_2	Qualitative	URL of the second LO in the LP	S
Query_2	Qualitative	Query used by the expert to obtain LO_2	S
Reason_2	Qualitative	Reasons for recommending reading LO_2 in second place	S
Id_ LO_3	Ordinal	Id of the third LO in the LP	C
URL_3	Qualitative	URL of the third LO recommended in the LP	S
Query_3	Qualitative	Query used by the expert to obtain LO_3	S
Reason_3	Qualitative	Reasons for recommending reading LO_3 last	S
Comments	Qualitative	Comments and observations made by each expert	S
LP_docs_extension	Categorical	LP documents’ extension: short or long	C
Document_language	Categorical	Documents’ language: Spanish or English	C
Document_type	Categorical	Documents’ content: text or multimedia	C

* The last column indicates whether the value was obtained directly from the survey (S) or computed (C).

Table 2. Summary of data by domain, level of education, and sex.

Domain	Student n = 23		Lecturer n = 51		Faculty n = 9		TOTAL n = 83
Domain	Women n = 4	Men n = 19	Women n = 9	Men n = 42	Women n = 2	Men n = 7	TOTAL n = 83
Biology	0.00%	0.00%	0.00%	1.20%	1.20%	0.00%	2.40%
Computer	3.61%	16.88%	4.82%	30.14%	1.20%	2.41%	50.06%
Finances	0.00%	1.20%	2.41%	4.82%	0.00%	0.00%	8.43%
Industrial	1.20%	2.41%	2.41%	4.82%	0.00%	1.20%	12.04%
Laws	0.00%	1.20%	0.00%	2.41%	0.00%	0.00%	3.61%
Physics	0.00%	1.20%	1.20%	7.24%	0.00%	4.82%	14.46%
TOTAL	4.81%	22.89%	10.84%	50.63%	2.40%	8.43%	100.00%

Table 3. Guidelines considered for the creation of the dataset.

Current Scenario	Criteria
Lack of validation for search results.	Consider experts’ knowledge and criteria to select and organize web documents as LOs.
Endless search results and random reading order.	Organize search results as LPs—defined as a finite and organized sequence of documents (LOs)—considering that the order in which study material is presented can lead to different learning outcomes [30].
Observed common attitudes and behaviors among students toward web search contexts as little time and effort were invested in finding information [18].	Short LPs intended to satisfy an immediate learning need, since students spend 14:21 min on average in a search session to read text documents [18].
Most web content is in text format.	LPs mostly based on text.
Most IR (Information Retrieval) research is based on information presented in English language.	Spanish is the third most used language on the Internet [21], so it is necessary to pay attention to these users.

Table 4. Topics and subjects for each domain.

Domain	Topic	Subject
Biology	Bioethics of animal tissue cloning for human intake	What are the basic ethical principles to consider when cloning animal tissues for human intake?
Computer science	Artificial neural networks	What are the main differences between a simple artificial neural network and a deep artificial neural network?
Finances	Investment projects	What are the factors that must/should be considered when deciding whether to undertake a new business or to invest in properties?
Industry	Industrial revolutions	What are the main milestones for each industrial revolution?
Laws	Inheritance laws in Chile	Is it legal to disinherit a daughter or son? If so, in which cases?
Physics	Quantum computing	What are the main differences between quantum computers and classic computers?

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Proaño-Ríos, V.; González-Ibáñez, R. Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning. Data 2020, 5, 92. https://doi.org/10.3390/data5040092

AMA Style

Proaño-Ríos V, González-Ibáñez R. Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning. Data. 2020; 5(4):92. https://doi.org/10.3390/data5040092

Chicago/Turabian Style

Proaño-Ríos, Verónica, and Roberto González-Ibáñez. 2020. "Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning" Data 5, no. 4: 92. https://doi.org/10.3390/data5040092

APA Style

Proaño-Ríos, V., & González-Ibáñez, R. (2020). Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning. Data, 5(4), 92. https://doi.org/10.3390/data5040092

Article Menu

Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning

Abstract

1. Summary

2. Background and Rationale

3. Data Description

3.1. Files

3.2. Features

3.3. Data Distribution

4. Methods

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI