Descriptor A Dataset of Vietnamese Junior High School Students ’ Reading Preferences and Habits

Books are the invaluable, colossal storage of mankind’s immense scholarship and are still commonly perceived as a more reliable source of knowledge even in this age of digitized information. Extensive reading is often promoted as being vital to cognitive development, especially for students in primary and secondary education. While it could now be considered common knowledge that reading is highly beneficial, reading habits vary among individuals in the same culture and receiving the same public education. This could be due to demographic variations and differences in socioeconomic status, or other factors such as family background and education. Despite the ample literature on reading habits, there still exists a lack of holistic approach with empirical results concerning the reciprocal interactions of reading and its relevant affecting factors. This data article presents a dataset of 1676 responses to the survey “Studying reading habits and preferences” of junior high school students in Vietnam. Result analysis facilitates evaluation of reading habits and their affecting factors, thus holding implications on education measures and policy. The dataset is available with the paper. Dataset: The data is submitted as a supplementary file Dataset License: CC-BY


Summary
The data article presents a dataset of 1676 responses from junior high school students from 8 schools in Northern Vietnam about their reading preferences and habits.The dataset was acquired from the first phase of the survey "Studying reading habits and preferences".The first phase ran from December 2017 to January 2018.An expansion of the dataset will include data from 16 schools in total.The dataset contains information about junior high school students' reading behaviors such as preferred type of book, source of supply for books, actions after finishing a good book, and so on.Moreover, information about family background, academic performance, and future professional orientation is also acquired.
The dataset provides a thorough understanding of the reading habits and preferences of the current generation of Vietnamese junior high school students in particular.Based on the findings, we hope to generalize the implications on students in developing countries.Data analyses could enable understanding individual differences in perception of books and choices of book genres.The dataset would provide empirical evidence for the effect of reading on academic performance and future professional orientation.The dataset allows researchers as well as policy makers to evaluate some factors that influence reading preferences and wishes of students in order to make sound and scientific education policies.
In the next section, we will explain in detail all the coded variables of the dataset.Then, in the Methods section, the design of questionnaire, the survey process, the data analysis method, and examples of results are presented.Finally, the Conclusion will discuss the limitation and potential of the dataset.

Data Description
The data set contains 1676 responses acquired from the first phase of a survey about junior high school student's reading preferences and habits, parental influences, impacts of families' economic conditions, and the association between reading habits and academic performances and future professional decisions.
The questionnaire consists of multiple-choice questions and open-answer questions.For all multiple-choice questions, the surveyed is required to choose one single answer out of the options provided.Questions and answers were encoded into variables (discrete and continuous variables) and items of variables in the dataset.The questions were divided into two groups: (1) Questions on personal and family background, and (2) Questions on reading preferences and habits.

Group (1) Questions
The first two questions in group (1) are 'Grade' and 'Sex'.In Vietnam, junior high school covers grade 6th to grade 9th.The question on 'Sex' consists of two items: 'male' and 'female'.The following questions asked about the number of children in the family ('NumberofChi), and the subjects' birth order in the family ('RankinF').
Distribution of answers is presented in Table 1.The ratio of male and female students joining the survey is approximately equivalent.Grade 6 is the modal category, occupying up to 28% of all the student body; however, the differences in share of student population among grades are insignificant.The families mainly have 2 to 3 children (~82%), and most of the students surveyed are first or second child (~78%).
The questions on academic performance asked about the average score of the most recent 45-minute tests of Mathematics, Physics, (Chemistry), and Biology, and the average score of the midterm examinations of Mathematics and proposed natural science subjects.The responses were encoded into two continuous variables: 'APS45' and 'APSVNEN'.Statistics show that 'APS45' range from 2.3 to 10 and 'APSVNEN' from 3.5 to 9.8; note that exams in Vietnam are graded on a scale of 0 to 10 with 10 being the highest score.Figure 1 is a histogram chart of 'APS45' in which the horizontal axis presents levels of performance and the vertical axis is the number of students.Each column is the number of students at a specific score.The number of students with below average score (=< 5 points) is little; and their columns have small height (occupying ~8%).For most students, the average score of the most recent 45-minute tests ranges from 6 to 8 points (~53.8%). is little; and their columns have small height (occupying ~8%).For most students, the average score of the most recent 45-minute tests ranges from 6 to 8 points (~53.8%).Next, we investigated the future professional orientation of students through the question: "In the future, which job would you like to do the most?" ('FutureJob').A variety of professions was proposed by the students such as doctor, policeman, scientist, teacher, and so on.Depending on particular research purposes, the professions will be categorized into different groups.
We also examined information about students' parents: age, academic level, profession of the father ('AgeFat', 'EduFat', 'CareerFat'); and, similarly, age, academic level, profession of the mother ('AgeMot', 'EduMot', 'CareerMot').Academic level contains 4 items: Under high school ('UnderHi'), high school ('Hi'), Undergraduate ('Uni'), Graduate School ('PostGrad').Age of the parents is coded as continuous variables.The highest age of a father is 70 years old, while for the mother it is 76 years old.'EcoStt' shows the economic condition of the family as perceived by the students themselves, consisting of 3 three categories: Poor ('poor'), Average ('med'), and Rich ('rich').For further detail, we investigated whether or not the students were aware of the the average monthly income of their household ('KnowledgeInc'); the answers consist of 'yes' or 'no'.185 out of 1676 students were able to estimate said average monthly income, in which case their estimation is recorded in the variable 'EstIncome' (unit: million VND).

Group (2) Questions
Group (2) questions aim to examine the reading preferences and habits of students.After the question: "Do you like reading?" ('Readbook') with two choices 'yes' or 'no', students are invited to express their reading preferences by choosing from a list of reading topics ('Topic') which includes: math -physics ('math.phy'),literature ('literality'), foreign languages ('language'), natural sciences, chemistry, and biology ('nat.chem.bio'),history and geography ('his.geo'),information technology ('tech').There are 79 students who did not choose any of the provided topics, whose responses are encoded as 'notans'.Figure 2 shows the distribution according to preferred reading topics.Each column represents a topic and the number of students interested in said topic.Accordingly, most of the students enjoy literature books (around 500 out of 1676 students).Moreover, each column was Next, we investigated the future professional orientation of students through the question: "In the future, which job would you like to do the most?" ('FutureJob').A variety of professions was proposed by the students such as doctor, policeman, scientist, teacher, and so on.Depending on particular research purposes, the professions will be categorized into different groups.
We also examined information about students' parents: age, academic level, profession of the father ('AgeFat', 'EduFat', 'CareerFat'); and, similarly, age, academic level, profession of the mother ('AgeMot', 'EduMot', 'CareerMot').Academic level contains 4 items: Under high school ('UnderHi'), high school ('Hi'), Undergraduate ('Uni'), Graduate School ('PostGrad').Age of the parents is coded as continuous variables.The highest age of a father is 70 years old, while for the mother it is 76 years old.'EcoStt' shows the economic condition of the family as perceived by the students themselves, consisting of 3 three categories: Poor ('poor'), Average ('med'), and Rich ('rich').For further detail, we investigated whether or not the students were aware of the the average monthly income of their household ('KnowledgeInc'); the answers consist of 'yes' or 'no'.185 out of 1676 students were able to estimate said average monthly income, in which case their estimation is recorded in the variable 'EstIncome' (unit: million VND).

Group (2) Questions
Group (2) questions aim to examine the reading preferences and habits of students.After the question: "Do you like reading?" ('Readbook') with two choices 'yes' or 'no', students are invited to express their reading preferences by choosing from a list of reading topics ('Topic') which includes: math -physics ('math.phy'),literature ('literality'), foreign languages ('language'), natural sciences, chemistry, and biology ('nat.chem.bio'),history and geography ('his.geo'),information technology ('tech').There are 79 students who did not choose any of the provided topics, whose responses are encoded as 'notans'.Figure 2 shows the distribution according to preferred reading topics.Each column represents a topic and the number of students interested in said topic.Accordingly, most of the students enjoy literature books (around 500 out of 1676 students).Moreover, each column was divided into two colors: blue shows the number of students who enjoy the activity of reading, while the orange represents students who do not like to read.More than 90% of the students claim to enjoy reading, illustrated by domination of the blue area over orange.On the other hand, most of the 79 students who did not give an answer on their preferred reading topic did not like reading.
Data 2019, 4, 49 5 of 12 divided into two colors: blue shows the number of students who enjoy the activity of reading, while the orange represents students who do not like to read.More than 90% of the students claim to enjoy reading, illustrated by domination of the blue area over orange.On the other hand, most of the 79 students who did not give an answer on their preferred reading topic did not like reading.In order to evaluate the reading habits of the students, we surveyed the students on their daily pastimes.The 'Hobby' variable represents activities that students like to do the most during their free time: reading ('a'), watching TV or listening to music ('b'), doing housework or helping at the farm ('c'), observing nature ('d'), hanging out with friends and family ('e'), and other ('f').Notably, while Figure 2 shows that a large proportion of students like reading, Figure 3 reveals that only 331 out of 1676 students (20% of the sample) consider reading their favorite hobby.Item 'b' is the modus among items of the 'Hobby' variable, meaning that students prefer watching TV or listening to music (up to 47%) to reading.Similarly to Figure 2, columns in Figure 3 were also divided into two colors: The blue coded for male and the orange coded for female.It can be observed that more female students like reading, doing housework, or helping at the farm, while male students prefer watching TV, listening to music, and hanging out with friends.In order to evaluate the reading habits of the students, we surveyed the students on their daily pastimes.The 'Hobby' variable represents activities that students like to do the most during their free time: reading ('a'), watching TV or listening to music ('b'), doing housework or helping at the farm ('c'), observing nature ('d'), hanging out with friends and family ('e'), and other ('f').Notably, while Figure 2 shows that a large proportion of students like reading, Figure 3 reveals that only 331 out of 1676 students (20% of the sample) consider reading their favorite hobby.Item 'b' is the modus among items of the 'Hobby' variable, meaning that students prefer watching TV or listening to music (up to 47%) to reading.Similarly to Figure 2, columns in Figure 3 were also divided into two colors: The blue coded for male and the orange coded for female.It can be observed that more female students like reading, doing housework, or helping at the farm, while male students prefer watching TV, listening to music, and hanging out with friends.
Daily time spent reading science books ('TimeSci') and daily time spent reading literature and social sciences-related books ('TimeSoc') were divided into three items: 'less30' for reading time under 30 m, 'b3060' for reading time from 30 m to 60 m, and 'g60' for more than 60 m.All durations are personal estimations provided by the students themselves.50% of the students spend more than 30 m/day on science books; while for literature and books related to social sciences, only 36% of the students sending more than 30 m/day reading them.Besides personal preferences and passions, the amount of time spent reading also depends on a person's literacy [1].
Students were also asked to give a 'yes' or 'no' response for the question as to whether their parents buy them books or not ('Buybook'); and whether their parents read them stories at home or not ('Readstory').Scientists have found that parents reading for children associates with the development of children's language, reading, and writing ability [2][3][4][5].This dataset shows that while 1447 students were given books by their parents, only 424 students have ever been read or told a story by their parents.Daily time spent reading science books ('TimeSci') and daily time spent reading literature and social sciences-related books ('TimeSoc') were divided into three items: 'less30' for reading time under 30 m, 'b3060' for reading time from 30 m to 60 m, and 'g60' for more than 60 m.All durations are personal estimations provided by the students themselves.50% of the students spend more than 30 m/day on science books; while for literature and books related to social sciences, only 36% of the students sending more than 30 m/day reading them.Besides personal preferences and passions, the amount of time spent reading also depends on a person's literacy [1].
Students were also asked to give a 'yes' or 'no' response for the question as to whether their parents buy them books or not ('Buybook'); and whether their parents read them stories at home or not ('Readstory').Scientists have found that parents reading for children associates with the development of children's language, reading, and writing ability [2][3][4][5].This dataset shows that while 1447 students were given books by their parents, only 424 students have ever been read or told a story by their parents.
We further examined the reading preferences of the students through the question: "Besides textbooks, if someone offers to gift you a book, what kind of book would you choose?".The nominal variable is 'Typebook' and the choices were Novel ('a'), Biography ('b'), Popular Science ('c'), Arts ('d'), Vocational instruction ('e'), and Other ('f').Up to 41% of the students chose Novel, while only 7% chose Biography.The rest of the proposed choices are almost equally shared at around 10% each.Regarding the reason for their book choice ('Reason'), the following options are proposed: 'a' = personal preferences, 'b' = recommended by parents, 'c' = recommended by teachers/friends, 'd' = serendipity.Most of the students (up to 85%) choose books based on their personal preferences.
Table 1 provides more information cultivated from the datasets.We asked students about their source of book supply ('Source').Most students (approximately 61%) access books by borrowing from friends or libraries ('borrow').Other common sources include buying books using their own or their parents' money ('buy') (about 37%), and receiving books as gifts or rewards ('gift') (about 25%).
Simply reading a book is one thing, but thoroughly understanding the message and implications is a different task, especially when it concerns younger audiences.Many studies have provided significant insights on critical reading and on how to read properly [6,7].Each individual has a unique way to read critically, and the differences reflect the ways in which the brain processes and stores information [8].We aim to explore how students understand a book and react to its content by questioning the first action undertaken by the student when they come across a piece of content that Table 1 provides more information cultivated from the datasets.We asked students about their source of book supply ('Source').Most students (approximately 61%) access books by borrowing from friends or libraries ('borrow').Other common sources include buying books using their own or their parents' money ('buy') (about 37%), and receiving books as gifts or rewards ('gift') (about 25%).
Simply reading a book is one thing, but thoroughly understanding the message and implications is a different task, especially when it concerns younger audiences.Many studies have provided significant insights on critical reading and on how to read properly [6,7].Each individual has a unique way to read critically, and the differences reflect the ways in which the brain processes and stores information [8].We aim to explore how students understand a book and react to its content by questioning the first action undertaken by the student when they come across a piece of content that piques their interest ('PrioAct').The options proposed are: telling friends or family ('a'), noting ('b'), applying the content to daily life ('c'), reflecting and relating to personal knowledge ('d').In addition, variable 'AftAct' shows their first action after finishing a good book: 'a', find more books on the same topics; 'b', find books about related topics; 'c', find books on new topics; 'd', read the book again.Most of the students would tell friends or parents when they come across something interesting, and find more books on the same topics after finishing a good book.
In order to understand students' wishes regarding books, we asked the them to answer the following questions: ( 1 in your classroom?"Diverse and interesting ('a'), Missing good titles ('b'), Lacking books ('c'), No bookshelf ('d').The statistics show that around 90% of the students enjoy participating in reading promotion activities.However, 71 students did not answer question (2); of the total 1605 responses, book exhibition occupied around 40%.
Finally, there are two open-answer questions: "Name two books you like the most" ('Read.like');and "Name three books you really wanted to read but have never had the chance to because they are not available."(Notread.like').

Potential Research Questions
Reading is a healthy hobby that contributes positively to vocabulary learning aptitude, reading and writing skills, and cognitive development [9][10][11].Specifically, a study examining 288 third grade students reported that despite similarities in their self-evaluation of reading ability, young female students value reading more than their male counterparts [12].Elementary students show more interest in reading than secondary or high school students [13].Moreover, researchers have highlighted the lack of children's access to sciences and philosophy books despite their role on children's development of scientific perception [14].
Based on the dataset, we present potential research questions in following list.
Possible research questions arising from the data set: • What socio-economic and socio-cultural factors that could affect reading habits of students?• How habits, preferences and wishes of students influence the efficacy of reading promotion campaigns and events?• How does the birth order associate with reading habits?• How does reading habits associate with academic performance and professional orientation?
Is there any difference in reading habits conditioned on biological sex?For example, do female students behave differently when coming across interesting content?• How does students' perception of classrooms' bookshelves associate with reading habits?

Methods
To the best of our knowledge, there is presently no thorough dataset containing adequate information on the reading behavior of Vietnamese students.Therefore, we conducted a survey named "Studying the reading preferences and habits of students" at 16 junior high schools in North Vietnam.The logic for designing the study is similar to what is described in [15][16][17][18].The survey was divided into two phases, and this dataset was acquired in the first phase.The first phase started in December 2017 and finished with 8 schools on January 2018.The list of high school students is provided in Table A1.In the first phase, we collected 1676 responses for this dataset.
The subjects were selected randomly and participated by giving their responses to the questionnaires.Examples of the questionnaires were provided in Supplementary Files S1-S4.Surveyed students were required to write down their school name before answering the questions.The structure of the questionnaires consists of 5 parts with 4 to 7 questions in each part.The participants are junior high school students, so we constructed short and easy to understand questions, with the request for students to answer as truthfully and accurately as possible.After collecting the questionnaires, two members of the research team input the data into MS Excel for validation.Hence, the dataset is perfectly matched with the answers in the questionnaire.
The project consists of 5 periods:

Materials and Methods
Raw data from the questionnaire were input as an MS Excel file (see Supplementary File S5).The data will be saved as CSV (see Supplementary File S6) format for processing in R. The analysis employed BCL model [19].
The majority of the dataset is discrete data, which received their value from categories that were built according to the design of the survey.Thus, the most suitable method of analysis for this kind of data is categorical regression, in which log-linear and logistic models can be equivalent [20].However, logistic regression model is more efficient in explaining the relationship, either independence or associations, among variables.Moreover, logistic regression analysis also provides coefficients to estimate the probabilistic trends for each value of the dependent variables according to the conditions of the independent variables.Logistic regression analysis is more flexible for analyzing mixed sets of nominal/ordinal and interval variables [21].For more detailed discussion on the comparative advantages of the two methods of analysis, please see [19][20][21].
The general equation for the logistic model is: In which, x is independent variables; and π j (x) = P Y = j x is the corresponding probability.
Therefore, π j = P Y ij = 1 with Y is the dependent variable.
In the logit model, the probability of each item of dependent variables was estimated as follows: Regression on the statistical software R was run from the distribution table of frequency of factors under the format CSV, similar to Supplementary Files S7 and S8 in the dataset.The usage of this file was meant to adjust the based items for each variable.The model can be run on the original CSV file, however, based item is at default and cannot be changed.

Examples of Data Analysis
Logistic regression method is employed with the dependent variable being reading habits ('Readbook'), and biological sex ('Sex') and school grade ('Grade') as independent variables.The regression coefficients are reported in Table 2. Examples of code on R that were used to come up with the results in Examples of code on R that were used to come up with the results in Table 2 is as follows: Based on the model, estimation of conditional probability produced Figure 4 (Table of detail probability is in Table A2).3 presents another example of estimating impacts of socio-economic conditions and biological sex on students' reading habits.Dependent variable is students' reading habits ('Readbook'); and independent variables are economic conditions ('EconStt') and biological sex ('Sex').3 presents another example of estimating impacts of socio-economic conditions and biological sex on students' reading habits.Dependent variable is students' reading habits ('Readbook'); and independent variables are economic conditions ('EconStt') and biological sex ('Sex').Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1, z-value in square brackets; baseline category for: "Sex": "Female"; and, "EcoStt": "med".Residual deviance: 3.191 on 2 degrees of freedom.

Conclusions
The dataset provides a comprehensive view on the reading practices, habits, and preferences of students in Vietnam, which is a subject that to this day lacks systematic research despite being acknowledged as a crucial aspect of education.The inclusion of both self-reported personal preferences and objective data regarding school performance under the form of notes allows for multifaceted analyses.In addition to data regarding the activity of book reading, we have also collected demographic data and recorded the socio-economic background of the respondents, including estimated household income and parents' profession.We believe this dataset has one of the most complete sets of variables out of all datasets in the same field in Vietnam, which would be especially beneficial in future models that require extensive control variables.
In terms of content, our dataset promises fertile grounds for future research on reading practices in adolescents both as a response variable-to demographic factors, family background, or personal preferences, for example-and a predictor variable-to, among others, academic performance or

Figure 1 .
Figure 1.Average score of the most recent 45-minute tests of Mathematics, Physics, (Chemistry), and Biology.

Figure 1 .
Figure 1.Average score of the most recent 45-minute tests of Mathematics, Physics, (Chemistry), and Biology.

Figure 2 .
Figure 2. Distribution of students according to their preferred reading topics and their reading hobby.

Figure 2 .
Figure 2. Distribution of students according to their preferred reading topics and their reading hobby.

Figure 3 .
Figure 3. Distribution of students according to favorite hobbies.

Figure 3 .
Figure 3. Distribution of students according to favorite hobbies.We further examined the reading preferences of the students through the question: "Besides textbooks, if someone offers to gift you a book, what kind of book would you choose?".The nominal variable is 'Typebook' and the choices were Novel ('a'), Biography ('b'), Popular Science ('c'), Arts ('d'), Vocational instruction ('e'), and Other ('f').Up to 41% of the students chose Novel, while only 7% chose Biography.The rest of the proposed choices are almost equally shared at around 10% each.Regarding the reason for their book choice ('Reason'), the following options are proposed: 'a' = personal preferences, 'b' = recommended by parents, 'c' = recommended by teachers/friends, 'd' = serendipity.Most of the students (up to 85%) choose books based on their personal preferences.Table1provides more information cultivated from the datasets.We asked students about their source of book supply ('Source').Most students (approximately 61%) access books by borrowing from friends or libraries ('borrow').Other common sources include buying books using their own or their parents' money ('buy') (about 37%), and receiving books as gifts or rewards ('gift') (about 25%).Simply reading a book is one thing, but thoroughly understanding the message and implications is a different task, especially when it concerns younger audiences.Many studies have provided significant insights on critical reading and on how to read properly[6,7].Each individual has a unique way to read critically, and the differences reflect the ways in which the brain processes and stores information[8].We aim to explore how students understand a book and react to its content by questioning the first action undertaken by the student when they come across a piece of content that piques their interest ('PrioAct').The options proposed are: telling friends or family ('a'), noting ('b'), applying the content to daily life ('c'), reflecting and relating to personal knowledge ('d').In addition, variable 'AftAct' shows their first action after finishing a good book: 'a', find more books on the same topics; 'b', find books about related topics; 'c', find books on new topics; 'd', read the book again.Most of the students would tell friends or parents when they come across something interesting, and find more books on the same topics after finishing a good book.In order to understand students' wishes regarding books, we asked the them to answer the following questions: (1) 'EncourAct' -"Do you like participating in reading promotion activities?" with the answers are 'yes' or 'no'; (2) 'MostlikedAct' -If yes, then which of the following activities would you rather take part in: Book exhibition ('a'), Storytelling competition ('b'), Story-writing competition ('c'), Drawing book illustrations ('d'); (3) 'Bookcase' -"How would you describe the common bookshelf ) 'EncourAct' -"Do you like participating in reading promotion activities?" with the answers are 'yes' or 'no'; (2) 'MostlikedAct' -If yes, then which of the following activities would you rather take part in: Book exhibition ('a'), Storytelling competition ('b'), Story-writing competition ('c'), Drawing book illustrations ('d'); (3) 'Bookcase' -"How would you describe the common bookshelf Data 2019, 4, 49 7 of 12

Part 1 :
Personal information; Part 2: Family information; Part 3: Habit/Preference; Part 4: Book choosing habits; Part 5: Questions on classroom's bookshelf.Questions can be either multiple-choice with only one single answer allowed, or open-answer.

( 1 )
Questionnaire design; (2) Surveying the students; (3) Validating and checking the quality of the answers; (4) Data design and input; (5) Mining the data and analyzing the results.

Figure 4 .
Figure 4. Probability to like reading according to biological sex and grade.Table3presents another example of estimating impacts of socio-economic conditions and biological sex on students' reading habits.Dependent variable is students' reading habits ('Readbook'); and independent variables are economic conditions ('EconStt') and biological sex ('Sex').

Figure 4 .
Figure 4. Probability to like reading according to biological sex and grade.Table3presents another example of estimating impacts of socio-economic conditions and biological sex on students' reading habits.Dependent variable is students' reading habits ('Readbook'); and independent variables are economic conditions ('EconStt') and biological sex ('Sex').

Table 1 .
Distribution of students according to demographic and reading behavior.
Table 2 is as follows:Based on the model, estimation of conditional probability produced Figure4(Table of detail probability is in TableA2).