Introduction
Ninety-five percent of the world’s illiterate people live in developing countries (
Verner, 2005). India’s literacy rate is 78 percent (
NSO, 2020) but the quality of literacy is extremely low. Sixty percent “literates” cannot read simple texts, much less a newspaper (
Kothari & Bandyopadhyay, 2010). Children fall behind in reading from the early grades and weak skills only erode later in life due to a lack of reading practice. The nation and practically every state confronts weak reading at scale and the overwhelming majority of non-literates and weak readers are rural and female (
Chandra, 2019). The long-term goal of the Billion Readers (BIRD) initiative at the Indian Institute of Management, Ahmedabad (IIM-A) is to ensure daily and lifelong reading practice for a billion people in India. In order to achieve this goal, BIRD aims to scale up Same Language Subtitling (SLS) on all existing entertainment content on television and streaming platforms, expressly for reading literacy practice and skill improvement at population scale. This is a global first in some important respects.
Till date, no country’s TV and/or streaming policy has implemented text on screen explicitly for mass reading literacy. Several majority English-speaking countries have implemented captions for media access among the Deaf and Hard or Hearing (DHH) (
NCI, 2020). Since Price, (1983) first researched the availability of captions for learning English as a Second Language (ESL), a large body of evidence has supported captions for second language learning (
Burger Guenter (
2022) maintains a comprehensive online bibliography).
The suggestion of leveraging captions for reading literacy among the DHH (
Cooper, 1973) and the hearing (
Koskinen, 1985) is as old as the idea itself. However, the use of captions to strengthen reading has been far less researched than their use for language learning and has certainly not driven national broadcast policy on mainstream TV or streaming, anywhere in the world, for mass reading literacy. The ‘SLS’ project was first conceived in India in 1996 with the intent to move national media policy and impact reading literacy at scale (
Kothari, 1998). Existing terms like ‘bimodal’ and ‘intralingual’ subtitling, while meaning exactly the same, did not suit the project’s need for a self-evident term that spoke widely to policy makers and viewers alike.
SLS is the idea of subtitling audio-visual (AV) content in the ‘same’ language as the audio. What you hear, is what you read. SLS means Hindi subtitles on Hindi content, Tamil subtitles on Tamil content and likewise on all existing and popularly watched Indian language content like films, serials, cartoons, and songs. In India and globally, most entertainment content in English is available with SLS. That is not the case on content in most of the world’s other, and especially, non-Roman script languages. Streaming platforms in India offer translation subtitles in a number of languages, but not in the ‘same’ audio language.
SLS is a deceptively simple social innovation with the power to transform a large population of struggling readers into functional and even fluent readers. Globally, the enormous value of SLS for reading literacy at population scale remains mostly untapped. After field testing in villages and poor urban communities, researchers at IIM-A piloted SLS on TV in Gujarat state in 1999 and found it to be an effective intervention for improving reading skills (
Kothari et al., 2004). Since then, other studies have confirmed that frequent matching text-sound exposure improves reading (e.g.,
Kothari & Bandyopadhyay, 2014).
The theory of change driving BIRD is that a population that engages in reading every day, all through life, cannot remain weak reading. Currently there is nothing that ensures daily reading practice for all Indians, across the average lifespan of 70 years. If all the entertainment content on TV and streaming platforms carried SLS, millions of viewers would automatically try and associate the matching text and sound. Reading skills would remain in a constant state of reinforcement.
To our knowledge eye-tracking studies of beginning or weak-readers’ viewing of subtitles are rare, if any, anywhere in the world. Most eye-tracking studies on dynamic texts assume functionally reading subjects and hardly any have been conducted in a developing country (
Kruger et al., 2015).
Negi and Mitra (
2020) study in India is with good readers with a minimum 10
th Grade education. Our present eye-tracking study in rural Rajasthan, with weak readers – those who can decode some letters or simple words but struggle to read the words in a 2
nd Grade level text as single units, i.e., – they lack the ability to unitize (
Ehri, 2005) – may be a first.
An eye-tracking study undertaken earlier (
Kruger & Steyn, 2014) found that viewers who saw videos with SLS and read them had better comprehension than those who saw the same videos but did not read the SLS. The participants were literate college students in South Africa who spoke English as a second language. The study was conducted in the context of English subtitles on academic lectures delivered in English. It demonstrated the educational potential of SLS in reading instruction and language learning. In rural India, the primary source of entertainment is via native language films and television programs. Could SLS on entertainment content be an approach to improved reading skills?
Most of the analyses for previous studies focused on the change in reading performance but did not eye-track the reading engagement with SLS. This paper analyzes the processing of SLS on popular Hindi songs and dialogs with eye gaze metrics. In addition to fixation and scan path measures, we investigated regressions, saccade amplitudes, and revisits. These metrics helped us to examine the reading behavior of participants and the proportion of engagement with SLS. We developed a visual analytic tool to investigate eye-tracking data. The tool provided information about eye gaze patterns that would have gone unnoticed otherwise. For example, the tool could differentiate between the reading pattern of weak readers who can read with difficulty (functional weak readers) and very weak readers who cannot read functionally (poor readers) with respect to subtitle reading in videos with SLS.
We discuss the related work on SLS in Section 2 followed by an overall design of our user study in Section 3. The methodology is discussed in Section 4. Section 5 presents our analysis and results followed by concluding remarks in Section 6.
Related Work
The most sustained push for SLS on TV for reading literacy has been in India, through research, pilot TV implementations in 10 Indian languages and evidence-based policy advocacy (
Kothari & Bandyopadhyay, 2020). However, the positive impact of SLS use on reading literacy has also been affirmed in research studies from other countries (
Koskinen et al., 1997;
D. Linebarger et al., 2001,
2010;
Parkhill & Davey, 2014). Taken together, these studies make a strong case that routine and regular exposure to SLS can have an impact at scale on a population’s reading development.
Previous studies have found that the presence of SLS on popular entertainment content causes automatic reading skill practice and improvement among viewers who can minimally recognize letters (
Kothari et al., 2004;
Kothari & Bandyopadhyay, 2014). Moreover, there is compelling evidence that SLS also promotes language learning and media access among the Deaf and Hard of Hearing (DHH) (
Danan, 2004;
Gernsbacher, 2015;
Pavakanun, 1992;
Vanderplank, 2016). While any of these benefits provides a strong rationale for SLS, the evidence for all three makes it a compelling proposition on TV and streaming platforms.
SLS on popular entertainment content is a critical aspect of BIRD for two factors that powerfully drive reading skill acquisition: first, an inordinate amount of practice and second, with text embedded in a context that the reader will be passionate about for life.
Toste et al. (
2020) review the powerful bi-directional relations between motivation and reading skill. Both drive each other. For proficient reading, grapheme-phoneme associations need to fire frequently and sufficiently over a long enough period of time to achieve and sustain automaticity. As
Frey and Fisher (
2010) state “When we experience something, neurons fire. Repeated firings lead to physical changes in the brain that, over time and with repetition, become more permanent.” They further point out that “The challenge, of course, with automaticity is to not allow repetition to turn into a rut.”
When reading acquisition occurs in a person’s first language, the auditory-to-language pathways are well-established in the brain. In a weak reader, the letter-to-soundto-meaning pathways are weak (
van Atteveldt et al., 2009). Reading fluency and comprehension are achieved over a long period of sustained exposure to congruent lettersound correspondence and decoding practice. As decoding approaches fluency and automaticity, cognitive resources are freed to focus on the key task of meaning-making. SLS on popular songs, nursery rhymes and repeatedly watched cartoons has the added advantage of reading practice with predictable sound-to-text reinforcement, on content of high-interest (
Hill-Clarke & Robinson, 2004;
Iwasaki et al., 2013). The visual and auditory pathways involved in reading are strengthened incidentally, subconsciously, and inevitably, as a by-product of entertainment.
There are few longitudinal studies on the impact of captions on reading skills (
Koskinen, 1985).
Linebarger et al. (
2010) study indicated that children who viewed video with captions improved their reading faster than their counterparts who viewed without captions, and the improvement was most pronounced among children at risk for poor reading outcomes. Similarly, in New Zealand,
Parkhill and Johnson (
2009) found that in their six-week ‘AVAILLL’ program for children aged 5-13 years, which uses popular, subtitled movies and accompanying novels to engage students in reading literacy, the greatest gains occurred for ‘low-progress’ readers. A positive impact was also observed for average and higher-level readers.
Kothari and Bandyopadhyay (
2014) evaluated the impact of SLS after sustaining it for 5 years on a weekly hour-long program of Hindi film songs telecast nationally in prime time. Among school children who could not read a single letter in Hindi at the baseline (2002), 70% in the high-SLS viewing group became functional readers by the end line (2007) as compared to 34% in the low-SLS group. The benefits of SLS or Closed-Captioning are not limited to reading literacy. The range of benefits attributable to SLS include reading, media access and language acquisition (
Gernsbacher, 2015).
A key finding of eye-tracking research on SLS and subtitling in general is that viewers will engage with the onscreen text automatically (
d’Ydewalle & de Bruycker, 2007;
Schroyens et al., 1999a,
1999b). Viewers just cannot ignore the subtitles in movies, although they may do so periodically. It is important to note, however, that all these studies were conducted with subjects who could read. Reading along with SLS is preferred because of the efficiency in following and understanding the movie. Evidence of such eye-tracking studies enables us to attribute possible learning outcomes to the subtitles.
d’Ydewalle and de Bruycker (
2007) reported that subtitle reading is inescapable and viewers have little difficulty in distributing visual attention.
Several studies have explored the reading behavior of participants and investigated to what extent they read subtitles (
Kruger & Steyn, 2014;
Pavakanun, 1992). These studies primarily limit the eye-tracking metrics to fixation count, fixation time, scan path, and average fixation duration in two areas of interest (AOI): the subtitle band and image.
Negi and Mitra (
2020) eye-tracking study of subtitled videos revealed that fixation duration can be a useful metric to trace the learning process. Other eye-tracking studies investigate the impact of audio on the reading of subtitles (
Liao et al., 2022) and make recommendations for future cognitive research in the field of audiovisual translation (
Kruger et al., 2015).
Reading along with SLS is inescapable, among good readers (
d’Ydewalle et al., 1991). This pioneering study found that American subjects watching an English movie with SLS and Dutch subjects watching a Dutch movie with SLS, spent considerable time in the subtitle area. Reading SLS was inevitable and comparable for both groups, even though the Dutch subjects had much more experience with subtitles on TV. Reading SLS did not depend much on habit formation. The critical question for us is, would struggling readers, especially those from economically disadvantaged backgrounds, also try and engage automatically with SLS?
PlanetRead (
2018) completed an eyetracking study of government school children in Grades 25 in rural Rajasthan, India, by showing them animated stories with and without SLS. Almost all viewers beginning, struggling, or good readers automatically engaged with SLS and could not ignore it.
User Study
We undertook user studies with participants from remote villages in Rajasthan. We approached an NGO named Dusra Dashak that has been working in Abu Road, Rajasthan for over 15 years and has an established on-theground connect. The head of the NGO advised us on the villages where we could conduct our eye-tracking study. In the selected villages we first got the required permissions with Dusra Dashak’s help and implicit support from the village head to conduct our study. We went door-todoor to mainly identify individuals who had completed their schooling but were still weak readers. As a quick filter, we asked villagers to read a simple Grade 2 level paragraph. Those who struggled to read the text were considered to be weak readers, including poor readers, and selected as study participants. Individually, every participant was asked to see video clips of Hindi songs and dialogs, with and without SLS, on a computer monitor. We collected ocular data while participants watched both versions of the video. In particular, we conducted both statistical and visual data analysis, described in the next section, to address the following four questions.
What proportion of viewers’ attention is divided between two regions (video and SLS) while watching a video with SLS?
How can we quantify the amount of engagement with the subtitles?
How can we differentiate between functional weak readers and poor readers?
Is there any difference in engagement when the SLS are highlighted?
Table 1.
Media descriptions of audio-visual clips.
Table 1.
Media descriptions of audio-visual clips.
Content | Movie Name | Song Name | Duration |
---|
Dialogue | Baahubali | Not applicable | 64 secs |
Song | Besharam | Dil ka jo haal hai | 67 secs |
Song | Abhimaan | Teri bindiya re | 63 secs |
Song | Dhadkan | Tum dil ki dhadkan | 42 secs |
Dialogue | Sholay | Not applicable | 100 secs |
Song | Karz | Kamaal hai | 94 secs |
Results
This section lists the indicators and parameters that can answer the questions addressed in section 1 on the resulting reading behavior from SLS. In the subsections below, we first list the parameters and then discuss the results of our statistical analysis to justify why the selected indicators can address our research questions. We also report a visual analysis of eye tracking data to support the results. For the analysis, we calculated the average values of the indicators or parameters for all 136 participants. We prepared a table of 12 columns corresponding to two versions (SLS and No SLS) of six unique videos and 136 columns corresponding to 136 participants.
We further analyzed the data statistically with two independent variables and each variable has two levels: (a) Video: with and without SLS, and (b) AOIs: AOI_SLS and AOI_Video. In particular we undertook Video(2) × AOI(2) repeated measure ANOVA for identifying statistically significant differences between the experimental conditions. We report the statistical results in the sections below along with other results.
What Proportion of Viewers’ Attention Is Divided Between Two Regions (Video and SLS) While Watching a Video with SLS?
Indicators considered for this question are explained below:
- (1)
Change in the number of fixations and saccades in AOI_Video
Participants who try to read subtitles while watching a video with SLS are expected to devote more time to AOI_SLS. Participants’ fixations and saccades at AOI_Video would be less if they try to scan subtitles on AOI_SLS while watching a video with SLS. A consideration of both fixations and saccades ensures that participants were indeed trying to concentrate at the region rather than just moving their eyes.
Figure 6 and
Figure 7 show how fixation counts can change in both the AOIs for the same video, without SLS and with SLS. The change of participants’ fixations and saccades at AOI_Video was calculated by subtracting the fixation numbers in the video without SLS from those in the video with SLS. We calculated the change for each pair of videos (with and without SLS) across all the participants and the videos.
Table 3 illustrates the average percentage of change in eye movements across all the participants for six pairs. In four out of the six pairs of videos the percentage change in the eye movements is more than 25%. We also undertook paired-samples t-test for each pair of videos to statistically investigate the change of eye movements in AOI_Video. The results listed in
Table 4 indicates that the eye movements in AOI_Video are significantly different between each pair of all six videos.
- (2)
Total time spent in AOI_Video
The time spent by participants in AOI_Video while watching a video without SLS is expected to be more than a video with SLS. Based on this assumption, the total time spent could be helpful in identifying readers’ attention in both the AOIs. We computed the differences in time spent in AOI_Video between the pair of videos (with and without subtitles) as this is the constant AOI in both the situations. The difference would express the divided attention of participant due to the inclusion of subtitles.
Table 5 shows the average time differences in AOI_Video across all participants for each pair of video. After undertaking repeated measure ANOVA, we found significant difference for the interaction effects of video × AOI: F(1, 135) = 287.42, η
2 = 0.68, p < 0.05. We also found that video AOI are significantly different for the time spent. F(1, 135) = 754.77, η
2 = 0.84, p < 0.05. We also undertook a pairwise comparison test like least significant difference (LSD) and found that AOI_Video in the video without subtitle is significantly different from the video with subtitle (p < 0.05).
- (3)
Saccades across AOIs or revisits at AOIs
Participants whose attention are divided between the two AOIs while watching videos with SLS would repeatedly visit the subtitle area. While we cannot expect much eye movements between the two AOIs in the videos without SLS as there were no subtitles. However, it is intuitive that the introduction of subtitles would increase the eye movements across the AOIs in videos with SLS. The comparisons of saccades across AOIs would allow us to investigate the quantity with which the participants were trying to visit the subtitle region while watch videos with SLS. We compared the average number of saccades across AOIs for all the participants on all SLS videos and its counterpart. The result conveys that the rate of increase in saccadic eye movement is approximately 60% for videos with SLS as compared to videos without SLS. We undertook 2 × 2 repeated measure ANOVA and found significant difference for the interaction effect of video × AOI: F(1, 135) = 286.53, η2 = 0.68, p < 0.05. We further found that video and subtitle AOIs are significantly different for saccades across AOIs F(1, 135) = 329.69, η2 = 0.7, p < 0.05. After undertaking the LSD test, we found that the video with SLS is significantly different from the video without SLS for AOI_Video (p < 0.05).
In What Ways Can We Quantify the Amount of Engagement with SLS?
The indicators used are explained below:
- (1)
Fixation rate and number of fixations
Fixation rate and the number of fixations in AOI_Video could be good indicators for assessing the amount of engagement. We calculated the average number of fixations and average fixation rate at AOI_Video across all participants for all videos with and without SLS. The fixation rate is calculated as:
Figure 8 compares the average fixation rate across all participants. We found from repeated measure ANOVA that the interaction effects of video × AOI are significantly different: F(1, 135) = 153.14, η
2 = 0.535, p < 0.05. We found that videos with and without SLS are significantly different F(1, 135) = 156.02, η
2 = 0.54, p < 0.05.
- (2)
Total durations in AOI_Video
Viewers would be expected to spend more time in AOI_SLS when they try to read the subtitles. Average and total fixation duration could thus be an important variable for quantifying the amount of reader engagement. Higher duration might not always indicate that participants were reading the subtitles. However, it would at least indicate that they were trying to engage in that region while watching the video. We undertook repeated measure ANOVA for total durations in AOI_Video across all participants for all videos and found that there is a significant difference for the interaction effects of video × AOI: F(1, 135) = 287.42, η2 = 0.68, p < 0.05. We also found that video AOI are significantly different for total durations. F(1, 135) = 754.77, η2 = 0.84, p < 0.05.
- (3)
Saccade amplitude at AOI_Video
Eye movements in the video region will be less if participants try to scan through the subtitle while watching the video. It will lead to smaller number of saccades and hence shorter total saccade lengths are expected in the video area and higher total saccade lengths are expected in the SLS area. We investigated the saccade amplitudes for both versions of video and found that saccade amplitude decreases significantly in AOI_Video.
Table 6 compares the change of saccade length in AOI_Video for each pair of six videos. We found that the interaction effect of video × AOI are significantly different: F(1, 135) = 221.99, η
2 = 0.62, p < 0.05. The videos with and without SLS are significantly different for saccade length F(1, 135) = 16.82, η
2 = 0.11, p < 0.05. After undertaking LSD test, we found that both versions of video are significantly different with respect to AOI Video (p < 0.05).
How Can We Differentiate Between Functional Weak Readers and Poor Readers?
- (1)
Number of regressions at AOI_SLS
Regression is a type of saccadic eye movement that orients backward for normal reading behavior on the screen while reading (
Booth & Weger, 2013). It is also termed backward eye movement. The standard reading behavior of participants in the Hindi language is from left to right on the screen, and we expect their eye movement in a similar direction. Regression could be an indicator for differentiating between functional weak readers (FWR) and poor readers (PR). We expect poor readers to have higher number of regressions than the functional weak readers, as poor readers would read with greater difficulty than their counterparts. We compared the average number of regressions for FWR and PR only on videos with SLS and observed that number of regressions for both the groups are almost similar. The results of the comparison are listed in
Table 7. As regressions in AOI_Video would not help us in identifying the reading behaviour of participants, we analysed regressions only at AOI_SLS. We also undertook paired sample t-test and did not find any significant difference between the two groups.
- (2)
Fixations at left AOI_SLS and right AOI_SLS
We noticed from the visualization tool, as shown in
Figure 9 and
Figure 10, that poor readers have relatively more fixations at the left part of the subtitle area than functionally weak readers. We found that the average fixation rate at AOI_SLS_Left is relatively more for poor readers than functional weak readers in all videos. However, the difference is not significantly high as can be seen from
Table 8. We also undertook paired samples t-test and did not find any significant difference between the two groups.
Is There Any Difference in Engagement When the SLS Are Highlighted?
Among the six videos with SLS, the subtitle words of two videos are not highlighted and the words of four videos are highlighted. We considered the number of fixations and durations for analysis.
- (1)
Number of fixations and durations
As stated earlier, the number of fixations and durations could be a good indicator for identifying reading engagement. We used these same variables for comparing our two design versions of SLS. We did not find any considerable difference among the subtitle designs for both the parameters.
Discussion
Our study is unique in several respects. First, a number of eye-tracking studies have established automatic reading behavior while watching video content, but, in almost all these studies, the subjects were good readers. Our subjects were all weak and many of them poor readers and we established that a majority of them also try to read along with SLS on video.
Second, eye-tracking studies in India generally, and of video consumption in particular, are rare. With a billion TV viewers and 356 million mobile video viewers in 2021 (
INMOBI, 2021), and growing fast, eye gaze research on how viewers consume video content, with or without subtitles, or interact with screens, is a fertile area for study.
Third, to our knowledge, there is no eye gaze study conducted with subjects from rural India. Perhaps the reason is that it may seem difficult to transport and operate eye-tracking equipment in rural areas or bring rural subjects to urban centers. Our study has demonstrated that the former is indeed possible.
Impact measurement studies have found that regular SLS exposure among weak readers leads to reading skill improvement, over time (
Kothari et al., 2004;
Kothari & Bandyopadhyay, 2014). Future studies could explore whether eye gaze measurement can also capture reading skill improvement resulting from SLS exposure over time. How would a subject’s reading of SLS on the same video be different at the baseline and end line, assuming that the subject’s reading skills have improved? The visualization tool proposed in the paper was useful to support the results of the statistical analysis.
This paper presents a detailed eye-tracking study on processing SLS while watching Hindi film songs and dialogs. We analyzed five gaze-based metrics to investigate eye movements of users in the subtitle region of video clips. We also observed the proportion of viewers who try to read along while watching videos. We noticed that regression between poor and functional weak readers are almost similar. We also found significant differences for the interaction effect between the experimental conditions. Weak readers tend to linger more in the left half of the subtitle area. From a policy perspective, a key finding is that both poor readers and functional weak readers tend to engage their eye movements in the subtitle regions with the introduction of SLS in the video clips. We plan to undertake a similar eye-tracking study of SLS film videos to understand changes in eye-gaze patterns when an individual becomes a better reader. This could even be conducted with a subset of our study participants who have measurably improved their reading skills since this study was conducted. Our results could potentially be used to design an adaptive user interface for learning while watching videos with SLS.
Conclusions
In 2019, SLS became a part of India’s Accessibility Standards (
India’s Accessibility Standards, 2019), mandating that half the entertainment content on TV, in every language, state, and channel, is required to carry SLS by 2025. Our finding that weak readers do try to read along with SLS on films, supports the leveraging of the Accessibility Standards, for both, media access among the DHH and improving the reading skills of over half a billion weak readers in India, an overwhelming majority being female. The Accessibility Standards, if implemented with the intent to benefit all, will then be designed to benefit all. But if they are framed as good only for the hearing impaired, then the implementation risks not leveraging their full potential for the hearing. This perspective is important to keep in mind for policy makers in countries, like India, that are in the process of adopting SLS
Countries that do have captioning on TV might consider re-designing captions with universal visual appeal. By not doing so, they lost out on a massive opportunity to contribute, at scale, to their populations’ reading literacy and language learning. The hearing also want to turn TV captions on for different reasons, as they now do on streaming platforms. Networks might consider making captions available on content in all languages and turning them on by default, especially on children’s programming, like Turn On The Subtitles (TOTS) is campaigning for in the UK. Finally, parents the world over need to know that SLS or captions can make a massive contribution to their child’s reading literacy and language skills. All they have to do is turn them on whenever possible.