Measurement of web traffic and unique web traffic was performed using Google Analytics (GA). Data are set out in Table 2
Traffic in Y2 increased by 68,824 to 365,024, equating to a 23% improvement when compared to Y1. A 22% improvement in unique traffic was also observed ( = 276,042). Y3 also yielded a 23% increase in traffic on Y2 ( = 450,520), with percentage growth in unique traffic equivalent to 26% ( = 346,851). The increase in traffic and unique traffic for Y4 was lower than Y3 at 9% and 10% respectively.
These increases in traffic initially appear to be lower than those reported previously [8
] which, for example, reported a Y2 traffic increase of 54%, from 150,408 to 428,407, considerably higher than the 23% improvement reported here. Similar disparities can be observed for Y3 data too. However, it should be noted that the alternative segmentation of annual web impact data have altered the spread of traffic data across years, making direct comparisons to previous results problematic. Indeed, while [8
] reported a plateauing of traffic (6%) and unique traffic (8%) in Y3, this article instead reports a considerable percentage increase at 23% and 26% for Y3, with plateauing of traffic (9%) and unique traffic (11%) observed in Y4. This means that total percentage growth during the entire reporting period of this present study was more significant, at 65% and 69% for traffic and unique traffic respectively. This actually exceeds previously reported results but highlights the difficulties which can arise from studying different ‘annual segments’ of data.
Google was again found to be the single largest referral source during the reporting period, accounting for 56% of all repository traffic in Y4. Over the entire reporting period this referral traffic (including unique traffic) increased by circa 1500% (Table 2
). The most significant referral source thereafter was found to be Google Scholar (GS), equivalent to 26% of all web traffic by Y4 and growing by 1920% during the entire reporting period (Table 2
). Much of this massive percentage growth can be observed in Y2, owing to a low baseline in GS traffic during Y1 but with significant increases observed in Y3 and Y4 also.
To verify the influence of outlying data points it is worthwhile briefly reviewing the extent of data variability using some common measures of central tendency. Table 3
sets out measures5
for the total traffic data detailed above in Table 2
(‘Current data—A’) alongside the same measures for data reported in previous work [8
], labelled in Table 3
as ‘Prior data—B’. Data used for ‘Prior data—B’ are publicly available [30
A higher mean and lower standard deviation for total ( = 400,221; = 86,594. = 386,908; = 95,203) and unique traffic ( = 308,200; = 70,162. = 296,311; = 73,251) can initially be observed within ‘Current data (A). When Google and GS are considered separately, however, we notice the opposite, with lower mean traffic and higher levels of variability around the mean, highlighting the low baselines in Y1 for both Google and GS.
By excluding Y1’s outlying data from these measures, as we have done in the bottom row of Table 3
, we can note a higher mean, and less variability around the mean, for total (
= 63,516) and unique traffic (
= 54,458). Similarly, higher means and lower deviations for Strathprints traffic and unique traffic from Google Scholar can be observed. Interestingly, while higher means are observable for traffic and unique traffic from Google, a slightly higher standard deviation is found when compared to ‘Prior data—B’.
It is significant to note from Table 2
that the traffic gains to Strathprints from GS during the reporting period experienced a more rapid rate of growth when compared to the general population of other web traffic sources. Even if we were to consider the large growth observed in Y1–Y2 as anomalous and were to exclude it from data as an outlier, a 74% and 70% increase in GS referral traffic and unique traffic respectively can still be observed between Y2 and Y4. This exceeds the growth rates in total (34%) and unique total traffic (39%) by some margin. Rapid growth in referral traffic from Google itself can also be found to have increased by 67% and 69% for traffic and unique traffic respectively. This is clearly lower than the figures for GS but nevertheless exceeds the growth rates observed in the wider pool of referral sources and may explain the higher standard deviation noted in ‘Current data—A*’. The especially steep increase in GS traffic and unique traffic can perhaps best be observed by the profile of the chart presented in Figure 2
4.2. Repository Content Discovery and Usage
Improvements in impressions and clicks were observed in Y2 at 16% (
= 4,537,744) and 23% (
= 153,539) respectively when compared to the Y1 period. This upwards trend accelerated in subsequent reporting years. In Y3 a 69% (
= 7,687,550) and 21% (
= 185,232) increase in impressions and clicks respectively can be observed, followed by an 86% (
= 14,290,059) and 61% (
= 298,020) increase in Y4. This general upwards trend in impressions and clicks, including the aforementioned acceleration in Y3 and Y4, can be observed in Figure 3
Data are contained in Table 4
. The total percentage growth in impressions and clicks during the entire reporting period was 266% and 104% respectively. Figure 4
summarises the increase in clicks, impressions and COUNTER usage; sharper increases in impressions and clicks can be noted between Y2 and Y4.
Strathprints demonstrated a 62% growth in COUNTER compliant usage during the full period examined (i.e., Y1–Y4). It is noteworthy that this growth was observed despite only a 23% growth in full-text deposits during the same period. Even where embargoed content is factored into total full-text deposits, growth remained lower (54%) than the overall increase in usage. As noted in previous work [8
], usage appears to demonstrate a more nuanced pattern when it is examined on a year by year basis. Usage in Y1–Y2 is particularly notable since it deviates considerably from the results reported previously and indicates that in the first year of observation Strathprints actually demonstrated negative growth, albeit minor. Conversely, Y4 yielded a 43% increase in COUNTER usage with only a 20% increase in full-text deposits recorded. Similarly, Y3 yielded an 18% increase in usage but experienced negative growth in full-text deposits (−22%).
It might be assumed that patterns in usage follow an exponential growth model, based on the volume of content deposited over time. In other words, that any increase in usage is directly proportional to increases in the volume of content deposited. This may indeed be true in some examples–and further research is encouraged in this respect; however, in this particular study, a weak exponential relationship was observed via exponential regression (
) with poor curve fitting notable (Figure 5
), indicating the limited influence content deposit growth has on overall usage. Fitting with other common models such as linear, power or logarithmic was similarly weak.
It is apposite to highlight data from the previous section that Google search referrals and GS traffic increased well in excess of the full-text deposit rate, at 266% and 104% respectively; ergo the percentage of users being referred increased at a higher rate than the rate of full-text deposit during the reporting period. This is relevant because, based on these observations, it suggests that the rapid growth in search referrals from Google and GS has been a key factor influencing the increase in COUNTER usage.
To determine whether a correlation between Google clicks and COUNTER usage was present, Pearson’s correlation coefficient was calculated for each year in the reporting period. A correlation was detected, ranging from a weak relationship in Y1 () to a moderate positive correlation in Y2 (). Y1 and Y2 were followed by a strengthening of the relationship in Y3 () and Y4 (). This strengthening of the positive correlation was confirmed via the statistic for both Y3 () and Y4, at a far higher level of statistical significance ().
Computing the coefficient of determination (
) allows for better appreciation of the proportion of variance observed in the dependent variable (i.e., COUNTER usage) which is then predictable from the independent variable (i.e., Google clicks resulting from the changes implemented). In computing the coefficient of determination it was found that
was significantly stronger in Y2 (
) than Y1 (
), but at such a low level that only 42% of variance in usage could be attributed to clicks. Variance narrowed considerably for Y3 (
) with a strong linear relationship between variables noted. This variance then narrowed again in Y4 (
), whereupon 95% of usage could be attributed to Google clicks. The incremental narrowing in variation between Y1 and Y4 can easily be observed from Figure 5
, in which data points in Y3, and particularly Y4, are grouped more closely to the regression line.
An area that evades sufficient understanding in the data analysed thus far is the extent to which specific repository optimizations can also influence discovery on web search platforms that are not either Google or GS. This is largely because these discovery platforms lack any commensurate analytics. Acknowledging that the majority of repository traffic appears to originate from Google and GS, it is nevertheless possible to summarise the most common web traffic referral sources over the reporting period, as measured by GA and using the existing dataset, to establish whether changes could be observed in other platforms. Such data may lack the specificity typical of analyses earlier in this section but nevertheless enable a degree of inference about whether the optimizations have had an influence beyond Google and GS.
and Figure 7
chart the top ten web traffic referral sources during the reporting period, with local sources excluded (e.g., local university website searches, native searches on Strathprints, etc.). From Figure 6
it is possible to observe significant traffic growth from Google and GS. This is to be expected based on analyses earlier in this section, but little change can be observed in the other sources, such as Bing or Baidu, which display limited or zero growth. To better appreciate any modest change in traffic from these other sources, Figure 8
charts the same data but with data on Google and GS excluded. From this it is clear that variation in traffic can be observed across reporting years but no single profile suggests any sustained or significant growth. This would tend to infer that the technical improvements and adjustments implemented in this study demonstrate a Google-specific effect only. Traffic from other sources remained at such low volumes as to have a negligible impact on the overall volume of traffic received by Strathprints.