1. Introduction
The study of information and library science can be enhanced by using basic qualitative and quantitative models. Such rigorous models may provide tools or metaphors by which one can increase the ability to describe what is happening, predict what will happen in the future, and explain why information phenomena occurs as it does. Relationships between variables in the library and information fields serve as the primary foundation for the study of these disciplines. Information theory is one of the most basic theories one can use to understand many aspects of information systems and related professional areas, such as the study of libraries and archives. Information theory provides various forms of relationships between variables in these disciplines, and understanding these relationships with information theory provides a basic level of knowledge for those in the information professions.
The field of library and information science can be studied in a holistic way, with qualitative and quantitative aspects. Most library and information science academic programs provide both qualitative studies and, often to a lesser extent, quantitative methods. As a discipline with a strong focus on the study of information, one can view library and information science as either a single discipline—the merger of the study of libraries and the study of information—or as two separate disciplines, one with an emphasis on libraries, and a separate one with a focus on information.
Many students have weak mathematical skill; a name,
innumeracy [
1], has been proposed for this phenomenon. Many innumerate students, as well as students in general, may be better able to understand certain information-based problems that they may need to solve as an information professional if they are able to more easily solve mathematical problems associated with their work life. Information professionals can use the
http://InformationCalculator.org web calculator to solve information-related problems, which can enable a better appreciation for information relationships, causes, and effects, and lead to improved professional practice. Below, we examine some information solutions to practical problems encountered by information professionals. These solutions may be analyzed using the Information Calculator. Students were asked about their attitudes towards the Information Calculator, and these opinions are examined to further illuminate the strengths and weaknesses of calculating software similar to what they used for a homework assignment.
Calculators have been used for decades to help students better solve problems. Mechanical calculators have been used for applications such as addition for most of the last century. Field-specific mechanical calculators were developed and marketed. For example, many research methods courses taught students to compute statistical functions needed to conduct research using a mechanical statistical calculator. Such calculators were used to determine answers to information-theoretic problems during the first decades of information theory.
Electronic calculators were developed in the 1960s and portable handheld electric calculators became popular in the 1970s. As microprocessors grew in sophistication through the 1970s and the following decades, electronic calculators grew in sophistication, with many discipline-specific calculators such as business, statistical, and scientific electronic calculators becoming inexpensive enough to be affordable to many students in the 1980s and 1990s.
As laptop computers grew in popularity over the last 20 years, students increasingly used laptop computers in college in lieu of calculators. Software made available to students taking particular classes often performed functions that were included in calculators a decade or two prior. While software addressing discipline-specific mathematical functions continued to expand for laptops, web pages that could perform calculations when the web page was displayed increased in popularity. Functions were programmed into software, such as JavaScript, that was executed when a web page was loaded, or a button on the page was pressed. Calculations took place, and the software then displayed the appropriate answer on the screen.
The web page used here,
http://InformationCalcualtor.org, was coded using SageMath software, a public domain package that has combined other public domain mathematical software packages [
2]. This web page calls some of the SageMath functions that are attached to the page, and uses them to perform several information-theoretic operations.
2. Literature
Much of the work below addresses how an information-theoretic calculator was used in a largely graduate-level “organization of information” course that was addressing relational databases. This work took place in a School of Information and Library Science that has a diverse set of students and topical areas, which cover professional practice as well as graduate-level issues of theory [
3].
Library Science (LS) studies the nature of libraries, the organization and delivery of information, and the needs and preferences of possible information users. It studies how material in libraries may be collected, organized, and retrieved by users. Libraries are also seen as providing other functions, often serving their clientele with various programs and services, ranging from bibliographic instruction for college students to hosting storytelling sessions for toddlers.
The study of these issues encompasses a range of problems, from the qualitative to the quantitative. Web calculators can be used to simplify these quantitative problems, allowing those with a more scientific leaning to further expand their understanding of problems and conduct research, while those who are innumerate [
1] can use a calculator to obtain exact and correct answers to problems necessary for the day-to-day operation of libraries.
Information Science (IS) may be delimited differently by those with different interests, with some viewing IS as the study of technology applied to libraries, the study of information, or the study of computers and people. As with LS, practitioners in IS range from highly technical people to those who have problems with the rigorous aspects of mathematics or computer science, but are still interested in developing technological systems and applying them to various domains. An Information Calculator can prove very valuable in the study of IS, whether IS is viewed with a focus on information, or as a discipline focusing on technology that is based on computers operating in a predetermined manner.
Library and information science may be viewed as a single, holistic discipline, or as different disciplines that may be studied separately. Different methods and entities can be used across this area, whether viewed as one or several disciplines. Many of the terms in library and information science are provided and defined in the
Dictionary for Library and Information Sciences [
4]. Most of the concepts are more fully described in the
Encyclopedia of Library and Information Sciences [
5]. Library Science may be defined by the interests of the American Library Association, while Information Science may be defined by the interests of organizations such as the
Association for Information Science and Technology, the
I School consortium, or the
IEEE Information Theory Society.
Within both library and information science, there is the need to organize information. This may consist of both providing labels for items—through assigning subject headings to items in library science—and assigning metadata to items in databases. These labels assist in the retrieval of desired information, with library material often being accessed through searches of subject headings in an online public access catalog. Databases may be searched for metadata or attribute labels, and the metadata may provide further information about various aspects of the data in the database, such as its origin [
6].
Information-carrying entities may be organized through different kinds of arrangements. Printed library books may be ordered on library shelves, for example, by an assigned Dewey Decimal Classification number, or by the Library of Congress classification system number. These orderings support browsing through allowing a potential user to go to a specific location in the library stacks and locate a desired book, or find books on a similar topic. In some instances, a first-time user may enter a library knowing what the call number is for materials on their desired topic, and they will proceed directly to the appropriate location in the library stacks to find useful materials.
Databases may have their data arranged in ways that facilitate their use. One of the desirable characteristics of the database design process is to avoid insertion and deletion anomalies. Insertion anomalies may occur when inserting new information may require much more information needing to be added to the system than one intuitively thinks should be added. For example, a college student enrolling in a database course might be expected to enter their student identification number and identifying information about the database course. One would not expect students to have to enter the instructor’s name for the course, the room number for the course, or possibly the class size for the course. A deletion anomaly occurs when removing information deletes information outside the scope of the transaction. Removing a computer from a room on the second floor should not delete the fact that a certain room that used to contain the computer was on the second floor, or that certain internet connections in the walls are in a certain room and could be used for future computer connections.
By arranging various attributes into different relationships, one can avoid both insertion and deletion anomalies by placing the database into forms. There are various normal forms into which a database may be placed, such as first normal form, second normal form, third normal form, and so on, that meet certain specifications [
7]. Many arrangement issues revolve around relationships, such as dependencies, between a key that is often used to access a relation, and the other attributes of an item. For example, student identification numbers often serve as a key in university databases, as do people’s full names in some other database systems.
Shannon proposed several probabilistic measures of information-related phenomena. Self-information, the information provided by an event, is measured as related to the inverse of the probability of the event occurring [
8,
9]. More formally, I
x = 1/log(p
x), where the logarithms are computed to base 2 to produce the information value associated with the event with probability p
x in bits. For example, the self-information associated with a coin landing “heads”—which has a probability of one half—is 1 bit. The measure of self-information can be applied to any situation where a single event occurs and the measure of the information in this event is desirable. The entropy (H) of a random variable, that is, all the possible events, is computed as the weighted average self-information for each of the possible values. Understanding a coin as a random variable with two equally probable actions, landing “heads” or “tails” is 1 bit. One may combine the entropy associated with two random variables by computing the joint entropy of the two random variables. Entropy and joint entropy are useful measures of the information in systems of variables, and most situations where there are a range of possible values can be measured with an entropy measure. For example, the average information for items in a library collection or in a database can be measured as its entropy value.
The conditional entropy measures the amount of information in a random variable, given a second random variable. For example, one might measure the uncertainty of a random variable at the output of a process, given the random variable describing the data that enters the process. This is a general model for cause-and-effect relationships. If a coin toss at the entry to the process produces heads or tails, and at the output the heads or tails message is received with no errors, the conditional entropy of this output, given the input, is zero, because in this errorless environment, there is no uncertainty with the output, given the knowledge of the input. If the process is so noisy that the binary output of heads or tails is completely independent of the input, the conditional entropy at the output is 1 bit of uncertainty for each message, because given the input, which provides no information about the output, then the output is the 1 bit that one would compute for the simple entropy of a coin toss. Conditional entropy may be measured in any situation where data enters a system, is processed in some way, and then leaves the system.
As an example of the practical use of conditional entropy, it may serve as a useful tool for identifying database keys. If one knows the key for a record, such as a personal ID number, it provides all the information needed about other fields in the record. For example, given a database of hair color and personal ID numbers, if one knows the reader’s personal ID number, one has all the information available about the hair color, or put differently, there is no uncertainty about the hair color. Thus, the conditional entropy of the hair color, given the personal ID number, is 0. On the other hand, knowing the hair color leaves a lot of uncertainty about the personal ID number, as the same hair color in a database could be associated with many different personal ID numbers.
The mutual information of two variables serves as a measure similar to a correlation; it can determine how much information one variable provides about the other. This can be used for measuring the degree of dependence or independence between the two variables. As a measure of the degree of association between two variables, the mutual information between the two variables can be commonly applied to determining how much information one variable provides about a second variable, such as how much information a human’s height provides about a human’s weight.
Information theory may be useful in analyzing relationships between attributes [
10,
11]. For example, if one knows the unique identification number of the reader, the other attributes about the reader in a database will be dependent upon the identification number that serves as a key. Knowing the key, knowing that it is the reader, means that one knows other information that is attached to this key in the database, which possibly includes things such as hair color, height, weight, etc. The homework assignment used below in analyzing students’ reactions to the information calculator addresses some of these types of information-theoretic relationships in a database.
Using a calculator can provide more accurate answers than manual computations, and using the calculator is also associated with more enjoyment and less stress [
12]. However, there are negatives to the use of calculators by learners. Students who grow to rely on a calculator may not absorb the knowledge needed for a full understanding and appreciation of the task at hand [
13]. For students without a full understanding of the subject at hand, a calculator may be a crutch, or one of a number of tools brought to bear on a calculation. For example, a student without much appreciation for basic arithmetic identities might find it easiest to determine what 2018 minus 2018 is by using a calculator, while those with more appreciation of arithmetic laws will quickly solve this mentally. Students who do understand the subject may be able to solve some problems without a calculator (such as some of the problems below), but may also be able to use a calculator to produce numeric answers.
There is a wide range of calculators on the web. These calculators act as web pages that have the ability to accept various inputs and produce outputs. While some are general calculators, many are tailored for specific kinds of problems, such as calculators aimed at specific academic disciplinary problems, as well as supporting non-academic applications, such as industrial, cooking, and health issues. In addition, there are websites that assist in the development of web calculators. Web calculators often have an html outer “shell” and internal processing done with languages such as Javascript. The web calculator described below further uses a set of pre-existing mathematical software packages, SageMath [
2], which can be executed through a JavaScript interface.
Homework assignments may encourage students to practice solving problems and thus help retain learned knowledge and skills for longer terms [
14]. In courses with large numbers of graduate students, homework exercises may help students infer relationships, with these insights ideally remaining with the students well into their professional lives [
15].
4. Student Attitudes toward the Information Calculator
Students expressed a number of opinions about the use of calculators in general, and the information calculator specifically. These comments were obtained by asking students to answer several questions about using the calculator to aid them in completing the homework assignment.
One question asked was, “What is best reason to use the Information Calculator?” There were a number of different kinds of responses. A number of positive responses were obtained, suggesting that the calculator has different positive aspects, or that different students find different advantages in using the
http://InformationCalculator.org.
Some students commented that using the calculator lead to understanding, with comments often using the word “understand”. One person noted that, for this database assignment, using the calculator lead one “to understand the relationship between different attributes”, while another felt the calculator lead the student “to check my understanding”, and another student felt that “in order to best understand” the problems, the calculator should be used. One student noted that “the calculator can also be helpful…to make sure we understand the concepts and its formulas”. Less directly, one student noted that students can “figure out how entropy works”.
Some students noted the ease of use. One stated that it is “easily convenient”, and another noted the “ease of calculation”, while another commented that it “is easy to use”, or “Convenien[t], saving time”. Another noted that “I can get the results easily”; another that “it makes computation easier”. Related to this is that the calculator is “simple”. Ease of use can also be interpreted as avoiding effort, as when one student noted that with the calculator there is “no memorizing the formula”.
Related to ease of use is the speed of calculations. One noted that the calculator “quickly calculates information”, while two other students called it “quick” and another noted that it “makes calculations … quickly”.
Accuracy is also important to calculations. One student described the calculator as “resolv[ing] uncertainty”, while other students were error-avoiders. One student noted that the calculator “helps reduce human error” and another commented on “less potential for errors in calculation”, with another one echoing that using the calculator would “reduce human error”.
Students were also asked why they should not use the information calculator.
One reason given was that using the calculator meant that students weren’t learning as much, because the information calculator did much of the work for them. One noted that “always calculat[ing] by ourselves can help us remember the concept all the time”; another noted that it “make[s] me dependent on the calculator rather than understand[ing] the theory and the formulas for myself”, and another “ha[s] less of an understanding of how they work.” One “can’t understand the formulas in the info calculator”. The Information Calculator “doesn’t show the mechanism behind calculation. We cannot see the process.” This concern about missing an educational opportunity was largely offered by non-native speakers of English who were enrolled in the class.
When asked why “understanding the formulas in the Information Calculator will be useful to me as a professional”, the primary reason for this was for increased “understanding”, as the question suggested.
One student noted, “This will help me as an archivist by understanding the amount of information in records and how they [are] related”, or it “can help me in my work in cataloging … [I] can try to find relationships”, while another suggested that, “Firstly I can fully understand the theory and apply it”. “[U]nderstanding these formulas will be useful in data preservation” and “understanding the applications always help me identify the situations where they would be useful”. Other answers pointed to how information, and data in general, would be better appreciated. The calculator “leads to the inner meaning of the information, which we must understand as an information researcher”; it was “help[ing] me to understand the appropriate times to use the calculator” and “further understanding of databases …”; one student remarked, “I can test database designs”; another said, “the more ways to talk about how information is processed and is transmitted, the better.” Several students commented on their eventual need to work with databases and how the information calculator would assist with this. This was to be expected, since the homework assignment explicitly addressed database design. One student expressed an interest in “how we can compact … information”.
A student noted that they “tend[ed] to shy away from math” and the calculator made life easier for one who wished to merely enter numbers without fully understanding the formulas underlying the calculator.
Several students suggested that using the calculator produced an increased level of understanding. Some felt that the calculator was easy to use or that it was simple. This might result in students not needing to memorize formulas, which some took as a positive aspect of the calculator, and others saw as a negative aspect. The calculator is also faster than human calculators, and almost always more accurate.