The study of information and library science can be enhanced by using basic qualitative and quantitative models. Such rigorous models may provide tools or metaphors by which one can increase the ability to describe what is happening, predict what will happen in the future, and explain why information phenomena occurs as it does. Relationships between variables in the library and information fields serve as the primary foundation for the study of these disciplines. Information theory is one of the most basic theories one can use to understand many aspects of information systems and related professional areas, such as the study of libraries and archives. Information theory provides various forms of relationships between variables in these disciplines, and understanding these relationships with information theory provides a basic level of knowledge for those in the information professions.
The field of library and information science can be studied in a holistic way, with qualitative and quantitative aspects. Most library and information science academic programs provide both qualitative studies and, often to a lesser extent, quantitative methods. As a discipline with a strong focus on the study of information, one can view library and information science as either a single discipline—the merger of the study of libraries and the study of information—or as two separate disciplines, one with an emphasis on libraries, and a separate one with a focus on information.
Many students have weak mathematical skill; a name, innumeracy
], has been proposed for this phenomenon. Many innumerate students, as well as students in general, may be better able to understand certain information-based problems that they may need to solve as an information professional if they are able to more easily solve mathematical problems associated with their work life. Information professionals can use the http://InformationCalculator.org
web calculator to solve information-related problems, which can enable a better appreciation for information relationships, causes, and effects, and lead to improved professional practice. Below, we examine some information solutions to practical problems encountered by information professionals. These solutions may be analyzed using the Information Calculator. Students were asked about their attitudes towards the Information Calculator, and these opinions are examined to further illuminate the strengths and weaknesses of calculating software similar to what they used for a homework assignment.
Calculators have been used for decades to help students better solve problems. Mechanical calculators have been used for applications such as addition for most of the last century. Field-specific mechanical calculators were developed and marketed. For example, many research methods courses taught students to compute statistical functions needed to conduct research using a mechanical statistical calculator. Such calculators were used to determine answers to information-theoretic problems during the first decades of information theory.
Electronic calculators were developed in the 1960s and portable handheld electric calculators became popular in the 1970s. As microprocessors grew in sophistication through the 1970s and the following decades, electronic calculators grew in sophistication, with many discipline-specific calculators such as business, statistical, and scientific electronic calculators becoming inexpensive enough to be affordable to many students in the 1980s and 1990s.
The web page used here, http://InformationCalcualtor.org
, was coded using SageMath software, a public domain package that has combined other public domain mathematical software packages [2
]. This web page calls some of the SageMath functions that are attached to the page, and uses them to perform several information-theoretic operations.
Much of the work below addresses how an information-theoretic calculator was used in a largely graduate-level “organization of information” course that was addressing relational databases. This work took place in a School of Information and Library Science that has a diverse set of students and topical areas, which cover professional practice as well as graduate-level issues of theory [3
Library Science (LS) studies the nature of libraries, the organization and delivery of information, and the needs and preferences of possible information users. It studies how material in libraries may be collected, organized, and retrieved by users. Libraries are also seen as providing other functions, often serving their clientele with various programs and services, ranging from bibliographic instruction for college students to hosting storytelling sessions for toddlers.
The study of these issues encompasses a range of problems, from the qualitative to the quantitative. Web calculators can be used to simplify these quantitative problems, allowing those with a more scientific leaning to further expand their understanding of problems and conduct research, while those who are innumerate [1
] can use a calculator to obtain exact and correct answers to problems necessary for the day-to-day operation of libraries.
Information Science (IS) may be delimited differently by those with different interests, with some viewing IS as the study of technology applied to libraries, the study of information, or the study of computers and people. As with LS, practitioners in IS range from highly technical people to those who have problems with the rigorous aspects of mathematics or computer science, but are still interested in developing technological systems and applying them to various domains. An Information Calculator can prove very valuable in the study of IS, whether IS is viewed with a focus on information, or as a discipline focusing on technology that is based on computers operating in a predetermined manner.
Library and information science may be viewed as a single, holistic discipline, or as different disciplines that may be studied separately. Different methods and entities can be used across this area, whether viewed as one or several disciplines. Many of the terms in library and information science are provided and defined in the Dictionary for Library and Information Sciences
]. Most of the concepts are more fully described in the Encyclopedia of Library and Information Sciences
]. Library Science may be defined by the interests of the American Library Association, while Information Science may be defined by the interests of organizations such as the Association for Information Science and Technology
, the I School
consortium, or the IEEE Information Theory Society
Within both library and information science, there is the need to organize information. This may consist of both providing labels for items—through assigning subject headings to items in library science—and assigning metadata to items in databases. These labels assist in the retrieval of desired information, with library material often being accessed through searches of subject headings in an online public access catalog. Databases may be searched for metadata or attribute labels, and the metadata may provide further information about various aspects of the data in the database, such as its origin [6
Information-carrying entities may be organized through different kinds of arrangements. Printed library books may be ordered on library shelves, for example, by an assigned Dewey Decimal Classification number, or by the Library of Congress classification system number. These orderings support browsing through allowing a potential user to go to a specific location in the library stacks and locate a desired book, or find books on a similar topic. In some instances, a first-time user may enter a library knowing what the call number is for materials on their desired topic, and they will proceed directly to the appropriate location in the library stacks to find useful materials.
Databases may have their data arranged in ways that facilitate their use. One of the desirable characteristics of the database design process is to avoid insertion and deletion anomalies. Insertion anomalies may occur when inserting new information may require much more information needing to be added to the system than one intuitively thinks should be added. For example, a college student enrolling in a database course might be expected to enter their student identification number and identifying information about the database course. One would not expect students to have to enter the instructor’s name for the course, the room number for the course, or possibly the class size for the course. A deletion anomaly occurs when removing information deletes information outside the scope of the transaction. Removing a computer from a room on the second floor should not delete the fact that a certain room that used to contain the computer was on the second floor, or that certain internet connections in the walls are in a certain room and could be used for future computer connections.
By arranging various attributes into different relationships, one can avoid both insertion and deletion anomalies by placing the database into forms. There are various normal forms into which a database may be placed, such as first normal form, second normal form, third normal form, and so on, that meet certain specifications [7
]. Many arrangement issues revolve around relationships, such as dependencies, between a key that is often used to access a relation, and the other attributes of an item. For example, student identification numbers often serve as a key in university databases, as do people’s full names in some other database systems.
Shannon proposed several probabilistic measures of information-related phenomena. Self-information, the information provided by an event, is measured as related to the inverse of the probability of the event occurring [8
]. More formally, Ix
), where the logarithms are computed to base 2 to produce the information value associated with the event with probability px
in bits. For example, the self-information associated with a coin landing “heads”—which has a probability of one half—is 1 bit. The measure of self-information can be applied to any situation where a single event occurs and the measure of the information in this event is desirable. The entropy (H) of a random variable, that is, all the possible events, is computed as the weighted average self-information for each of the possible values. Understanding a coin as a random variable with two equally probable actions, landing “heads” or “tails” is 1 bit. One may combine the entropy associated with two random variables by computing the joint entropy of the two random variables. Entropy and joint entropy are useful measures of the information in systems of variables, and most situations where there are a range of possible values can be measured with an entropy measure. For example, the average information for items in a library collection or in a database can be measured as its entropy value.
The conditional entropy measures the amount of information in a random variable, given a second random variable. For example, one might measure the uncertainty of a random variable at the output of a process, given the random variable describing the data that enters the process. This is a general model for cause-and-effect relationships. If a coin toss at the entry to the process produces heads or tails, and at the output the heads or tails message is received with no errors, the conditional entropy of this output, given the input, is zero, because in this errorless environment, there is no uncertainty with the output, given the knowledge of the input. If the process is so noisy that the binary output of heads or tails is completely independent of the input, the conditional entropy at the output is 1 bit of uncertainty for each message, because given the input, which provides no information about the output, then the output is the 1 bit that one would compute for the simple entropy of a coin toss. Conditional entropy may be measured in any situation where data enters a system, is processed in some way, and then leaves the system.
As an example of the practical use of conditional entropy, it may serve as a useful tool for identifying database keys. If one knows the key for a record, such as a personal ID number, it provides all the information needed about other fields in the record. For example, given a database of hair color and personal ID numbers, if one knows the reader’s personal ID number, one has all the information available about the hair color, or put differently, there is no uncertainty about the hair color. Thus, the conditional entropy of the hair color, given the personal ID number, is 0. On the other hand, knowing the hair color leaves a lot of uncertainty about the personal ID number, as the same hair color in a database could be associated with many different personal ID numbers.
The mutual information of two variables serves as a measure similar to a correlation; it can determine how much information one variable provides about the other. This can be used for measuring the degree of dependence or independence between the two variables. As a measure of the degree of association between two variables, the mutual information between the two variables can be commonly applied to determining how much information one variable provides about a second variable, such as how much information a human’s height provides about a human’s weight.
Information theory may be useful in analyzing relationships between attributes [10
]. For example, if one knows the unique identification number of the reader, the other attributes about the reader in a database will be dependent upon the identification number that serves as a key. Knowing the key, knowing that it is the reader, means that one knows other information that is attached to this key in the database, which possibly includes things such as hair color, height, weight, etc. The homework assignment used below in analyzing students’ reactions to the information calculator addresses some of these types of information-theoretic relationships in a database.
Using a calculator can provide more accurate answers than manual computations, and using the calculator is also associated with more enjoyment and less stress [12
]. However, there are negatives to the use of calculators by learners. Students who grow to rely on a calculator may not absorb the knowledge needed for a full understanding and appreciation of the task at hand [13
]. For students without a full understanding of the subject at hand, a calculator may be a crutch, or one of a number of tools brought to bear on a calculation. For example, a student without much appreciation for basic arithmetic identities might find it easiest to determine what 2018 minus 2018 is by using a calculator, while those with more appreciation of arithmetic laws will quickly solve this mentally. Students who do understand the subject may be able to solve some problems without a calculator (such as some of the problems below), but may also be able to use a calculator to produce numeric answers.
Homework assignments may encourage students to practice solving problems and thus help retain learned knowledge and skills for longer terms [14
]. In courses with large numbers of graduate students, homework exercises may help students infer relationships, with these insights ideally remaining with the students well into their professional lives [15
Our analysis of the use of the information calculator is based upon students in two small sections of an Organization of Information course that is required for an American Library Association accredited Master’s degree. Both class sections had one undergraduate enrolled, and the rest of the students were in the Master’s program. One section provided 20 usable homework assignments, and the other class section provided 14 assignments, for a total of 34 assignments. Because of the relatively small number of assignments and the low variation in homework performance, no statistical analysis was conducted because of the likelihood that students could identify other students by examining tables of data that detail the characteristics of the students and their performance on the specific assignment questions. Almost all of the students answered all the calculator questions correctly; the differences between students were on their attitudes toward using the calculator, and those differences are the focus of the methodology below.
As students perform a homework assignment about information in database systems using an information calculator, they may develop further knowledge about information, databases, and their own abilities to understand and solve mathematical problems. The homework assignment used here asked questions that students could answer using the information calculator. Additionally, each homework assignment asked students subjective questions about their attitudes toward the calculator and the application of the results.
The focus here is on data, provided in Table 1
, and some relationships that exist between the different attributes listed in the columns in the table. The questions below were asked about the following table:
Note that in using the table, anyone with knowledge of the ID Number can easily and accurately determine the associated First Name, Street Address, and Building Color with no uncertainty. Further, if one has the Street Address, one can exactly determine the Building Color. One could remove the Street Address relationship with Building Color information from this table and make a separate relation. Building Color can then be removed from Table 1
, and the remaining three columns—ID Number, First Name, and Street Address—are a useful relation that may be combined (joined) with the Street Address and Building Color table to produce the original Table 1
The following questions were asked:
What is the entropy, in bits, for the ID Number, the First Name, and the Building Color?
What is the conditional entropy of the ID Number, given the First Name? Enter each name (and text entries below) in single quotes, e.g., “Danielle”. Thus, a list of Street Addresses would start like this: [“123 Library Lane”, “123 Library Lane”, “456 Information Ave.”…].
What is the conditional entropy of the ID Number, given the Street Address?
What is the conditional entropy of the Street Address, given the ID Number?
What is the conditional entropy of the Street Address, given the Building Color?
What is the conditional entropy of the Building Color, given the Street Address?
How might you use these results, and why might this work? Think about the relationship between “key” and “non-key” attributes.
The information values were computed for various relationships in the table by using the Information Calculator, with the URL of http://InformationCalculator.org
. The calculator performs a number of functions, including self-information, entropy, conditional entropy, and mutual information for data values.
4. Student Attitudes toward the Information Calculator
Students expressed a number of opinions about the use of calculators in general, and the information calculator specifically. These comments were obtained by asking students to answer several questions about using the calculator to aid them in completing the homework assignment.
One question asked was, “What is best reason to use the Information Calculator?” There were a number of different kinds of responses. A number of positive responses were obtained, suggesting that the calculator has different positive aspects, or that different students find different advantages in using the http://InformationCalculator.org
Some students commented that using the calculator lead to understanding, with comments often using the word “understand”. One person noted that, for this database assignment, using the calculator lead one “to understand the relationship between different attributes”, while another felt the calculator lead the student “to check my understanding”, and another student felt that “in order to best understand” the problems, the calculator should be used. One student noted that “the calculator can also be helpful…to make sure we understand the concepts and its formulas”. Less directly, one student noted that students can “figure out how entropy works”.
Some students noted the ease of use. One stated that it is “easily convenient”, and another noted the “ease of calculation”, while another commented that it “is easy to use”, or “Convenien[t], saving time”. Another noted that “I can get the results easily”; another that “it makes computation easier”. Related to this is that the calculator is “simple”. Ease of use can also be interpreted as avoiding effort, as when one student noted that with the calculator there is “no memorizing the formula”.
Related to ease of use is the speed of calculations. One noted that the calculator “quickly calculates information”, while two other students called it “quick” and another noted that it “makes calculations … quickly”.
Accuracy is also important to calculations. One student described the calculator as “resolv[ing] uncertainty”, while other students were error-avoiders. One student noted that the calculator “helps reduce human error” and another commented on “less potential for errors in calculation”, with another one echoing that using the calculator would “reduce human error”.
Students were also asked why they should not use the information calculator.
One reason given was that using the calculator meant that students weren’t learning as much, because the information calculator did much of the work for them. One noted that “always calculat[ing] by ourselves can help us remember the concept all the time”; another noted that it “make[s] me dependent on the calculator rather than understand[ing] the theory and the formulas for myself”, and another “ha[s] less of an understanding of how they work.” One “can’t understand the formulas in the info calculator”. The Information Calculator “doesn’t show the mechanism behind calculation. We cannot see the process.” This concern about missing an educational opportunity was largely offered by non-native speakers of English who were enrolled in the class.
When asked why “understanding the formulas in the Information Calculator will be useful to me as a professional”, the primary reason for this was for increased “understanding”, as the question suggested.
One student noted, “This will help me as an archivist by understanding the amount of information in records and how they [are] related”, or it “can help me in my work in cataloging … [I] can try to find relationships”, while another suggested that, “Firstly I can fully understand the theory and apply it”. “[U]nderstanding these formulas will be useful in data preservation” and “understanding the applications always help me identify the situations where they would be useful”. Other answers pointed to how information, and data in general, would be better appreciated. The calculator “leads to the inner meaning of the information, which we must understand as an information researcher”; it was “help[ing] me to understand the appropriate times to use the calculator” and “further understanding of databases …”; one student remarked, “I can test database designs”; another said, “the more ways to talk about how information is processed and is transmitted, the better.” Several students commented on their eventual need to work with databases and how the information calculator would assist with this. This was to be expected, since the homework assignment explicitly addressed database design. One student expressed an interest in “how we can compact … information”.
A student noted that they “tend[ed] to shy away from math” and the calculator made life easier for one who wished to merely enter numbers without fully understanding the formulas underlying the calculator.
Several students suggested that using the calculator produced an increased level of understanding. Some felt that the calculator was easy to use or that it was simple. This might result in students not needing to memorize formulas, which some took as a positive aspect of the calculator, and others saw as a negative aspect. The calculator is also faster than human calculators, and almost always more accurate.
5. Analysis of Student Attitude Data
Several different attitudes existed among those who used the Information Calculator. It may lead to an increase in understandin and result in greater accuracy, and it was easier to use, although using the calculator may also result in students not learning the formulas. The comments provided by students suggested that there are several positive opinions about the Information Calculator and a few negative opinions. Several students mentioned an increase or presence of understanding, predominately from among native speakers of English. They also noted that it was convenient and easy to use. Non-native speakers of English were the dominant demographic for people who emphasized the ease of calculation. This may be because of the heavy mathematical experience of these students, who perhaps expected to learn information theory using these same mathematical techniques, rather than using a simple calculator. The calculator also helped reduce human error, as noted by native speakers of English.
A significant negative to using the calculator is that students did not learn as much as when they studied and learned the theory. Students noted that they did not learn the process. Additionally, one student noted that using the calculator increased dependence on the calculator.
The biggest advantage of using the calculator for information professionals was that this knowledge was generally useful. More specifically, this knowledge helped students understand the theory and “inner meaning” of information and its application, and the measurement of the amounts of information in professional situations.
Most students answered all of the questions correctly on the homework assignment, which indicated that they were able to use the calculator correctly. Two students had several answers wrong. One self identified as “an archivist”, and the other as someone who intended to work “in cataloging”. Both of these students had backgrounds in the humanities, and may have been confused by the assignments. A third student who had a background in the harder sciences produced one wrong answer, which may have been purely accidental.
The non-native speakers of English in the class constituted the majority of the Information Science students, and the native speakers of English were the majority of the Library Science students. The largest difference between the two groups was the interest in learning the formulas on the part of the non-native speakers. Most of these Information Science students had undergraduate degrees with extensive mathematical training and probably considered learning mathematics to be a normal part of their education. These international students pointed out how using the calculator without learning the underlying formulas and principles could lead to a lack of understanding. Interestingly, native speakers of English did not see failure to learn and understand the underlying formulas as a significant problem. This belief by some in full understanding could lead to a better managers and to better understanding, as is found in leaders in the field and academic researchers.
The most important recommendation that one can make, given the results obtained above, is to use the information calculator, or other comparable software, to allow students to more easily study how information theory may used to describe, predict, understand, and explain relationships found in the information world. The information calculator was seen by many students as being much easier than performing manual calculations, and enabled those with a limited amount of time, knowledge, or motivation to constructively determine the nature of relationships between informative variables.
Because using the information calculator made it much easier to perform calculations, students could learn more sophisticated and more formally defined relationships in the information world. Classroom instructors may go much further in formal descriptions of the information world and related phenomenon than they might have previously. Students can expect to better understand the science of information and perform better as information professionals.
While using the calculator may allow students to work at a more sophisticated level, efforts should be made to present the underlying formulas. Some students reported that using the calculator without a full presentation of the underlying formulas and theory resulted in a less sophisticated appreciation of the information world and the theories describing relationships within this domain. This can be remedied through presentations in a classroom or online, or by providing readings that address these topics for more advanced students. Given the variation in mathematical maturity among graduate students, readings might be the best approach to use.
A minor recommendation is to teach scientific notation for very large and very small numbers. The information calculator may generate numbers such as zero or numbers that are very close to zero when calculations involved complex operations, such as division and logarithms, for example. In place of zero, a complex calculation might generate a number such as 1 times ten to the minus 16th power. This is numerically almost equivalent to the zero, but students may be confused by what may appear to be a large number if they are unfamiliar with scientific notation.
7. Summary and Conclusions
Students studying the organization of information may benefit from learning the relationships between informative items as described by information theory. This body of theory can allow one to describe what has occurred retroactively, as well as to predict what will happen in the future. It further allows one to better understand these phenomena [16
Using a calculator with built-in information theoretic functions can enhance the education of these students who desire to become information professionals. By using such a calculator, students were able to compute the information-theoretic relationships existing in database management systems. Using such methods, one can determine the relationships between keys in a database and the non-key attributes, and encourage the development of databases without insertion and deletion anomalies.
Students reported a number of advantages to using the calculator. Some students claimed that using the calculator lead to an increased understanding of the relationships existing in the problem area being examined. Some found the calculator easy to use and a time saver. The calculator was often more accurate than manual calculations.
A negative aspect of using the calculator is that students may not learn and use the formulas themselves that are inherent in the calculator. This was noted by several non-native speakers in the class, who often had undergraduate educations where they learned more mathematically-based principles, along with the formulas, than do many of the American students in Master’s in IS and LS programs.
Using a calculator such as the one described here can lead to a more sophisticated understanding of complex relationships between data, such as those found in automatic indexing or relational databases.