Word processors, spreadsheets and relational databases have quantitatively extended human cognitive capabilities. Most kinds of documents can now be filed, replicated and delivered at light speed to those who need the knowledge they contain. Large–scale arithmetic and mathematical calculations that would be completely beyond the capacity of an individual human to achieve can be carried out in an instant. However, these applications by themselves have not fundamentally changed the cognitive activities of people assembling knowledge into documents or managing them. Current business processes using word processing systems, spreadsheets and databases often still closely resemble those followed when scribes and clerks were pressing cuneiform script onto clay tablets – they just work faster with fewer people.
In the last decade computer-based information management and delivery technologies have begun to break away from the paper paradigm. These newer technologies assist human cognitive abilities in new ways, and will lead to more radical changes in the way humans produce and work with knowledge than have occurred in conjunction with past technological revolutions. However, before considering the electronic tools, it is worth considering the evolution of older information and knowledge management technologies practices associated with printed literature.
As noted earlier, writing and books allowed knowledge to be recorded and preserved as objective knowledge in World 3 externally to the human brain. The development of libraries or collections of written materials follows on naturally from the development of writing. Recorded knowledge is most useful if people other than the original authors can find and retrieve what they need from the records when and where it is needed. To fulfill this function, repositories or libraries need to be much more than random collections of documents.
Originally, most records stored in libraries were religious and literary works (i.e., books). In the latter half of the 17th Century European philosophers such as Roger Bacon stressed the importance of documenting knowledge of the world (Bell, 2000). The first scholarly and scientific journals were established in the 1660's to provide standard means of communicating scientific discoveries and knowledge (Fjällbrant, 1996–1997; MacDonell, 1999). As the volume of knowledge held in books and other publications grew beyond the capacity of an individual remember which document held which knowledge, the need for systems to manage and retrieve specific knowledge objects from the storehouse became paramount. As will be seen, the requirements to manage and retrieve content from books in general versus retrieval of specific kinds of knowledge from the scientific and scholarly literature are actually quite different.
To be useful, the records (or "books") in the knowledge storehouse (library) need to be systematically organized – both to manage them physically in space and conceptually to facilitate retrieving the knowledge they contain. Two approaches provide systematic clues to the contents of a library: indexes and catalogs. An index is an alphabetically ordered list of contents, e.g., by author name and title – which points to the book's physical location in the library. A catalog attempts to provide a systematically organized structure of knowledge that places each book or object at a particular place within that structure based on the principal subject(s) or content of the book. Most libraries maintain both an author/title index and a subject catalog. In most cases, the catalog organizes subjects according to some kind of hierarchical logic as discussed below. Before computerization, indexes and catalogs were physical files of index cards that had to be manually sorted and maintained.
The earliest Sumerian tablet archives, dating to the third dynasty of Ur (c. 2100 BC) apparently already used simple classification systems. The tablet–repository of the earliest temple at Nippur contained at least 20,000 tablets. These covered diverse topics, including linguistics, ideogram registers, grammatical exercises, lists of names of mountains and cities, of gods and temples and of minerals and plants, medical prescriptions and incantations, liturgical texts and hymns, etc. Based on physical evidence, the material was shelved to help determine three things: the identity of every tablet, its specific content and its extent135. I have found no references on the Web to tablets found in these libraries describing the classification philosophy followed, but even these early repositories may have been conceived as "universal" libraries to catalog all recorded knowledge.
To quote from the introduction to the Second Anglo–German Seminar on Library History154:
The 'universal library' has long been a dream of literate societies. The idea that it might be possible to assemble a collection of texts in which all human knowledge was contained had obvious appeal for tyrants, democrats, and scholars alike.
The Library (BIBLIOQHKHA) in Alexandria, Egypt, founded in 297 BC, is the earliest of the universal libraries we know much about137 (Jameson 1993; Brundige 1998; El-Abbadi 1998; Delia (1992). Some of its history may be myth or legend (Bede 2000), but there is no doubt that it demonstrated many of the knowledge management concepts that underpin modern libraries. Demetrios of Phaleron supposedly founded the Bibliotheka as a part of the Mouseion (a research academy or "museum" of the arts and sciences), established by Alexander the Great's satrap, Ptolemy. Ptolemy intended the library to hold a copy of every book in the world. At its apogee, the library probably held substantially more than 500,000 manuscripts – a large collection even by today's standards. To access the knowledge in its holdings required the development of cataloguing and indexing systems. According to Jameson (1993), Kallimachos of Cyrene (ca.305–240 BC), invented the kind of hierarchically systematic catalog still used by modern libraries:
In the Pinakes, Kallimachos devised a system by which a large collection of books could be arranged. The “lists” of the Pinakes constitute an author–title catalog and a subject catalog, the first such scientific classifications in the history of western libraries. But the catalog, as it evolved from Kallimachos' work, was much more than an aid for the retrieval of books. By adding specific biographical information about authors and compiling lists of similar or related works, Kalimachos invented the bibliography....
Kallimachos had no known precedent for his work, but what he put together subsequently became a model for all librarians. For his accomplishment, he deserves to be called the ‘inventor’ of the catalog and the bibliography–the two indispensable tools of the scholar and librarian.
One of the oldest surviving universal libraries and certainly the most important of the early libraries is the Vatican Library, founded in the 1450's. This is now open to all via a virtual tour of its origins, history, impacts on scholarship and important holdings in the early history of the book. Unfortunately, the Library, founded in the 1450's, only began to be effectively catalogued in the 20th Century. At least through the 16th Century, books were chained to benches more or less organized by subject. A bench list identified the books chained to each bench, but these were sequential records on paper which meant titles were listed in the order added to the bench (Boyle 1994). Although its collections were eclectic, for all too much of its early history it was more concerned to protect its repository of early knowledge from scholars rather than making it available to them138.
The core technology for locating information in modern libraries is still the catalog. There are several major cataloguing systems in use today and all are based on systematically classifying the contents of the catalogued objects139:
The Dewey Decimal Classification system (DDC) is first and still most widely used general classification system developed in the modern era. It is designed to cover the whole of general knowledge within an organizational scheme that is continuously revised to keep pace with the growth of knowledge. The system was conceived by Melvil Dewey in 1873 and first published in 1876. It is maintained up to date by a division of OCLC Online Computer Library Center, Inc. currently working out of the US Library of Congress130.
The second major universal classification system is the US Library of Congress Classification131 (LCC) system, which some think to be more suitable for the largest collections. The Library of Congress was founded in 1800, but was burned by the British when they attacked Washington in 1814. The Library was rebuilt and Thomas Jefferson sold his extensive personal collection to form the nucleus of the new Library. Jefferson's personal classification system, based on the philosophies of Bacon and d'Alembert, continued in use until it was replaced by the current system, developed in the period from 1887 to 1903132. The adoption of the LCC system by other libraries was facilitated when in 1901 the Library of Congress began selling copies of its own catalog cards for use by other libraries. Cole (1992) provides a history of the library.
Other commonly used "universal" cataloguing and classification systems include the Universal Decimal Classification (NISS 1997) and the Colon Classification. A core idea fully developed in Ranganathan's Colon Classification system is to describe in an indexable form, several different "facets" of the item being indexed (Maple 1995).
All these systems seek to provide pigeonholes for filing (and retrieving) the entire scope of human knowledge. Only if there is a comprehensible scheme defining where knowledge should be stored, can someone other than the person who filed it find the knowledge when needed. To do this reliably requires the entire administrative apparatus of a public or research library to accession, catalog, shelve and retrieve the printed knowledge. However, no matter how sophisticated the classification system, there has never been a completely satisfactory method for filing objects as large and complex as books by single codes able to represent their content. The situation is even worse for the contents of scientific and scholarly journals.
This was understood more than 2,200 years ago in the in the ancient library of Alexandria:
Problems arose for works without author or title, or copies of the same work with different titles, or different work with the same title. [Kallimachos] also had to deal with scrolls that were inscribed on both sides or collective scrolls which contained works by several authors. His famous maxim 'Megabiblion, megakakon' (Big book, big evil) expresses the frustration felt ever since by librarians when they have to catalog a multi–authored work. (Jamieson 1993).
Thomas Jefferson, expressed similar frustrations in describing the system he used to catalog his own library, which he sold to the Library of Congress in 1914 and which formed the basis of the Library of Congress's catalog through the 1880's:
The arrangement according to subject is far preferable, altho' sometimes presenting difficulties also, for it is often doubtful to what particular subject a book should be ascribed. This is remarkably the case with books of travels, which often blend together the geography, natural history, civil history, agriculture, manufacturing, commerce, arts, occupations, manners, etc. of a country, so as to render it difficult to say to which they chiefly relate. Others again are polygraphical in their nature, as encyclopedias, magazines, etc.162
Weinberg (1996) discussed problems and limitations of complex classification systems.
The ultimate representation of the thought content of a document, which was espoused by Ranganathan [Colon Classification], represents optimization in classification and indexing. Human beings may not require this. Pointing the user to a manageable chunk of text or number of documents that can be scanned in a reasonable amount of time for the desired fact or information would constitute satisficing in the field of content analysis.... There is considerable evidence that users don't want intermediaries to retrieve only the single most specific document on a topic. Users want to select from a group of documents and make their own relevance judgments. (Weinberg, 1996:5).
...
Computers are great at counting words, but are not so successful at distinguishing the significant occurrences of words from the insignificant ones. Moreover, computers are terrible at recognizing concepts. (Weinberg, 1996:6).
The conclusion from thousands of years trying to organize knowledge held in libraries, is that even the most sophisticated cataloging systems offer limited capabilities for identifying and retrieving knowledge held in books. Knowledge held in primary professional literature – the scholarly and technical journals where newly developed knowledge is first published and disseminated, is even more difficult to catalog for effective retrieval.