The following article appears in the journal "Library Hi Tech," Volume 19, Number 1, ISSN 0737-8831 and is used with permission.
by John Cookson and Lloyd Rasmussen
John Cookson (firstname.lastname@example.org) and Lloyd Rasmussen (email@example.com) are both based at the National Library Service for the Blind and Physically Handicapped (NLS), The Library of Congress, Washington, DC, USA.
Keywords: Disabled people, Information technology, Blind people, Information services.
Abstract: The National Library Service (NLS) produces about 2,000 talking books and 50 magazines per year on specially formatted cassette tape for free distribution to a readership of about 764,000. Cassettes and special players are delivered by U.S. Postal Service from a network of 138 participating libraries. To control the cost of technical obsolescence and to meet patron and sponsor expectations, NLS will replace this analog system with a digital system over the next ten years.
The increasing prevalence, popularity, and economy of digital electronics motivates the National Library Service (NLS) to work rapidly toward a completely digital audio service. This means using digital methods throughout the production and distribution stream, from studio recording of digital original masters and digital collection management to digital product distribution and playback. The length and complexity of the transition are consequences of size. NLS annually circulates about 22 million audio books and magazines nationwide and supports the use of about 721,000 cassette playback units. Further information on NLS, including the agency's mission, organization and history can be found on the NLS web site ; see, particularly, the area titled "About NLS" , where "Basic facts about NLS" and other summary information can be found.
The National Library Service (NLS) produces about 2,000 talking books and 50 magazines per year on specially formatted cassette tape for free distribution to a readership of about 764,000. Cassettes and special players are delivered by U.S. Postal Service from a network of 138 participating libraries. To control the cost of technical obsolescence and to meet patron and sponsor expectations, NLS will replace this analog system with a digital system over the next ten years. In response to the high-risk digital-transition challenge, NLS has published a summary master plan and has made significant progress toward its implementation. The plan can be found on the NLS web site under "Digital talking books: planning for the future" . Of particular interest are the articles "Planning for the future" and "Twenty steps to next-generation NLS technology." In our synopsis of implementation presented below, we will refer to specific topics mentioned in the cited articles.
NLS's implementation activity has been focused on three major project areas:
our digital audio development project;
our standards-setting effort; and
local research and development.
Digital audio development project
Immediately following its inception in September 1998, this NLS committee identified three areas for pursuit of our digital future:
product simulation; and
Prioritized list of features for digital talking book playback devices. As the name indicates, this document describes characteristics of DTB playback systems. It allows for three different types of players: very simple hand-helds, mainly for linear leisure reading; more complex portables, mainly for students and professionals; and user-supplied computer-based players capable of supporting the most sophisticated features. Features are prioritized as essential, highly desirable, or useful for each type of player, and include functions such as variable-speed reading, book-marking, and the ability to immediately access items listed in a table of contents. The full feature set may be found at the Digital Talking Book Standards Committee's document titled, "Playback device guidelines".
Document navigation features list. Although the prioritized feature set mentioned above contains general navigation requirements, the topic is complex enough to motivate a separate explanatory document. The Navigation Features List  describes mechanisms for immediate random access to selected areas of a book, and other capabilities such as searching, highlighting, excerpting, and skipping user-selected elements.
Structuring guidelines for digital talking books. The Structuring Guidelines document tells DTB producers how to put XML tags into text so that the relationships between components are properly represented. It suggests where tags from the allowable set should be inserted into a document and indicates the proper syntax. For example,
marks the beginning of a paragraph and the end is marked by
. Consult the document titled, "DAISY structure guidelines" for more information.
Open eBook Standard. We refer to this standard because it represents an industry effort to achieve compatibility among various playback devices and content from various producers. We recognize that for the widest and most enduring support it is advisable to converge on standards that dominate the consumer market. Moreover, eBook participants have a keen and authoritative interest in resolving difficult issues such as digital rights management (DRM) methods and metadata requirements. One section of the eBook standard that is of particular interest is the package file. This file lists the components of a given product and indicates various relationships among them. For example, the spine area of the package file lists product files in a logical linear reading order. It appears that with minor modifications, the package file specification, which is embodied in an XML document type definition (DTD), would be suitable for use in the DTB standard. The modifications would expand the allowed file types to include various audio formats.
Digital talking book (DTB); Document type definition (DTD). This technical paper defines what XML tags are to be used to indicate the structure of a particular document and the proper syntax for their use. As with all DTDs, it is typically used by parser software to verify that target documents are "well formed" and "valid," i.e. are properly marked up. A DTD is typically read only by a computer. The application of this DTD is the subject of the Structuring Guidelines. For a view of the DTD and its history, please see the document titled, "Document type definition (DTD) for digital talking books" .
DTB Bookmark DTD. This technical paper defines the structure, syntax, and content of a bookmark file. Bookmark files are portable files to be composed and read by various playback devices. They are designed to allow a user to set a large number of bookmarks or highlight many sections and attach text or audio labels to them. To ensure compatibility, it is necessary to define a standard format. This DTD is nearing completion. In service it would be directly used only by player software, not by the patron.
DTB navigation control center for XML applications (NCX) DTD. This technical paper defines the structure, syntax, and content of a file called the Navigation Control Center that is used by a player to provide a direct access to various areas of the book being read. The NCX is typically built by software and accessed directly only by player software. This DTD is nearing completion.
DTB package file DTD. This technical paper defines the structure, syntax, and content of a file called the package file. The concept and most of the details are borrowed from the Open eBook Forum. The package file would be built by software with producer intervention and accessed directly only by playback software. This DTD is posted on the Open eBook site and suggested modifications will be the subject of discussions at the eBook Forum.
DTB file specification. This technical paper defines the types of ASCII (text) and binary (audio and image) files that are allowed in a DTB. Most of the text files are of the XML type and consist of the following:
Book text with tags added to indicate its structure, e.g. RevStd.XML.
Package file to identify the DTB, list contents, include metadata, e.g. RevStd.OPF.
SMIL file for fast access and synchronization of text with audio, e.g. RevStd.SMIL.
NCX file to enable fast access to book components, e.g. RevStd.NCX.
Bookmark file containing points of interest marked by the user, e.g. RevStd.BKM.
PCM files represent audio with numeric samples like music CDS, e.g. RevStdFwd.WAV.
ADPCM files are similar to PCM but more compact, e.g. RevStdIntro.WAV.
MPEG files are very compact but have adequate fidelity, e.g. RevStdHist.MP3.
One of the essential components of technology-transition decision-making is relative cost. In the simplest terms, we must know the cost of providing service using the incumbent technology, we must be able to project this cost out to a ten-year horizon, and we must be able to estimate the cost of the digital alternative. To this end, we have engaged a commercial firm with expertise in cost modeling to build a software tool that will enable us to make the required estimates and support decision making.
When a technical advancement, such as dense solid-state memory, is introduced into the consumer entertainment market, a plethora of competitive products are introduced at premium prices. Market pressures and the quest for standards cause losers to disappear, while demand causes winners to fall in price, sometimes precipitously. On the other hand, older technology can experience price escalation and availability problems as demand declines. Our model is designed to capture and depict these phenomena so that we can identify more economical technology as well as the best time and level of investment to recommend. When subject to critical scrutiny, the model's inputs and computational procedures must be able to engender credibility and confidence. As of this writing, working in consonance with the contractor, NLS has drafted a data dictionary that defines the name, meaning, and source of each data element used in the model.
We have also reviewed some of the computational procedures that the model will use to compare various technologies that NLS might consider. We anticipate the review process to be completed by mid-March and delivery of a prototype by mid-May. The prototype will allow users to evaluate the input-output interface, check computational validity, and find bugs. Although it is very unlikely that NLS will implement a CD-ROM-based system, for purposes of test and verification, the design team is under contract to demonstrate a comparison between the use of cassette tape and CD-ROM. The model should be fully operational by mid-September, and we look forward to making it an evolving working tool that will be updated and maintained throughout the technology-transition process.
The last step in the technology-transition process is to gradually replace cassette players with portable digital playback devices. This final step in the transition will be built upon changes in digital production, storage, and distribution methods that are made far in advance of providing digital players. Our plans are to have a digitally mastered collection of about 10,000 titles available no later than 2008. To this end, NLS has published its intent to require all of its contract producers to phase in digital original mastering over the next four years. In support of this requirement that will involve the writing and review of technical specifications, NLS studio personnel are testing a digital mastering system in-house. This effort is intended to build expertise while providing the first titles for a digital collection, and will be followed by another in-house studio installation and the exploration of efficient methods for quality checking of digital material.
In parallel with the innovation in digital mastering, NLS is exploring the selective conversion of audio in the analog collection to a digital format. In one multi-state center we have installed a system that efficiently uses analog-to-digital conversion to make digital cassette copies of open-reel analog masters. Besides building expertise in efficient and high-quality digital conversion, the computer-based system permits archiving on CD-ROM and quality enhancement of older recordings. Testing and evaluation of these capabilities is presently in progress.
As digital original masters become available from the NLS recording studio and contract producers, we intend to experimentally store them on media and in formats designed to facilitate a particular function. There will be a shelf or archive copy on CD-ROM in the same format as the digital original master, most likely a WAV file of 16-bit PCM sampled at 44.1 kHz. There will also be a version stored on a large magnetic disk array (not connected to the Internet) that could be an exact copy of the archive copy or a version that has been formatted by a data-reduction algorithm such as MPEG Layer 3. The archive will teach us about problems in long-term storage, and the disk array will permit experiments in formatting and data reduction. A prototype system with about 90GB of storage is presently in place at NLS. This system cannot be connected to the Internet because the technology for the protection of intellectual property is in development. This kind of technology comes under the general heading of Digital Rights Management (DRM), and is the subject of intense discussion in the publishing and library communities. There are many complex issues on the table, and we are following events closely so that any system that we eventually adopt will be acceptable to rights owners.
The production, management, and playback of digital products involves the development and maintenance of computer software. As this assertion suggests, required software can be divided into the three identified categories. Our approach to audio production software is to use off-the-shelf products that drive digital mastering equipment and produce WAV files. Using other production software to help build a digital talking book (DTB) from the audio files, plus whatever part of the corresponding text is available for integration, is an area that will receive more attention as the need for product simulation comes into focus. Programs that can help generate and validate XML, as well as programs to time-align audio with the corresponding text, will be evaluated.
For the management of digital products our initial approach will be to use conventional file management software found in popular computer operating systems. As the collection expands we will consider the use of software more attuned to our particular data set. In this case there may be an opportunity to use some of the software presently used to manage the audio material that is part of the Library of Congress's Digital Repository Project.
One of our most challenging and interesting software requirements is the need to simulate future playback equipment. This software will allow us to check the feasibility of putting various features into portable hardware, assist us in the development of user interfaces, and permit us to test the compatibility of DTBs. We have done some experiments locally with software components such as systems for interactive control of playback speed without pitch distortion, but have yet to build a fully functional system. We intend to survey the commercial software market to determine approximate cost and level of interest. Our request for information will appear in the Commerce Business Daily during the second quarter of this calendar year. We expect to be able to identify the required software expertise at an affordable price. An appreciation for the complexity of this undertaking is apparent from the DTB feature list .
Under the auspices of the National Information Standards Organization, an affiliate of the International Standards Organization, NLS is leading a group of experts in the development of a DTB standard. For a more detailed discussion of why and how the standards process was begun, please see "Talking books: toward a digital model", our paper delivered at CSUN 3/97.
Because the development of this standard plays such a central role in our digital strategy and because it has such pervasive and persistent implications, we treat it with a separate article in this special ITD issue. Please see "Digital talking book standards developed by NLS and partners under NISO auspices" (pp. 19-24) .
In support of our progress toward a fully digital future, NLS has undertaken an eight-point in-house technical research and development program. The thrust of this program is threefold: evaluate potential DTB components, build expertise competent to develop technical specifications, and maintain a clear view of where consumer products are headed. Specific areas under study include audio data reduction algorithms (e.g. MPEG Layer 3), variable rate playback methods, text-to-speech programs, and user interfacing hardware. For a complete list of topics please see our planning document, "Digital talking book: technical activity planning" .
Besides building and maintaining digital expertise, our R&D program has yielded some interesting results. In the area of variable rate playback, for example, we have tested an algorithm from Enounce Inc.  called TimePlay, and found it to produce high-quality audio with real-time user control. We have conducted experiments in the use of MPEG Layer 3 and found that at a 10-to-1 data reduction ratio few listeners can distinguish the original from the processed audio. We have recently acquired and listened to Lernout & Hauspie's state-of-the-art speech synthesis software called RealSpeak, noting performance that is good and interesting, but not equivalent to rendition by a talented narrator. For more detail on progress in these and the other R&D areas, please see "Digital talking book technical activity: summary report" .
Some of the areas that we are closely following in the consumer market place include the emergence of electronic books, both audio and text-based. Of particular interest are new reading models offered by Audible and Gemstar. Audible plans to target the auto commuter market with frequently updated material from newspapers and magazines as well as novels and other leisure reading are being developed. They distribute encrypted spoken audio via the Internet to proprietary devices for playback on an auto sound system. Their significant investors include Microsoft, and there will be an effort to offer audio synchronized with text on Microsoft-supported platforms. Gemstar, the company with a dominant interest in TV Guide and VCR Plus, has recently acquired Softbook and RocketBook. Both of these enterprises market text-based reader devices that display Internet-delivered encrypted content that is accessible only on a specific reader. Gemstar plans to expand reader options, include audio capability, and mount a large advertising campaign. It hopes to get about 35 million consumer units into service within about five years. One possible sales strategy will be modeled on the cell phone, where the hardware is offered at well below manufacturing cost with the purchase of a minimum value of content. At NLS we have purchased both Softbook and RocketBook readers as well as an Audible player and sample material for evaluation. At this time the players are not suitable for use by blind and physically handicapped individuals, but if the market is opened to competing devices, there is reason to evaluate any promising product. Our R&D program will include these evaluations. A de facto industry standard for content may also be a competitive outcome that we will both monitor and influence through participation in Open eBook discussions. Open eBook is a consortium of companies that have drafted a standard for electronic text so that it can be displayed by any device conforming to the standard. For more details please see the Open eBook web site .
Converting NLS from an analog cassette-based talking-book system to a digital enterprise with a readership of over 764,000 is a complex and challenging task. We are approaching the task in a step-by-step manner, as outlined in the plan cited above, and view it as an opportunity to enhance user satisfaction through the introduction of advanced technology.
Keywords: Disabled people, Information technology, Standards, Blind people, Books.
Abstract: The functionality, compatibility, and longevity planned for future digital talking books require clear, exact definitions of component format and content. NLS will achieve this by working with a diverse team of experts to establish an applicable standard. This article outlines the plan, describes progress, and indicates what further work is necessary to complete the standard.
Electronic access: The research register for this journal is available at
Under the auspices of the National Information Standards Organization (NISO), a standards developer accredited by the American National Standards Institute, NLS is leading a committee of experts in the development of a digital talking book (DTB) standard. For a more detailed discussion of why and how the standards process was begun, please see our CSUN 97 paper, titled: "Talking books: toward a digital model".
The committee has taken a general approach to a DTB standard to accommodate a wide variety of books, users, producers, and playback devices. There is interest in compatibility with commercial electronic books as well as ease of interaction on an international basis. To this end there is common membership with groups of similar interest, specifically, the DAISY Consortium, the World Wide Web Consortium, and the Open eBook Forum. This perspective allows us to utilize rather than duplicate prior work and enhances the standard's prospects for longevity and support. The committee's membership may be found on the NISO Standards Committee site . A DTB is envisioned to be, in its fullest implementation, a group of digitally encoded files containing an audio portion recorded in human speech; the full text of the work in electronic form, marked with the tags of a descriptive markup language; and a linking file that synchronizes the text and audio portions.
As this document illustrates, such a structure will allow the DTB user a broad range of capabilities not possible with cassette talking books. The standard uses concepts and components found in other open web-based standards, specifically, Open eBook, Synchronized Multimedia Integration Language (SMIL), and Extensible Markup Language (XML). For more details on these please see:
The standard covers three different classes of players from very simple to very sophisticated and six different types of books, again, from simple to complex. All text files use the ASCII character set. For an overview of the larger NLS digital audio development project please see the companion article in this issue entitled (pp. 15-18), "National library service for the blind and physically handicapped: digital plans and progress."
The provisions of the NISO DTB standard are expressed in two different kinds of documents: normative and formative. Normative documents define the characteristics of a product required for standard compliance. Informative documents provide general information about the standard and recommend ways to achieve compliance. The informative documents presently consist of the following:
The normative documents that comprise the standard presently consist of the following:
There is one other type of text file besides XML: CSS files that tell the player how to present the material to the user, e.g. RevStd.CSS files that tell the player how to present the material to the user, e.g. RevStd.CSS.
Binary files can be divided into two classes, audio and image. Audio files are as follows: