Cette publication ne fait pas partie de la bibliothèque YouScribe
Elle est disponible uniquement à l'achat (la librairie de YouScribe)
Achetez pour : 81,22 € Lire un extrait


Format(s) : PDF

avec DRM

Preserving Digital Information


Cultural history enthusiasts have asserted the urgent need to protect digital information from imminent loss. Without action, much of what has been created in digital form is likely to become unusable. Although a decade has already elapsed since this challenge was clearly articulated, nobody has described a complete procedure for preventing such loss – until now.

Leading industry consultant Henry M. Gladney outlines a technical solution and justifies its correctness and optimality. His presentation focuses on long-term digital preservation principles as a basis for producing the software that will be needed. The method described will work for any kind of digital document, multimedia file, business record collection, or scientific information, and is believed to be optimal with respect to both the quality of the preserved information and end-user convenience. Additionally, Dr. Gladney explains the requirements of the related software, and sketches how to implement it.

Preserving Digital Information presents an up-to-date description of its field, together with a solution for all technical problems identified in the pertinent professional literature. It is for archivists, research librarians, and museum curators who need to understand digital technology in order to manage their institutions; software engineers and computer scientists whose work requires sound information about digital preservation; and attorneys, medical professionals, government officials, and business executives who depend on the long-term reliability of digital records.

Voir plus Voir moins
Summary Table of Contents
Preface For whom is this book intended? What is its topical scope? Summary of its organization. Suggestions how to read it.
Part I: Why We Need Long-term Digital Preservation 1 State of the Art Challenges created by technological obsolescence and media degradation. Preservation as a different topic than repository management. Preservation as specialized communication. 2 Economic Trends and Social Issues Social changes caused by and causing the information revolution. Cost of information management. Stresses in the information science and library professions. Interdisciplinary barriers. Part II: Information Object Structure 3 Introduction to Knowledge Theory Starting points for talking about information and communication. Basic statements that are causing confusion and misunderstandings. Objective and subjective language that we use to talk about language, communication, information, and knowledge. 4 Preservation Lessons from Scientific Philosophy Distinguishing essential from accidental message content, knowledge from information, trusted from trustworthy, and the pattern of what is communicated from any communication artifact. 5 Trust and Authenticity How we useauthenticto describe all kinds of objects. Definition to guide objective tests of object authenticity. Object transformations. Handling dynamic information. 6 Describing Information Structure Architecture for preservation-ready objects, including metadata structure and relationships to semantics. Names, references, and identifiers. Ternary relations for describing structure.
1 7
53 57
XVI Contents
Part III: Distributed Content Management 7 Digital Object Formats Standards for character sets, file formats, and identifiers as starting points for preservation. 8 Archiving Practices Security technology. Record-keeping and repository best practices and certification. 9 Everyday Digital Content Management Storage software layering. Digital repository architecture. Types of archival collection. Part IV: Digital Object Architecture for the Long Term 10 Durable Bit-Strings and Catalogs Media longevity. Not losing the last copy of any bit-string. Ingestion and catalog consistency. 11 Durable Evidence Cryptographic certification to provide evidence that outlasts the witnesses that provided it. 12 Durable Representation Encoding documents and programs for interpretation, display, and execution on computers whose architecture is not known when the information is fixed and archived. Part V: Peroration 13 Assessment and the Future Summary of principles basic to preservation with TDO methodology. Next steps toward reduction to practice. Assessment of the TDO preservation method against independent criteria. 14 Appendices Glossary. URI syntax. Repository requirements analysis. Assessment of TDO methodology. UVC specification. SW wanted.  Bibliography
135 139
205 209
251 251
Detailed Table of Contents
Preface  Trustworthy Digital Objects  Structure of the Book  How to Read This Book
Part I: Why We Need Long-term Digital Preservation
1State of the Art 1.1What is Digital Information Preservation? 1.2What Would a Preservation Solution Provide? 1.3Why Do Digital Data Seem to Present Difficulties? 1.4Characteristics of Preservation Solutions 1.5Technical Objectives and Scope Limitations 1.6Summary
2 Economic Trends and Social Issues The Information Revolution Economic and Technical Trends Digital Storage Devices Search Technology Democratization of Information Social Issues Documents as Social Instruments Ironic? Future of the Research Libraries Cultural Chasm around Information Science Preservation Community and Technology Vendors Why So Slow Toward Practical Preservation? Selection Criteria: What is Worth Saving? Cultural Works Video History Bureaucratic Records Scientific Data Summary
XVIII Contents
Part II: Information Object Structure
3Introduction to Knowledge Theory 3.1Conceptual Objects: Values and Patterns 3.2Ostensive Definition and Names 3.3Objective and Subjective:Not a Technological Issue 3.4Facts and Values: How Can We Distinguish? 3.5Representation Theory: Signs and Sentence Meanings 3.6Documents and Libraries: Collections, Sets, and Classes 3.7Syntax, Semantics, and Rules 3.8Summary
4Lessons from Scientific Philosophy 4.1Intentional and Accidental Information 4.2Distinctions Sought and Avoided 4.3InformationandKnowledge: Tacit and Human Aspects 4.4Trusted and Trustworthy 4.5Relationships and Ontologies 4.6What Copyright Protection Teaches 4.7Summary
5Trust and Authenticity 5.1What Can We Trust? 5.2What Do We Mean by ‘Authentic’? 5.3Authenticity for Different Information Genres 5.3.1Digital Objects 5.3.2Transformed Digital Objects and Analog Signals 5.3.3Material Artifacts 5.3.4Natural Objects 5.3.5Artistic Performances and Recipes 5.3.6Literature and Literary Commentary 5.4How Can We Preserve Dynamic Resources? 5.5Summary
6Describing Information Structure 6.1Testable Archived Information 6.2Syntax Specification with Formal Languages 6.2.1String Syntax Definition with Regular Expressions 6.2.2BNF for Program and File Format Specification 6.2.3ASN.1 Standards Definition Language 6.2.4Schema Definitions for XML 6.3Monographs and Collections
7 Digital Object Formats Character Sets and Fonts Extended ASCII Unicode/UCS and UTF-8 File Formats File Format Identification, Validation, and Registries Text and Office Documents Still Pictures: Images and Vector Graphics Audio-Visual Recordings Relational Databases Describing Computer Programs Multimedia Objects Perpetually Unique Resource Identifiers Equality of Digital Documents Requirements for UUIDs Identifier Syntax and Resolution A Digital Resource Identifier The “Info” URI Summary
Part III: Distributed Content Management
Contents XIX
Digital Object Schema Relationships and Relations Names and Identifiers, References, Pointers, and Links Representing Value Sets XML “Glue” From Ontology to Architecture and Design From the OAIS Reference Model to Architecture Languages for Describing Structure Semantic Interoperability Metadata Metadata Standards and Registries Dublin Core Metadata Metadata for Scholarly Works (METS) Archiving and Preservation Metadata Summary
XX Contents
8Archiving Practices 8.1Security 8.1.1PKCS Specification 8.1.2Audit Trail, Business Controls, and Evidence 8.1.3Authentication with Cryptographic Certificates 8.1.4Trust Structures and Key Management 8.1.5Time Stamp Evidence 8.1.6Access Control and Digital Rights Management 8.2Recordkeeping Standards 8.3Archival Best Practices 8.4Repository Audit and Certification 8.5Summary
9Everyday Digital Content Management 9.1Software Layering 9.2A Model of Storage Stack Development 9.3Repository Architecture 9.3.1Lowest Levels of the Storage Stack 9.3.2Repository Catalog 9.3.3A Document Storage Subsystem 9.3.4Archival Storage Layer 9.3.5Institutional Repository Services 9.4Archival Collection Types 9.4.1Collections of Academic and Cultural Works 9.4.2Bureaucratic File Cabinets 9.4.3Audio/Video Archives 9.4.4Web Page Collections 9.4.5Personal Repositories 9.5Summary
Part IV: Digital Object Architecture for the Long Term
10Durable Bit-Strings and Catalogs 10.1Media Longevity 10.1.1Magnetic Disks 10.1.2Magnetic Tapes 10.1.3Optical Media 10.2Replication to Protect Bit-Strings 10.3Repository CatalogfCollection Consistency 10.4Collection Ingestion and Sharing 10.5Summary
11Durable Evidence 11.1Structure of Each Trustworthy Digital Object 11.1.1Record Versions: a Trust Model for Consumers 11.1.2Protection Block Content and Structure 11.1.3Document Packaging and Version Management 11.2Infrastructure for Trustworthy Digital Objects 11.2.1Certification by a Trustworthy Institution (TI) 11.2.2Consumers’ Tests of Authenticity and Provenance 11.3Other Ways to Make Documents Trustworthy 11.4Summary
Contents XXI
12Durable Representation 12.1Representation Alternatives 12.1.1How Can We Keep Content Blobs Intelligible? 12.1.2Alternatives to Durable Encoding 12.1.3Encoding Based on Open Standards 12.1.4How Durable Encoding is Different 12.2Design of a Durable Encoding Environment 12.2.1Preserving Complex Data Blobs as Payload Elements 12.2.2Preserving Programs as Payload Elements 12.2.3Universal Virtual Computer and Its Use 12.2.4Pilot UVC Implementation and Testing 12.3Summary
Part V: Peroration
13Assessment and the Future 13.1Preservation Based on Trustworthy Digital Objects 13.1.1TDO Design Summary 13.1.2Properties of TDO Collections 13.1.3Explaining Digital Preservation 13.1.4A Pilot Installation and Next Steps 13.2Open Challenges of Metadata Creation 13.3Applied Knowledge Theory 13.4Assessment of the TDO Methodology 13.5Summary and Conclusion
XXII Contents
Appendices A: Acronyms and Glossary B: Uniform Resource Identifier Syntax C: Repository Requirements D: Assessment with Independent Criteria E: Universal Virtual Computer Specification E.1Memory Model E.2Machine Status Registers E.3Machine Instruction Codes E.4Organization of an Archived Module E:5Application Example F: Software Modules Wanted
Fig. 1: OAIS high-level functional structure Fig. 2: Information interchange, repositories, and human challenges. Fig. 3: How much PC storage will $100 buy? Fig. 4: Schema for information object classes and relationship classes Fig. 5: Conveying meaning is difficult even without mediating machinery Fig. 6: A meaning of the word ‘meaning’ Fig. 7: Semantics or ‘meaning’ of programs Fig. 8: Depictions of an English cathedrals tour Fig. 9: Relationships of meanings; Fig. 10: Bit-strings, data, information, and knowledge Fig. 11: Information delivery suggesting transformations that might occur Fig. 12: A digital object (DO) model. Fig. 13: Schema for documents and for collections Fig. 14: A value set, as might occur in Fig. 12 metadata Fig. 15: OAIS digital object model Fig. 16: OAIS ingest process Fig. 17: Kitchen process in a residence Fig. 18: Network of autonomous services and clients Fig. 19: Objects contained in an AAF file Fig. 20: Identifier resolution, suggesting a recursive step Fig. 21: MAC creation and use
Contents XXIII
Fig. 22: Cryptographic signature blocks Fig. 23: Trust authentication networks: Fig. 24: Software layering for “industrial strength” content management Fig. 25: Typical administrative structure for a server layer Fig. 26: Repository architecture suggesting human roles Fig. 27: Storage area network (SAN) configuration Fig. 28: Replacing JSR 170 compliant repositories Fig. 29: Preservation of electronic records context Fig. 30: Workflow for cultural documents Fig. 31: Workflow for bureaucratic documents Fig. 32: MAC-sealed TDO constructed from a digital object collection Fig. 33: Contents of a protection block (PB) Fig. 34: Nesting TDO predecessors Fig. 35: Audit trail element—a kind of digital documentary evidence Fig. 36: Japanese censor seals: ancient practice to mimic in digital form Fig. 37: A certificate forest Fig. 38: Durable encoding for complex data Fig. 39: Durable encoding for preserving a program Fig. 40: Universal Virtual Computer architecture Fig. 41: Exemplary register contents in UVC instructions Fig. 42: UVC bit order semantics Fig. 43: Valid UVC communication patterns
Tables Table 1: Why should citizens pay attention? Table 2: Generic threats to preserved information Table 3: Information transformation steps in communication Table 4: Metadata for a format conversion event Table 5: Dublin Core metadata elements Table 6: Closely related semantic concepts Table 7: Samples illustrating Unicode, UTF-8, and glyphs Table 8: Sample AES metadata Table 9: Reference String Examples Table 10: Different kinds of archival collection Table 11: NAA content blob representations Table 12: TDO conformance to InterPARES authenticity criteria Table 13: Comments on a European technical research agenda