Reconsidering Information Management in the Electronic Environment

The Public Interest Declassification Board (PIDB) recommends that a policy be implemented for uniform government-wide metadata standards for classified electronic records (e-records).  The adoption of metadata standards will make declassification review of e-records more effective and efficient.  The current focus on analog records (paper and special media) has kept attention from this looming and monumental problem and the need to find methods to deal with the challenge of reviewing petabytes of classified electronic records for declassification.

Background

The Government creates massive amounts of information or data in a variety of digital environments and formats.  It needs a metadata strategy designed to preserve and manage this digital information across domains and over time.

Metadata [1] are structured information which describes the format, content, context and organization of the underlying information in a document or record.  Adequate metadata are essential for information management professionals to discover, identify, describe, manage, and preserve records over time and to support the use of records.

Our Government depends on these records to inform history.  Records provide policymakers a memory of past decisions and shape future business decisions.  Since the 1980s, records have been created in multiple electronic formats, and the number of formats is growing. There is no national metadata strategy or standardized practice across agencies.  Without a thoughtful and planned strategy informed by existing national and international metadata standards, it will be impossible to administer this increasingly incoherent records and data environment.

Agencies currently maintain and use various metadata elements to suit their short-term needs without regard to recordkeeping or archival practices.  In order to improve access for policymakers and for the public, a comprehensive national standard is needed for the management of digital records.  Absent a national metadata strategy and standards, it will become ever more difficult to retrieve important information and conduct efficient declassification review of classified e-records.

Deficiency in metadata elements will lead to an inability to locate and share critical information.  Without necessary technical metadata, digital records may not be able to be read or used, and without contextual metadata, [2] records may not be given accurate meaning.  Poor metadata may compromise the authenticity and reliability of e-records. [3]

A New Recordkeeping Model

The Government currently uses the lifecycle model of records management, an approach for managing paper records which evolved after World War II.  Under this model, creators, users, record managers, and archivists are isolated actors.  Classified records are created, maintained by agency records managers, and transferred to the National Archives and Records Administration (NARA), where they come under the custodial care of archivists and are eventually reviewed for declassification.  This process creates an artificial distance between archivists and those who originally create and manage this information.  In the electronic records environment, this model hinders efficient processing and creates potential obstacles to declassification review and public access.

The continuum model approach to information management provides a consistent, coherent system of records management processes from the time of records creation through preservation and archiving.  Throughout the life of the e-record, recordkeeping actions are continuously captured and linked in metadata.  This approach integrates recordkeeping and archival functions and mitigates barriers between records creators, users, records managers, and archivists.

Proposals

We propose that the Government, under the leadership of the Chief Information Officers (CIO) Council, adopts a metadata strategy which reflects the continuum model of information management.  The CIO Council should solicit the input of agencies, technologists, archivists, and commercial practitioners and develop comprehensive standards.  Specifically, these standards should:

  • Automate the creation and management of metadata.  Whenever possible, computer systems should generate metadata to minimize the burden on users.
  • Keep metadata schema simple to limit costs and complexity.
  • Establish and implement clear processes and procedures throughout classifying offices to prevent potential poor-quality metadata and gaps in metadata.
  • Stress that the creation and management of metadata is a shared responsibility.
  • Be updated regularly to reflect changes in Government policy and operations regarding classified information and to ensure quality control and assurance.

Conclusion

Mandating standardized metadata tagging of records at creation offers several important benefits.  Metadata fields will document user actions, create audit trails, normalize declassification instructions, and reinforce access controls to ensure that classified information is appropriately safeguarded.  Metadata tags will reveal agency equities to improve the efficiency and accuracy of declassification review.  When paired with context accumulation tools, metadata will limit overclassification by allowing records managers and reviewers to monitor classification actions more effectively.  Lastly, adopting the continuum model will allow all parties—records creators and users, records managers, archivists, and researchers—more timely and efficient access to records and information, as appropriate.


[1] In an information technology context, metadata is data about data or database systems.

 

[2] Contextual metadata, very simply, surrounds data to provide context to the data; it is secondary, deeper; e.g., “provenance assertions are a form contextual metadata . . .” Available at : http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance#Provenance.2C_Metadata.2C_and_Trust

[3] Adrian Cunningham of the National Archives of Australia citing an international workshop held in the Netherlands in 2000 described recordkeeping metadata as “[s]tructured or semi-structured information which enables the creation, management and use of records through time and across domains.  Recordkeeping metadata can identify, authenticate and contextualise records and the people, processes and systems that create manage and use them.”  Recent Developments in Standards for Archival Description and Metadata, Presented at the International Seminar on Archival Descriptive Standards, University of Toronto, March 2001.  Available at: http://enj.org/portal/biblioteca/funcional_y_apoyo/archivistica/42.pdf

10 thoughts on “Reconsidering Information Management in the Electronic Environment

  1. This was a good summary of the issue, and I agree with the position as stated. I do want to add one important point: metadata is a solution to several different problems, which is to say, a metadata policy might solve some problems without solving all those relevant. To get maximum benefit from a metadata strategy, various types of metadata should be included. I can’t promise that the following is exhaustive, but the list should at least include:
    – Structure/Format. Information to explain how the electronic data can be interpreted as a meaningful document. Look at XML, which self-describes both bit-encoding and document structure as a good example.
    – Topic. Metadata used to classify documents as to the subject discussed. One of the most commonly found metadatypes, and very useful in Information Retrieval.
    – Qualities. These range from inherent properties, such as length, and language, to attributed qualities, of which the security classification is a perfect example. The important distinction here is that the MD5 signature of a file is ALWAYS the same, whereas the security classification is a decision made by an actor at a point in time, based upon certain assumptions. Attributed qualities require more complexity than inherent properties and that should be reflected in the way the metadata itself is structured.
    – Context. There are other types of metadata that participate in the production of meaning associated with an information artifact. You mentioned an important one above: provenance, which gives us information we might use to determine whether a document truly is what it purports itself to be. This category also includes information about how the information was produced, for example standard measures often change in the way they are calculated such as changing certain assumptions, or, perhaps using a 4 point scale instead of a 5 point scale. One should be aware of these changes else false conclusions can be drawn.

    Best wishes with the project.

    Bob Savage
    (formerly with Stanford University Libraries, Media Preservation Unit)
    (formerly Director of Records and Institutional Research at Vanderbilt University, College of Arts and Science)

    1. The proper management of electronic records is an extremely important issue, but as the comment above indicates, the problem is not limited to classified records. A comprehensive metadata strategy is needed to cover all kinds of official records, classified and unclassified. The overall strategy should probably not be driven by classification and declassification requirements, though it should incorporate and reflect them.

      At any rate, I don’t think that this broad information management challenge, important as it is, is central to the problem of transforming national security classification.

  2. Dear Mr. Faga,

    I would suggest that in addition to the specific points that the standard should include, the final report to the president explicitly recommends specific activities that Government and the CIO council should perform to arrive to the most effective standards. In particular, on the topic of “automate the creation and management of metadata”.

    A key challenge for a standardized metadata system to be effective is that its applicability be practical and scales over time. As information amounts continue to grow exponentially and IT environments are evermore organic and unstructured, finding ways of automating metadata application is critical. You refer to this issue when you say “Whenever possible, computer systems should generate metadata to minimize the burden on users”.

    One of the activities that I would suggest to recommend to the President is that as standards are defined by Government and the CIO council, agencies are incentivized (through funding or other means) to explore innovation in this area. Specifically, in the form of piloting/testing of new technologies. Learnings from different agencies will help in defining more clear guidance on this topic.

    Without such (or similar) concrete activities I am afraid the standard language on automation will become overly directional and will tell agencies something that they already know, as opposed to providing with clear guidance they can use for their day to day operations.

    Today, we have a unique opportunity to include “teeth” in the recommendations to the president that will help move the needle in a substantial way. Moreover, such activities would be aligned with the President’s overall strategy to prioritize investments in innovation in the face of a major shift in government spending priorities.

    I would be happy to discuss this topic in more detail or help in any way shape or form in exploring what those activities could be, how they could be structured, etc.

    Regards,
    Pablo Osinaga

  3. Metadata is an essential element of managing huge volumes of electronic data. In the paper world we’ve left behind we are trying to find a solution for the review and declassification of a mere 409 million pages of historic records. Each petabyte of electronic data may include more than 50 billion pages of textura a report l data. Most large federal agencies are measuring current on-line storage capacity in multiple petabytes. Reviewing this many pages with humans is untenable. Metadata will help us crack the first part of the problem – sorting the non-record and short term temporary record problem and eliminating as much as 90% of the data we now have spinning on disc.

    Metadata also provides provenance which is an essential element in context mapping as suggested by Mr. Jonas. Indeed, even sensitivity of some information is based on the context that can be partially derived from the provenance of the record.

    To actually begin to really manage electronic records metadata must be robust, pervasive, and include a unique document identification system that uniquely identifies each document and every copy of that document and every version of that document. In today’s electronic age document copies aren’t limited to the number of sheets of carbon that can be rolled into a typewriter, they are unlimited. Email systems are largely copy-based in that sending a document to 100 or 1000 of your closest colleagues means you have created 100 or 1000 copies. If they do the same the one unique document can have 100,000 copies. We can get great bang for the declassification buck if we can apply any decision made to one of the 100,000 copies to every one of the 100,000 copies.

    We are rapidly reaching a point with regard to electronic records where we won’t know what we know and the sheer volume of data collected and stored will defeat any – literally any – attempt to find anything. If we go to NARA today to find a report from a patrol in Viet Nam in 1972, the chances are pretty good that we can find the box retired by the specific unit. If we need to search 100s of billions of pages of electronic records that include the report from a patrol in Afghanistan 40 years hence, its unlikely that we’ll ever find it without metadata.

  4. Thanks to the PIDB for launching this blog and leading a public exchange on this important issue.

    On the records continuum and records lifecycle issues, I recommend reviewing the work of Sue McKemmish from Monash University (Australia) at http://tinyurl.com/44skzoq.

    This link provides a very full view of the theoretical differences between the lifecycle and continuum approaches.

    There is a lot to recommend the continuum approach, particularly better integration between the various users and purposes for records and information.

    What I am struck by in this post and exchange is the implication that metadata and metadata standards seemingly equal effective information or records management.

    There are a number of other policy considerations that need to be taken into account so that records (or information) can be managed effectively.

    As Steve says, this is a broad and important issue, but its complexity seems undersold here, and the implications reach far beyond declassification (and even records management). But I am encouraged by the discussion.

    Also, given the discussions going on elsewhere on declassification transformation blog, I can’t help but observe that one can have metadata standards– and more importantly, rich metadata capture, but if there is not consensus about how classification/declassification decisions in multiagency environments will be executed, we have not resolved the underlying problem.

    Having said that, there is great progress being made on this front, as described elsewhere on the blog, but if we had rich metadata today, I am not sure the community would be agreed on how to best use it to carry out decisions.

    If one were to adopt an integrated continuum approach (as opposed to a lifecycle approach), there would be a need for a rethink the current archival/records management statutory, regulatory, and policy frameworks.

    While NARA has current policy and guidance on a number of electronic records management issues, they are derived from and are understood in the context of Federal records management statutes that mainly date from the early 1950s and reflect the separation of responsibilities that are called out as a problem in the original post.

    At least one other issue that should be considered with metadata is the preservation costs. Assuming one does develop and enforce standards, maintaining robust and pervasive metadata (as Harry called it) is a serious challenge, in addition to preserving the digital objects themselves (regardless of record type, which are also proliferating, and more and more complex). The challenge is both technical and very expensive.

    Thanks again to the PIDB for launching this effort. (And I look forward to Harry’s next epic post!)

    1. Paul’s perspective is eye opening. Although it was sort of in the background, I’m not sure we considered the fact that the metadata itself is part of the record and must also be preserved and migrated for the life of the record — which is a long, long time for permanent records.

      We’ve got a double edged sword. Without metadata we will drown in electronic data. Metadata can be used to more effectively sort out the 5% – 10% of federal records that require permanent retention. The remaining 10% is still not insignificant and metadata contributes to the problem at that point, perhaps more so than to the solution.

      Its also conceivable that the metadata will be classified or contribute to the classification as part of a compilation of information, both seen and unseen. How do we configure records management systems to isolate complex classified metadata and classified textural information and efficiently provide to the public the remaining unclassified text and metadata?

      This is perhaps another topic that a national laboratory focused on the electronic document and electronic data problem can investigate.

  5. I’m not familiar with metadata models per se, but the RIF (Record Information Form) created to tag the JFK Assassination Collection records made them searchable without full text. Source, date, recipient, topic and key names were entered on the RIFs, though not always accurately. It strikes me that these metadata tags could be determined and set from the point of creation for each document and reflect not only those data points but flag any part of the text that requires redaction with a legal justification cited. I recommend, especially on documents over 10 years old that the criteria for postponing release of information follow the JFK Records Act standards, not FOIA. Those included living agents or current sources and methods or secrecy agreements with individuals or foreign governments, with postponement of release ending when those conditions changed or by a date certAin from the time of creation. This could assist in declassification reviews as well. Documents over 25 years old should be released as soon as practicable without further review, and annually from here forward. The newer, digital documents allow full text search. Vivisimo is one search algorithm software that allows term, context and relational searches. See their online model of the 9/11 Commission Final Report. Older paper records should be digitized for collections of public interest, but NARA lacks staff and funding for this currently. I suggest reading the recommendations of the JFK Assassination Record Review Board in their Final Report. A standardized and automatic metadata trail for every government record at creation would also prevent concealment of records by the use of parallel or hidden records systems at agencies. Metadata should allow declassification reviews of categories of records, as was done by the Review Board for the JFK records. Perhaps the most important reason that the Recods Act model overcame the flaws of the FOIA, save for the presumption of release, was taking the decisions away from the agencies and putting them in the hands of an independent and representative board. I hope the future will hold the universal adoption of this model for declassification, especially in expediting release of files relating to historical events of wide public interest or concern. Thanks for opening these decisions to public comment.
    John Judge
    Museum of Hidden History
    Coalition on Political Assassinations

  6. Information management in the electronic environment is hamstrung by laws and policies developed when we typed documents on typewriters (anyone remember what a typewriter looks like?) and made copies with carbon paper.

    Today’s government is run on email and cut-and-paste. Current US military actions around the globe are not managed by memos and letters typed and carbon copied. We use email. Classified and unclassified, hundreds of millions of email messages each month. Most agencies today simply delete them as they have a “print and file” policy for those times when rank-and-file employees decide that a document might be a record. We don’t actually print and file very much anywhere, ever. These records will just be lost to history.

    We also see a phenomena where documents are sent to 1000s of analysts via electronic dissemination means such as email and messaging systems. Those analysts cut and past portions of many documents into new documents that are made up of many fragments of the original documents. The provenance is often maintained in footnotes or end notes, but the resulting millions of documents are each amalgams of many originals. Our electronic world allows our young and computer-savvy employees to perform these cut-and-paste rituals at dizzying speed, producing an order of magnitude more documents than were originally created on any topic.

    We also have millions of web pages across the federal government and many of these webpages are dynamically updated many times each day. If we follow the records laws correctly, each representation of data on a webpage that informs a US Government judgment or policy is itself a record – a fleeting one at best, but a record. Yet few if any agencies capture all unique representations of data on webpages and even if we did we’d be overcome with the volume of data we’d have to keep.

    Web content is routinely updated and backed up, but rarely archived to preserve the content. IT folks will tell you its not practical to keep a copy of every version of every web page on every server government wide. But that content today can be compared in terms of content with the memos, briefings and announcements that pepper the files and provide rich historic content at the Archives. Unless preserved somehow, these records will be lost to history – forever.

    It is a national imperative that we begin updating records laws immediately to ensure that at least the minimum electronic content that represents this era in our government is preserved. Else the 1950s laws that Mr. Wester describes, will all but assure the destruction of valuable electronic records. We need to act now.

  7. I agree that the tagging of materials with finely-grained metadata can be of great value (and I’ll return to this elsewhere), but the realization here that adding metadata actually expands the data universe, the body of accumulated documents that require protection simply underlines a point made elsewhere–that the classified document backlog and frontload can ONLY be resolved by dealing with CLASSES of information, not paper by paper. The metadata can assist in that endeavor.

Comments are closed.