Using Technology to Improve Classification and Declassification

The Problem

Advances in the electronic environment have led to a pronounced increase in the amount of classified information being produced.  Staggering volume and scarcity of resources make the eventual human review of these records for declassification impossible.  Human review as it is done today is estimated at two full-time employees (FTEs) per gigabyte.  At one intelligence agency alone, the growth of classified records is approximately 1 petabyte (1 million gigabytes, ~49 million cubic feet of paper) every 18 months.  The Government cannot dedicate 2 million FTEs a year to review 1 petabyte, much less over 20 million FTEs a year to review the tens of petabytes of classified records being created across the Government.

A Technological Solution for Both Declassification and Classification

Technology can be employed to address the challenges of mass declassification in a more accurate, cost-effective, and efficient manner.  Existing technologies such as information retrieval tools, natural language processing, optical character recognition software, predictive analytics, and cloud computing can serve as a foundation for future innovation, but the Public Interest Declassification Board (PIDB) believes the most integral and necessary component to a new system will be a robust context accumulation capability.

Context accumulation is a means by which computers predict classification and declassification dispositions.  Human inputs, either as priori rules or individual decisions based on classification and declassification guidance, direct the process.  The system ingests the decisions of human reviewers to classify or declassify (in full or in part) pieces, categories, or associations of information, using reviewers’ determinations as the basis for future, automated decisions.  The greater the body of knowledge (e.g., reviewer decisions, classification and declassification guides, previously released documents, open source material) ingested into the system, the better the predictions the computer would generate. 

Based on these data points, the computer learns how to sort information into release and withholding bins.  In instances of conflicting context, the system would require human input by reviewers and subject matter experts.  The decisions of these individuals would train the system to better sort information.  As the system learns through more human input, its declassification review ability will evolve.  For those areas in which the system’s aptitude is sufficiently advanced, meticulous human review will no longer be necessary.  In time, reviewers will be able to focus exclusively on evaluating those pieces of information identified by the system as posing unique challenges.  Reviewers’ decisions on the rationale for the withholding or declassifying of this information will provide context to the system to address and sort all other appearances of that information.  As more of these review precedents are established, the volume of records reviewable by the system will continue to increase.  

Because a context accumulation tool ingests both declassification and classification guidance, any system could serve concurrently as an automated classification tool.  Allowing computers to classify information minimizes the user burden.  Moreover, automating classification reduces the potential for over-classification by ensuring that classification determinations are made in the strictest accordance with current policy and only in appropriate circumstances.  The rationale for classification determinations would be digitally imprinted on documents, creating metadata which the system could later use to locate and declassify this information as policy guidance changes.

The accuracy and consistency afforded by this system will enhance information security and thus national security. 

Approaches to Context Accumulation

A context accumulation system can be implemented in two ways.  Policy guidance (“rules”) can be input at the onset and used as the basis for initial classification decisions, or decision-making criteria can be based entirely on the ingestion of documents whose classification status is then determined by individual reviewers based on existing policy guidance.  Assigning rules at the onset entails a potentially protracted battle over classification standards but ensures that the computer begins with standardized criteria.  Developing rules organically based on the ingestion of documents increases the likelihood for poor review from human error but mitigates intra- or interagency disputes over classification guidance and allows for quicker implementation. 


Employing these technologies, and context accumulation tools in particular, would:

  • Improve consistency and accuracy of classification and declassification decisions.
  • Minimize instances of over-classification by automating routine classification and consistently aligning classification decisions with established guidance.
  • Facilitate the immediate implementation of changes in classification and declassification guidance.
  • Reduce the administrative burden of declassification reviewers, allowing them to focus exclusively on the rationale for declassification or classification.
  • Audit individual human inputs to measure reviewer performance and identify areas for improvement.
  • Identify to cleared users exactly what information is and is not classified or available through open sources.
  • Reinforce classification standards in real time as documents are created in the classified environment.
  • Replace document-level, pass/fail reviews with redaction-level reviews, significantly increasing the volume and quality of material declassified.

 Technology and the National Declassification Center (NDC)

A research laboratory could be created within the NDC for launching and evaluating pilot projects that incorporate these technologies.  Digital records collections of historically significant records could be used to field test a new system (based on either of the two approaches outlined above).  The NDC provides an ideal interagency environment in which to share classification guidance and draw on the expertise of subject matter experts.  Successful projects could encourage future interagency cooperation and innovation in this area.

15 thoughts on “Using Technology to Improve Classification and Declassification

  1. I do not oppose the exploration of technological tools to support classification and declassification, and I would generally favor pilot projects to test and evaluate innovative approaches to class/declass.

    I am personally skeptical that the “context accumulation” approach would work as smoothly or effectively as described to ease the declassification burden, if only because there are innumerable different “contexts” in the classified records of the past half-century that would have to be defined. Establishing a credible foundation for automated declassification is likely to be a challenging and resource-intensive project in itself.

    But in any case, before attempting to “solve” the massive declassification review problem through new technological means, the top priority should be to SIMPLIFY (or transform) the problem. That means that as far as possible an effort should be made to identify categories of records that are to be declassified WITHOUT ANY REVIEW at all, technological or human.

    Disturbingly, this has already been tried, without much success. Specifically, the last three presidents have all directed that non-exempted classified records should be automatically declassified at 25 years “whether or not the records have been reviewed.” But executive agencies have refused to implement this presidential instruction.

    Therefore, in order to simplify the declassification challenge and to make it amenable to human or technological processing, I think the President will have to personally declassify a large fraction of currently classified records on his own authority, without delegating the task to agency officials.

    In other words, an amendment to the current executive order 13526 could be issued to replace the current instruction (in section 3.3a) that non-exempt 25 year old records “shall be automatically declassified” — a directive that has been shockingly ignored — with a new statement that these records HAVE BEEN declassified by the president.

    Then, and only then, once the scope of the declassification challenge has been massively reduced by presidential fiat, new approaches to declassification review of a smaller subset of records may have a plausible chance of success.

    Without such a simplifying preliminary step, I suspect that any technological approach offers false hope and is likely to be futile. Automated declassification may supplement, but cannot substitute for, automatic declassification.

  2. Mr. Aftergood has an overly simplistic view of the problem. We literally have classified needles in a haystack of unclassified data that makes up the 400+ million page backlog. Despite claims to the contrary, the government has always shared information. Historic classified records contain the equities of many, many agencies. Most can be declassified – theoretically without review – but the few remaining sensitivities are interfiled with the no longer sensitive information in a way that makes finding it quite difficult. Because people and governments live for many decades, its not reasonable to say that all sensitive information looses its sensitivity in a mere 25 years. Its finding the few critical secrets in the mountain of junk that identifies the problem.

    1. Granting that there may be “classified needles in a haystack of unclassified data,” as Mr. Cooper says, is it reasonable to treat the entire haystack as if it were nothing but classified needles?

      Or, before even beginning the search for the needles, should one attempt to shrink the size of the haystack by removing large portions of it?

      I think the latter approach is clearly preferable. It runs the risk that some old classified needles will be publicly exposed, a risk that should be acknowledged and accepted. The former approach offers not a risk but a certainty that vast quantities of “hay” will be improperly withheld from the public for years or decades to come.

      1. I don’t disagree. Reducing the size of the stack using some form of risk management makes sense. As long as the likelihood of including parts of the collection where sensitive information is most likely to occur is reduced by some analysis, the amount withheld unnecessarily should be minimized. The key is finding a way to analyze large collections to identify areas where the most sensitive information is likely to be found.

  3. The technological approach will not, as Mr. Jonas believes, solve today’s declassification problem. It can be leveraged to solve the future problem when today’s petabytes must be reviewed and declassified. While context accumulation is an interesting idea, it will be fraught with practical problems as the very information that defines sensitive documents should not be “accumulated” in one vast repository. Instead, we need to develop technology that will “read” documents for context and apply current declassification guidance to each document. This will allow sensitive information to remain restricted while releasing all remaining unclassified information. Additionally, the use of context technology at the front end will allow documents to be tagged (using XML metadata tagging) to identify every classified word/phrase and its appropriate level. Then during the entire document life cycle the tagging can be used to move documents to lower classification domains and eventually to declassify the document by knowing exactly which words and phrases contributed to the initial classification. More importantly, however, is the ability to use front end technology to identify only the classified words and phrases based on approved guidance rather than individual personal preferences of classifiers. We need to focus on getting out ahead of tomorrow’s problem rather than trying to solve yesterday’s problems.

  4. A few thoughts related to the post and thread.

    “The key is finding a way to analyze large collections to identify areas where the most sensitive information is likely to be found.”
    >> I Agree

    “Instead, we need to develop technology that will “read” documents for context and apply current declassification guidance to each document.”
    >> In my experience looking only within a single document to construct its “context” is much too blunt a technique for this problem, the false positives/negatives will most likely be too high to trust. I think drawing on tertiary observations to improve the machine triage, e.g., has this code word always been declassified when reviewed? Or never? Or somewhere in between? Seeing how code words, people’s names, etc. have appeared and been treated are an example of additional context. Context, meaning in my own words: Better understanding something by looking at the things around it. Another example would be noticing if the document has ever been FOIA requested … and using this additional context to better prioritize documents that require human review.

    “ Or, before even beginning the search for the needles, should one attempt to shrink the size of the haystack by removing large portions of it?”
    >> To do this, I would use as much tertiary data (more context) as possible within policy and laws to improve the accuracy of such a machine triage process.

    “It runs the risk that some old classified needles will be publicly exposed, a risk that should be acknowledged and accepted. The former approach offers not a risk but a certainty that vast quantities of “hay” will be improperly withheld from the public for years or decades to come.”
    >> Yep. This seems true to me. But then again these are not my equities.

    “The key is finding a way to analyze large collections to identify areas where the most sensitive information is likely to be found.”
    >> I would assemble (accumulate) such things as previous declassification disposition decisions, code words, report authors, FOIA requests, open source and/or other relevant data points and use this to make more fine grain decisions about treatment. As well, I still think it might be useful to consider some form of crowd sourcing … taking for example a massive set of documents selected for total disclosure and allowing cleared members of the community to freely query this corpus as one more (final) checkpoint.

    1. We have no current ‘ground truth’ regarding what has been declassified. Paper documents (the 400+M pages) are either completely classified or completely unclassified. Very, very few have any redacted versions and even if we do have some redacted versions the complexity of scanning both versions and then figuring out which words and phrases were removed to build a context aggregation could push us further downstream with regard to time than is possible.

      Further, the documents at NARA that have been both withheld and released are not consistent. If nothing else your approach would help demonstrate that no two reviewers have the same results and even with volume it may be that simple codewords or once-classified programs are released 50% of the time and withheld 50% of the time.

      Your notion of building what amounts to context driven guidance for declassification by using all the work we’ve ever done is brilliant, but I fear that we don’t actually have the kind of data points you would need and it may take a hybrid approach that uses technology for some limited success while we build the kind of accumulated context you envision.

      1. Good point. I too think the hybrid approach is essential. I did try to allude to that in my presentation ( on chart 26 – the bottom box. Point being, as humans declassify more (in a normalized way!) the more the system can assist. Kinda of like the more bites off the front of the apple, the more chunks can fall off the back. To your comment below, validating this in the real world in a lab would be wise. That said, I believe tertiary data will be key, which tertiary data is better than another, is worth sorting out.

  5. Building a national laboratory to explore all these options and more is very badly needed. I don’t believe adding that burden to the NDC is the right approach, however. Any such lab should have neutral ground, outside of the beltway, where academics, industry and government can come together to solve this problem. The national document content analysis laboratory should also be able to provide grants to students and universities to allow the best and brightest among us to take ideas – such as Mr. Jonas’ – and find ways to put the technology to use. NARA would have to have a key role in such a laboratory, but not have to add responsibility for a laboratory to its already impossible deadlines.

    The kind of analysis that Mr. Jonas recommends would also work well in a national laboratory setting. Looking at millions of pages of previously declassified or exempted documents and finding ways to establish patterns, identify anomalies, and even show how individual agencies have either succeeded or failed to maintain any consistency in review and release would add tremendous value to the process.

    I urge the PIDB to carefully consider recommending that funding be made available for such a lab.

  6. Building a laboratory “where academics, industry and government can come together to solve this problem” sounds very promising — until one realizes that something like this has already been tried, without success.

    From approximately 1995 to 1998, the Department of Energy funded a Declassification Productivity Research Center to promote advanced technology solutions to declassification. The Center focused on modeling of declassification, development of automated declassification processes including text analysis and interpretation, and integration of new technology into declassification. The Center was based at George Washington University’s Virginia campus. It produced a handful of papers and annual reports, and then its funding petered out, producing exactly zero effect on real-world declassification policy. Do we want to repeat this experience?

    Besides that, it is also worrisome that the PIDB proposal does not even mention the matter of cost. When cost constraints are not specified, spurious solutions are generated. (Of course, if infinite dollars were available, many kinds of solutions would be possible.) As an openness advocate, I favor the maximum possible declassification and disclosure. But as a taxpayer, there are limits to what I would be willing to pay for it. With rare exceptions for cases of urgent public interest, declassification of historical records should normally cost much less than $1 per page, in my opinion. More than $1 per page would be outrageous and unacceptable, as a rule.

    I think the PIDB could perform a real public service just by clarifying exactly where things stand and what the real options are. For example, the PIDB might conclude:

    The blunt truth is that technology does not provide transformative options for declassification that can be applied today at current cost levels. (Depending on the availability of resources to support research and development, speculative new technology options may eventually be proven and validated.)

    There is no way to transform declassification today without an adjustment of our tolerance for risk (i.e. the occasional inadvertent disclosure of classified information). If we are prepared to accept an elevated degree of risk, then the declassification challenge can be effectively surmounted at a reduced cost per page. Without accepting an elevated degree of risk (and without a massive infusion of new funds for declassification), the backlog of historical classified records will grow and will remain publicly inaccessible for the indefinite future.

    1. I had the pleasure of working with Dr. Scotty at GWU’s lab between 1997 & 1998. The problem then was that Dr. Scott was out ahead of computing power, so then current machines couldn’t manage the kind of transactions per second that NLP, Expert Systems and Machine Learning algorithms need. Work today by Mr. Jonas, the University of Texas at Austin and George Mason University is a generation ahead of what Scotty was doing in the 90s. As discussed above, the overwhelming volume of electronic records requires sophisticated content analysis software run out of a toolbox of tools and able to learn from both historical data that is available and human interaction with the software itself. One key principle is leveraging machine learning to improve the results of expert systems to improve results with each interaction. Making these pieces work together is not simple, but its within reach. GMU has done some amazing work on ontology generation and that will be a key element in designing a system that will understand the myriad businesses of the federal government and focus on those aspects that can be either released or must be protected. In 1998 Dr. Scott was about all we had in addition to FIDUL (Federal Intelligent Document Understanding Laboratory). FIDUL, like Scotty was doomed because we knew then what we needed, but computers couldn’t deliver. We absolutely need a somewhat independent lab – outside the beltway – where scientists from academia and industry can collaborate freely without us government bureaucrats defining the results by public optic or individual agency agendas. Just my 2¢

  7. It is not so much of a “sophisticated content analysis software” problem as is the problem of access to a reasonable sample of the business information. For this problem, developing an affordable and cost effective “out of the box solution” that meets muster is achieved with testing, verification and validation using user information or a close approximation from which projections are both reasonable and supportable. I believe that industry has the wherewithal to develop sophisticated content analysis software. There are applications that can be adapted, integrated, bundled, etc., but need testing with user veracity and focus. However, without government interaction and assistance with their information, the question of investing industry funds in a down market with government funding cuts is not attractive. On the other hand, there are a number of industry information manipulation problems that if solved may assist the government in the future. The question, can the Intelligence Community afford not to be the leader?

  8. The idea of using context accumulation technology for classification decisions (versus declassification decisions) is highly problematic, in my view. Especially in its initial years of operation, the technology is bound to yield a significant number of false positives and false negatives. Assuming that context accumulation technology will supplement rather than replace human effort, the false negatives (information that is subject to classification, but not flagged by the automated process) will be corrected by human input. Realistically, given how rarely authorized users challenge improper classification decisions despite their obligation to do so, the false positives (information that is erroneously classified) will not be corrected. The use of this technology will thus exacerbate the problem of overclassification. (The idea that context accumulation technology will reduce the potential for overclassification “by ensuring that classification determinations are made in the strictest accordance with current policy and only in appropriate circumstances” makes sense only if the technology will entirely replace human effort. I assume that is not the case, and that human classifiers will always retain the authority to classify documents not classified by the automated process.)

    More fundamentally, we should be urging classifiers to put more thought into their classification decisions, rather than simply classifying by rote (one of the main culprits behind overclassification). How can we simultaneously ask classifiers to be more thoughtful about their decisions while assigning these same decisions to a computer program? To say the least, this would be sending a mixed message.

    Relatedly, as Bill Leonard often points out, the classification EO permits the classification of certain information, but doesn’t *require* it. In most cases, there is room for judgment – i.e., for a case-specific determination that broader considerations of policy, strategy, or public interest weigh against classifying certain information even though it means the relevant criteria. To my knowledge, context accumulation technology is not capable of making that type of judgment.

    I’m less concerned with applying this technology to declassification efforts. Unlike classification decisions, declassification in many cases *should* be automatic. The inevitable “false negatives” are less problematic, in this context, than “false positives” in the context of classification decisions, because they result in no change to the status quo – and because supplemental human effort in the declassification context would be directed toward catching such omissions.

  9. NARA has sponsored research related to identifying equities, content summarization, decision-support processes, etc since at least the late 1990s. You can find links to some of this work here:

    Over the years NARA and its Research Partners have collected empirical evidence related to the effectiveness of NLP, machine learning, information extraction, decision-support sytems, etc that could be used to inform this discussion.

    I am especially intriqued by the idea of incorporating tools into the classification process. This could certainly make it easier to declassify records downstream.

    Another area for consideration would be the use of visualization tools to isolate the areas where the needles are most likely to be located in the haystack. NARA has sponsored research in this area as well. See for example:

    Great discussion so far.

  10. While I agree that the scale of the problem is so vast that machine-based decision mechanisms could be helpful, I align myself with the remarks of several respondents who argue that the basic necessity is to transform the problem. In addition, given the very limited resources available, diverting them into a laboratory or a project to identify useful decision criteria for an AI-driven expert system simply takes us away from the issue at hand. I will say more regarding a machine-based system in connection with another of these papers, but here let me say that focusing on a laboratory at this stage of play is only tinkering at the margins of the problem. I do not oppose some machine-based mechanism to assist declassification personnel, but let’s talk about one after the backlog has been cleared and the problem has resolved itself into setting a course for the future.

Comments are closed.