Clarity from Complexity Part II: An Interview with Dr. Jon Kimminau on Big Data and Activity Based Intelligence

This is part two of a three part interview with Dr. Jon Kimminau on the future of intelligence analysis. Part one can be read here. 


Part II:

Dr. Jon Kimminau (JK): So if you look across these four data analysis activities there are a number of conclusions that can be drawn and I’ll start with some of the high points. [Note: the four activities include Big Data triage, forensic network analysis, activity forecasting, and collaborative analytics. More information can be found in Part I of this series]. The first one is that if you go and talk to some of the folks that are doing this today and you look across the four activities and you ask those people where they spend their time across them, you find that about 60-70% of analysts’ time is spent in Big Data triage. That’s an important thing to think about because that’s prior to anything everybody expects out of this. This is just: “how do we get the data and structure it in a way that I can then go and apply my tools to it?” So that has portents in what we need to invest more in if our analysts doing this have to spend up to 70% of their time just on the data triage.

The second conclusion is that people are way underestimating that third activity about forecasting. We can’t get the tools that we need or would like to go beyond just what I’d call the descriptive analytic – and we want to start getting into the predictive analytics and cognitive analytics which are higher orders – you can’t get there if you don’t do that third column where you have folks modeling and working with the data to build the kinds of tools that will help you do that.

Over the Horizon (OTH): There are a lot of ongoing conversations on the role of Artificial Intelligence (AI) in this kind of predictive analytic work. Is that something that figures heavily into this third activity?

JK: I shy away from the AI label because it gives people the wrong idea. I like “machine learning” which also gets grouped quite often with AI and yes machine learning is a big part of both the modeling and in some respects also a big part of Big Data triage. We can use machine learning to help us capture that streaming data and put it into a form which we need. It’s also quite useful for what we call unstructured data. So if you have large data sets, the simplest form would be “I have these huge documents. How in the world do I get to the data that’s inside?” Machine learning is one of the approaches to breaking that down. So yes, it plays heavily into this.

Another conclusion is about the fourth column, the collaborative analytics. If you look at where the investments are being made, there is way too little attention being paid to the last column. And if you don’t like the term collaborative analytics you could also say “the user environment” – what we want our operators to see and how they participate in the analytics and just the ability to do the things I named like Amazon.

Wouldn’t we like to have analysts that can sit down and when they request data or information about a particular area they can then also see “people asking for this also asked for this” and “Oh did you know Mary, John, and Sam just asked this question and they’re working in your region or your function.” All of those kinds of things that would help, we are investing so little in that area right now. I’ll even underline that a little more. If you look at the data itself, the data involved in data analytics, data as an asset, and look across these four activities where the commercial world is telling us they get by far the best value is in that fourth activity, where the user interfaces with the data analytics. It’s not in the first where they gather the data… where they’re getting the value is the user end of this and so that should be a lesson for us that maybe our best value, if we can get involved in data analytics, is going to come from what the users can do and interact with and get out of that interaction.

So you asked how Big Data analytics fits into the future vision and well it absolutely fits! We’ve got to somehow realize this.

Before moving on, I’d like to put one more point on this: none of this substitutes for what I would call classic analytics. Classic analytics is where we have a question we try and go find out what data we have on it, we piece it together like a puzzle and we provide the answers. We still need classic analytics but Big Data analytics is kind of on the other side, it’s inductive, it starts with: let’s gather the data and let’s explore it and then we might find stuff relevant to things we’re working on. Another way to describe those two sides is that the classic approach is deductive and the Big Data analytics is inductive. We start with the data first, we don’t start with questions.

OTH: How is the US defense community progressing toward that vision?

JK: So I guess I’ll talk first about the Intelligence Community because the Air Force is behind on this. From an Intelligence Community standpoint the first part is to actually have the foundation to do all of this. The infrastructure is a program the Director of National Intelligence put together which is called ICITE. It stands for Intelligence Community (IC) and the ITE is Information Technology Environment. ICITE is basically the foundation to be able to do analytics. So think of it as providing the clouds, the services, the platform that you can put tools on, the interaction, the access, … all of that comes with that environment. So there is progress in that sense because ICITE has come well along and Director Clapper isn’t going to leave until it’s at a point where we can’t go back.

But when I talked about this framework and what that report that the folks did for Activity Based Intelligence in the major issues study, I kind of pulled what they did in that study up a level. Because they proceeded to take their framework and show where current projects are today in terms of the framework. I’ve kind of raised it and asked: so where are data analytics projects across the IC taking place across that framework?” And what you find is two things.

First, if you think of them as bubbles and think that each bubble or program is really concentrating on one aspect or working on one idea within data analytics most of those bubbles, those programs in the IC right now, are concentrated on the left hand side which is big data triage and forensic analysis. So investment-wise, these bubbles are under-emphasizing the modeling that’s necessary for that activity forecasting set of activities and then, like I had mentioned before, there’s almost nothing going on in the collaborative analytics in terms of those projects. So that’s an investment look at it in terms of where we are today.

A second part beyond investment, along with ICITE and data analytics the DNI began about a year and a half ago what he called mission campaigns. Mission campaigns are big ideas that they say “let’s see how ICITE and data analytics can help us tackle this big idea.” They had seven mission campaigns go from 2015-2016 and the conclusion they drew from the first round, and this relates to the bubbles, was that we aren’t sharing the infrastructure necessary to do these things. So every campaign, every project let’s say, has to kind of reinvent the entire framework for data analytics just for their project. Another way to say that is that every one of those projects has their own approach to the data triage and structuring rather than sharing one common one. So this idea is that, while we are doing data analytics, we have not reached the point where we can share it all. Obviously, it would be so much more efficient and you might even be able to do more projects or initiatives within this if you could somehow get to that common infrastructure, let’s call it for now. That’s the second conclusion.

A third conclusion I will mention real quick in terms of where we are today: if we think of data analytics as kind of new wine, we are trying right now to put new wine in old wineskins and the way they articulate this in the Intelligence Community is they say “our mission workflows need to change and need to change radically to accommodate data analytics.” We’re trying to do new business in old ways and it doesn’t work, it doesn’t fit.

ic_circle-jpg

This is where the Intelligence Community is. The Air Force is definitely way behind. We are grappling with the most basic questions and if you want to look beyond the Air Force, and say the Department of Defense (DoD), it is also behind this – the part of DOD that is the Military Services. The part of DoD that is big agencies is a full participant in ICITE and where it’s going but the Services are behind. Example: the DoD equivalent of ICITE is supposed to be DI2E and that stands for Defense Intelligence Information Enterprise. DI2E itself is going to be considered kind of a sub-bubble or bridging bubble to JIE which is a Joint Information Environment. Each of those are trying to establish the same thing that ICITE is, that infrastructure for where we go. Well, I’ll just tell you DI2E and JIE are probably at least 2 or 3 years behind where ICITE is and ICITE is, in crawl walk run stages, they’re just transitioning from crawling to walking. So we are behind.

OTH: So if it is supposed to be a common framework amongst everybody, why is the US building two different infrastructures, especially if one of these is years ahead of the other?

JK: That’s a great question. It’s because for right now the infrastructures are isolated to their information Classification level. ICITE is Top Secret level and the DOD is Secret and Unclassified levels. So JIE is going to be Secret and Unclassified and that’s why it has to be on its own and DI2E is kind a bubble between them because it’s struggling with how do I not become separated from the Intelligence Community and yet serve our customers that are fighting on lower Classification level networks. So that is a big question DI2E is struggling with, or challenged with let’s say. And that’s why the separate information environments, it’s really Classification levels right now.

OTH: What are your thoughts on what was described in a recent Foreign Affairs article as the “Age of Transparency” and its implications for the future of defense and national security? This includes things like the Snowden leaks and Twitter analytics or commercial imagery that reveal events and movements as they happen. Given that Big Data works best with volume and variety of data and that Open Source data will be an increasingly large contributor, how do you see future intelligence collection being shaped by this?

JK: Here’s how I think about it: it almost gets a little philosophical. I think of the issues that were raised in the Age of Transparency article is an almost separate paradigm shift from what we’re talking about in data analytics and Big Data, and here’s why. If you go back and look at intelligence for the last 50 years, the way we have approached it is that we draw on intel sources, we collect in an intel sense, and we draw on those collected intel things and make our analytic products and services from those intel sources and maybe add what we now call open source to it where necessary, kind of the condiments that we would add to an intel product. What is coming true, what I believe is true today and we just haven’t shifted our own paradigm enough, is that the sources of information that you can get from Open Source should be the foundation and the intel collection and sources should be the condiments and that’s a massive reversal. I don’t remember if the author of the article used this but I remember sharing it with him when we were discussing it.

There’s this anecdote going around that’s supposed to be attributed to Robert Cardillo who is currently the director of The National Geospatial-Intelligence Agency. He had a bunch of seniors around a table and he said let’s just consider this table the whole sea of information that’s available to us to understand what’s going on around us. Then he took a piece of paper and tore the corner off of it and threw that corner on the table and said this is where the intelligence community spends all of its time, this little piece of paper of what we collect. How do we change our approach so that we’re taking advantage of everything? I think that’s absolutely what we have to do and for us in the intelligence business, that’s the key issue going on with the Age of Transparency.

We have to shift our paradigm and recognize that the foundation of the knowledge that we need to do our products and services can come from Open Source and that we use our intelligence collection to fill the gaps, to be the condiment, to be the extra thing we need that isn’t already publicly available in some fashion or form. So to me, that’s where Open Sources come in and that’s why I consider it a separate paradigm shift from that of Big Data. They definitely overlap at some point, particularly when we talk about what we do but there are two different things going on. But they both have the roots, don’t they, the roots in technology of today and how we are digitizing everything.

OTH: Are the volume and validity aspects of Open Source data relevant here? The idea that the amount of publicly available data is massive but often of imprecise or questionable validity seems to fit the Big Data model that is focused more on truths that can be derived through data volume and less through data precision.

JK: There actually have been some guys doing studies on that. One study I know of said: hey, if we get one source that we consider, or one data set let’s say that’s very large, but we clean it, we take out all of that either redundant or dirty data and just have a clean data set, and we have another one that has everything, all the dirty data in one big set, and we try and apply our analytics to both of those data sets to compare it, what do we find? What some of these studies have been finding is you’re getting better information out of the dirty data set. It has something to do with all of the volumes you get but also that some dirty data can point you to things you didn’t know that are actually happening. So that’s why I downplay the idea – and there’s other studies like that, I’m not relying on one study – that’s why I personally downplay the idea that open sources are full of deliberately misleading things and dirty data, because it appears the more data you have the better equipped you are to actually to find out things that are going on.

And you brought something else up there too. I use analogs to talk about this, sort of, open sources versus intel data. I call one of the analogs the video compression model. So if you’re using one of those peer-to-peer video things on your computer and you open a little window and you’re looking at each other, a lot of people don’t realize that behind that are some algorithms going on that don’t transmit the entire actual picture to you all the time. What some of those compression techniques are doing is, they will get one big picture and then after that will only transmit the things that change, the moving person or the mouth that’s moving and the rest is still the static picture. Because the rest, if it’s not changing, I don’t need to give you the repeat data. That’s one way to think about intelligence collection in the future. How do we set this up so that we sample areas to kind of lay a foundation and then we’re only looking for what changes. It’s a different approach to how we do things. Open source might give us the static picture and we might use intel collections for the things that are moving, that’s one way to think about it.

Another way to think about it is something called monovision and I’m familiar with it because I’m a near-sighted guy but as I get older I also need some assistance on reading. I like contacts but I have that difference in vision requirements so what they’re using for many people now is one contact that is more focused on far vision and the other contact is a little more adjusted to handle near vision. So it’s not bifocals, it’s one eye focusing on one thing and the other on the other challenge. I also like to use that in what we’re talking about here. If we think about maybe open sources can provide me, let’s say the far vision and maybe I only need intel to kind of do the near vision but put together I have full vision and that’s another way for us to think about how do we manage the balance between these two things. 

Jon “Doc” Kimminau is the Air Force Analysis Mission Technical Advisor for the Deputy Chief of Staff, Intelligence, Surveillance and Reconnaissance. He is a Defense Intelligence Senior Leader (DISL) serving as the principal advisor on analytic tradecraft, substantive intelligence capabilities, acquisition of analysis technology, human capital and standards. Previously, he served nearly 30 years on active duty as an Air Force intelligence officer. Dr. Kimminau holds a Master’s in Public Policy from the Kennedy School of Government, Harvard University, a Master’s in Airpower Art and Science from the School of Advanced Airpower Studies (SAAS), and a PhD in Political Science from the Ohio State University.

This interview was conducted by Sean Atkins, Editor-in-Chief of Over the Horizon, on 14 December 2016.

Disclaimer: The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

oth-end-of-article-logo-small

2 comments

Leave a Reply