[Cross-posted at Digital Fourth]
The NSA has just vigorously denied that their new Utah Data Center, intended for storing and processing intelligence data, will be used to spy on US citizens. The center will have a capacity of at least one yottabyte, and will provide employment for 100-200 people. With the most generous assumptions [200 employees, all employed only on reviewing the data 24/7, only one yottabyte of data, ten years to collect the yottabyte, 5GB per two-hour movie], each employee would be responsible on average for reviewing
4500 billion terabytes, or approximately 23 million years’ worth of Blu-ray quality movies, every year.
This astounding and continually increasing mismatch shows that we are well beyond the point where law enforcement is able to have a human review a manageable amount of the data in its possession potentially relating to terrorist threats. Computer processing power doubles every two years, but law enforcement employment is rising at a rate of about 7% every ten years, and nobody’s going to pay for it to double every two years instead. Purely machine-based review inevitably carries with it a far higher probability that important things will be missed, even if we were to suppose that the data was entirely accurate to begin with – which it certainly is not.
So none of us should be surprised that the FBI and the other agencies who have claimed the ability to prevent terrorist attacks, failed to prevent the Boston Marathon bombings. With this ocean of data, no matter how much artificial intelligence you use, your chances are very remote of being able to catch a threat ahead of time. Of course, after an attack, and especially once you have a suspect, you can go back and do a search on all the data points in the ocean that you almost inevitably missed. Why was Tamerlan Tsarnaev, the elder of the Boston Marathon bombing suspects and one of around 750,000 people in the TIDE database, was not stopped at the border? Why was facial recognition software not able to flag him as a match for a suspect? Why do the fusion centers, intended to synthesize data into actionable “suspicious activity reports”, flag things too late for them to be of any use? Why is the Air Force panicking a little at not having enough people to process the data provided by our drone fleet? Because there’s not only too much data, but much too much too much.
It’s in this context, then, that we should understand the calls for more surveillance after the Boston Marathon attacks for what they are. More cameras, more surveillance drones and more wiretapping, without many more humans to process the data, will make this problem worse, not better. These calls are being driven not by a realistic assessment that surveillance will help prevent the next attack, but by the internal incentives of the players in this market. Neither the drone manufacturers, nor law enforcement, nor elected officials, have an interest in being the ones to call a halt. So instead they’re promoting automation – automated drones, automated surveillance, and email scanning software techniques – which inevitably has a much higher error rate than review by humans.
It’s much easier to claim you need more data, than to solve the mismatch problem. The real solution is much harder politically. In truth, we don’t need a terrorism database with 750,000 names on it. There are not 750,000 people out there who pose any sort of realistic threat to America. If the “terrorism watch list” were limited by law to a thousand records, then law enforcement would have to focus only on the thousand most serious threats. Given the real and likely manpower of the federal government, and the rarity of actual terrorism, that’s more than enough. If law enforcement used the power of the Fourth Amendment, instead of trying to find ways round it, it could focus more on the highest-probability threats.
If law enforcement leveled with the public about their increasingly limited ability to thwart attacks ahead of time, instead of selling us a bill of goods to make us feel safe, they would catch less heat when they fail. They’re going to miss stuff. That’s inevitable under both a tight and a loose system. But a tight system has the added advantages that it protects more people’s liberties, and costs a lot less.