Just Exactly What Kind Of Data Mining Are We Talking About?


An email from a reader:

I'm a long-time reader who has read with interest your many posts on the Bush administration's various attempts to spy on Americans. You should be aware that various Bush officials are lying or misdirecting the public when they say that none of their domestic suveillance programs includes widespread data-mining.

CIA director Stephen Hadley has said "it is not a driftnet" and John Negroponte has said the NSA was "absolutely not" monitoring domestic calls (link). Both are lying. Domestic surveillance isn't just about call logs and although the calls themselves aren't sifted, recording of those calls are. Despite the claims of NSA Director McConnell, the programs run by the NSA are neither surgical (link) nor "limited to 100 wiretaps" (link). Certain aspects of the NSA programs may be as they describe them, but they aren't describing the whole thing.

A company called Nexidia (http://www.nexidia.com/), developed the software used by NSA and offers a commercial version to call center companies. Nexidia admits in it's in-person pitch to those call centers (but not on its website) that it provided the NSA software. On its website it simply says - during a promo video ( http://www.nexidia.com/ovation/intro/index.php) - that "Nexidia is also used in the government marketplace." Watch the video for yourself. Those words are at the end between 1:35 and 1:40.

Nexidia's software basically takes recorded audio and indexes it the way Google does text on a page. Type in your search string, wait a while for the database search to compile, and all the calls with that instance come up with the audio passage itself marked. You can isolate every instance of "ass hole" and "damn you" and "shut the hell up", for example. When used by the NSA on recorded phone calls, that constitutes data-mining and a "driftnet" approach, to my view. These officials need to be exposed as the liars they are.

Gen. Hayden and the rest are lying to the public. They talk as if these "alleged keyword searches" are some fantasy, pie-in-the-sky approach to data mining, when in fact they are publicly available technologies that corporations are already using. This is not about the people at Nexidia who have developed this technology nor the call centers using it for legal commercial purposes. It may even be true that being able to audit large swaths of audio recordings for particular keywords, when such auditing is done within the law, can be a powerful tool to help protect the people of the United States from threats like terrorism.

This is strictly about administration officials' treatment of this technology. It is not fanciful, it is not myth, and it is not some dreamland of computer technology. It is in use today and has been used by NSA. It may still be in use by NSA. When NSA threatens action against the nation's largest cellular telephone providers (http://www.cbsnews.com/stories/2006/05/12/politics/main1613877.shtml) unless they provide data, and technology such as Nexidia's software is in use by them, the public has a right to know about the ease with which their government can listen in on their personal communications.

Anything else we should know?

Update: Lambert has a great compilation on the subject.


Sean-Paul Kelley August 30, 2007 - 7:56pm
( categories: Liberties )

Add in this from Wired the other day:
The FBI has quietly built a sophisticated, point-and-click surveillance system that performs instant wiretaps on almost any communications device, according to nearly a thousand pages of restricted documents newly released under the Freedom of Information Act.

The surveillance system, called DCSNet, for Digital Collection System Network, connects FBI wiretapping rooms to switches controlled by traditional land-line operators, internet-telephony providers and cellular companies. It is far more intricately woven into the nation's telecom infrastructure than observers suspected.

It's a "comprehensive wiretap system that intercepts wire-line phones, cellular phones, SMS and push-to-talk systems," says Steven Bellovin, a Columbia University computer science professor and longtime surveillance expert.

Regards, C

Cernig August 30, 2007 - 8:44pm

...I'd say that given the computational power requirements, this probably isn't as big a "driftnet" as folks might think. Doubtless it has applications (if it works), but I'm pretty skeptical of an "indexed Echelon" capability.

"The spectacle of this great nation which does not know its own mind is as humiliating as it is dangerous." ~ Walter Lippmann

JustPlainDave August 30, 2007 - 9:52pm

Has put so much computational power into its HQ that its in danger of outrunning the local power supply, and is planning another massive facility down in Tx.

Dave, did you watch the vid or look around Nexidia's site?

Regards, C

Cernig August 31, 2007 - 11:59am

My understanding of the demands of that technology and my understanding of how NSA works leads me to believe that things are a lot more tiered than people seem to automatically presume. To the extent that there's a drift net on the domestic side (and I think there probably is) I think it's about using traffic analysis to flag communications of interest - this will produce a much, much, much (a significant number of orders of magnitude I would guess [I'd have to do some more informed thinking about the volume of traffic they're collecting vs. the scale of the targets to predict better, but I'd guess 6 or 8 or so]) smaller body of material that then gets fed through the sort of indexing that Nexidia provides.

"The spectacle of this great nation which does not know its own mind is as humiliating as it is dangerous." ~ Walter Lippmann

JustPlainDave August 31, 2007 - 3:08pm

they identify phonemes, then put them in contexts that they're looking for and then drill down further from there.

This may mean that they are able to listen far more efficiently, thus ignoring irrelevant stuff which could invade your privacy.

So at a deeper level it may be less invasive; however, at the front end it seems quite invasive.

http://mauberly.blogspot.com/

mauberly August 31, 2007 - 5:10pm

What good does it do you to "flag communications of interest" if you can't cross reference them to anything historically?

If it were MY database and MY telecom network I'd pay AT&T and Global Crossing for facility space, install my splitters, build a backbone network to route domestic traffic to a central location with lots of inexpensive disks. Once in the door, I'd dump it to disk straight off the pipes in parallel. As my staging disk volume groups fill up I'd use background processes on something like IBM SP2s to read, write and index those bits by to/from, whether voice, internet or other. Once indexed, those bits, which might be voice, html, data, other can be used to do meaningful analysis of "communications of interest". Nothing would be "transcribed" real-time, unless required, but the equipment I'd install can do that too, and I have the backbone to watch real-time when needed.

The basic problem with "communications of interest" is that they expand from the point of origin. You are interesting. Who did you call in the past year? Who did they call? What email did they send? What web sites did they look at? To answer these questions I'd want to store it all, indexed by to/from in compressed digital format and would only uncompress the "interesting" things as I needed to look at them. The things that are interesting become interesting when they are uncompressed and "scored" for relevancy. They become even MORE interesting when they are cross referenced with other searchs. Everyone would have a "score". Just like your credit rating.

It is much more costly to pass very large amounts of digital information against large filter sets, than it is to simply dump to disk, index and coalesce later and then analyze. And determining what is "interesting" is historical, not predictive.

Two problems though. Once I have all of this on disk, it never goes away. Deleting all the stuff I don't need can never be done, because I'll NEVER know what I might need and it is VERY costly to sift through my thousands of terabytes of data to remove old stuff and free up a little disk space.

Second, and the most obvious, is that simply by having this stuff I can declare you, JustPlainDave, a terrorist, and make your life and everyone you know miserable. Happened in Greece. With Hayden in charge, and Rove sitting in meetings, what makes us think it can't happen here? What makes Rove so sure Hillary is a "fatally flawed candidate"? Perhaps Karl has been getting some transcripts from my little database...

Bonus questions:

1) For fun, who owns Hawaiian Telecom?
2) Which branches of our Armed Services are not represented on the Board of Directors of "public" company Global Crossing?
3) Who owns the facility where the trans-Pacific fiber loops terminate in Hawaii?
4) And lastly, why don't private companies care that their communications are completely compromised?

Time to put the tin foil hat to bed.
Cheers

dhomyak September 4, 2007 - 12:33am

1) The Carlyle Group
2) Nearly all the MIC is represented, just not necessarily by brass.
3) AT&T
4) That's the price they have to pay for a place at the trough.


Turn back to the Constitution - and

READ it.
Rick September 4, 2007 - 6:26am

A+

dhomyak September 4, 2007 - 1:35pm

Why would you presume that if traffic analysis is used to flag comms of interest that they become ahistorical and you can't reference them to anything? I'd guess that the highest weighting factors in identifying traffic come from past history on the numbers, geography, etc. and that that previous traffic is archived.

I've no doubt that they're collecting significant amounts of potential take for sifting later, but my sense is that's got to be heavily tiered by the traffic analysis as well - the mass of data otherwise is simply too large. How much storage does it take to collect an entire day of voice calls using a driftnet approach where you collect everything - now what about a week, month, year? I've seen back of the envelope calculations that lead me to be immensely skeptical - what are yours?

"The spectacle of this great nation which does not know its own mind is as humiliating as it is dangerous." ~ Walter Lippmann

JustPlainDave September 4, 2007 - 6:45am

http://www.autonomy.com/content/home/

For an example of how their thinking goes on the subject.
The last Washington Insider trade magazines relating to the IT market where pointing to an increasing trend of government outsourcing fo IT services. That means private companies would ultimately have access and really be in charge of these super secret agencies.

Lasthorseman August 31, 2007 - 3:50pm

This is the manner of large IT systems on which I've worked. Let's discuss "success factors" in intercepting calls which could be "bad". The objective here is to identify "really bad" calls in real time for immediate action, down to recording calls which might just be "bad".

Here are some categories of calls. An exhaustive list could be 50 to 100 categories.

1. Calls from or to "suspected numbers".
2. Calls that are between family members.
3. Calls originating from "immigrant" populations (negihborhoods).
4. Calls from llarge busineses to large busineses.

etc.

Based on the calling and called parties one could asign a "risk factor" to a call.

Then one could scan (and record) the "high risk" calls and "word spot", as discused above.

Calls could then be assigned into risk categories, whith the highest risk category demanding a real-time intercept and live person listening to the call, down to a recording that might get reviewed sometime.

The objective here is to eliminate as many false positives as posible, while refing the "risk triggers" iin the system.

The reports from the FBI on many "false positivies" leads me to belive that such as system is operations, and is undergoing "tuning for risk management".

To digress:

On course, this completly ignores that one the objective of the "terrorists" is to excite the US inelligence agencies & law enforcement, make them terrorize us and waste time money & energy (that is, use our own agencies to terrorize us while trying to protect us). This requires little other than a few cell phones and some "suspects" chatting on the cell phone about their devilish plans, while having every intention of doing nothing but cause excitement. (The bomb threat effect).

It's interesting to calclate the cost effectiveness of a single phone call, vs the reaction of DHS & other homeland security departments.

10 phone calls spread over 5 days, 20 people discussing the evil plot for 10 minutes a day, total cost 1,000 minutes total or about one third of a man-hour.

And a response for a week by 500 US security folks, for a cost of 10 man-years.

Not bad, somewhere between a 6,000:1 to a 10:000:1 "return on terror investment".

There are 200,000,000 man-years/year in the US. It'd only take 5,000 to 20,000 man-years/year of "terrorist activity" to completly paralyze the US economy. That's about 0.1% of the palestinian population. And we'd be fighting them over here, while they are over there!

Throw in a few incidents in the US or Europe, and the pot would be merrily boiling.

Perhaps we need a better way to fight terrorist? An economic and political solution, not military & law enforcement?

My apologies for the digression.

Synoia September 2, 2007 - 7:26am

was the best part.

http://mauberly.blogspot.com/

mauberly September 2, 2007 - 1:01pm

Had an absolutely miserable (stupid, micromanaging) boss. He asked for the weeks tasks lists every monday and would assemble them into a wonderful Gantt chart to show his bosses. So every Monday, we just randomly re-arranged our task lists. That kept him busy until Thurs afternoon. And he played poker Thurs nights, so Friday wasn't a problem.

gone September 4, 2007 - 4:31pm

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.