You are viewing an obsolete version of the DU website which is no longer supported by the Administrators. Visit The New DU.
Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

To my Friends who think there is too much data for the NSA to deal with... [View All]

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (01/01/06 through 01/22/2007) Donate to DU
hootinholler Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Dec-31-05 03:26 PM
Original message
To my Friends who think there is too much data for the NSA to deal with...
Advertisements [?]
The NSA was founded as an information aggregator to collect communications from foreign nations. Since its inception (I'm unsure of the date, but I'm pretty sure it was well established by the late 70's), it has been one of the big consumers of high powered computer hardware. Over the years they've bought really big boxen from IBM and Cray, middle class boxes like VAX and later Suns and untold numbers of Intel based processors. It is truly not an Enterprise class shop, rather, it's an Empire class shop. The investment in technology and the recruitment of the brightest brains in computer science is truly staggering.

I've seen a number of people here pooh-poo the 'wiretap' issue because the NSA doesn't have the bandwidth or capacity to read all of the email, or phone conversations, etc. The implication being that I'm not worried 'cause there's too much stuff for them to deal with.

I could whine on about it wasn't really a wiretap issue which is why they didn't go the FISA route, but, I'd rather point out that that argument is the answer to the wrong question. The appropriate question should be what portion of all that stuff do we have to deal with to obtain useful results? In other words, it's a matter of efficiency.

I have considerable professional experience in the Information Retrieval field. What I lay out here is not what the NSA has, but, rather, what I would set up given the premise that the system would analyze data streams on the order of a terabyte/hour.

The first thing to be captured is the information about the communication, known in the industry as meta-data, or data about data. This in and of itself is a useful byproduct of the preparatory, or grooming phase of the intake processing. However valuable this data is on its own, a major reason this data is needed is for use in the noise reduction of the communications stream being monitored, think spam filtering. Unlike us, trying to individually filter spam, the view of the traffic afforded by the meta-data would allow rejection of broadcast types of messages by an analysis of the traffic originated at the source. This traffic would still be cataloged, but probably not indexed, only archived.

The remaining traffic would be prioritized for indexing queues. Those digital sources with high priorities would be indexed in a matter of minutes from capture. Indexing yields another set of meta-data that is further analyzed. This set of data is about the content of the messages.

This new data is then inserted into the index collections and passed through filters for automated routing. Those filters are basically searches that have been saved to get new hits. These are not simple keyword searches, but are quite sophisticated.

Concepts expressed within the stream would be identified and parties would be marked as subject matter experts as the concept repeats in communications. Social and professional circles would be mapped with alacrity. A concept of interest thus becomes a person of interest which broadens into a circle of interesting parties.

The software to do this can be had by any one with enough money, off the shelf, today.

So, if you're not concerned because of the magnitude of the data set, I ask: "How efficient does this process need to be before you become concerned?" 1%? 10%? 50%?

Even this post is an answer to the wrong question because whatever they did do was done unconstitutionally. I think that even archiving spam violates the law.

-Hoot

Printer Friendly | Permalink |  | Top
 

Home » Discuss » Archives » General Discussion (01/01/06 through 01/22/2007) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC