A lot of messianic stuff is written about what the future of journalism looks like, but new site Wikileaks has a pre-launch taster of the kind of materials and treatment that could underpin it.
It has a leak of what it claims is a list of US Military Equipment in Afghanistan. Wikileaks applies a lot of Computer Assisted Reporting techniques to analysing the data. Here is how they did it:
The analysis proceeded as follows:
- Understand the abbreviations, acronyms numbers and other nomenclature in the leak (specifically NSN, LIN, UIC) using publicly available source information. The results of which have been documented in US Military Logistics and elsewhere.
- Discover various public NATO Stock Number catalogues. Confirm the the validity of random samples of the leak using these databases and other deployment references.
- By hand create tallies for a few interesting items observed by inspection. Write up an initial draft of the high-level analysis.
- Learn Python. Using vim macros, perl and a couple of Python programs, put the material into more presentable form, i.e Afghanistan OEF Property List and Afghanistan OEF Property List.html.
- Write additional code to split out the NATO Supply Group and NATO Supply Classification from the NATO Stock Number (NSN)
- Obtain a list of NATO Supply Group and NATO Supply Classification codes from public US Military logistics sources
- Learn Structured Query Language and install a database program.
- Pull the original leak, the group and classification code tables into a SQL database, in this case, sqlite, but any SQL database would have sufficed.
- Experiment with SQL. Merge in NATO Supply Classifications into the main leak for extra context and generate Afghanistan OEF Property List-extended.html.
- Experiment with SQL and discover how to generate several different tallies for the leaked items; by NATO Supply Group, NATO Supply Classification and NATO Stock Number. Convert to HTML and place into the Appendix .
- Using SQL, generate a unique list of NSNs. Write a program to concurrently query the US Logistics web-query NSN search for pricing information and extract the price for every NSN on the list (except alphanumerical NSN’s which are not listed, probably due to being Management Control Numbers).
- Pull in the pricing information to the SQL database.
- Using SQL, generate a new tally by NSN, join this together with the pricing information for each NSN, sort by total price, convert to HTML and place it into the Appendix .
- Using SQL calculate the total value of all equipment for which we have prices.
- By inspection extract additional features of interest – Notable Units , and items.
If you want to help, they have a list of tasks to move things on…
3 responses to “The new journalism?”
Terrific. New journalists will doing valuable work that terrorists cannot do for themselves. Many in the States will not see this as new, but as already practiced by the NY Times. Seriously, I get your point, by you have to admit that the example is kinda creepy. (Steve Boriss, The Future of News)
I’d say of less interest to terrorists than the PLA or Russian military.
Still, in terms of equipment costs there’s a valuable debate to be had about just how well resourced units are. Certainly be interesting to compare equipment levels with British forces in the same theatre, who I would imagine are having to operate with far less.
Adrian, I agree it might be interesting information, but I prefer my defense of Western civilization the old-fashioned way — by allowing our military forces to keep the secrets they feel they need to keep to do their jobs, with sufficient classified civilian oversight. Re: the PLA and Russia post-London poisoning, I guess it depends what the meaning of “terrorist” is.