• Home
  • News
  • Coins2Day 500
  • Tech
  • Finance
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechCyber Saturday

The Panama Papers Search Tool Began as an Academic Skunkworks Project

Robert Hackett
By
Robert Hackett
Robert Hackett
Down Arrow Button Icon
Robert Hackett
By
Robert Hackett
Robert Hackett
Down Arrow Button Icon
April 11, 2016, 1:01 PM ET
Google Illustration
BERLIN, GERMANY - JUNE 02: A magnifying glass is seen in front of a screen on which the Google search engine is displayed on June 02, 2014 in Berlin, Germany. (Photo by Michael Gottschalk/Photothek via Getty Images)Michael Gottschalk—Photothek via Getty Images

A version of this post titled “Panamania” originally appeared in the Cyber Saturday edition of Data Sheet, Coins2Day’sdaily tech newsletter.

The Panama Papers—the biggest leak in data journalism history, as Edward Snowden christened it—is not so much a leak as a hemorrhage.

Mossack Fonseca, the Panamanian law firm that specializes in creating companies in off-shore tax havens, lost 2.6 terabytes worth of data, equivalent to 11.5 million documents. To pore over that many reams of documents required lots of reporters, lots of eyeballs, and lots of tech.

I caught up with Mar Cabra, head of the data and research unit at the International Consortium of Investigative Journalists, which coordinated the reporting effort, on Friday afternoon to discuss how the global investigation—more than 400 reporters in 80 countries—took place. “This would not have been possible without technology,” she said.

One aspect I found interesting was her team’s use of open source info-retrieval software: In particular, Apache Tika, Apache Solr, and Blacklight. These tools allowed reporters to dig into the cache and turn up their findings, which in many cases involved tying global leaders to tax-dodging accounts. Tika extracts document data; Solr indexes it; and Blacklight provides a user interface, the packaging and presentation. Why this specific set of tools? “We chose Solr because project Blacklight existed,” Cabra said, mentioning that her team had adopted the search software by mid-2014 for earlier projects. “It’s an interface that’s intuitive and easy to use.”

For more on search software, watch:

I spoke with Erik Hatcher, one of the original developers of Blacklight, on Friday as well. He said he wrote the precursor code—a Ruby on Rails application that layers on top of Solr’s Java code, for the programmers among us—while working in a research group at the University of Virginia. He created the tool to do analytics and search on a database of 19th century literature and poetry. Then he adapted it to accommodate the entirety of the university’s library records.

Hatcher said he’s proud that the search software—today used everywhere from the Rock and Roll Hall of Fame to inside national security organizations—was used in the Panama Papers data dump. “Oftentimes these tools get the job done, but they’re not really exposed in and of themselves,” he said. “They’re just a means to an end—they don’t get as much press.”

“I’m happy in this case that these technologies are being showcased for the power they offer,” he added.

Cabra said that her team is now considering using a bit of rival search software—Elasticsearch—for an upcoming project. She said the group is interested in assembling a centralized cache of all the leaks the consortium has worked on so far. “We call it a knowledge center,” she told me. “It’s going to be a global repository of everything we have.”

Expect a one stop shop for all your investigative journalism needs.

About the Author
Robert Hackett
By Robert Hackett
Instagram iconLinkedIn iconTwitter icon
See full bioRight Arrow Button Icon
Rankings
  • 100 Best Companies
  • Coins2Day 500
  • Global 500
  • Coins2Day 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Coins2Day Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Coins2Day Brand Studio
  • Coins2Day Analytics
  • Coins2Day Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Coins2Day
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Coins2Day Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Coins2Day Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.