• Home
  • News
  • Coins2Day 500
  • Tech
  • Finance
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechFuture of Work

How Lockheed Martin cleans dirty healthcare data

By
Jonathan Vanian
Jonathan Vanian
Down Arrow Button Icon
By
Jonathan Vanian
Jonathan Vanian
Down Arrow Button Icon
August 24, 2015, 12:30 PM ET
175422122
Screens with program codePhotograph by Getty Images

If any company knows a thing or two about sifting through mountains of data, defense contracting giant Lockheed Martin is surely near the top of the list.

Besides developing air-to-missiles and weapons systems, the multi-billion dollar company also helps customers with their technology infrastructure and stitching together disparate databases. But that task is not as easy as it might seem because data is often messy and disorganized.

Ravi Hubbly, a senior engineering manager for Lockheed Martin (LMT), knows just how tedious the job can be. Hubbly works with Lockheed Martin’s health and life sciences group, which works with federal agencies and medical companies to improve how they process information they collect about patients, drug trials, and billing.

Five years ago, the U.S. Department of Health and Human Services unveiled its ambitious Health.Data.gov project to make more government healthcare data available to companies and government agencies. The idea was that by making more data easily available to download, organizations could develop new software and services to help improve the health care industry’s efficiency.

The problem, however, is that a lot of that data can be really dirty, Hubbly explained.

Hubbly’s 20-person team works with healthcare clients to build systems that can sift through large amounts of healthcare data and identify fraud. By comparing data from the government’s health care initiative with internal corporate data, health care companies can potentially spot when they are being scammed.

“You need to see the full health lifecycle,” said Hubbly on the importance of comparing multiple data sets — like Medicare payments records and physician databases.

However, before health care companies can start crunching numbers to uncover crooked doctors who bill for bogus cancer treatments, all that healthcare data has to be cleaned up and uniform, Hubbly explained. But all too frequently, it is full of incorrect information and unfilled fields.

For example, when a drug gets released to the market, patients and third parties can provide data to the FDA about any adverse drug reactions they may experience, Hubbly said. A patient or doctor could have easily logged in a dose of 20 tablets of one drug instead of two tablets, thus making the information inaccurate.

And it’s not just poorly entered data that can interfere with efforts to analyze data. Before analyzing multiple datasets, technicians must merge the databases in what’s known as a “join.”

A major problem that plagues companies when merging huge amounts of information is that it may be correct in one database but incorrect in another that they want to compare it with. One database could contain the name “Hewlett-Packard” to represent the enterprise technology giant while another might use the abbreviation, “HP.” Both are technically correct, but they both contain different data points that can complicate things.

In order to clean up healthcare data, and any sort of mixed-up data for that matter, Lockheed Martin uses the services of a startup called Trifacta to help sort through the information. The company is one of many new startups—including Tamr and Paxata—that have been raking in millions from investors in recent years amid a boom in data analysis.

While cleaning data to prep for analysis isn’t a new idea, the technologies now available has made the process faster and more efficient. For one thing, these data cleaning technologies work in conjunction with the open-source big data technology Hadoop, which acts as a giant digital repository that companies can dump their data into without any “limit to how much data can be processed,” Hubbly said.

Startups like Trifacta also include machine-learning algorithms in their technology that helps them learn how to best to modify the data just the way a customer wants. In the case of merging two databases together containing both “Hewlett-Packard” and “HP,” data analysts can enter that they want the system to automatically recognize those words as the same thing. The algorithms help train the system to learn from the analysts’ actions so the next time those words appear in the database, it will know how to group them together.

The system basically automates the time-consuming task of having to manually sift through the different databases.

Data that previously took three to four weeks to prepare for analysis can now be handled almost instantaneously with the new data-cleaning tools on the market, Hubbly said.

It should be noted that using technology to clean data is not limited to only the healthcare industry. Any business looking to analyze data should take steps to verify that their working with clean information. A telecommunications company that’s been acquiring businesses, for example, will often be inundated with a mish-mash of information that it must scrub before combining it with its master data.

But as Hubbly explained, having to merge data is not longer the nightmare situation it used to be, when company coders had to manually cobble together ways to automate the task. With the new tools on the market, coders require less time to help the business side clean data that in their effort to improve their bottom line.

Subscribe to Data Sheet, Coins2Day’s daily newsletter on the business of technology.

For more on data, check out the following Coins2Day video:

About the Author
By Jonathan Vanian
LinkedIn iconTwitter icon

Jonathan Vanian is a former Coins2Day reporter. He covered business technology, cybersecurity, artificial intelligence, data privacy, and other topics.

See full bioRight Arrow Button Icon
Rankings
  • 100 Best Companies
  • Coins2Day 500
  • Global 500
  • Coins2Day 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Coins2Day Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Coins2Day Brand Studio
  • Coins2Day Analytics
  • Coins2Day Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Coins2Day
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Coins2Day Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Coins2Day Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.