By: Michael Rice, ICG Siemens Consultant
Whether it’s financial data and you’re trying to determine if your business is growing or shrinking, or it’s clinical data and you’re trying to diagnose a patient — not validating your data can have lasting consequences.
Data can be a volatile, unforgiving …predictor of how things are changing.
We rely on data for so many things. It’s hard to argue with the cold, hard facts of what data can surface, but it’s worse to run your business on incorrect data. Take for example the ongoing battle on climate change — it’s all predicated on the validity of the data. I’ll leave it at that.
Data is akin to statistics, as statistics are derived from data, and it was a British Prime Minister who once said, “There are three kinds of lies: lies, damned lies, and statistics.” It’s not a stretch to replace “statistics” with “data” because of the trust that we place in data and what that data can do to change our opinions.
As a healthcare data architect, I commonly find mistakes that other professionals have made when building systems based on data. And what system isn’t? These mistakes can take multiple forms, but the two most common errors are either hiding information (missing data) or presenting false facts (incorrect data).
In healthcare, hiding information can take the form of missing patients for a census or vanishing charges on reports. False facts occurs when data summarizations are wrong (e.g., double charge revenues, incorrect reimbursement payment amounts).
The importance of validating your data cannot be understated. Take, for instance, databases. On a technical level, it’s very easy for an inexperienced database programmer to make a mistake on a “Join” (that is bringing different pieces of information together). In the database world there are different types of joins — Inner Joins, Outer Joins, Full Joins, Cross Joins, Self Joins, etc. They all do different things and by not fully understanding what they do, it’s very simple to surface incorrect data.
For example, using an Outer Join, it’s very easy to create duplicate records. Using an Inner Join, it’s even easier to lose records, as both data sources have two matching records. But I digress; this isn’t about the intricacies of databases.
I was recently working on a project for a client on follow-up for some work that others had worked on. Because we were doing full validation on our data, we quickly realized we were getting some incorrect information presented. Had we continued on without doing validation, the key stakeholders who were making decisions based on the truth and factuality of this data could have made some very bad calls through no fault of their own.
All I ask is one simple thing of you: PLEASE, PLEASE, PLEASE, make it a priority to validate your data.