By Sharat Shashi Nayar, Operations Lead
3.1 trillion USD. That’s IBM’s estimate on the cost of bad quality data in the US alone, in 2016.
How do we define good, clean data?
“Cleaning” refers to the removal of invalid data points from the given data. The end goal of data cleaning is not just to “clean up” the data off its unwanted elements, but also to bring a structure to the same, for it to be used in the near future in a relevant format. Why I say near future is because, a given data, just like all of us, has an expiry date. Hence it is of utmost importance that there is always a consistent cyclical process of cleaning reviewing and cleaning up the existing data to keep it up to date.
The reason bad data costs us so much is that every company, no matter how big or small, and their stakeholders; managers, data scientists, decision makers; rely on this information and accommodate the same in their daily work. Doing so involves a lot of time and effort, which they do not have in plenty. Hence they make random corrections where they find necessary, to complete the task at hand. They neither take the time nor invoke a precaution to eliminate the root causes of the fault. And by the time they do (if they do so), it becomes too late and now there is way too much of work for which they may not have the resources to allocate.
“What is the first thought that comes to your mind when you are hungry?” 20 out of the 20 folks I asked this question to, either mentioned a particular type of food (Pizza, biryani, fried rice, etc) or a Cuisine (Andhra food, Chinese food). And the funny thing is, even with all the apps and start-ups out there, almost none did a good job at helping me with a relevant “seek-and-you-shall-find” process on food. Most of them just listed and rated restaurants in a particular order, giving less or no information about how good a particular food of a restaurant was.
Enter dishq, upping the factor by offering personalized recommendations to an individual’s taste palate. Taste is personal, that’s no secret. But dissecting food preferences down to the individual level is complex. In addition to taste, food choices are contextual, emotional, and sometimes even irrational. We at dishq, build technology that understands all of this and makes smart suggestions. And for the tech and the suggestions of the algorithm to function perfectly, the data has to be impeccable.
A blog on data is never deemed complete unless there is a chart/graph of some kind. And a (self-explanatory) chart is said to be worth a thousand words, like some anonymous genius, once stated.
Access -> Organise -> Analyse -> Improve. That is the simple, but effective method that we use to handle our data. After thorough research across a manifold of data, we handpick the relevant ones, validate and organize the data in a particular order, capture the information most relevant to us, in the format we deem necessary. This captured data is regularly revisited, analyzed, reviewed and updated as and when required.
Total Quality Management is a process which most companies focused on operations or data (though Sai and Kishan, our Co-Founders, call ours a “Pure Tech Company”) use, to improve and maintain the quality of their internal processes with the focus always on end-consumer satisfaction. And these basic seven principles, when diligently implemented and habituated, result in the most surprising advantages.
- Quality can, and MUST BE managed
We, at dishq, have got the most dedicated bunch of people working with us. Easier said than done! Getting people to hit targets is easy; you just follow the “Carrot and Stick” approach. Or you just throw away the carrot and directly use the stick. But this does not get devoted employees loyal to the cause of ensuring quality work.
Hence the D-I-S-H-Q principles – Dedication, Intelligence, Spirit, Humility, Quick. Aside from achieving the daily/weekly/monthly targets, every employee at dishq is constantly evaluated against the D-I-S-H-Q values. We even go a step further to measuring our prospective candidates as well, along with the same values. Only when we feel that they hit a particular number against the values, do we even consider hiring them. Quality, we believe, does not start at the work-level, it starts at the employee level.
- Processes, and not people, are the problem
You may own a Border Collie (Smartest breed of dog) but if you do not know how to train them and keep them engaged, you never will achieve the dog’s true potential. My logic being, even if you have the smartest, most talented people working for you, if you do not have a defined process in place to achieve your goal, nothing can help your cause.
Hence we introduced the “Data Rule Book”; a periodically updated reference guide for anyone to understand and keep themselves educated on the processes the Data/Ops team is involved with, our Bible.
- Don’t treat symptoms, look for the cure
When we start a process in Data / Ops, we do not just look at the immediate goal to be achieved. We try to understand how each of our actions affects the rest of the system, we stay proactive and pessimistic (in a good way) to ensure that we have always tried to understand ways where it could go wrong and try to find solutions to them before the problem occurs. This way, we are always one step ahead of the probable mess. It helps us respond faster to a problem than just waiting for one to happen and then react to it.
- Every employee is responsible for the quality
We have had a regular session where the whole company spends time in adding data into the system, with all its complexity. Every week, and at times, a few minutes of every day, we randomly play with our App and Data and flag errors. And we also take turns in correcting these mistakes as well. It is imperative that the different verticals of the company interact with each other and understand what the Tech and Data team does and how it impacts the others, especially since we are a Data-driven Tech company.
- Quality must be measurable
Even though it is difficult to DEFINE quality with definitive numbers, you always need to see how the process is being implemented and how and where the desired effect is being achieved. This helps us set our future goals, set strict timelines to them and work diligently towards realizing the same. We have implemented a system wherein every data-point is being analyzed related to its family of data and quantified against its creator (the Data Executives) to bring numbers into play. So we measure how long it takes to feed in a particular data, the average amount of time it takes for research, the validity of the data being fed in, and thus the quality of the same.
It can also be observed that a lot of the manual work of adding data, has a pattern to the thought process and if the same can be documented, the whole process can be automated; and thus a lot of time saved.
Like for eg, we learned that, by automating two of our Data Points (of a total of 25+ ones), we not only saved 8% of the time, but also ensured zero mistakes for the same.
- Quality improvements must be continuous
Quality is not something which can be implemented once and forgotten once the expected results are attained. Real-time improvements must occur continually to ensure that we stay the tides of time.
Every week the whole company meets up for what we call, a Margherita Meeting (We are all about food after all) where we review the numbers and discuss on areas of improvement on the current process. We ensure that everybody puts in their suggestions, and encourage crazy, out-of-the-box ideas.
- Quality is a long-term investment
If you look back at the 6 points mentioned above, you would notice that I have talked very less about data and outlined across multiple verticals and horizontals of HR, Recruitment, General Ops, Management, etc. To ensure the quality of data, you don’t just focus on data alone. Quality starts from scratch, way before you even start thinking about data. You do not just clean one room in your house and let the rest stay as is.
This is where the C of DMAIC comes into play. DMAIC stands for Define, Measure, Analyse, Improve and Control. It is a data-driven improvement cycle which is used for improving, optimizing and stabilizing business processes and designs, as Wikipedia puts it. A true investment in quality is for a lifetime, and it reaps its benefits in the long term than immediate.
Clean and well-marinated data ensures that the algorithm is being fed with supple information. This information, topped with a rich, creamy UX, garnished with some fresh, handpicked UI makes it an unforgettable meal for the user. And keeps them coming back for more.
What has the past year been like? A lot of learning and re-learning.
What does the future hold for us? A plethora of opportunities.
“The price of light is lesser than the cost of darkness” – Arthur. C. Nielsen