You probably already know that leading analyst firms have been quoting data lake failure rates of 85% for some time now.
You may not be aware that one of those same leading analyst firms are now also forecasting that, by 2020, 30% of data lakes will be built on standard relational DBMS (database management system) technology “at equal or lower cost than Hadoop” because – and I quote - “application performance is superior” and “most data going into data lakes is relational.”
Put those two things together and you start to understand why MapR has recently gone to the wall. And why Cloudera is under so much financial stress.
With many organisations having invested tens and even hundreds of millions of dollars in data lakes that deliver little or no business value, it’s way past time for some brutal self-assessment in the technology industry.
Many data lakes have failed because they were IT-led vanity projects, with no clear linkage to business objectives and operational processes. If the strategy for your failing data lake is to lift-and-shift it lock-stock-and-barrel from Hadoop to an object store, then you are about to flush more millions down the pan - to say nothing of the opportunity cost associated with several more wasted years. Unfortunately, I know from personal experience that this is absolutely the plan in several large organisations that really ought to know better.
Failed data lakes often represent a toxic combination of both poor technology choices and an inadequate approach to data management and integration. If you think that data management begins and ends with ACID (Atomicity, Consistency, Isolation, Durability) compliance – as at least one of the cool kid vendors that e-mails me regularly seems to – then pick any technology platform you like, so long as you do it quickly. If you are going to fail anyway, you may as well fail fast.
Better yet, develop a data strategy that includes a layered data architecture, a minimum viable product approach to data integration (we call that “Light Integration”) - and an agile, incremental approach to the more robust integration of the data that matter most. That gives you a fighting chance of optimising end-to-end business processes and delivering real business value.
Much of the complex, multi-structured data that today sits unloved and unqueried in Hadoop-based data lakes will ultimately reside in object storage. At Teradata, we recognize this – hence our focus on enabling robust access to object stores
. But much of your structured and semi-structured interaction data belongs in your existing data and analytics platform, where they can be seamlessly integrated with the transaction data you already manage there.
Don’t just take my word for it, ask the analysts.
Not every data lake is a data swamp – and like all technologies, the Hadoop stack has a sweet spot. But the tide of history is now running against data silos masquerading as integrated data stores, just because they are co-located on the same hardware cluster. And that same tide is running against a distributed file system and lowest-common denominator SQL engine masquerading as a fully-fledged analytic DBMS.
If you are doubling-down your investment in Hadoop, you are swimming against that tide. And if you are betting on a fashionable-but-unproven technology to get you out of a data management hole, then you aren’t learning from recent history – you are condemning yourself to repeat it. But if you are ready to move on and look forward, talk to us about the industry’s leading integrated data and analytic platform, Teradata Vantage