Big data and business intelligence trends 2017: Machine learning, data lakes and Hadoop vs Spark

  • Posted by: vadim.pavlovich
Big data and business intelligence trends 2017: Machine learning, data lakes and Hadoop vs Spark 1

As we sit on the cusp of 2017 we are still talking about organisations finally “operationalising” their data, namely: putting useful, actionable data into the hands of business users where and when they need it.

As the cost of data storage continues to fall and the availability of pre-wrapped SaaS analytics solutions proliferates, the opportunity to put insights into the hands of employees has never been cheaper or easier.

Big data and business intelligence trends 2017: Machine learning, data lakes and Hadoop vs Spark 2
© iStock

Here are some of the trends we are seeing on the horizon for 2017 in big data, analytics and business intelligence (BI).

Embracing machine learning

Analyst firm Ovum says machine learning will be the “biggest disruptor for big data analytics in 2017.”

Tony Baer’s Big Data trends report states: “Machine learning, which has garnered its share of hype, will continue to grow; but in most cases, machine learning will be embedded in applications and services rather than custom-developed because few organisations outside the Global 2000 (or digital online businesses) will have data scientists on their staff.”

Vendors now sell pre-packaged that make it easier than ever for organisations to apply machine learning to data sets so we can expect to see businesses continue to take advantage of predictive analytics, customer insight and personalisation, recommendation engines, fraud and threat detection.

Moving beyond Hadoop

The open source data storage solution Apache Hadoop has been the talk of the BI industry for the past few years, but viable alternatives are starting to come through alongside the popular framework, in particular Apache Spark.

The in-memory data processing engine has been hyped for some years now but as Baer notes in his report, the ability to deploy Spark in the cloud is driving adoption. He states: “The availability of cloud-based Spark and related machine learning and IoT services will provide alternatives for enterprises considering Hadoop.”

Although closely related, Spark and Hadoop are different products, and Baer notes there are pros and cons to both: “The debate rages because, if you eliminate the overhead of a general purpose data-processing and storage engine (and in Hadoop’s case, YARN), Spark should run far more efficiently. The drawback, however, is that standalone Spark clusters lack the security or data governance features of Hadoop.”

Read next: Hadoop vs Spark: Which is right for your business? Pros and cons, vendors, customers and use cases

Data visualisation specialists Tableau added that late Hadoop adopters can take advantage of self-service data preparation tools to get in on the action in 2017. The vendor said: “Self-service data prep tools not only allow Hadoop data to be prepped at source but also present the data available as snapshots for faster and easier exploration. We’ve seen a host of innovation in this space from companies focused on end-user data prep for big data such as Alteryx, Trifacta, and Paxata.”

Usable data lakes

The past few years has seen a drive towards having a single data source in the enterprise instead of multiple silos, making it easier to share insights across the organisation. Enterprises implementing a data lake – a large, unstructured data set – isn’t new for 2017, but this could be the year that they become properly governed and operational.

Ramon Chen, CMO of data management specialists Reltio, said: “Many companies who took the data lake plunge in the early days have spent a significant amount of money not only buying into the promise of low cost storage and process, but a plethora of services in order to aggregate and make available significant pools of big data to be correlated and uncovered for better insights.

“With existing big data projects recognising the need for a reliable data foundation, and new projects being combined into a holistic data management strategy, data lakes may finally fulfill their promise in 2017.”

Read next: Why simplicity – not speed – is key to enterprise Hadoop strategies

Baer from Ovum sees more organisations replacing Excel spreadsheet processes once data lakes are being used day-to-day. He said: “The common points of pain for data lake adopters are related to the inventorying and securing of data. Data preparation is a logical first step for organisations that are seeking to eliminate reliance on standalone Excel spreadsheets. As this capability has become widely available in offerings, ranging from data integration providers to functionality that is part of analytic and data science tools, we expect significant uptake in 2017.”

Read next: Is Workday Planning the Excel killer?

The enterprise still needs data scientists

The need for data scientists in the enterprise may be softening as more and more smart graduates are entering the job market, but demand isn’t going anywhere in 2017.

Read next: How to get a job as a data scientist: What qualifications and skills you need and what employers expect

According to Hired’s 2016 Mind The Gap report, data scientist salary offers rose by 29 percent in the past 18 months. The report also showed a 234 percent increase in interview requests for data engineers over the same period.

More self service BI

Aaron Auld, CEO at in-memory analytics specialist EXASOL, believes that self-service BI, where business users have direct access to analytics and insight, will continue to be a trend in the enterprise in 2017.

Read next: Ten of the best self-serve analytics and business intelligence tools for enterprises: What are the best alternative BI products?

He said: “Self-service tools are gaining ground in the enterprise and startups alike. As data analytics integrates itself further into the core of the business, there will be a shift towards the business diving into data analytics with databases, visualisation tools such as Tableau and data-prep tools such as Alteryx.”

Cloud-based analytics

Data visualisation specialist Tableau expects that more core data stores and analytics workflows will shift to the cloud in 2017: “With businesses moving their data to the cloud, the realisation that analytics should also live in the cloud will become mainstream.

“Next year, we predict that data gravity, in which all of the data that needs to be correlated for analysis moves to the location of the largest data set, will push businesses to deploy their analytics wherever their data lives. Cloud data warehouses such as Amazon Redshift will continue to be a popular data destination and cloud analytics will become more prevalent as a result.”

Streaming analytics

Streaming analytics is the practice of monitoring data as it streams into the organisation, instead of traditional batch analytics. This is particularly useful when monitoring the health of key infrastructure or machinery, which is why streaming analytics should continue to see traction in 2017, as more organisations look towards Internet-of-Things (IoT) deployments which demand it.

Ovum’s Baer notes that streaming analytics is decades-old, but open source technology has lowered barriers to entry. Now, with the proliferation of connected devices and IoT in the enterprise, especially in manufacturing and healthcare, streaming analytics could have its day in 2017.

Read next: Internet of things examples: 12 best uses of IoT in the enterprise

He said: “The reason for all this activity is the demand created by emerging IoT use cases; this is where realtime sense, analyse, and respond has spurred technology vendors to pick up where niche CEP (Complex Event Processing) left off.”

Conclusion: 2017 data trends

Big data remains a thorny issue for the enterprise, but the cloud is making it cheaper and simpler for the enterprise to do more with their data, without having to hire an army of data scientists.

With the major cloud providers like AWS and Microsoft releasing APIs for machine learning, and Google releasing its TensorFlow open source tool, 2017 should see what were previously considered advanced data processing techniques go mainstream.