Enhancing ETL by turning to real-time data streaming. 

The role of data in today’s organizations is indisputable. Data not only informs decisions across all areas of the business, it is also increasingly used to automate processes, as businesses ‘become software’. That said, the complexity of managing fragmented data is also rising. 

A recent IDC survey found that 79% of organizations are using more than 100 data sources, and 30% use more than 1000 sources. Many CDO admit to spending more than a third of their time tackling day-to-day management of data, as opposed to using data to drive strategy and innovation. 

As Enterprises with a high level of data maturity generate 250% more business value, it has never been more important to execute data cleansing, enrichment and processing across all types of data; transactional, operational and analytical.

Increasing data maturity requires a level of data leadership, which goes hand in hand with digital leadership. Real-time data pipelines have become a necessary standard with businesses expected to drive data-maturity as a prerequisite to using AI and ML. In other words, data transformation is critical to digital transformation. This is why data leadership is so crucial, as it will enable internal teams to address the primary challenge of fragmentation and complexity, and ultimately generate higher levels of business value. 

The challenge, however, for those who want to modernise and elevate their services is linking all the data together and making it accessible, in real-time. Traditionally a lengthy three-step process has been used to consolidate data from multiple sources – Extract, Transform and Load (ETL). But this tends to work in batch and hasn’t always delivered the required results. Some solutions have switched the process to ELT – Extract, Load and Transform the data. We are even seeing reverse ETL. Now, with the rise of setting data in motion, we see the industry shifting towards streaming ETL with real-time stream processing. 

Setting data in motion

ETL (Extract, Transform and Load) is a three-step process used to consolidate data from multiple sources. At its core, ETL is a standard process where data is collected from various sources (extracted), converted into a desired format (transformed), then stored into its new destination (loaded). 

ETL is not a new concept. In fact, it has evolved since the 1970s and ‘80s, where the process was sequential, data was more static, systems were monolithic, and reporting was needed on a weekly or monthly basis. 

As customer expectations and backend operations have moved towards a more real-time world, with many business processes set in software we have seen batch-processed ETL move to streaming ETL. With streaming ETL, data is automatically extracted and transformed, or acted upon, then loaded to any destination, almost as soon as it’s created, which enables businesses to automate processes – removing people from the critical path – and operate with scalability, security, on an optimal infrastructure, which most likely includes the cloud

Streaming ETL in practice

Real-time data is a key element for both new and high-performing legacy brands that rely on consistent flow and streams of data in order to respond to their customers’ continuously evolving expectations. Rather than letting data sit in a static database, the data itself can trigger an action or analysis in real-time. In many cases, this ‘setting data in motion’ can open up new value opportunities that were not possible with static data in more traditional databases, using request-response type architecture. 

Technology leaders such as Uber, Ebay, Netflix and Yelp have already adopted a real-time data streaming approach and architected themselves around data-streaming platforms. 

Real-time stream processing has also been successfully implemented across a range of more traditional industries. For example in financial services, banks continuously search for ways to become more relevant to today’s customers. Consumers can no longer imagine banking without real-time push notifications initially brought to the market by challenger banks. Traditional banks are also expected to offer additional intelligence, enabled by data, such as tracking finances and support in budget planning, based on past buying patterns and life objectives. 

Or take retail. Businesses want to merge data from website interactions, mobile apps and in-store experiences, so they can offer real-time, contextualized and highly targeted offers. Moreover, with real-time data they can capture post-sale feedback and returns, or further upsell and cross-sell products and services. 

Ultimately, for a regular customer, it’s difficult to imagine what these services would look like if they didn’t leverage the power of real-time stream processing, but there are many more businesses that can tap into data to become digital-first. 

A data approach to digital transformation

While developing a digital transformation strategy that fully leverages the value of data isn’t easy, many businesses are realizing this necessity. Getting this right means companies can use the power of the network effect to drive further data synergies; As more parts of the business consume various data sources, they will also produce more data, which in turn results in more data consumption. And so on. 

Traditionally, data was used to serve a product or solution. For instance, with a customer relationship management platform, the data’s main purpose was to serve that platform. However, with the ability to access real-time data, we’re seeing a shift in this relationship. Products or business solutions are now creating data, which can become a product in itself. Therefore, instead of data only serving the solution, the solution serves the data as well. 

Real-time stream processing is modernizing this old way of working with data. It gives people real-time access to information as events happen, with ever-increasing levels of contextual intelligence. A data streaming platform can also react to events and carry out the task directly, bypassing the human. 

Nowadays, data is at the heart of every modern business. Traditional organizations are augmenting their legacy architectures to satisfy real-time requirements and simplify operations at scale. In order to elevate how data is being used, companies need to create new synergies in order to fully unlock the data potential. Moving from ELT to streaming ETL will enable organizations increase their data maturity and get ahead of the pack. 

Lyndon Hedderly

Principle Business Value Consultant at Confluent.

Rise of the machines.

Ahsan Zafeer • 26th November 2022

Ahsan Zafeer covers topics related to tech and digital marketing and tweets @AhsanZafeer. Here he explains people’s fears as to why machines are taking over their jobs.