In our latest contribution, Simon Crosby, CTO of SWIM.AI, asks if mobile networks are ready for the forthcoming data deluge, looks at the impact it might have on event processing and how best to handle all that streaming data in real-time
I bet you think this piece is about the rise of Netflix, YouTube, Disney+ etc. It isn’t. Sure, mobile and terrestrial networks are being swamped with video traffic consumed by both young and old alike. But I’m referring to the rise of streaming data from the edge – data from every consumer and industrial product imaginable.
By many estimates over 20 billion smart devices enter the market each year (that’s 2 million per hour), and they all have something to say. A lot to say, and all of the time. The stream of data heading into the mobile network from the edge to carriers and to Internet-facing enterprise apps and SaaS vendors is growing at an enormous rate. So in fact, it isn’t a stream, it’s a tsunami that won’t end.
For mobile operators, there are two opportunities that this data deluge presents. The first is to use device data and network status data to gain real-time insights into network performance, handset performance, user experience, network traffic issues and outages to deliver a more robust network and to improve customer satisfaction.
The second is to use their position at the edge to deliver “Edge Cloud” services that help tame the flood of data before it hits cloud service providers. In this scenario, operators will host edge cloud services on computers close to data sources. The opportunity has never been greater than with the introduction of 5G networking – where operators can offer enterprise customers secure, private slices of network capacity with access to real-time edge computing capabilities with low latency, enabling them to deliver smart cities, smart grids, and tailored enterprise-focused offerings. Vendors have spotted this opportunity too: Ericsson “Edge Gravity” is one example.
What’s needed to succeed across these two areas is for operators to both become fluent in the language of cloud-native messaging services such as those offered by AWS Kinesis, Azure Event Hubs & Enterprise Service Bus, and Google pub/sub, Apache Kafka & Pulsar, Spark etc, and become fluent in open source platforms for real-time stream analysis.
The key to deriving insights from the edge may be in supporting pub/sub messaging. One can argue that pub/sub messaging is a new “dial tone” for both consumer and enterprise-focused service providers. Delivering a platform that helps companies securely scale messaging from edge devices is an important service offering, and just as importantly, adopting cloud-native software architectures is crucial for operators to master in order to deliver customer service and understand the state of their networks, in real-time.
Pub/sub messaging enables an unknown number of publishers to deliver asynchronous messages to subscribers – which are the applications that process it – without either of them needing to know the identity of the other. In pub/sub, sources publish events for a topic to a broker that stores them in the order in which they are received. An application subscribes to one or more topics and the broker forwards matching events.
READ MORE: Watching the Big Data Throne
Apache Kafka & Pulsar, and the Cloud Native Foundation’s NATS are pub/sub frameworks that are rapidly becoming the de-facto standard for messaging in the cloud era. Pub/sub is offered as a cloud service by all major clouds, so one might question whether or not mobile operators should enter the fray. I think there are a couple of reasons for them to invest:
- First, mobile operators have points of presence that are closer to the network edge, and that can, therefore, offer infrastructure to process events and respond in real-time.
- Second, pub/sub messaging will be a key component of a future real-time operations platform within every mobile operator
For use cases in traffic prediction, routing and any interactive service, the response time is critical. Using real-time messaging to “Edge Cloud” application micro-services can save hundreds of milliseconds of event processing time. For a real-time stream processing framework such as Apache Samza or SwimOS, getting hold of events fast is key to real-time analysis, learning and prediction to drive visualizations and automated responses.
For the second, one can consider subscribing to events at a broker to be the streaming equivalent of the database-era “SELECT”. App dev teams can independently subscribe to and write apps for different event topics. All apps, from customer care to predicting outages in network equipment, are feasible when all events are reported in real-time.
Streaming data contains events that are updates to the state of applications or infrastructure. When choosing an application architecture to process it, the role of a data distribution system, like Kafka or Pulsar, is limited. Take into consideration:
- Data is often noisy – Many real-world systems are noisy and repetitive; large numbers of data sources add up to a huge amount of data. If raw data is delivered as a stream of pubs from the edge, the transport cost can be huge.
- State matters, not data – Streaming environments never stop producing data – typically real-world events – but analysis is dependent on the meaning of those events, or the state changes that the data represents. Even de-duplicating repetitive updates require a stateful processing model. This means that the stream processor must manage the state of every object in the data stream.
So how do enterprises write apps that consume pub/sub-events? Smart service providers will support application frameworks that take much of the pain out of application delivery. They will need these capabilities internally, so whether or not they offer them as service capabilities very much depends on their appetite for competition with the major cloud vendors.
Now For Stream Processing
The category of a software platform that can enable developers to quickly create, deploy, scale and manage an application that consumes data from the edge is called “stream processing”.
Stream Processors are application runtime platforms that support applications that consume events from brokers, the real-world and even change-logs for database systems, then they process them, and deliver real-time insights to users, and other applications.
Stream processing can involve both the simplest and the most complex kinds of analysis. At the simplest, streaming “transform and load” (STL) – the streaming equivalent of “extract, transform and load” from the store-then-analyze era – simply takes events, transforms and labels them, and delivers them to a cloud data lake like Azure Data Lake Service (ADLS). This is not necessarily even stateful.
“Organizations need to look for solutions, from third parties or delivered through service providers, that support stream-centric unsupervised learning and prediction that avoids the complexity of model training and deployment in the cloud.”
At the other end of the spectrum, stream processors drive complex analytical processes including real-time analysis, accumulation, learning and prediction. Some analytical frameworks use Apache Spark or Flink. On the other hand, leading stream processors offer a powerful set of analytical functions “in the box” but can also be used to drive more complex analysis using other frameworks like Spark.
Organizations need to look for solutions, from third parties or delivered through service providers, that support stream-centric unsupervised learning and prediction that avoids the complexity of model training and deployment in the cloud. By utilizing this technology, service providers and businesses throughout industries can derive important insights that can help save time and expense and deliver far better customer service.