Real-time Data Streaming: What is it and How does it work?

Data streaming in real-time has seen an exponential increase, and more than 80% of organizations report that real-time data streams are critical to building responsive business processes and improved experiences for their customers. Data streaming helps companies gain actionable business insights, migrate and sync data to the cloud, run effective online advertising campaigns, and create innovative nextgen applications and services. But in order to act on events and data as soon as they happen, you need a data infrastructure built for real-time streaming.

The need for real-time data

When a business runs in real-time, the need for real-time data becomes increasingly apparent. Use cases we see around security/threat management, customer activity tracking, and real-time financial data are all excellent examples of this.

Health care organizations are increasingly relying on real-time data when making decisions about patient care. IoT sensor analytics, cybersecurity, patient communication, insurance, research, and many other domains are impacted by real-time data. This data needs to be analyzed immediately, and is often transformed before reaching the target stores (i.e., real-time ETL). Real-time data streaming is therefore an integral part of modern data stacks.

Common Streaming ETL Use Cases

360-degree customer view

A common use case for streaming ETL (also called real-time ETL) is achieving a “360-degree customer view,” particularly one that enhances real-time interactions between businesses and customers. An example of this could be when a customer uses the business’ services (such as a cell phone or a streaming video service) and then searches their website for support. This data is sent to the ETL engine in a streaming manner so that it can be processed and transformed into an analyzable format. Raw interaction data alone may not reveal insights about the customer that could be gained from ETL stream processing. For example, the interactions might suggest that the customer is comparison shopping and might be ready to churn. Should the customer call in for help, the call agent has immediate access to up-to-date information on what the customer was trying to do, and the agent can not only provide effective assistance but can also offer additional up-sell/cross-sell products and services that can benefit the customer.

Credit Card Fraud Detection

A credit card fraud detection application is another example of streaming ETL in action. When you swipe your credit card, the transaction data is sent to or extracted by the fraud detection application. The application then joins the transaction data in a transform step with additional data about you and your purchase history. This data is then analyzed by fraud detection algorithms to look for any suspicious activity. Relevant information includes the time of your most recent transaction, whether you’ve recently purchased from this store, and how your purchase compares to your normal spending habits.

Streaming Architecture and key components

Streaming ETL can filter, aggregate, and otherwise transform your data in-flight before it reaches the data warehouse. Numerous data sources are readily available to you, including log files, SQL databases, applications, message queues, CRMs, and more that could provide valuable business and customer insights.

Stream processing engines use in-memory computation to reduce data latency and improve speed and performance. A stream processor can have multiple data pipelines active at a given point, each pipeline comprising multiple transformations. Each transformation leads to another transformation in the chain, with the result of this chain serving as input for the next transformation. There can be a wide variety of data producers, such as Change Data Capture (CDC), a technology that captures changes from data sources in real-time, as well as a wide variety of consumers, such as real-time analytics apps or dashboards.

The goal is to achieve a streaming latency of 1 second or less for over 20,000 data changes per second for each data source.

Data transformation during stream processing

The aim of streaming ETL or stream processing is to provide low-latency access to streams of records and enable complex processing over them, such as aggregation, joining, and modeling.

Data transformation is a key component of ETL. The transformation includes such activities as:

  • Filtering only the data needed from the source
  • Translating codes
  • Calculating new values
  • Splitting fields into multiple fields
  • Joining fields from multiple sources
  • Aggregating data
  • Normalizing data, such as DateTime in 24-hour format

When working with streaming data, it is often necessary to perform real-time data transformation in order to prepare the data for further processing. This can be a challenge due to the high volume and velocity of streaming data. The task can, however, be accomplished through the use of a number of techniques.

Data filtering refers to the act of limiting what data should be forwarded to the next stage of a stream processing pipeline. You may want to filter out sensitive data that should be handled carefully or that has a limited audience. In addition to data quality and schema matching, filtering is commonly used to ensure data quality. Finally, filtering is a special case of routing a raw stream into multiple streams for further analysis.

In some cases, a stream may still need to be restructured using projection or flattening operations after it has been transformed into structured records. These kinds of transformations are most commonly used to transform records from one schema into another.

Conclusion

Streaming ETL has emerged as the most efficient, effective method of real-time data integration when transformations are required, and supports critical business use cases by integrating with business intelligence products, AI, machine learning, and intelligent process automation (IPA) workflows.

Learn more about streaming ETL by downloading our whitepaper here.

4 Questions You Should Ask About Change Data Capture (CDC) Tools

Change Data Capture (CDC) describes a set of techniques to capture changes from data sources in real-time. CDC is widely considered the lowest-overhead and most effective way to capture changes asynchronously without any change to the source – a critical requirement for enabling real-time analytics and decision making.

Equalum’s approach to data ingestion is based on the timely identification, capture, and delivery of the changes originally made to enterprise data sources, allowing organizations to quickly acquire analytics data to make actionable business decisions that ultimately result in innovation and time/money saved.

CDC is a well-known methodology, and there are several legacy tools that leverage these techniques. These solutions are a natural starting point for data teams looking to empower their business with real-time insights.

But CDC technologies vary widely in their ability to meet critical business needs. Without proper vetting, buyers may find that legacy CDC tools fall short of delivering on essential business use cases – and don’t scale in an ROI-positive way as an enterprise’s needs evolve beyond a specific replication use case to enterprise-wide adoption. Picking the right CDC vendor and the right CDC tool can help you save time, money, and avoid costly headaches.

How can you ensure that you’re getting the best possible change data capture solution that will meet your current needs, and is also scalable enough to meet your needs down the road? Here are four questions you should ask when choosing a CDC vendor:

1. What can your CDC solution do beyond just data replication?

Most legacy Change Data Capture (CDC) tools are built primarily for data replication (reproducing data “as is” from a source to a target). However, businesses often rely on in-flight transformation to make the data useful for analysis. Transformation – which include aggregation, filtering, joining across multiple data sources, or otherwise modifying data – are essential for businesses looking to extract insights from raw data. While legacy CDC tools may offer limited transformation as an add-on for specific replication use cases, savvy buyers should look for robust, full-scale ETL capabilities for streaming data – which help enterprises harness the full power of data in motion to power better, faster decision-making.

The Equalum difference:

Equalum provides in-flight data transformation capabilities that go well beyond replication, including source data modification, data computations and correlation with other data sources. Our platform uses a zero-coding, UI-based approach with limited to no overhead on data sources.

2. What data sources and targets does your CDC tool support?

CDC tools typically provide out-of-the-box integrations to major legacy and widely adopted enterprise databases, but may offer very limited support for new databases and non-database sources and targets. Data-driven enterprises often need to harness the data in applications, files, APIs, message queues, and more. Data from a broader range of sources means a more complete view of your customers and how your business is performing. Data teams should ensure that a solution for in-flight data supports multiple sources and delivery to all your relevant targets.

The Equalum difference:

Equalum offers seamless integration with legacy, newer database technology, and non-database sources. Equalum’s sources include Oracle, SQL Server, Postgres, SAP, and Salesforce. We deliver to a wide range of targets including Snowflake, Azure, and other data lakes and warehouses.

3. How much coding and expertise does your CDC solution require?

Some leading CDC tools require extensive coding that is time-consuming and inconvenient, and offer no UI whatsoever. Other tools include limited UI, but the overall UI/UX is antiquated and doesn’t offer drag and drop and other features that make it easy to build data pipelines and monitor your data flows. Consider avoiding CDC tools that create extra work for your overburdened data teams, and require hours of coding to setup and maintain.

The Equalum difference:

Equalum’s CDC tool includes drag and drop UI with no coding required. This allows engineers to deploy our tool in minutes, instead of days or weeks, and frees up data teams to focus on other projects.

4. How much is this going to cost me?

The pricing model for legacy CDC tools – which typically involves a per-server or CPU usage-based fee – is a good option for small, isolated scenarios of data replication. With this model, costs can add up quickly when a business looks to leverage the technology across a wider range of applications. Data analysts charged with driving widespread usage of real-time insights should look for solutions that offer a license model designed to enable businesses to scale for enterprise-wide adoption without costs increasing dramatically.

Ultimately, solutions that leverage change data capture technology vary widely in their cost, level of technological sophistication, and their ability to address meaningful challenges for data engineers and company-wide data infrastructure. The most successful enterprises are those that are looking for a scalable, end-to-end CDC solution that will meet the needs of business stakeholders and harness the power of data in motion to empower faster and better decision-making.

Equalum secures $14 million Series C for data integration solution

Meir Orbach / Calcalist

Equalum, which provides data integration and ingestion solutions, announced on Thursday that it has raised $14 million in Series C financing. The investors in this round include Planven, United Ventures, Innovation Endeavours, Saints Capital, and the company’s newest partner, SpringTide Ventures. Total fundraising now stands at $39 million.

“We are a platform that continuously and natively supports all use cases under a single unified platform without the need for custom coding,” Equalum CEO Guy Eilon told Calcalist. “The company’s technology provides in one no-code platform the tools provided in three or even four other platforms. Since each part of our platform has many competitors, we work mainly with large organizations in order to simplify their data integration.”

Guy Eilon Equalum

Equalum was founded in 2015 by Nir Livneh, who served as CEO until last year. “I was brought to the company in order to address its go-to-market strategy and in nine months the company has doubled its revenue and number of clients,” said Eilon.

Equalum Launches CDC Connect Partner Program

Equalum is making news! With the launch of our new OEM program, CDC Connect, technology partners can now integrate our industry-leading change data capture (CDC) tool on their platform or workflow, and benefit from our cutting-edge CDC capabilities. “Many data integration vendors lack strong CDC, said Kevin Petrie, an analyst with Eckerson Research., “so they might benefit by building Equalum CDC Connect into their offerings.” Read more about it at Tech Target.

Equalum CDC Connect embeds change data capture and streaming

By Sean Michael Kerner

Change data capture platform provider Equalum is looking to change the way that organizations use and integrate its technology into data platforms with the CDC Connect program it unveiled on Tuesday.

It has been a busy year for Equalum. In April the vendor released version 3.0 of its Continuous Data Integration Platform, which includes change data capture (CDC), extract, transform, load, and real-time data streaming capabilities. In August, Equalum raised $14 million to expand its go-to-market and technology efforts.

Until now, Equalum has sold its technology, which is available for on-premises and cloud deployments, as a standalone service that organizations and vendors could use as part of an existing data workflow for business operations or data analytics. The CDC Connect program will enable vendors to integrate Equalum inside an existing data or cloud platform as part of another vendor’s larger data workflow offering.

Equalum competes against a number of vendors that offer CDC capabilities including Fivetran, which acquired CDC vendor HVR in 2021; Arcion, which launched a cloud service in April 2022, Oracle and its GoldenGate software platform; and the open source Debezium technology.

Until now, Equalum has offered high-performance integration of data in bulk and in real time, building on Spark and Apache Kafka capabilities as well as CDC.

Now Equalum wants to address a new opportunity, offering its core CDC technology as a white-label component to other data integration vendors, said Kevin Petrie, an analyst with Eckerson Research. The approach offers a way for Equalum to increase its addressable market in a crowded landscape for data integration tools, he said.

“CDC helps replicate high volumes of transactional data from traditional databases, at low latency and ideally with low impact on production workloads,” Petrie said. “Many data integration vendors lack strong CDC, so they might benefit by building Equalum CDC Connect into their offerings.”

The original idea behind CDC technology was to provide a way for organizations to get data out of one database and into another as it changes. It’s an approach that can be enabled today by a variety of methods.

One of the most common ways for CDC to work is by connecting to a database log to capture changes that can then be replicated out to an external source.

Equalum’s platform uses an approach called binary log parsing, a technology that is able to rapidly read and understand all the changes happening in a database.

Equalum uses event streaming technologies including Apache Kafka to stream data from a source database to a target in real time. Going a step further, Equalum integrates data transformation capabilities so data can be converted into the required format and structure for the target database.

Equalum CEO Guy Eilon said the vendor’s goal is to enable organizations to get data from one location to another in a format that makes the data immediately usable for business operations or data analytics.

Why Equalum wants to embed its technology with CDC Connect

The challenge Equalum’s customers often face is managing multiple data technologies in order to enable a data application for business operations, machine learning or data analytics, Eilon said.

With CDC Connect, Equalum’s technology can be embedded into a larger data platform, so users will never actually see the name Equalum.

Eilon declined to immediately disclose the large data platform vendors that he said plan to use CDC Connect.

“The biggest challenge int he data integration world right now is definitely how to move data in real time from legacy and cloud systems towards a single place so users can get answers in real time,” Eilon said. “That is what Equalum is all about.”

Ready to Get Started?

Experience Enterprise-Grade Data Integration + Real-Time Streaming

Get A Demo Test Drive