Real-Time Real Talk: Streaming Data 101

Comments Off on Real-Time Real Talk: Streaming Data 101

16046_Real_Time_Real_Talk_V3-01“Real-Time Real Talk” is an ongoing blog series that seeks to clarify the “what”, “why”, and “how” behind ad-tech innovations.

Today’s “Real-Time Real Talk” blog gives you a 101 overview of streaming data. We’ll cover the basics of what streaming data is – and isn’t. We’ll also walk you through some ways that buyers and sellers put streaming data insights to good use. Finally, you’ll learn about some of the cool data-streaming facts and features on AppNexus’ own platform.


Let’s start with the obvious. What exactly is streaming data?

The short answer to that question is streaming data (sometimes referred to as “event-based data”) is data that gets recorded in real time. But the thing is, there’s no “gold standard” for real time. It’s like trying to pin down the difference between an “instant” and a “given moment.”

The best way to define streaming data might be in saying what it isn’t. You can contrast streaming data with batch data processing, where businesses wait for a certain amount of time (hours, days, or weeks) and data inputs to accumulate before compressing it all into one condensed set. Unlike batch data processing, streaming data has little in the way of latency about it (unless you want to get super-technical and argue that yes, there’s latency that exists even within a microsecond).

To understand more about the shift from batch to streaming data, check out our post on Medium.


Let’s take things back to ad tech. How does streaming data benefit buyers?

Streaming data helps buy-side marketers in a number of different ways. But let’s get down to what we think are three major benefits for buyers:

  • Event-based campaign decision-making: you’re able to see your advertising campaign happening in real time, which gives you the ability to react to market fluctuations moment-by-moment, rather than hour-by-hour. If your budget isn’t going to the right sites or targeting the right audience segments, you can shut off that wasted ad spend, and shift it to other platforms, devices and formats where things are looking profitable. By gathering continual, event-based data on your brand audiences, you eventually arrive at statistical correlations that you can then reapply into a “closed feedback loop.” Armed with this level of insight you can customize subsequent campaigns to reflect different user experiences – and win over more customers.
  • Real-time visualization: When data is continually refreshed each moment, there’s the opportunity for dynamic insight, something that marketers are increasingly depending on to make smarter, faster decisions. With more accurate analysis comes a better likelihood of reacting to things that are happening in real-time. For instance, imagine you’re running a multi-channel campaign on a global scale: you’ll likely want to see geographically where your impressions are serving; which site domains or devices are seeing the most engagement; and perhaps most importantly, how audiences are interacting with your campaign and in what ways. You’re able to this dynamically with streaming data, instead of parsing through data retroactively to develop static, periodic views of activity in batch processing.
  • Rich analysis of real-time datasets: Real-time visualizations also help over the long run in terms of strategic analysis and better overall decision-making. And unlike batch data, real-time visualizations usually require a much smaller dataset sample to arrive at these kinds of decisions. Instead of waiting for an entire batch dataset to process, you can monitor the “rolling average” of a chosen data point as it streams and draw conclusions faster.
  • Pricing benefit: If you have constant insight into your campaigns, you can make better pricing decisions that can keep your campaign budget on track. And if you’ve already gained the relevant information you need to get from a stream through the right data analysis, you can make decisions over the stream a lot quicker – and more cheaply – than batch data processing would ever allow.


Sounds like the buyers are sold. How about sellers? How can streaming data help them?

Sellers stand to gain in a vast number of ways from streaming data; here are a few examples you might find interesting:

  • Invalid traffic detection and viewability: streaming data gives sellers the ability to see whether actual people are viewing their inventory… or not. For those sellers who like to keep a close eye on their inventory, they can now act quickly against any event that might look like a cause for concern.
  • Real-time visibility into inventory: basically, sellers can gain clarity into how much their inventory is worth to buyers during any given instant. Over time, this level of visibility lets them identify peaks and troughs that can help them adjust their pricing. By establishing correlations between minute data points, sellers can begin to anticipate when the right moments are to increase their prices to respond to demand, and when the right moments are to make their inventory more affordable to attract more buyers.
  • Long-term relationships with particular buyers: by watching the spending patterns of buyers over the long run, sellers can more easily identify particular buy-side partners they’d like to do business with in the future. Sellers can reach out to buyers in real time and propose private marketplace deals where both parties stand to gain the best of all possible brand exposure (for buyers) and inventory yield (for sellers).


Impressive stuff. So what can clients do to get the most out of streaming data?

First of all, clients and partners might need to change their mindset about how data gets processed. Unlike back in the latency-ridden days of batch processing, clients won’t have to be retroactive in how they respond to event-based data occurrences. They’ll now have the option to proactively analyze the data as it’s streaming by, and adjust the stream so that it can run analyses, reports, and visualizations from the get-go.

Clients also should take time to look for data scientists with engineering backgrounds. Candidates with engineering credentials can understand how to work with the stream tools and apply data science to them. From a tech perspective, good data science candidates ought to be knowledgeable in major processing technologies like Spark Streaming, Samza, Storm, Apache Flink, Google Dataflow, and Kafka.

To get the technical 101 behind streaming data, check out our recent tech talk on “Real-Time Big Data”.

From a data science standpoint, the ideal sorts of candidates need to be proactive in their analysis of real-time data. They should prepare themselves in advance to know what sorts of trends and correlations to be looking for as the data rolls in. After all, this isn’t batch processing we’re talking about; streaming data never sleeps. There’s no time to lose in analyzing event-based data points as they accumulate into actionable correlations. That being said, batch data reports are still useful in many ways for specific business needs, and data scientists should be versed in understanding how to convert streaming data into batch data for historical reference and long-term “data mining.”


Sounds like good advice for clients. Speaking of which, why should clients put AppNexus’ streaming data services to good use?

Well, first off, we’re one of the few — if not the only — companies in ad tech that can seamlessly offer streaming data services to our clients and partners. Additionally, our stream is jam-packed with data points that cover impressions, clicks, and conversions.

But beyond these obvious (and significant) benefits, there are two major selling points we can also provide:

1) Speed: Here’s an important point any client should probably know about our platform: we stream data to you incredibly swiftly. That’s because 70% of our platform’s served impressions show up in your real-time data feed in less than 10 seconds, while 90% arrive in less than 30 seconds. By keeping a running tally of the data and analysis you care about, you can check in and see how things are going instantaneously, rather than having to wait for some large batch job to be finished. This speed allows you to process not just data – but also insights – much more quickly.

2) Expense: Unlike batch processing, stream processing doesn’t require you to gather all the raw data and run algorithms over it. Whereas in batch processing you have to compress all your information into separate files and sums and crunch a lot of unnecessary numbers before arriving at a set of data points, streaming data saves you resources since you don’t have to aggregate and write large amounts of data to disk, and can keep running tallies of the metrics you care about in memory. And since you’re able to calculate these sums as soon they appear in the stream, you also save money, since you don’t encounter the disk round-trip normally present in batch-based processing.

To learn more about AppNexus’ streaming data solutions, drop us a line or contact your AppNexus representative.


Filed under Real-Time Real Talk.

Comments Off on Real-Time Real Talk: Streaming Data 101

Comments are closed.