At its core, the story of online ad fraud is a cat-and- mouse game between bad actors and those of us who work to thwart them. Every time that brands, publishers, and vendors implement a new preventative measure, fraudsters develop a still more sophisticated method of stealing ad budgets intended for legitimate human audiences.
Though fraudsters use a variety of tactics to carry out their theft, the common denominator is that they profit by selling phony web traffic under false pretenses. In many cases, these schemes involve charging brands and/or publishers to access audiences that don’t actually exist. While the methods are always changing, popular ones today include:
- Browser or device hijacking programs: malware that takes over a user’s browser and navigates to specified websites without the user’s knowledge.
- Bot networks: server-based browsers that masquerade as human users — in recent years, this tactic for generating non-human traffic has become more widely-used than the browser hijacking malware described above.
- Ad-stuffing: a tactic in which the publisher fills a web page with invisible ads the user can’t see.
While it’s impossible to know exactly how much money the bad guys have made off with, the Interactive Advertising Bureau (IAB) estimates that ad fraud cost our industry $8.2 billion in 2015. Meanwhile, advertising verification firm Adloox predicts that this number could be as high as $16.4 billion in 2017. These numbers are inherently imprecise, as there’s no single indicator that guarantees an impression is fraudulent — but these estimates make it clear that this is a huge problem for our industry.
Beyond the sheer magnitude of this theft, fraud presents an existential threat to the digital ecosystem because it undermines brands’ trust in the programmatic marketplace. According to a recent report from the Chief Marketing Officer Council and Dow Jones, 72% of programmatic advertisers are concerned about brand safety and control in the programmatic marketplace.
How a complex supply chain gives cover to fraudsters
With so much riding on the line, you may be wondering why no one has developed a solution to eliminate ad fraud and end the cat-and-mouse game once and for all. The answer lies in the complexity of the programmatic supply chain, a multi-faceted ecosystem that Procter & Gamble CMO Marc Pritchard famously described as “murky at best and fraudulent at worst.”
In every programmatic transaction, brands are separated from the end user by a multiplicity of agencies, technology vendors, and ad networks — all of whom provide vital information about the impression the brand is purchasing. With so many hops along the chain, it can be extremely difficult for anyone involved in the transaction to ensure that the rest of their business partners are acting ethically. All it takes is one deceitful actor to perpetuate a multi-million dollar heist.
Just recently, Buzzfeed uncovered one such scheme, in which a digital media agency used device-hijacking software to drive millions of fake visitors to a network of low-quality websites. Along the way, the agency managed to steal from the ad budgets of Disney, Gillette, and over 100 other brands.
What AppNexus is doing to clean up the marketplace
Despite the obstacles in our path, AppNexus is deeply committed to rooting out ad fraud and building a more trustworthy digital marketplace.
Over the past three years, we’ve invested a substantial amount of time, money, and resources to stay one step ahead of the bad actors who threaten our clients. Today, we have 30 fraud detectors running at all times. By collecting all of the data associated with each impression – from the time the user opens a page to when the ad is served – we’re able to see when something is amiss. Plus, we go the extra mile to better understand how fraud works by analyzing the malware these bad actors use in a safe, “sandboxed” research environment. This allows us to see how the malware behaves in its attempts to mimic human behavior and to track which sites it visits, which lets us ascertain that those sites are most likely buying non-human traffic.
This is just the beginning. As the world’s leading independent ad tech platform, we see hundreds of billions of impressions every day, each of which is logged and analyzed by our anti-fraud detectors. In order to stamp out the next generation of bad actors, we decided to use our unique vantage point to study where fraud is happening today and which new tactics are being used to steal advertiser budgets.
Establishing these instances and assigning blame isn’t easy, since there’s no single, definitive indicator that an impression is fraudulent. But with the latest machine learning techniques, our data scientists can find trends and patterns that point to a high likelihood of fraudulent activity. Some of those techniques include:
- Cluster analysis: a foundational data analysis technique that involves grouping data points together based on key similarities. Cluster analyses helps us find suspicious commonalities between impressions generated by shady traffic sources.
- Covisitation: a method of identifying overlaps in traffic between different web sites, which helps us establish links between sites getting questionable traffic from the same sources. This study from researchers at NYU and Dstillery (formerly Media6Degrees) offers a more in-depth explanation of how covisitation can uncover instances of ad fraud.
- Honeypots: a method by which we draw bots into the open by creating a fake site and then sending traffic to it from suspicious vendors. We can then analyze for signs of non-human visitors.
Using these methods, our data science team has found that fraudsters are using increasingly sophisticated tactics to fool fraud detection mechanisms. Perhaps most interestingly, they’ve also uncovered a meaningful overlap between viral content — including the subcategories of fake news and hate speech — and fraudulent activity.
This post is an excerpt from our latest report on inventory quality. Download the whole thing in just two clicks to learn more about ad fraud, how we’re fighting it, and the connections we’ve observed between non-human traffic and viral content.