Multi-Layered Machine Learning: A New Requirement for Sophisticated Bot Protection

Why DataDome’s Use of Machine Learning Is Unique

As AI has entered the mainstream, companies have been quick to tout AI claims, making it increasingly harder for security professionals to determine which solutions actually use AI, and which solutions are throwing it around to make their solution sound more sophisticated than it is.

There are a few key things that set DataDome apart when it comes to our use of AI and machine learning. DataDome’s system is designed to inspect every request, every time, for all end-points. Our real-time ML detection models process a staggering 5 trillion signals per day and continuously scale new data across all protected endpoints instantly. DataDome’s detection engine takes it a step further by employing a two-pronged approach: it uses the fingerprints of HTTP traffic to gather server-side signals, while also leveraging browser and device metrics for client-side behavioral signals.

Our use of machine learning, rather than reliance on manual rule creation, allows us to adapt and respond to emerging threats at machine speed. The platform’s performance and protection capabilities are enhanced by its operation at the edge, utilizing a network of 26 points of presence (PoPs). This setup enables DataDome to use available threat data in real time. In addition, DataDome is an automated solution, allowing security professionals to focus on other initiatives.

Further reinforcing its detection capabilities, DataDome’s Threat Research team continuously identifies and analyzes bot patterns, ensuring that its detection capabilities evolve in tandem with new threats as they emerge.

Why is having a robust threat research team vital for successful bot management? Because AI cannot be better than the information it is fed. By continuously uncovering the latest attack trends and bot patterns, our researchers empower our AI with up-to-date knowledge. This ensures that our ML models evolve and strengthen, keeping pace with advancing bot threats and providing robust, tailored protection for our customers.

All this and we haven’t even gotten to one of the most important aspects of our detection engine—our multi-layered machine learning approach.

The Core Competency of DataDome: Multi-Layered Machine Learning

A sophisticated bot attack calls for an equally sophisticated security system. Our AI-powered detection engine uses multiple layers of ML models that work in tandem to offer the most accurate detection that takes less than 2 milliseconds to decide whether a request is malicious.

Why is a multi-layered approach so important? First, because some types of bots are sophisticated enough to require a multi-layered approach to identify them. Second, because DataDome can apply to multiple specific use cases and/or customer needs—so one layer may be necessary for detecting ad fraud, and another for fake account creation—and each layer could be more useful for one customer versus another. For example, not every company may be dealing with hordes of scrapers from generative AI tools like ChatGPT gathering their data, but some customers might need robust protection against them.

DataDome analyzes different types of signals, aggregated at different granularities (request, session, IP, fingerprint) and time windows to detect more bots than other solutions. Our detection engine employs a combination of ML techniques, such as behavioral analysis, supervised learning, genetic algorithms, time series analysis, and anomaly detection. It also takes into account verified bots and custom rules. Our ML also is implemented using real-time inference and explainability. It’s designed to be high-performance, accurate, and scalable.

Below we explain why each layer of our multi-layered detection engine plays a critical role in stopping fraudulent traffic.

Every Layer of Detection Matters

Verified Bots & Custom Rules

DataDome scans for verified “good” bots and adheres to custom rules you’ve created for your instance. This allows for flexibility and ensures good bots aren’t blocked. Verified bots include search engine crawlers—like Googlebot and Bingbot—that help ensure a web page is indexed for search engines.

Signature-Based Detection

Known bot signatures are cataloged and all incoming traffic is scanned against these known threats, instantly blocking any that match. While bot signatures are constantly evolving and changing, DataDome keeps an up-to-date repository to ensure known bots are blocked from the first request on any endpoint we protect.

Behavioral Analysis

DataDome’s behavioral analysis delves deep into user interaction patterns, differentiating between genuine users and bots. DataDome monitors 2 types of behavioral patterns: the ones related to the interaction of the end user with the device (mouse movements, touchpoints, keystrokes, scrolling, etc.), and the interactions with the site or application (navigation patterns). Both work together to get a sense of the user’s intentions and whether they are malicious.

Supervised Learning

Complementing behavioral analysis, supervised ML models use labeled data to recognize and adapt to known (and unknown) bot patterns and their variants. The supervised models are generally applied to fingerprints and the context of a request rather than looking at user behavior.

DataDome collects these signals in such a way that we cannot track the actual activity of the user, nor do we capture any personal information. Privacy is a key value of DataDome.

Time Series Analysis

Time series analysis provides insights into traffic patterns over time, crucial for spotting new bot signatures. Once signatures are detected, they can be implemented into signature-based detection, rather than behavioral.

Anomaly Detection

Anomaly detection is key in identifying unusual behaviors that deviate from established patterns, a vital tool for detecting malicious bot traffic. DataDome’s behavioral engine leverages Flink to analyze user activity in real time. The behavioral engine aggregates and analyzes traffic per IP, session, and fingerprint, which enables the engine to detect anomalous behavior at different levels, even if the attacker adapts its behavior.

To catch heavily distributed bots we also apply outlier detection at the entire website traffic level. This enables us to understand when the overall distribution of the traffic has changed and that something abnormal is happening. Once this is detected, we can trigger more specific ML models to understand which subset of the traffic is malicious. This approach is particularly useful against heavily distributed attacks, including credential stuffing attacks. To learn more, watch our presentation about this topic at Black Hat Asia 2023.

Real-Time Inference & Explainability

Real-time inference ensures that threats are identified and dealt with instantaneously, crucial for maintaining uninterrupted online operations. Moreover, the explainability aspect of DataDome’s ML models provides clear insights into why certain traffic is flagged as malicious, aiding in transparency and the continuous improvement of defense strategies.

DataDome is Your Shield Against Bots and Online Fraud

DataDome’s integration of diverse ML, enriched with real-time processing, explainability, and advanced anomaly detection, places it at the forefront of bot detection and online fraud prevention. Its ability to learn, adapt, and accurately predict bot behavior offers businesses a shield against current threats and future challenges. Customers can rest assured knowing that as more threats and use cases appear, we will develop additional ML models to continue to lead the way in detection.

Want a look at what bots might be attacking your business? Our BotTester tool can identify basic bots using common attack vectors your business may be vulnerable to. To identify more advanced bots, try DataDome for free or book a live demo today.

Source link

AI Gumbo

Multi-Layered Machine Learning: A New Requirement for Sophisticated Bot Protection

Why DataDome’s Use of Machine Learning Is Unique

The Core Competency of DataDome: Multi-Layered Machine Learning