Performance Evaluation of machine learning algorithms for cyber threat analysis SDN dataset

By Parthavi Parmar, eInfochips

Abstract— Cyber threat intelligence (CTI) is a component of a cybersecurity platform that is used for collecting and analysing information about current and potential attacks that threaten the security of the organization or its assets. Although cyber threat analysis is a part of CTI, with the help of cyber threats, we can test the online activities and skills of anonymous entities such as spy agencies or criminals. DDoS (Distributed Denial of Service) is a type of cybercrime used by cybercriminals to render an internet service, network service, or hosting machine unavailable to targeted users online. The DDoS attack target is filled withs thousands or millions of unnecessary applications, which frustrates the machine and its support systems. DDoS attacks are different from standard of Denial Service scenarios because they originate from distributed or multiple sources or IP addresses. Identifying the problem of DDoS attacks is a major problem in machine learning. To achieve this, attackers often disrupt network bandwidth, shut down system resources, and thus restrict access to legitimate users. DDoS attacks are more common across network, transport and start-up layers, utilizing a seven-layer OSI model. Detecting DDoS attacks based on machine learning consists of feature removal and pattern detection. At the output stage, DDoS attack traffic characteristics are extracted by comparing data packets, often categorized by rules. In the model acquisition phase, extraneous features are used as input components for machine learning, and a random forest algorithm is used to model the model to detect the attacker. The test results show that the proposed method for detecting DDoS attacks based on machine learning is of good value for detecting currently widespread DDoS attacks.

Keywords— Cyber threat intelligence (CTI), Cyber threat analysis, Distributed Denial of Service (DDoS), Machine learning, SDN

I. Introduction

Cyber threat intelligence (CTI) is used to share information about cyberattacks to help organizations better understand threats and how to protect their systems and networks from cyberattacks. Cyber Threat Intelligence is how information threatening the internet becomes after it has been collected, tested for its source and reliability, and analyzed with powerful and organized trading strategies by people with access to extensive knowledge and information from all sources. As with all intelligence, threatening online intelligence provides added value to information that threatens the internet, helping the consumer identify threats and opportunities while reducing consumer uncertainty. Producing accurate, timely, and relevant information requires analysts to recognize similarities and differences in vast amounts of data and detect fraud.

The intelligence cycle is a circular system of intelligence development rather than being developed through an end-to-end technique. This cycle requirement is data collection planning, implementation, and evaluation. Results are processed to produce intelligence, then disseminated and evaluated in light of the most recent data and user feedback. The cycle’s analysis phase is what distinguishes intelligence from the collection and transmission of information. To ensure that certain biases, mindsets, and uncertainties are acknowledged and managed, intelligence analysis depends on a rigorous way of wondering and uses organized, analytical procedures. Instead of just making assumptions about broadly challenging problems, intelligence analysts consider their thinking processes. Taking these extra procedures ensures that, to the greatest extent feasible, analyst attitudes and prejudices are considered and, if necessary, minimized or included. An ecosystem that enables decision-making from the gathering, analysis, dissemination, and monitoring of cyber threats is known as a cyber threat intelligence platform.

Cyber threat analysis is the procedure to evaluate the cyber activities and capabilities of unknown intelligence assets or criminals. Hackers target businesses, governments, institutions, or individuals with sensitive information. Threats posed by cyberattacks include denial of service (DoS), computer viruses, malware, malicious email theft, and more. The attacks are aimed at anyone online. Cyberattacks can cause power outages, security breaches of security systems, malfunction of military equipment, disruption of computer networks, disruption of telephone networks, absence of confidential data, and potentially compromised health of a person.

Cyber threats are increasing daily as technological advances in artificial intelligence or intelligent systems increase the need for better skills beyond the most secure systems. For these reasons, organizational leaders must complete a thorough and detailed analysis of the online threat to determine the level of exposure of their business or businesses to cyberattacks.

The primary purpose of online threat analysis is to produce findings to help initiate or support counterintelligence investigations. Steps are taken to eliminate the threat to a particular organization, business, or government program. When analyzing cybercrime threats, information on the risk of internal and external communication related to a specific business model is compared with real or virtual internet attacks. This type of cybercrime is a desirable transition from an active, productive, and dynamic environment. A threat assessment should ensure how protective controls are used to promote integrity, availability, and privacy without affecting conditions of functionality and usability.

Distributed denial-of-service (DDoS) attacks are a subset of denial-of-service (DoS) attacks. A DDoS attack uses many Internet-connected devices, collectively called botnets, to take down targeted websites by flooding them with fictitious data. DDoS attacks don’t try to penetrate your security boundaries like other cyberattacks. DDoS assaults instead seek to deny authorized users access to your website and servers. DDoS attacks can also decrease security features by breaching the target’s device, known as a cover for other malicious operations. All online users are impacted when service hacking attempts are successfully deployed. This makes it a popular tool for hackers, cybercriminals, and anyone looking to express themselves or fight for a cause. DDoS assaults can frequently occur or repeatedly, but regardless of how they attack, the effects on a website or company may linger for days, weeks, or even months as the latter struggles to recover. DDoS can become highly damaging to any online organization resulting from it. DDoS assaults, among other things, can result in lost sales, erode customer confidence, compel companies to pay for compensation, and damage a company’s reputation over the long term.

II. PROBLEM STATEMENT

A distributed denial of service (DDoS) attack is a vicious attempt to disrupt the normal traffic of a targeted server, service, or network by flooding the target or surrounding infrastructure with overcrowded internet traffic. A DDoS attack provides efficiency by using compromised computer systems as a source of traffic attack. Users can include computers and other network resources such as IoT devices. DDoS attacks are currently one of the most common network attacks. With the rapid development of computer technology and communications, the damage done by DDoS attacks is getting worse. Therefore, DDoS attack detection research is becoming more and more important. However, due to the diversity of DDoS attack methods and the different sizes of attack traffic, there is no detection method that can be obtained with satisfactory accuracy yet. The problem of identifying DDoS attacks is a phase problem in machine learning. The task of identifying DDoS attacks is a major challenge due to the complexity of the computer that needs to be addressed. Basically, a Denial of Service (DoS) attack is an intentional attack from a single source, with the intent of attackers to make the application open to the intended parties. To achieve this, attackers often interfere with network bandwidth, shutting down system resources and thus blocking access to legitimate users. Unlike DoS attacks, in DDoS attacks the attacker uses multiple resources to launch an attack. DDoS attacks are most common in networking, transport, initialization layers, and implementations of the seven-layer OSI model. While machine learning and Artificial Intelligence technology have proven very useful for major advancements in various fields, they also have their downsides. This technology can be used by illegals to carry out cyber attacks and threaten businesses. This technology can be used to target high value directly within a large dataset. In this article, the aim of the study is to examine DDoS Attack models using a machine learning algorithm and to perform cyber threat analysis on DDoS attacks.

III. IMPLEMENTATION METHODOLOGY

A. Details of algorithms used

We will integrate machine learning strategies to detect DDoS flood attacks such as A detailed comparison successfully is performed and evaluated based on the analysis, precision criteria of the algorithms mentioned K-Nearest, Decision Tree, SGD, Logistic Regression, Naive Bayes, DNN, Support Vector Machine, Multi-layer Perceptron, XGBoost, Quadratic discriminant.

The Naive Bayes classifier is a straightforward machine learning model that computes the probabilities for each class in the database and uses discriminant learning to predict new class values. The Bayes theorem states that we may calculate the probability that A (the hypothesis) will appear when B (the evidence) does, assuming that the elements are independent. One predictor or trait cannot exist as the other is unaffected.

This programme’s non-parametric supervised learning method is called a decision tree. It is mainly used to address accidents and division issues. The end objective is to build a model that forecasts the desired level of variability by looking at the straightforward rules of the decision tree taken into account in the data items. A definite estimate is thought to be the tree. The separation problem is resolved using the DecisionTreeClassifier class. The class is well-equipped to perform various database class partitions. The programme Y, with its dense organised elements (n samples, n features) and integer shapes (n samples) that hold the class labels to match the training samples, or the same X that has the database training samples, serve as the inputs for the divider. Following the model setup, sample class predictions are made using the model. The divider typically forecasts the class with the lowest score among the ranks when there are multiple courses with precisely the same high likelihood. The probabilities for each class can be computed to include a portion of the class training sample in the leaf and eliminate a specific type. The separator can distinguish between binary classification and multistage splitting.

The use of deep neural networks is one of the most well-known and current models, which can be viewed as a collection of neural networks or a grid with several layers. Using simple automated models, DNN has been successfully used in several applications, including retransmissions, splits, or timeline prediction problems. The structure consists of at least 3 layers of nodes namely input layer, hidden layer and connected output layer; data flow occurs in a single way from the input node to the output area. Continuous DNN uses backpropagation as the training algorithm and the activation function (usually sigmoid) of the classification process. We train a deep neural network to distinguish common attack regions from DDoS using a carefully selected set of network statistics as an input signal.

Stochastic Gradient Descent Classifier is a separator that uses standard-line models such as SVM, regression, etc., using the training process’s stochastic gradient descent (SGD) development technique. This scheduler determines the gradient loss rate for each sample and updates the model by evaluating the minimal gain at a lower learning rate or power schedule. The SGD Classifier effectively measures large-scale problems as it allows partial minibatch readings. Simple line segments do not work if records cannot be stored in RAM. However, the SGD separator continues to work. This model is sensitive to feature measurement and requires exemplary configuration of multiple hyperparameters, such as multiplication numbers and efficiency parameters.

K-Nearest Neighbor (K-NN) is a simple Surveillance Machine Learning algorithm that considers the similarities between existing and new data and places a recent case in a similar category to those available. It separates the point of new data based on the similarity of the public open data, i.e., where any new data appears, it can be easily categorized into a well-suited category using the K-NN algorithm. The KNN team can successfully detect offensive attacks and achieve a lower rate of collapse. It can distinguish between normal and abnormal system behaviour and is used to determine network status at each stage of the DDoS attack.

Support Vector Machines (SVM) is one of the most popular ML algorithms in many systems, such as pattern detection, spam filtering and entry. There are several SVM formats for regression, split, and distribution ratio. It is based on the fact that the line divider and the hyperplane partition are perfect. There is a training set D = {(X1, y1), (X2, y2) …. (Xn, yn)}, where Xi is a vector of the training sample vector and is a related class label. Takes +1 or − 1 (y belongs to {+ 1, -1}), indicating whether or not the vector belongs to this category. It is said to be linearly divided when a line function can completely separate the two types; otherwise, they are illegitimate. Since DDoS attack detection is equivalent to a binary split problem, we can use the features of the SVM algorithm to collect data to generate traits for the training feature, detect high-level hyperplane between official traffic and DDoS attack traffic, and use test data to test our model. and get the results of the separation.

The Random Forest classifier uses an integrated learning approach to create multiple trees for cutting. All the individual trees that exist as part of the random forest provide a class forecast. Next, the class with the highest number of votes predicts every model. The concept of the separator is to have a significant number of trees working together to be successful in any of the models in each component. What is important is the low correlation between the models. Offline models can produce more accurate models than any individual predictions. The main reason is that trees protect each other from individual mistakes. Although some trees may be bad, if many other trees are good, then, a group of trees will be able to go to the right place. The separator uses random element and bags to build each tree to create an unrelated forest of trees.

XGboost Classifier is a tree-based compact education program. It uses a gradient expansion framework. With random data predictions like images, text etc. The artificial neural network tends to perform better compared to other frameworks or algorithms. However, tree-based algorithms are considered the best when it comes to table data from small to medium. The algorithm is based on the GBM framework based on algorithmic enhancements and system configurations. In other words, it is an advanced gradient development algorithm that uses deforestation, uniform processing, pruning and handling of non-existent values and uses customization to avoid bias and overuse.

In Quadratic Discriminant Analysis (QDA), each class follows the Gaussian distribution and is productive. It is very similar to that of the Online Discrimination Analysis except that the covariance and definition of all categories are equal. Previous specific category refers to a portion of the data points for that class. The matrix for a particular class covariance refers to the covariance of the vectors in that category. The definition vector of a particular class refers to the degree of variability of the inputs that are part of that class.

B. Dataset Description

Software-Defined Networking (SDN) is a dynamic architecture that is robust, manageable, cost-effective, and flexible, making it suitable for high-bandwidth, flexible environment for modern applications. This structure removes network control and transmission functions that allow the network controller to be directly configured and sub-infrastructure to be deployed on network applications and services.

IV. RESULTS

DDoS attack analysis and detection were performed using machine learning method. In this function, a specific SDN data set is used. The database initially covers 23 features. The uninstall feature is the last column of the database i.e. the class label that classifies the type of traffic as inappropriate or malicious. Cruel traffic is marked 1 and negative traffic is marked 0. It has 104345 instances. To improve the model, null values were detected in rx_kbps and tot_kbps and were reduced as a result. Data processing steps are complete, including data processing/refining, One Hot code and standardization. The data frame comprised 103839 cases and 57 features after one hot encoding, and it was incorporated in the model. The Deep Neural Network had been used as the proposed model. It has been observed that the performance of our proposed model is higher than the base class dividers used. Our suggested model’s correctness was estimated to be 99.38 percentages, over 1.21 percent greater than the accuracy of the following XGBoost model, which was 98.17 percentages.

V. Conclusions

DDoS attack analysis and detection was carried out using machine learning method. This function uses a specific SDN dataset. Numerous types of network assaults have emerged because of the evolution of technology, particularly the Internet. One of the most serious security risks facing the internet age today is the DOS assault. Additionally, a Distributed Denial of Service (DDoS) attack is a specific issue whose impact could be equally significant. This attack’s tricky aspect is that it sneaks up on the victim with hardly any warning and can quickly cut off their contact or resource counting. Our suggested model looks to function better than the base class dividers that were used.

Acknowledgment

I express my deep sense of gratefulness to Ms. Richa Sharma for suggesting & guiding me to take this as my final year Dissertation Topic. I am very thankful for providing me with the necessary facilities and helping in carrying out my dissertation work. I am greatly thankful to all my Professors of Rashtriya Raksha University, Gandhinagar for their kind support and guidance. I am thankful for all the journals where I am able to gather information to successfully complete my survey regarding the topic.

References

[1] Husam Hassan Ambusaidi and Prakash Kumar, “Cyber Threat Intelligence and its Role in Proactive Incident Response”, Journal of Student Research 2017.

[2] Seonghyeon Gong and Changhoon Lee, “Cyber Threat Intelligence Framework for Incident Response in an Energy Cloud Platform”, 21 January 2021.

[3] Md Sahrom Abu, Siti Rahayu Selamat, Aswami Ariffin, Robiah Yusof, “Cyber Threat Intelligence – Issue and Challenges”, March 18, 2018.

[4] Ensar Seker. “Cyber Threat Intelligence (CTI)”. September 2019

[5] Davy Preuveneers and Wouter Joosen, “Sharing Machine Learning Models as Indicators of Compromise for Cyber Threat Intelligence”, 26 February 2021.

[6] Rami J. Alzahrani and Ahmed Alzahrani, “Security Analysis of DDoS Attacks Using Machine Learning Algorithms in Networks Traffic”, 25 November 2021.

[7] Jiangtao Peil, Yunli Chen1, Wei Ji1, “A DDoS Attack Detection Method Based on Machine Learning”, 2019.

[8] Swathi Sambangi and Lakshmeeswari Gondi, “A Machine Learning Approach for DDoS (Distributed Denial of Service) Attack Detection Using Multiple Linear Regression”, 25 December 2020.

[9] Bhavika Pande, Gargi Bhagat, Shanu Priya and Himanshu Agrawal, “Detection and mitigation of DDoS in SDN”, 2-4 August 2018.

About Author:

Parthavi Parmar

Parthavi Parmar works as a Security Engineer at eInfochips, specializing in the IoT security domain, with about a year of experience in the security field. She possesses expertise in web application Vulnerability Assessment & Penetration Testing (VAPT), Threat Modelling, Vulnerability Management, and Incidence response.

If you wish to download a copy of this white paper, click here

Source link