Detection of web attacks via HTTP requests using NLP techniques

Authors

DOI:

https://doi.org/10.30837/rt.2024.3.218.05

Keywords:

HTTP request, NLP, BERT, transformer architecture, web attack detection, machine learning

Abstract

The work focuses on improving web attack detection methods through the analysis of HTTP traffic using natural language processing (NLP) techniques and transformer-based models, particularly BERT. The relevance of research in the field of web attack detection is underscored by the significant achievements of the model trained on an extended dataset containing 195,000 records. The developed model based on BERT demonstrates high efficiency in detecting web attacks due to its deep contextual understanding and modern Word Piece tokenizer, which better handles rare words. Unlike methods such as Doc2Vec, LSTM-CNN, or Isolation Forest, our model accounts for global word relationships enhancing its accuracy.

Some previous studies have limitations; notably, some do not utilize state-of-the-art architectures, which limits their ability to achieve high model accuracy. Additionally, while using modern architectures, other studies operate with small datasets, limiting their capability to effectively detect various attack types and ensure high detection quality. In the context of these challenges, the created model was trained on an extended dataset, resulting in significantly better performance compared to leading analogs in the field of web attack detection. The high balanced accuracy of the model at 0.9998 confirms its effectiveness and reliability, making it a potentially important tool for cyber security applications.

References

Attention Is All You Need [Електронний ресурс]. Режим доступу до ресурсу: https://arxiv.org/pdf/1706.03762.pdf

HTTP DATASET CSIC 2010 [Електронний ресурс]. Режим доступу до ресурсу: https://www.isi.csic.es/dataset/

Saikat Das, Mohammad Ashrafuzzaman, Frederick T Sheldon, and Sajjan Shiva. Network intrusion detection using natural language processing and ensemble machine learning // 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 829–835. IEEE, 2020.

Ali Moradi Vartouni, Saeed Sedighian Kashi, and Mohammad Teshnehlab. An anomaly detection method to detect web attacks using stacked auto-encoder // 2018 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), pages 131–134. IEEE, 2018.

Jiaxin Liu, Xucheng Song, Yingjie Zhou, Xi Peng, Yanru Zhang, Pei Liu, and Dapeng Wu. Deep anomaly de-tection in packet payload. arXiv preprint arXiv:1912.02549, 2019.

Chaochao Luo, Zhiyuan Tan, Geyong Min, Jie Gan, Wei Shi, and Zhihong Tian. A novel web attack detection system for Internet of things via ensemble classification // IEEE on Industrial Informatics, 2020

Tianlong Liu, Yu Qi 2, Liang Shi, Jianan Yan. Locate-Then-Detect: Real-time Web Attack Detection via At-tention-based Deep Neural Networks

mrm8488/bert-tiny-finetuned-sms-spam-detection [Електронний ресурс]. Режим доступу до ресурсу: https://huggingface.co/mrm8488/bert-tiny-finetuned-sms-spam-detection

Best WAF solutions in 2023 - real-world comparison [Електронний ресурс]. Режим доступу до ресурсу: https://www.openappsec.io/post/best-waf-solutions-in-2023-real-world-comparison

Published

2024-09-26

How to Cite

Kavetskiy, M., & Ruzhentsev, V. (2024). Detection of web attacks via HTTP requests using NLP techniques. Radiotekhnika, 3(218), 64–75. https://doi.org/10.30837/rt.2024.3.218.05

Issue

Section

Articles