Detection of web attacks via HTTP requests using NLP techniques
DOI:
https://doi.org/10.30837/rt.2024.3.218.05Keywords:
HTTP request, NLP, BERT, transformer architecture, web attack detection, machine learningAbstract
The work focuses on improving web attack detection methods through the analysis of HTTP traffic using natural language processing (NLP) techniques and transformer-based models, particularly BERT. The relevance of research in the field of web attack detection is underscored by the significant achievements of the model trained on an extended dataset containing 195,000 records. The developed model based on BERT demonstrates high efficiency in detecting web attacks due to its deep contextual understanding and modern Word Piece tokenizer, which better handles rare words. Unlike methods such as Doc2Vec, LSTM-CNN, or Isolation Forest, our model accounts for global word relationships enhancing its accuracy.
Some previous studies have limitations; notably, some do not utilize state-of-the-art architectures, which limits their ability to achieve high model accuracy. Additionally, while using modern architectures, other studies operate with small datasets, limiting their capability to effectively detect various attack types and ensure high detection quality. In the context of these challenges, the created model was trained on an extended dataset, resulting in significantly better performance compared to leading analogs in the field of web attack detection. The high balanced accuracy of the model at 0.9998 confirms its effectiveness and reliability, making it a potentially important tool for cyber security applications.
References
Attention Is All You Need [Електронний ресурс]. Режим доступу до ресурсу: https://arxiv.org/pdf/1706.03762.pdf
HTTP DATASET CSIC 2010 [Електронний ресурс]. Режим доступу до ресурсу: https://www.isi.csic.es/dataset/
Saikat Das, Mohammad Ashrafuzzaman, Frederick T Sheldon, and Sajjan Shiva. Network intrusion detection using natural language processing and ensemble machine learning // 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 829–835. IEEE, 2020.
Ali Moradi Vartouni, Saeed Sedighian Kashi, and Mohammad Teshnehlab. An anomaly detection method to detect web attacks using stacked auto-encoder // 2018 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), pages 131–134. IEEE, 2018.
Jiaxin Liu, Xucheng Song, Yingjie Zhou, Xi Peng, Yanru Zhang, Pei Liu, and Dapeng Wu. Deep anomaly de-tection in packet payload. arXiv preprint arXiv:1912.02549, 2019.
Chaochao Luo, Zhiyuan Tan, Geyong Min, Jie Gan, Wei Shi, and Zhihong Tian. A novel web attack detection system for Internet of things via ensemble classification // IEEE on Industrial Informatics, 2020
Tianlong Liu, Yu Qi 2, Liang Shi, Jianan Yan. Locate-Then-Detect: Real-time Web Attack Detection via At-tention-based Deep Neural Networks
mrm8488/bert-tiny-finetuned-sms-spam-detection [Електронний ресурс]. Режим доступу до ресурсу: https://huggingface.co/mrm8488/bert-tiny-finetuned-sms-spam-detection
Best WAF solutions in 2023 - real-world comparison [Електронний ресурс]. Режим доступу до ресурсу: https://www.openappsec.io/post/best-waf-solutions-in-2023-real-world-comparison
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).