CYBER SECURITY
The realities of AI in cybersecurity: catastrophic forgetting There is a lot of hype about the use of artificial intelligence (AI) in cybersecurity. The truth is that the role and potential of AI in security is still evolving and often requires experimentation and evaluation. SophosAI1 is committed to openly sharing its data science research with the security community in order to make the use of AI more transparent and influence how AI is positioned and discussed in cybersecurity. Details of other initiatives shared, as part of this objective, are available in the SophosAI blog2.
C
atastrophic forgetting: What is it? Malware detection is the cornerstone of IT security and AI is the only approach capable of learning patterns from millions of new malware samples within a matter of days. But there’s a catch: should the model keep all malware samples forever for optimum detection but slower learning and updates; or go for selective finetuning that enables the model to better keep up with the rate of change of malware, but runs the risk of forgetting older patterns (known as catastrophic forgetting)? Retraining the whole model takes about one week. A good fine-tuning model should take about one hour to update. SophosAI wanted to see if it was possible to have a fine-tuning model that could keep up with the evolving threat
32
SECURITY FOCUS AFRICA APRIL 2021
landscape, learn new patterns but still remember older ones, while minimising the impact on performance. Researcher Hillary Sanders evaluated a number of update options and has detailed her findings in the Sophos AI blog3. The detection dilemma Keeping detection capabilities up-to-date is a constant battle. With every step we take towards defending against a malicious attack, adversaries are already developing new ways to get round it, releasing updates with different code or techniques. The result is that hundreds of thousands of new malware samples appear every day. Detection is made even harder by the fact that the latest-and-greatest malware is rarely completely “new”. Instead, it is more likely to be a combination of new, old, shared, borrowed or stolen code and
adopted and adapted behaviors. Further, old malware can re-emerge after years in the wilderness, co-opted into an adversary’s latest arsenal to take defenses by surprise. Detection models need to ensure that they are able to continue to detect older malware samples, and not just the most recent ones. Updating AI detection models When it comes to updating AI detection models with new malware samples, vendors have a choice between two options. • The first is to keep a copy of every sample that they might ever want to detect and retrain the model repeatedly on an ever-increasing volume of data. This results in better overall performance but also slower updates and fewer releases.
securityfocusafrica.com