WALKING THE ENORMOUS exhibition halls at the recent RSA security conference in San Francisco, you could have easily gotten the impression that digital defense was a solved problem. Amidst branded t-shirts and water bottles, each booth hawked software and hardware that promised impenetrable defenses and peace of mind. The breakthrough powering these new panaceas? Artificial intelligence that, the sales pitch invariably goes, can instantly spot any malware on a network, guide incident response, and detect intrusions before they start.
That rosy view of what AI can deliver isn’t entirely wrong. But what next-generation techniques actually do is more muddled and incremental than marketers would want to admit. Fortunately, researchers developing new defenses at companies and in academia largely agree on both the potential benefits and challenges. And it starts with getting some terminology straight.
“I actually don’t think a lot of these companies are using artificial intelligence. It’s really training machine learning,” says Marcin Kleczynski, CEO of the cybersecurity defense firm Malwarebytes, which promoted its own machine learning threat detection software at RSA. “It’s misleading in some ways to call it AI, and it confuses the hell out of customers.”
Rise of the Machines
The machine learning algorithms security companies deploy generally train on large data sets to “learn” what to watch out for on networks and how to react to different situations. Unlike an artificially intelligent system, most of the security applications out there can’t extrapolate new conclusions without new training data.
Machine learning is powerful in its own right, though, and approach is a natural fit for antivirus defense and malware scanning. For decades AV has been signature-based, meaning that security companies identify specific malicious programs, extract a sort of unique fingerprint for each of them, and then monitor customer devices to ensure that none of those signatures appear.
Machine learning-based malware scanning works in a somewhat similar manner—the algorithms train on vast catalogues of malicious programs to learn what to look for. But the ML approach has the added benefit of flexibility, because the scanning tool has learned to look for characteristics of malware rather than specific signatures. Where attackers could stymie traditional AV by making just slight alterations to their malicious tools that would throw off the signature, machine learning-based scanners, offered by pretty much all the big names in security at this point, are more versatile. They still need regular updates with new training data, but their more holistic view makes a hacker’s job harder.
“The nature of malware constantly evolves, so the people who write signatures for specific families of malware have a huge challenge,” says Phil Roth, a data scientist at the machine learning security firm Endgame, that has its own ML-driven malware scanner for Windows systems. With an ML-based approach, “the model you train definitely needs to reflect the newest things that are out there, but we can go on a little bit of a slower pace. Attackers often build on old frameworks or use code that already exists, because if you write malware from scratch it’s a lot of effort for an attack that might not have a large payoff. So you can learn from all the techniques that exist in your training set, and then recognize patterns when attackers come out with something that’s only slightly new.”
Similarly, machine learning has become indispensable in the fights against spam and phishing. Elie Bursztein, who leads the anti-abuse research team at Google, notes that Gmail has used machine learning techniques to filter emails since its launch 18 years ago. But as attack strategies have evolved and phishing schemes have become more pernicious, Gmail and other Google services have needed to adapt to hackers who specifically know how to game them. Whether attackers are setting up fake (but convincing-looking) Google Docs links or tainting a spam filter’s idea of which messages are malicious, Google and other large service providers have increasingly needed to lean on automation and machine learning to keep up.
As a result, Google has found applications for machine learning in almost all of its services, especially through an ML technique known as deep learning, which allows algorithms to do more independent adjustments and self-regulation as they train and evolve. “Before we were in a world where the more data you had the more problems you had,” Bursztein says. “Now with deep learning, the more data the better. We are preventing violent images, scanning comments, detecting phishing and malware in the Play Store. We use it to detect fraudulent payments, we use it for protecting our cloud, and detecting compromised computers. It’s everywhere.”
At its core, machine learning’s biggest strength in security is training to understand what is “baseline” or “normal” for a system, and then flagging anything unusual for human review. This concept applies to all sorts of ML-assisted threat detection, but researchers say that the machine learning-human interplay is the crucial strength of the techniques. In 2016, IBM estimated that an average organization deals with over 200,000 security events per day.
Machine learning’s most common role, then, is additive. It acts as a sentry, rather than a cure-all.
“It’s like there’s a machine learning assistant that has seen this before sitting next to the analyst,” says Koos Lodewijkx, vice president and chief technology officer of security operations and response at IBM Security. The team at IBM has increasingly leaned on its Watson computing platformfor these “knowledge consolidation” tasks and other areas of threat detection. “A lot of work that’s happening in a security operation center today is routine or repetitive, so what if we can automate some of that using machine learning or just make it easier for the analyst?” Lodewijkx says.
Read the full article on Wired.com