Thursday, November 28, 2013

Model the Good With the Bad

When security monitoring appliances were originally envisioned, it was believed that the purpose was to "visualize risk" as Davi Ottenheimer (@daviottenheimer) put it.  However, for all of the fancy visualizations, the most useful part of a Security Information and Event Monitor (SIEM) is the list of correlated events.

That has lead us to realize the real benefit is in being able to prioritize potential malice on the network for investiagtion and identify data correlated with the potential malice.  We can see this in the rise of Splunk as a SIEM.  Now many SIEM producers are working toward this approach.

The next logical step is to develop models for malicious activity to help identify attackers in the massive amounts number of observations available.  But as we go forward with this approach, we need to not lose sight of the importance of modeling legitimate access as well.  

If we only model malice, the question we have to ask about any given observation of the network is "Does this observation match the model of malice?"  As a binary question, this is very hard to answer.  It will be very rare that something exactly matches the model.  If we don't have an exact match, it becomes a fairly arbitrary question as to whether the partial match is malicious or not.

However, if you model legitimate use along with malicious use, the question changes.  Now you can ask, "Does this observation match the model of malice more than the model of legitimate use?"  This is a much easier question to answer and provides real comparisons.  It also lets you monitor the hosts in an observation to watch for movement in the observation.  Over multiple observations you should be able to identify a host who's observations trend away from legitimate use towards malicious use.

Ultimately, the availability of innumerable observations of our network is opening up new options for   detection of malice on the network. Whether machine learning, subjective, or other approaches are used for modeling activity, they will all be more effective if we model not just the malicious activity, but the legitimate use as well.