Tuesday, May 10, 2016

The role of Pen Testing / Vuln Hunting in Information Security


At a security conference, ask someone in attendance what they do.  More than likely they are a consultant, either doing penetration testing, vulnerability hunting or both.  Penetration testing and vulnerability hunting are mainstays of security testing, many times required by laws, regulations, or contracts.  They exist ubiquitously in information security.

But we don't have a good model for how they fit into improving defense.  The prevailing knowledge is that disclosing vulnerabilities leads to their mitigation which leads to more security.  However there is a counter-argument that disclosing vulnerabilities helps the attackers more than the defenders.  Can we build a model that takes both views into account?  Let's see.

So what do you 'do' here?

So what do penetration testers and vulnerability hunters actually 'do'?  If we think of information security as a game, (a very high-stakes game), we could say that penetration testers and vulnerability hunters reveal paths on the game board that attackers can take to reach their objectives.  That begs the question:

How does this benefit the defenders?

Let's take four scenarios:
  1. No-one knows about the path:  In this case no-one benefits, no-one loses, because no-one knows. No change.
  2. Only the defender knows about the path: In this case, the defender either benefits none or actually loses as they expend resources to mitigate the path. Defender Cost.
  3. Both defender and attacker know about the path: In this case, the attacker either benefits some or none depending on whether they successfully exploit the path.  The defender probably loses some (mitigates the path) or loses a lot (is exploited) though there is the off chance they lose none due to the attacker's failed exploitation. Attacker potential Profit. Defender potential for more Cost.
  4. Only the attacker knows about the path: Here the attacker's chance to benefit goes up significantly as the defender is unaware of the path.  The defender, on the other hand, doesn't even have the chance to mitigate the path and can only lose.  And after exploit, they return to step 3 and still lose as they mitigate the path. Attacker most Profit. Defender most Cost.


Based on the model above, penetration testers and vulnerability hunters can be most helpful by using their knowledge of paths to detect when attackers know them and to disclose them to defenders in situations when the attackers already know of the path.  This helps move from Scenario 4 to Scenario 3.  It's not ideal, but it's better than the status quo.

If only it were so simple

This model is admittedly naive.  It's a starting point, not an end-all-be-all.  Some things to consider:
  • There is a time lag from knowledge of a path to its weaponization or mitigation.  The model should take that into account.
  • Attackers and defenders are not homogenous.  This model doesn't consider what some attackers/defenders know and what others do not.  Nor does it model the spread of that knowledge through the population.
  • This model relies on defender's knowledge of attacker's knowledge.  Something that will always be imperfect.
  • Paths are made up of individual pieces.  This model doesn't account for the rearranging of pieces of the path, combined with other information in the attacker/defender's knowledge, to form new paths.
This model is not perfect, but hopefully it's a start in how to consider the role of penetration testing/vulnerability hunting in information security.

Alexi Hawk's Impossible Data Set

As the author of the only unsolved puzzle in the DBIR Cover Challenge this year, I figured I should provide a bit of a write up.  I'll apologize to all of the cover challenge participants as it's quite literally 10 lines of code to solve,  only two of which are actually functional (vs loading packages and naming stuff).

The Idea

First, where the puzzle came from.  I wanted to have a data-y puzzle in the challenge, but I also wanted it to be challenging for data science-y people.  To that end, I suggested, and the team approved, a puzzle based on a dataset, but with a twist.  The solution would not be from analyzing the data statistically.  Even then, our estimate going in was that it was the hardest puzzle of the bunch and likely wouldn't be solved.

The Setup

To create the puzzle, I used gimp to create a raster image with the key text.  I then opened the image in python using the PIL package.  It lets you parse through each of the individual pixels and determine its RGB.   I took all the pixels with RGB less than 10 (i.e. black) and saved them as a csv of (x, y) coordinates.

From there I transferred it to R.  Since each point is a pixel (i.e. closer than the size of a circle drawn at that location), I filtered down to 10% of the points.  Now, the first thing a good data scientist does is looks at the data, so we can't have it be that obvious.  Instead, I added a third column with random points in the range of the first two columns.  Then I swapped the first and third column.  If creating a scatter plot of the data would have been looking straight on, now doing so (on the first two columns) is like looking at the vertical location and a completely random horizontal location of the pixel.  

As we discussed the puzzle, someone else had suggested doing something with polar coordinates.  So I did just that.  I converted the current cartesian coordinates into spherical form.  (Hopefully all the hints about spheres and looking at the ranges now make sense as two columns, the angles in radians, range from about 0 to 1.6 and one, the vector length, ranges from 0 to about 500).

The Payout

So, the solution to the dataset (in R) is as follows:
# Read in the file
alexi <- read.csv("http://cybercdc.global/static/alexi.csv")

# Convert each from spherical coordinates to cartesian
back <- apply(alexi, MARGIN=1, sph2cart) 
(At this point if you look at the data you'll notice two int rows and one numeric.  That wasn't intended and gives away the correct rows a bit.)
# The output of above is 1 point per column.  Change it to rows using the 't' (transpose) command. 
back <- t(back)
# Convert it back to a dataframe to make it easily plottable with ggplot
back <- as.data.frame(back)

# Give it column names to make it easy to refer to
names(back) <- c("V1", "V2", "V3")
# Scatter plot the correct two dimensions to view the data
ggplot(as.data.frame(back)) + aes(x=V3, y=V2) + geom_point()

You may have to squish the vertical dimension a bit to read the text, but you'll see it.


Incidentally, I actually tested that the spherical points to make sure that they wouldn't reveal the clue when visualized directly.  The sampling had to be adjusted so that if you visualized columns V1 and V3, it didn't reveal the activation text.