Wednesday, May 3, 2017

Elasticsearch. Logstash. R. What?!

Motivation

At bsidesNash, Chris Sanders gave a great talk on threat hunting.  One of his recommendations was to try out an ELK (Elasticsearch, Logstash, Kibana) stack for searching for threats in log data.  ELK is an easy way to stand up a distributed, scalable, stack capable of storing and searching full text records.  The benefit is it's easy ingestion (Logstash), schema-agnostic storage ability, (Elasticsearch), and robust search and dashboards (Kibana) makes it easy platform for threat hunters.

However, because of it's ease, ELK tends to be a one-size-fits-all solution for many tasks.  I had asked Chris about using other tools for analysis such as R by way of Rstudio and dplyr or Microsoft Power BI.  Chris hadn't tried it and, at the time, neither had I.  (My day job is mostly historic data analysis rather than operational monitoring.)

Opportunity

However, the DBIR Cover Challenge presented an opportunity.  For those who are unaware, each year there is a code or codes hidden on the DBIR cover.  That code then leads to a puzzle challenge which has resulted in some nice rewards for the winners; (iPad minis, auto-follow telescopes, Yeti coolers, quadcopters, 3D printers, and more).  The challenge has multiple puzzles of which players must complete 8.  So that they check their answers as they go, the site is a dynamic webapp hosted at Heroku.  Because it is dynamic, I can add my own log messages into the endpoint functions.

But I needed a place to store and search the logs.  Heroku provides some great plugins for this, but, given the conversation with Chris, I figured I'd try to roll my own, starting with ELK.  The first hurdle was that, though there is a lot of hosted Elasticsearch and Kibana, there was much less hosted Logstash (the part I really needed).  Elastic cloud didn't have it.  AWS had their own tools.  Finally I found logit.io which works perfectly.  They provide a full ELK stack as a cloud service for around $20 at the low end with a 14 day trial. I signed up for the trail and was up-and-running in minutes.  They even have an easy one-line instruction on how to set up a Heroku drain to send logs to a logit.io Logstash endpoint.  From there, it is automatically stored in Elasticsearch and searchable through Kibana.

Going beyond ELK

The problem I quickly found out, was that Kibana didn't have the robust manipulation I was used to using R.  While it could find entries and make basic dashboards, I simply couldn't cut the data like I wanted once I'd found the subset of data I was interested in.  I tried passing the data to PowerBI, but on first blush, the streaming API setup was too limited to ingest a heroku drain using the basic setup tools.  Finally, I decided to try and keep the Logstash and Elasticsearch underpinnings, but switch to R for analysis.  R allows for simple pipeline analysis of data as well as robust charting.

Doin it with R

The first step was to install the packages I'd need:
install.packages("dplyr") # for simple piped data processing
install.packages("elastic") # for talking to the Elasticsearch store
install.packages(flexdashboard) # for creating a dashboard to monitor
install.packages("DT") # for displaying a HTML data table in the dashboard
install.packages("stringr") # simple string manipulation
install.packages("ggmaps", "viridis", "rgeolocate", "leaflet") # geocoding IPs and displaying them on a map
install.packages("devtools", "treemap") # create treemaps
devtools::install_github("Timelyportfolio/d3treeR") # create treemaps
After installing packages, the next step was to set up the Elasticsearch connection:
elastic::connect(es_host="<my ES endpoint>", es_port=443, es_path="", es_transport_schema = 'https', headers=list(apikey="<my api key>"))
I also manually visited: "https://<my ES endpoint>/_cat/indices?v&apikey=<my API key>&pretty=true" to see what indexes Logstash was creating.  It appears to create an index per day and keep four indexes in the default logit.io setup.  I stored them into a variable and then ran a query, in this case for the line log line indicating a player had submitted a specific key:
indexes <- c("logstash-2017.04.28", "logstash-2017.04.29", "logstash-2017.04.30", "logstash-2017.05.01") # I should be able to get this from `elastic::cat_indices()`, but it did not apply my apikey correctly
query <- elastic::Search(index=indexes, q="logplex_message:submitted", size=10000)$hits$hits
The following thing we need to do is remove only the fields we want from the query.  The result is a list of query results, each itself a list of key:value pairs.  I used the `lapply` function to extract _just_ the logplex_message field.  (`lapply` takes a function and applies it to each item of a list in R.)  `lapply` returns a list and so I `unlist` the results and make them a column in a dataframe:
submissions <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message))
In our puzzle challenge, we have 'trainers' who use 'keys' to indicate they've caught Breachemon.  I can use my normal R skills to separate the trainer name and key from the log message and count how many times each trainer has submitted each key:
submissions <- submissions %>%
    mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract 'trainer'
    mutate(key = gsub(".*submitted key (.*) to the bank.$", "\\1", text)) %>% # extract 'key'
    group_by(trainer, key) %>% # group each trainer-key pair
    tally() # short cut for `summarize(n=n())`.  For each trainer-key pair, create a column 'n' with the number of times that pair occurred
From there we can visualize the table with:
DT::datatable(submissions)
We could also visualize the total submissions per trainer:
submitters <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message)) %>% # extract the log message and produce a dataframe
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract the trainer
group_by(trainer) %>% # create a group per trainer
tally() # shortcut for `summarize(n=n())`. Count the events per group
d3treeR::d3tree2(treemap::treemap(submitters, "trainer", "n", aspRatio=5/3, draw = FALSE)) # produce a treemap of submissions per person

Dashboard Time

To wrap this all together, I decided to make a simple dashboard.  In the Rstudio menu, File->New File->R Markdown...  In the menu, choose 'From Template' and then Template: 'Flex Dashboard'.  You'll get something like:
---
title: "Untitled"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
---
```{r setup, include=FALSE}
library(flexdashboard)
```
Column {data-width=650}
-----------------------------------------------------------------------
### Chart A
```{r}
```
Column {data-width=350}
-----------------------------------------------------------------------
### Chart B
```{r}
```
### Chart C
```{r}
```
Lets add our two charts:
---
title: "Breachemon"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
---
```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
elastic::connect(es_host="<my ES endpoint>", es_port=443, es_path="", es_transport_schema = 'https', headers=list(apikey="<my api key>"))
query <- elastic::Search(index=indexes, q="logplex_message:submitted", size=10000)$hits$hits
```
Column {data-width=650}
-----------------------------------------------------------------------
### Submissions
```{r fig.keep='none'}
submitters <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message)) %>% # extract the log message and produce a dataframe
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract the trainer
group_by(trainer) %>% # create a group per trainer
tally() # shortcut for summarize(n=n()).  Count the events per group
d3treeR::d3tree2(treemap::treemap(submitters, "trainer", "n", aspRatio=5/3, draw = FALSE)) # produce a treemap of submissions per person
```
### Submitters
```{r}
data.frame(text = unlist(lapply(query, function(l) {l$`_source`$logplex_message}))) %>%
    mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract 'trainer'
    mutate(key = gsub(".*submitted key (.*) to the bank.$", "\\1", text)) %>% # extract 'key'
    group_by(trainer, key) %>% # group each trainer-key pair
    tally() # short cut for `summarize(n=n())`.  For each trainer-key pair, create a column 'n' with the number of times that pair occurred
  DT::datatable()
```
Column {data-width=350}
-----------------------------------------------------------------------

### Map
```{r}
ips <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$msg_fwd))
geo <- rgeolocate::db_ip(as.character(unique(ips$text)), "<my free db-ip.com api key>") # geocode unique IPs, returns a list
geo <- do.call(rbind.data.frame, geo) # bind the list together as a dataframe
names(geo) <- c("IP", "Country", "State", "City") # set the dataframe column names
geo <- ips %>%
    group_by(text) %>%
    tally() %>% # count per IP
    rename(IP = text) %>%
    right_join(geo, by="IP") # join with geolocation
cities <- unique(as.character(geo$City)) # unique list of cities
cities <- cbind(ggmap::geocode(cities), cities) # geo code the cities
geo <- right_join(geo, cities, by=c("City" = "cities")) #join it back together
pal <- leaflet::colorFactor(viridis::viridis_pal(option = "C")(2), domain = geo$n) # create a color range
leaflet::leaflet(geo) %>% # make a map
  leaflet::addTiles() %>% # add some default shapes to it
  leaflet::addCircleMarkers(color = ~pal(n)) # add a circle with a color based on the count of submissions for each IP
```
Resulting in:

The last block pulls the msg_fwd field which contains the source IP adddress, splits it (as some have multiple), and stores it in a dataframe.  It then geolocates the IPs and binds the cities.  After that it geocodes latitude and longitude and joins it.  Finally it places the geolocated and coded IPs as dots on a map.

Wrapup

That's not to say there aren't hang-ups.  You _are_ pulling the data from the remote cluster to your local machine which is a relatively costly action.  (The queries I ran returned in a fraction of the second, but I can imagine querying a billion record store, returning tens of thousands of hits, would be slower.)  However, as Chris noted during his talk, not being selective in what you retrieve to search is one of the signs of a junior analyst.  Also, I have not automated retrieval of more than 10,000 records or the automatic tracking of indexes as they are created.  Finally, the dashboard must be refreshed manually.  There's a little button to do so in the Rstudio browser, however I think it may make more sense to provide a Shiny button to use to update all or selected portions instead.  Unfortunately, most of this goes beyond the few hours I was willing to put into this. proof of concept.

In the end, it was well worth the experimentation.  It required no hardware and brings the robust slicing and dicing of data that the R ecosystem provides to the easy and scalable storage of ELK. Though the logit.io service doesn't allow direct configurability of most of the ELK stack, they seem responsive to requests.  I'm actually not sure that the ES portion of ELK is really necessary.  If you are working with a limited number of well-defined data sources, a structured store such as Postgres, or a key:value store such as hive/hbase might make more sense.  R has nearly the repository of packages that Python does.  On my mac pro I can work with datasets in the 10's of millions of records, providing all sorts of complex analysis.  All in an easily-documentable and repeatable way.

In the future, I'd love to see the same thing done with MS PowerBI.  It's not a platform I know, but I think it would definitely be an interesting one to explore.  If anyone has any ideas on how to stream data to it, please let me know!

Tuesday, November 29, 2016

How to Handle Being Questioned

In my post, How to Converse Better in Infosec, I laid out some rules for better infosec discussions.  A key tenent of that blog post was asking questions.  But what if you are on the receiving end of that?

To the questioned:

When expressing a view, being questioned feels like a challenge.  For me, it feels as if the other person doesn't believe me and is trying to catch me in a lie.  Frankly, maybe I did embellish a bit.  Maybe I made a statement based on something I thought I remembered hearing but don't quite remember where I heard it.  Or maybe I feel the statement is so obvious, the only reason someone would question it is if the other person wanted to try and take me down a rung.

It's OK.  If, as speakers, we feel we are in the right, we can treat all questions as if the questioner doesn't know the answer and is seeking help learning, or there is some ambiguity in the questioner's mind and they are just trying to help clarify it.  (Remember, for topics we are knowledgeable on, it is hard to see the subject from the perspective of a less-informed person.)  Answer with the intent of being as genuinely helpful as possible.  Have fun!  This is our chance to help someone out!

And if we don't have the answer, we can be polite and say so.  "I honestly can't demonstrate it right now.  If you'll allow me the time, I'll collect the information for you and get back to you.  And, in the event I can't, I'll let you know."  Everyone is wrong at some point.  Big people can admit it and only weak people don't accept it from others.

And to the questioner:

Be aware that you may be unintentionally putting the questioned person in an emotionally defensive position.  They may have all the answers and be able to clearly explain it.  They may be right, but need time to collect the evidence to demonstrate it.  They may be flat out wrong but not prepared to say so.

Be a good participant in the social dynamic.  If the other person can't answer, is evasive, or is demonstrating some technique to avoid answering, give them an out.  Say, "It's OK, let's pick this up again later."  Or "If you find/remember the answer, please message it to me."  If the question is unimportant to you, you lose nothing by letting it go until the questioned person brings it up to you again.  And if it is truly relevant to you, you can look it up yourself.  If you feel you can't let it go, ask yourself if you're truly practicing the principle of charity.

In conclusion

Remember, a conversation involves multiple people. You're all in it together. Either everyone wins or everyone loses. So help everyone win.

Tuesday, November 22, 2016

What is most important in infosec?

"To crush your enemies -- See them driven before you, and to hear the lamentation of their women!" - Conan the Barbarian

Maybe not.

Vulnerabilities

Recently I asked if vulnerabilities were the most important aspect of infosec.  Most people said 'no', and the most common answer instead was risk.  Risk is likelihood and consequence (impact). (Or here for a more infosec'y reference.)  And as FAIR points out, likelihood is threat and vulnerability. (Incidentally, this is a good time to point out, when we say 'vulnerability', we aren't always saying the same thing.)  While in reality, as @SpireSec points outthreat is probably more important, I suspect most orgs make it a constant 'TRUE' in which case 'likelihood' simply becomes 'vulnerability' in disguise.  I doubt many appreciate the economic relationship between vulnerability and threat.  As many people pointed out, the impact of the risk is also important.  Yet as with 'threat', I suspect it is rarely factored into risk in more than a subjective manner.  There were other aspects of risk such as vulnerable configurationsasset management and user vulnerability.  And there were other opinions such as communication, education and law.

Risk

The first big take-away is that, while we agree conceptually that risk is complex and that all its parts are important, practically we reduce 'risk' down to 'vulnerability' by not dynamically managing 'threat' or 'impact'.  While most organizations may say they're managing risk, very likely they're really just managing vulnerabilities.  At best, when we say 'managing', we probably mean 'patching'.  At worst, it's buying and blindly trusting a tool of some kind.  Because, without understanding how those vulnerabilities fit into the greater attack-surface of our organization, all we can do is patch and buy.  Which leads to the second take-away...

Attack Surface

The second take-away "I think we need to change the discussion from vulns to attack surface." Without understanding its attack surface, an organization can never move beyond swatting flies.  If an organization is a city and they want to block attackers coming in, what we do is like blocking one lane of every road in.  Sure, you shut down a lot of little roads, but the interstates still have three lanes open.  And what about the airport, busses, and beaches?

Our Challenges

Unfortunately, if we can't move from vulns to full risk, our chances of moving beyond simple risk to attack surface are slim.  At least in FAIR, we have the methodology to manage based on full risk, if not attack surface.  However, while vulnerabilities are the data is not easy to collect.  It's not easy to combine and clean.  And it's not easy to analyze and act upon.  (All the things vulnerability data is.)  We don't even have national strategic initiatives for threat and impact, let alone attack surface the way we do for vulnerabilities, (for example bug bounties, and I Am The Cavalry).

In Conclusion

Yet we continue to spend our money and patch vulnerabilities with little understanding of the risk it addressed, let alone how that risk fits into our overall attack surface.  But for those willing to put in the work, the tools do exist.  And eventually we will make assessing attack surface as easy as a vulnerability assessment.  Until then though, we will continue to waste our our infosec resources, wandering blindly in the dark.

P.S.

The third and final take-away is that the whole discussion completely ignores operations, (the DFIR type vs the installing-patches type).  In reality, it may be a strategic decision, but the trade-offs between risk and operations based security are better left for another day blog.


Tuesday, October 18, 2016

Why Phishing Works

Why Phishing Works

I've been asked many times why old attacks like phishing or use of stolen credentials still work.  It's a good, simple, question.  We are fully aware of these types of attacks and we have good ways of solving them.  Unfortunately, there's just as simple an answer:
"The reason attackers use the same methods of attack is we assume they won't work."
 We conduct phishing training.  We install mail filters. And when something gets through, we treat it as an anomaly.  A trouble ticket.  Yet, from the 2016 DBIR, about 12% of recipients clicked the attachment or link in a phishing email.  Imagine if that happened in airplanes; for example, if 12% of bolts in an airplane failed every flight.  They wouldn't simply take the plane in for repairs when bolts failed.  They'd build the plane to fly even if the bolts failed.

This leads to a fundamental tenant of information security:

"Your security strategy CANNOT assume perfection.  Not in people. Not in processes. Not in tools.  Not in defended systems."

When you assume anything will work perfectly and treat failures as a trouble ticket, you cede an advantage to the attacker.  They are well aware that if they fire off 100 phishing emails, 10 will hit the mark.


What To Do

Do what engineers have been doing for generations, engineer resilience and graceful degradation into the system.  Assume phishing, credential theft, malware, and other common attacks WILL succeed and plan accordingly.  Build around an operational methodology.  Work under the assumption that phishing has succeeded in your organization, that credentials have been stolen, that malware is present, and that your job is to find the attacker before they find what they're looking for.

Attackers are just some other guy or gal, sitting in their version of a cube, somewhere else in the world.  They want their attacks to happen quickly and with as little additional effort as possible.  They take advantage of the fact that we treat their initial action succeeding as an anomaly.  If we assume that initial action will be partially successful and force them to exert additional effort and actively work to remain undetected, we decrease their efficiency and improve the economics of infosec in our favor.

Thursday, September 22, 2016

How to Converse Better in Infosec

In a previous blog, I spoke a bit about what to do when the data doesn't seem to agree with what we think.  But what if it's not data you disagree with, but another person?

We've grown up in a world where the only goal in a conversation is to simply be right. It is all around us and, unfortunately, drives how we converse with other professionals.  Whether it's a twitter thread or questions at the end of a conference talk, we tend to look to tear down others to build ourselves up.  The mantra "Defense has to be perfect, offense only has to succeed once" pushes us to expect it in our technical dialog even though no one and no thing is perfect.

Let's change that.  The next time you are on twitter, at a conference, or engaging in discussion with colleagues, try and follow the Principle of Charity.  I highly recommend you read the link, but the basic premise is:
Accept what the other says if it could be true.
Now, obviously it's more complex than that. It's more like "dato non concesso" which means "given, not conceded". You are accepting their statements where logic otherwise does not prevent you from doing so, not because you believe they are true, but simply because you believe they were given in good faith. It also means interpreting statements in the way most likely to be true.
If the other says something that sounds conditionally untrue, ask questions that would help clarify that it is true.
It doesn't mean you have to accept statements that can't be true. It doesn't mean you can't confirm your interpretation. And it doesn't mean you can't ask clarifying questions.  If the other's statement could be conditionally true, ask questions that help clarify that the conditions are those that make the statement true.
Do not ask questions or make statements to try and prove the other's assertion false.
It does, however, mean not nitpicking.  It does mean not taking statements out of context or requiring all edge cases be true.  If the other's position truly is false, you will simply fail at clarifying it as true.

And if we do we should be doing this, we should do one more thing:
Expect others to follow the same principles.
We should not, as a community, accept members not following this principle.  Conversations contradictory to the Principle of Charity bring our community down and they inhibit growth.  However, we will only root it out if we take a stand and speak out against it.  Whether at conferences, in blogs, in podcasts, on twitter, or anywhere else, it improves us none to tear down rather than build up.  I challenge you to adopt the Principle of Charity in your conversations, starting today, and make it a goal for the entire year!

Update: Also check out the follow-on blog: How to Handle Being Questioned!

Tuesday, August 30, 2016

Do You Trust Your Machine or Your Mind?

Data science is the new buzzword.  The promise of machine learning is to be able to predict anything and everything.  Yet, It seems like the more data we have, the harder the truth is to find.  We hear about some data that doesn't sound right to us.  We ask questions and find out that there are assumptions and biases all over the data.  Even if the data was true, once it is analyzed, it becomes contaminated in some way.  With such things, how can we possibly trust it?  Instead, as Adam Savage put it, the best course of action seems: "I reject your reality and substitute my own."
https://twitter.com/n1suzie/status/490796035376427008

The reality of your mind is: "Your mind is crazy and tells you lies."  Your brain has to do the same thing the machine does in assembling data into a complete picture that a data analysis process does.  (An analogy would be assembling the building blocks to the right into a single creation like a castle or whale.)  It can do it, but the reality is it takes a lot of skill and a lot of thought.

Pieces for a mind to assemble into a single picture.


The downside to doing it in your brain is:

  • There is no documentation of how the picture was formed from the data
  • There is no record of what data your mind included and excluded as it assembled its picture
  • It is much harder to question the process your mind used in creating it's picture
  • Is is very hard to maintain consistency so that that the picture your mind creates today is the one it will create a year from now given the same data
Your mind is a black box.  As Andy Ellis put it, "Systems are becoming too complex for risk analysis to be performed by System 1." (gut instinct).  He termed it "The Approaching Complexity Apocalypse".

This doesn't mean data doesn't have it's faults.  No data is the knowledge it represents.  All data requires analysis to produce the picture from the data.  All data has underlying assumptions and biases.  You should expect your data sources to:

  • Publish the methodologies they use to product the pictures from the data
  • The provenance of the data
  • The known assumptions and biases, both of the data and of the methodology
Also, data science is not quite classic science.  Classically, science follows the scientific method.  In classic science, a hypothesis is first established and then tests are created collect data to disprove that hypothesis. If the tests fail, they hypothesis is accepted.  Normally in data science, we start with the data and use it to identify hypotheses that are true.  XKCD highlighted the issue with this nicely:

https://xkcd.com/882/

There will always be unknown assumptions and biases in data, but if you use them to ignore the data you put yourself at a disadvantage.  If you conduct 100 studies, none of which are statistically significant, but all predicting the same thing, you have strong evidence that the thing is true.

On the other hand, this does not mean you should accept all data-based conclusions that come your way.  As multiple speakers in the bSides Las Vegas Ground Truth track suggested, machines and minds should work together.  The mind can help identify potential biases and assumptions, as well as potential improvements in the machine's methodology.  The machine can produce reproducible results to inform the mind's decisions.

The worst thing you can do is identify biases, assumptions, and flaws in the machine and then use them to justify the validity of your mind.  If you were to do so, you would need to document the methodology of your mind and subject it to the same scrutiny for biases, assumptions, and flaws.  At which point, the methodology would then be in the machine.

And if you can't make your mind and the machine agree, my preference is to trust whichever system is most thoroughly documented, investigated, and validated.  And that tends to be the machine.

Tuesday, May 10, 2016

The role of Pen Testing / Vuln Hunting in Information Security

Intro

At a security conference, ask someone in attendance what they do.  More than likely they are a consultant, either doing penetration testing, vulnerability hunting or both.  Penetration testing and vulnerability hunting are mainstays of security testing, many times required by laws, regulations, or contracts.  They exist ubiquitously in information security.

But we don't have a good model for how they fit into improving defense.  The prevailing knowledge is that disclosing vulnerabilities leads to their mitigation which leads to more security.  However there is a counter-argument that disclosing vulnerabilities helps the attackers more than the defenders.  Can we build a model that takes both views into account?  Let's see.

So what do you 'do' here?

So what do penetration testers and vulnerability hunters actually 'do'?  If we think of information security as a game, (a very high-stakes game), we could say that penetration testers and vulnerability hunters reveal paths on the game board that attackers can take to reach their objectives.  That begs the question:

How does this benefit the defenders?

Let's take four scenarios:
  1. No-one knows about the path:  In this case no-one benefits, no-one loses, because no-one knows. No change.
  2. Only the defender knows about the path: In this case, the defender either benefits none or actually loses as they expend resources to mitigate the path. Defender Cost.
  3. Both defender and attacker know about the path: In this case, the attacker either benefits some or none depending on whether they successfully exploit the path.  The defender probably loses some (mitigates the path) or loses a lot (is exploited) though there is the off chance they lose none due to the attacker's failed exploitation. Attacker potential Profit. Defender potential for more Cost.
  4. Only the attacker knows about the path: Here the attacker's chance to benefit goes up significantly as the defender is unaware of the path.  The defender, on the other hand, doesn't even have the chance to mitigate the path and can only lose.  And after exploit, they return to step 3 and still lose as they mitigate the path. Attacker most Profit. Defender most Cost.

Conclusion

Based on the model above, penetration testers and vulnerability hunters can be most helpful by using their knowledge of paths to detect when attackers know them and to disclose them to defenders in situations when the attackers already know of the path.  This helps move from Scenario 4 to Scenario 3.  It's not ideal, but it's better than the status quo.

If only it were so simple

This model is admittedly naive.  It's a starting point, not an end-all-be-all.  Some things to consider:
  • There is a time lag from knowledge of a path to its weaponization or mitigation.  The model should take that into account.
  • Attackers and defenders are not homogenous.  This model doesn't consider what some attackers/defenders know and what others do not.  Nor does it model the spread of that knowledge through the population.
  • This model relies on defender's knowledge of attacker's knowledge.  Something that will always be imperfect.
  • Paths are made up of individual pieces.  This model doesn't account for the rearranging of pieces of the path, combined with other information in the attacker/defender's knowledge, to form new paths.
This model is not perfect, but hopefully it's a start in how to consider the role of penetration testing/vulnerability hunting in information security.