Information Security Analytics Blog: 2018

Friday, September 7, 2018

Data Driven Security Strategy

I presented on building a data driven security strategy at RSA this year. You can find the video here and the slides here.

If there's one thing to take away it's this:

"Strategy is HOW YOU CHOOSE plans to meet your objectives, not the plans you choose. Those plans must be in the context of the rest of security and your organization. And a data driven security strategy is using MEASURES TO CHOOSE."

This is just a quick blog to share my jupyter notebook analysis template. I analyze a lot of different datasets in a short period, so having the analysis consistent is very helpful. I'll walk through the sections quickly to share a bit about my process.

Title Section

In the title section, I have a block for any ideas to explore, specific things I intend to do, anything I need to request to be updated in the data, and any notes about the data. These are all bulleted text boxes.

This section is VERY helpful for working on multiple datasets. it's easy to forget what you were going to do or what you've done and the summary up front helps get you back in place.

Preparation

next is preparing the data. No data comes ready for analysis. Here I have blocks to read in the data, clean the created dataframe, save it to an R data (Rda) object on disk, and then, the next time I need it, I just load the Rda and skip the cleaning.

Analysis

The analysis section is basically filled with mini experiments. each chuck is one. As such, it's important that each have a bit of information in comments at the top of it:

A description of the hypothesis being tested or explored. Something like "looking at the distribution of the periodicity of events".
Once it's done, describe the results. Yes, the results should describe the results but you'll thank past you if you write down what you got from the analysis when you did it. Something like "it looks like the periodicity is bimodal with one mode representing X and another representing Y."
Add a comment with a UUID. Seriously. Every. Single. Block. If it's something interesting you're going to put it in a document or a blog or something. You want to be able to track it from beginning to end. (Ours track from the report, through several drafts of the report, through drafts of the sections, to a figures rmarkdown file that generates all the figures, to an exploratory report where we created the original analysis.) Seriously. If you like it then you shoulda put a UUID on it.
Now you can actually write the analysis code

Appendixes

This is where I put all of the extra stuff.

Testing

I always have a testing block. Throughout the analysis, you'll spend a lot time testing stuff to make it work, (or simply looking up things like the dimensions of your data and the column names). Putting those in a testing block keeps you from coming back later and wondering what the block in your analysis was there for.

Lookups

Sometimes you have big, ugly, lookups. putting them at the top clogs the Preparation section, so I tend to put them at the bottom. You'll remember you forgot to run them when your analysis fails.

Backup

Really a parking lot for anything you don't want in another section, but don't want to delete.

Ultimately, if I were doing full modeling, I'd probably want a template that follows the process outlined in Modern Dive. However, for someone just getting into analysis, hopefully this helps!

Sunday, August 19, 2018

Game Analysis of the 2018 Pros vs Joes CTF at BSidesLV

Introduction

Capture the Flag (CTF) contests are a staple of security conferences and BSides Las Vegas is no exception. However the Pros vs Joes (PvJ) CTF I help support there is a bit unique. Not only is it a blue vs blue CTF with red aggressor and gray user teams, but the game dynamics are a fundamental development point for the CTF team. (There's a lot more to it such as it's educational goal or that we allow blue teams to attack each other on the second day. You can read more about it at http://prosversusjoes.net/.)

Game Dynamics

When we say 'game dynamics', we mean a couple of things. First we mean what's scored and how much. In our case that is currently four things:

hosts (score given to teams for maintaining service availability)
beacons (score deducted when the red team signals a host is compromised)
flags (score deducted when the red team breaches specific files)
tickets (score deducted when the gray team is not being appropriately supported)

At a more fundamental level though, we mean the scenario the CTF is meant to represent. As a blue team CTF, we try and simulate the real world. As such, starting last year, we began to transition our game model to simulate an economy. Score is not granted so much as transferred. For example, the gold team pays the gray team for accomplishing some task, then the gray team pays a portion of that score to the blue team for maintaining the services necessary to accomplish that task. Alternately, when the red team (or another blue team) installs a beacon, the score isn't lost, but instead transferred to the team that placed the beacon.

Beginning with last year, we have started to then simulate the way we expect the game to run. This year we have also captured detailed scoring logs. This blog is about our analysis of the score from this year's game and how it helps us plan for the future.

Simulation

The first thing we do is create a game narrative and scoring profile for the game. The profile is the servers that will come online, go offline, and how much they will be scored per (5 minute) round. It is picked to produce specific outcomes such as inflation (to decrease point value early in the game when teams are just getting going and to allow dynamism throughout the game).

We then try and build distributions of how likely servers will be to go offline, how likely beacons will be and how long they will last, and how many flags will be found. This year we used previous years simulations and logs as well as expert opinion to build the distributions. The distributions we used are below:

### Define distributions to sample from

## Based on previous games/simulations and expert opinion

# H&W outage distributions

doutage_count <- distr::Norm(mean=8, sd = 8/3)

doutage_length <- distr::Norm(mean=1, sd = 1/3)

# flag distributions

dflags <- distr::Norm(mean=2, sd= 2/3) # model 0 to 4 flags lost with an average of 2

# beacon distributions

gamma_shapes <- rriskDistributions::get.gamma.par(p=c(0.5, 0.7), c(0.75, 4)) # create a gamma distribution to draw number of tickets from

dbeacons_length <- distr::Gammad(shape=gamma_shapes['shape'], scale=1/gamma_shapes['rate']) # in hours

dbeacon_count <- distr::Norm((4-3)/2+3, (4-3)/3)

Based on this we ran Monte Carlo simulations to try and predict the outcome of the game.

First, we analyzed the expected overall score.

Next we wanted to look at the components of the score.

Finally we wanted to look at the distributions of potential final scores and the contributions from the individual scoring types

The Game

And then we run the game.

The short answer is, it's VERY different. We had technical issues that prevented starting the game on time. We were not able to complete some development that prevented automatic platform deployment, some hosts were not available, and some user simulation was also not available. This is not a critique of the development team who did a crazy-awesome job both rebuilding the infrastructure for this game in the months leading up to it as well as dynamically deploying hosts during the game. It's just reality. The scoring profile was built for everything we want. I am pleased with how much of it we got on game day.

The Scoreboard

The Final Scoreboard

You can find the final scoreboard and scores here. It gives you an idea of what the game looked like at the end of the game, but doesn't tell you a lot about how we got there. I'm personally more interested in the journey than the destination so that I can support improving the game narrative and scoring profile for the next game.

Scores Over Time

The first question is how did the scores progress over time? (You'll have to forgive the timestamps as they are still in UTC I believe.) What we hoped for was relatively slow scoring the first two hours of the game. This allows teams the opportunity to make up ground later. We also do not want teams to follow a smooth line or curve. A smooth line or curve would mean very little was happening. Sudden jumps up and down, peaks and valleys, mean the game is dynamic.

What we see is a relatively slow beginning game. This is due to beacons initially being scored below the scoring profile and one of three highly-scored puzzle servers being mistakenly scored lower from it's start late in day 1 until it was corrected at the beginning of day 2.

We do see an amount of trading back and forth. ForkBomb (as an aside, I know they wanted the _actual_ fork bomb code for their name, but for this analysis text is easier) takes an early lead while Knights suffer some substantial losses (relative to the current score). Day two scores take off. The teams are relatively together through the first half of day 2, however, Arcanum takes off mid-day and doesn't look back.

The biggest difference is that when teams started to have several beacons, as part of their remediation they tended to suffer self-inflicted downtime. This caused a compound loss of score (the loss of the host scoring they would have had plus the cost of the beacons). We did not account for this duplication in our modeling, but plan to in the future.

Ultimately I take this to mean scoring worked as we wanted it to. The game was competitive throughout and the teams that performed were rewarded for it.

It does leave the question of what contributed to the score...

Individual Score Contributions

What we expect is relatively linearly increasing host contributions with a bit of an uptick late in the game and linearly decreasing beacon contributions. We also expect a few significant, discrete losses to flags.

What we find is roughly what we expected but not quite. The rate of host contribution on day two is more profound than expected for both Paisley and Arcanum suggesting the second day services may have been scored slightly high.

Also, no flags were captured. However, we do have tickets which were used by the gold team to incentivize the blue teams to meet the needs of the gray team.

The biggest difference is in beacons. We see several interesting things. First, for a period on day two, Knights employed a novel (if ultimately overruled) method for preventing beacons. We see that in the level beacon score for an hour or two. We also see a shorter level score in beacons later on when the red team employed another novel (if ultimately overruled) method that was significant enough that had to be rolled back. We also see how Arcanum benefited heavily from the day 2 rule allowing blue-on-blue aggression. Their beacon contribution actually goes UP (meaning they were gaining more score from beacons than they were losing) for a while. On the other side, Paisley suffers heavily from blue-on-blue aggression with significant beacon losses.

Ultimately this is good. We want players _playing_, especially on day 2. Next year we will try to better model the blue-on-blue action as well as find ways to incentivize flags and provided a more substantive and direct way for the gray team to motivate the blue team.

Before we move on, two final figures to look at. The first lets us see individual scoring events per team and over time. The second shows us the sum of beacon scores during each round. It gives an idea of the rate of change of score due to beacons and provides an interesting comparison between teams.

But there's more to consider such as the contributions of individual hosts and Beacons to score.

Hosts

The first thing we want to look at is how the individual servers influenced the scores. What we want to see is starting servers contributing relatively little by the late game, desktops contributing less, and puzzle servers contributing substantially once initiated. This is ultimately what we do see. (This was the analysis, done at the end of day 1, that allowed us to notice puzzle-3 scoring substantially lower than it should. We can see it's uptick on day 2 as we correct it's scoring.)

It's also useful to look at the score of each server relative to the other teams. Here it is much easier to notice the absence of the Drupal server (removed due to technical issues with it). We also notice some odd scoring for puzzle servers 13 and 15, however the contributions are minimal.

More interesting are the differences in scoring for servers such as Redis, Gitlab, and Puzzle-1. This suggests maybe these servers are harder to defend as they provided score differentiation. Also, we notice teams strategically disabling their domain controller. This suggests the domain controller should be worth more to disinsentivize this approach.

Finally, for the purpose of modeling, we'd like to understand downtime. It looks like most servers are up 75% to near 100% of the time. We can also look at the distributions per team. We will use the distribution of these points to help inform our simulations for the next game we play. We are actually lucky to have a range of distributions per team to use for modeling.

Beacons

For the purpose of this analysis, we consider a beacon new if it misses two scoring rounds (is not scored for 10 minutes).

First it's nice to look at the beacons over time. (Note that beacons are restarted between day 1 and day 2 during analysis. This doesn't affect scoring.) I like this visualization as it really helps show both the volume and the length of beacons and how they varied by team. You can also clearly see the breaks in beacons on day two that are discussed above.

The beacon data is especially helpful for building distributions for future games. First we want to know how many beacons each team had:

Day 1:

Arcanum - 17
ForkBomb - 24
Knights - 18
Paisley - 21

Day 2:

Arcanum - 13
ForkBomb - 17
Knights - 29
Paisley - 34

We also want to know how long the beacons last. The aggregate distribution isn't particular useful. However the distributions broken out by teams are interesting. They show substantial differences between teams. Arcanum had few beacons, but they lasted a long time. Paisley had very few long beacons (possibly due to self-inflicted downtime). Rather than be a power law distribution, the beacons are actually relatively even with specific peaks. (This is very different from what we simulated.)

Conclusion

In conclusion, the take-away is certainly not how any given team did. As the movie "Any Given Sunday" implied, sometimes you win, sometimes you lose. What is truly interesting is both our ability to attempt to predict how the game will go as well as our ability to then review afterwards what actually happened in the game.

Hopefully if this blog communicates anything, it's that the scoreboard at the end simply doesn't tell the whole story and that there's still a lot to learn!

Future Work

This blog is about scoring from the 2018 BSides Las Vegas PvJ CTF so doesn't go into much detail about the game itself. There's a lot to learn on the PvJ website. we are also in the process of streamlining the game while making the game more dynamic. As mentioned above, the process started in 2017 and will continue for at least another year or two. Last year we added a store so teams can spend their score. We also started treating score as a currency rather than a counter.

This year we added additional servers coming on and off line at various times as well as began the process of updating the gray team's role by allowing them to play a puzzle challenge hosted on the blue team servers.

In the next few years we will refine score flow, update the gray team's ability to seek compensation from the gray team for poor performance, and additional methods to maximize blue team's flexibility in play while minimizing their requirements. Look forward to future posts as we get the details ironed out!

Sunday, July 22, 2018

A Year Not Drinking

With Blackhat, Defcon, and BSides Las Vegas coming up, it seems like an appropriate time for a quick blog on alcohol. In 2017, for my birthday I took a year off drinking. Now that my birthday is past, I figured I'd share a bit about it.

Why?

Honestly, I felt I was drinking too much. There was always an excuse to drink. It was a holiday. Friends were over. My wife and I wanted to go out. There was something interesting to taste. etc.

Also, it became an end-of-day thing. Have a beer to relax after work. Just adding that up alone becomes a number not to be proud of.

I also wanted to see if it changed how I felt. Would I feel more healthy? Would I feel smarter? Since alcohol is a depressant that can last a week+ in your brain, would I be in a better mood?

And I wanted to try and save some money.

It also helped that I read a book where the main character didn't drink. I think it provided subconscious acknowledgement that it could be done as well as giving some ideas as to how.

What it took

It was easy. much easier than I expected. My goal wasn't to avoid alcohol like an allergy, but just not to have a full drink. It also helped to have a goal. "I'm not going to drink a full drink until at least X." I could easily tell people "I'm taking a year off drinking" and didn't get much pressure to drink after that.

To make it work, I had to have something else to drink though. (I drink a LOT of fluids. 2-4 liters of hot tea during the work day.) I don't like sweet drinks or fruity drinks. I also need variety and don't drink caffeine after like 7 at night, so that kinda limits my options. What I did find was:

Herbal Tea - TONS of variation here. Better during the winter when warm drinks are nice. I wish someone would make condensed herbal tea similar to what's available for ice tea.
Bitters and Tonic - This was my go-to. I have about 20 bitters of various flavors and a soda stream (modded w/ a real CO2 tank) now. I can drink these for ever and a day with a ton of variation.
Water with sliced fruit, then carbonated - It turned out this was great too. Cut a cucumber and a grapefruit into the water and let it sit a day. Then bottle it up and carbonate it.
La-croix - Not sweet and great flavors

Positive Impacts

First, I did feel like it was easier to solve complex challenges. The mental gymnastics just seemed a bit easier. Plus, it saved a BUNCH of money (minus stocking up the bitters). I'm sure the long-term effects of not poisoning myself regularly are good though I haven't quite termed long enough to find out.

Another interesting impact was social interactions were more productive. Instead of meeting over beer at the end of the day in a dark, loud place, I'd meet people in the morning or mid-day over tea. We tended to get a LOT more done.

Negative Impacts

On the other hand, there's a LOT less to do. A lot of the things that seem like fun (many times vague 'going out somewhere' concepts) just aren't exciting if you aren't drinking. Going downtown is now kinda 'bla'. Going out to bars is pretty much out of the question. (You could, but why?) So now when my wife and I try to find something to do on a free night, we actually have some trouble figuring it out. (That said, it may also be that because we have kids and so free nights are so rare we're not sure what to do with them.)

More stress. The reality was drinking was relieving stress. (Obviously not in a good way, but it was.) Life not drinking is much more stressful.

Also, I consumed a LOT more sugar. Probably linked to the last point about stress frankly. Instead of drinking alcohol, easting sweets became a way of dealing with stress, which I'm pretty sure is also not healthy.

When I drink

A side affect of this is it became very clear _when_ I drink.

First was after work to relieve stress.
Second were social events, basically as something to do when meeting people.
Third were celebrations. These tended to be heavier drinking. The problem is that the world makes sure there is always something to celebrate.

Going Forward

So my plan going forward. I don't plan on not drinking at all but I do plan on drinking less.

I plan to pick the days to drink in celebration way ahead. Probably my birthday and my wife's birthday, but likely nothing else. I think it's very important to do this ahead of time so that I have an idea how often it's happening throughout the year. It's very easy to impulse-celebration-drink and if I don't think about the year ahead, looking back on the year it's easy to find out I drank way more than I would have if I'd planned ahead.

Socially, I think I'll only drink in rare cases. And when I do, only make it one drink. Last year, I wish I'd had a drink of scotch with my father and brothers at home at Christmas. On the other hand, I probably won't drink when meeting up with people in Vegas. Those will be tea or tonic and soda type things depending on the time of day.

I'm not going to swear off tastings, particularly when offered. But on the other side, I'm not going to take an entire drink just to taste it. It's silly not to try interesting things, but it can't be an excuse to drink more.

My plan is to completely stop drinking after work. It's just too much of a slippery slope. Instead i plan to get out to the gym more and meditate (I pray, but you do you) to relieve stress.

Conclusion

So as you prepare for Vegas, drink the amount you want. But don't feel it's something you have to do. Many people don't and everyone I've spent time around has been understanding. And recognize that drinking won't make you cooler/more of a hacker/give you a fuller experience.

Now to figure out what to do about the sweets.

Wednesday, June 20, 2018

Good Blackhat/Defcon/BSides Las Vegas Advice

Every year new people come to Las Vegas for the triumvirate of conferences, Blackhat, Defcon, and BSidesLV, better known as hacker summer camp. If you've never been, it can be an intimidating experience. To help those who might be interested in some suggestions, I've compiled the list below from my own experience (starting with Defcon 13).

Think about what you want to get out of it. BH and DC are BIG. You can easily spend the entire time just wondering. You'll learn a lot about the conferences, but not necessarily security. Plan half a day to walk around and just see things, but have a better plan after that. Pick a few talks to go to (and wait in line for). Pick a village to sit in all day (I'm partial to BSidesLV Ground Truth as I help run it). Schedule to meet people (something I do a lot).
Thursday is a down day. The schedule says there's stuff going on, but not a lot. DON'T plan to wonder on Thursday. Nothing will be ready. Plan to do something on the schedule. Meet up with people. Volunteer. Visit the Grand Canyon. But don't just assume you'll have stuff to do.
Wear shorts. Most people will be in black t-shirts. You don't have to. a t-shirt, polo, or even short sleeve button-down is fine. Just don't do slacks and long sleeves. it's HOT.
Wear comfy shoes but don't stress over it. Whats comfortable at home will be comfortable there. I wear a pair of dock shoes (sparreys).
Don't rent a car unless you'll be driving out away from las vegas (to the grand canyon or such). Instead get a week ticket for the Deuce (double decker bus on the strip)
Don't worry about your electronics. I can't find documentation of a single breach related to a compromise at BH/DC. The BH/DC noc operators have been doing it longer than those trying stuff and are generally safe. Still, patch all your stuff before going and try to use a VPN for all communication including mobile. (There will be lots of fake cell towers though the police have been cracking down on it a bit I think.)
I prefer to get a microwave and get some food, especially breakfast food, to eat in my hotel room. Food tends to be a huge portion of the cost of going and eating a bagel and some fruit and yogurt in your room for breakfast can help keep you grounded.
Speaking of being grounded, Las Vegas is a city of haves and have nots. You'll be living the good life, pampered by vendors, etc. Consider giving to those who don't have by volunteering at or donating to the Las Vegas Rescue Mission (https://vegasrescue.org/) or such.
Speaking of parties, go to one, but most are going to be either loud, over-crowded, and obnoxious or hard to get into and pretentious. (There are a very few that facilitate socializing like the bsides las vegas pool party.) Better though to go to bed early and try and have breakfast with new people each day. I generally follow groucho marx's rule for parties.
Go to some talks. Lots of people put a lot of work in to talk about lots of things. And not just the big showie talks. Those tend to be spectacle. Instead find lesser known people talking about their passion. And plan to get in, talks have waiting lines that can be LONG. Especially at defcon.
And see a show or two. Go to the day-of discount booth and get tickets to some big show (Every casino has one) but also to the little lounge shows (Burlesque, Hypnotist, Comedy, etc). Ask the hotel what smaller shows they have and what others are around.
don't bother gambling. Your time around many of the best security professionals in the world is limited. Don't waste it on throwing your money away. You can do that any time.
Don't plan to go back to your hotel room. Put everything you need for the day in a bag and go (water, snacks, clothes, batteries, etc). That includes electronics, extra power, water, and clothes if changing for the evening, (whether an extra t-shirt to replace your sweaty one or your slacks for a nice evening out). It can take you an hour to get back to your hotel and back out again and you don't want to waste that.
Take one set of nice clothes (business casual, maybe a tie and jacket, in case you want to go somewhere nice one night. Make SURE to bring close-toed shoes. Some nice restaurants will refuse you in sandals. (goes for women too).
Bring extra power. The wireless environment is FLOODED. it will DRAIN all your devices. I can drain the battery in every device I bring 2-3 times a day. USB batteries are a MUST and if you don't need the wifi on on your device, just leave it off.
Read this blog: How to Converse Better in Infosec and this one: How to Handle Being Questioned on asking & receiving questions.
Bring a big, boxy suitcase so if you find cool stuff you can bring it back. (I've flown servers back before.)
Remember that blocks in Las Vegas are about a mile. Don't look at google maps and think "it's only one block".
If you see someone you recognize in infosec (a speaker you look up to, a company CEO, etc), walk up and say "Hi. I'm <your name>. I love your work. I'm curious about what you're interested in these days." If they excuse themselves, that's fine. They may be in between things. (I've heard of people taking an hour or more to get from the hotel lobby to their room because they meet so many people that know them along the way.) If they mumble something, that's ok. After talks particularly speakers are worn out mentally. If they tell you off, that's ok. Some people are jerks. But none of those things cost you anything and the potential for a good conversation is HUGE.
If you see someone you _don't_ recognize, say "Hi. I'm <your name>. What brings you here?" Again, they could not talk to you for any number of reasons, but I have met all sorts of super interesting people just being willing to meet with whoever is willing to meet with me.
Lots of people like badges. Some are super cool. I'll be honest, all my old badges, electronic or not, are hanging in my closet taking up room I need for other things. If you want a fancy defcon badge, get a badge early as they tend to run out and then hand out paper. If I get a fancy badge and they run out, I tend to trade it to someone whose there for the first time who doesn't have one. I've got enough badges and your first defcon badge is special.
The minimum rule is 1 shower, 2 meals, and 3 hours of sleep. Personally, I get a full nights sleep, I eat all my meals, and of course shower and use deodorant.

I'm sure there's much more I'm forgetting. I'll update it if I think of anything else.

Also, you can search twitter for #gooddefconadice (or #baddefconadvice) but take it with a grain of salt.

Wednesday, April 25, 2018

Presentation timing like a BOSS

Introduction

This year as I prepared for my RSA talk, Building a Data-Driven Security Strategy, I decided to do something slightly different. I modeled my timing practice after video game speedrunners. Ultimately it was a good experience that I plan to repeat. Here's the story.

What is a speedrun?

One thing I do to relax is watch video game speedruns. This is when people try and complete a video game as quickly as possible. (It’s so competitive that on some improvements in records are measured in terms of frames and some players spend months or even years, playing hundreds of thousands of attempts, to try and beat a record.)

One thing they all have in common is they use software to measure how long the attempt (known as a run) takes. Most break the runs down into sections so they can see how well they are doing at various parts of the game. To do this, they use timing software which measure their time per section, and overall time. Additionally, each run is individually stored and their current run is compared to previous runs.

Speedrunning for presenting

This struck me as very similar to what we do for presentations, and so for my presentation, I decided to use a popular timer program, livesplit (specifically livesplit one) to measure how well I did for each practice run of my presentation. Basically, every time I practiced my presentation, I opened the timer program and at each section transition, I clicked it. While the practice run was going, the software would indicate (by color and number) if I was getting close to my comparison time (the average time for that section). Each individual run was then saved in a livesplit xml file (.lss). I’ve attached mine for anyone that wants to play with it here.

Figure 1

The initial sections analysis (Figure 1) showed some somewhat dirty data. First, there probably shouldn’t be a run -1. Also, runs 4 and 5 look to not be complete. So we’ll limit our analysis to runs 7 to 20. For some reason, The introduction section in runs 9, 14, and 18 seems to be missing, so we’ll eliminate those times as well. It’s worth noting that incomplete runs are common in the speedrunning world and so some runs where no times are saved will be missing and other runs where the practice was cut short will exist as well. It’s also relevant that ‘apply’ and ‘conclusion’ were really mostly the same section and so I normally let ‘apply’s split run until the end of the presentation, making ‘conclusion’ rarely occur at all.

Figure 2 and 3 look much better. A few things that start popping out. First, I did about 20 practice runs though the first several were incomplete. Looking at Figure 2, we see that some sections like ‘introduction VMOS and Swot’, ‘apply’, and ‘data driven strategy’ decrease throughout the practice. On the other hand, ‘example strategies’ and ‘example walkthrough’ increased at the expense of ‘define strategy’. This was due to pulling some example and extra conversation out of the former as feedback I got suggested I should spend more time on the latter. Ultimately it looks like a reduction of about 5 minutes from the first runs to the final presentation on stage (run 20).

The file also provides the overall time for each time. Figure 4 gives a quick look. We can compare it to Figure 3 and see it’s about what we expect. A slight decline from 45 to 40 minutes in runtime between run 7ish to run 20.

Figure 4

Figure 5

We can also look at actual practice days instead of run numbers. Figure 5 tells an interesting story. I did some rough tests of the talk back in December. This was when I first put the slides together in what would be their final form. Once I had that draft together, I didn’t run it through January and February (as I worked on my part the DBIR). After my DBIR responsibilities started to slow and the RSA slide submission deadline started to come up, I picked back up again. The talk was running a little slow at the beginning of march, however through intermittent practice and refinement I had it down where I wanted it (41-43 minutes) in late March and early April. I had to put off testing it again during the week before and week of the DBIR launch. After DBIR launch I picked it up and practiced it every day while at RSA. It was running a little slow (2 runs over 43 minutes) at the conference, but the last run the morning of was right at 40 minutes with the actual presentation coming in a little faster than I wanted at 39 minutes.

We can take the same look at dates, but by section. Figures 6 and 7 provide the story. It’s not much of a difference, but it does put into perspective the larger changes in the earlier runs as substantially earlier in the development process of the talk.

Conclusion

Ultimately I find this very helpful and suspect others will as well. I regularly get questions such as “how many times do you practice your talk?” or “how long does it take you to create one”. Granted it’s a sample size of 1, but it helps give an idea of how the presentation truly evolved. I can also see how the changes I made as I refined the presentation affected the final presentation. Hopefully a few others will give this a try and post their data to compare!

Oh, and for those adventurous types, you can see the basic analysis I did in my jupyter notebook here.

Monday, February 12, 2018

The Good, The Bad, and the Lucky - (Why improving security may not decrease your risk)

Introduction

The general belief is that improving security is good. Traditionally, we assume every increment ‘x’ you improve security, you get a incremental decrease ‘y’ in risk. (See the orange 'Traditional' line below.) I suspect that might not be the case. I made the argument in THIS blog that our current risk markers are unproven and likely incorrect. Now I’m suggesting that even if you were able to accurately measure risk, it might not matter as what you do might not actually change anything. Instead, the relationship may be more like blue 'Proposed' line in the figure below. Let's me explain it and why it matters...

Threats

I think we can break attacks into two groups:

Already scaled, automated attacks.
Everything else (including attacks that could be automated or even are automated, but not scaled.)

Type-1 is mostly single-step attacks. Attackers invest in a single action and then immediately get the return on that investment. These could be ransomware, DoS, it shoautomated CMS exploitation, or phishing leading to stolen credentials, to compromised bank accounts.

Type-2 includes most of what we traditionally think of as hacking. Multi-step attacks including getting a foothold, pivot internally, and exfiltrate information. Not-petya attacks would fall in here as would the types of hacks most pen testers simulate.

Security Sections

Section one in the above figure is driven by risk from type-1 attacks. If you are vulnerable to these, you are just waiting your turn to be breached. Sections two and three relate to type-2 attacks.

In section two, your defenses, are good enough to stop type-1 attacks, but are likely not good enough to stop attackers willing and able to execute type-2 attacks. This is because, having an ability to execute a multi-step attack flexibly, the threat here has many different paths to choose from. If you either aren't studying all of your attack paths in context with each other, or are simply not able to handle everything thrown at you, the attacker gets in regardless of what security you do have. As such, the primary driver of risk is attacker selection (mostly unrelated to your security).

Once your security reaches section three, you start to have the path analysis and operational abilities to stave off attacks that can flexibly take different paths. As such, the more you improve, the more you see your risk go down (if you can measure it).

Risk vs Security

The first takeaway is that if you are in section one, you are a sitting duck. Automated attacks will find you on the internet and compromise you. Imagine the attackers with a big to-do list of potential victims and some rate at which they can compromise them. You are on that list somewhere, just waiting your turn. You need to get out of section one.

The second takeaway is that if you are better than the first section, it doesn’t really matter what you do. Increasing your security doesn’t really do anything until you get to a pretty darn mature point. All the actors looking for a quick ROI are going to be focused on section one. There are so many victims in section one that to target section two they would literally have to stop attacking someone easier. Even as type-2 attacks become commoditized, there’s absolutely no incentive to expand until either all of section one victims are exploited or the type-2 attack becomes a higher Return on Investment (ROI) than an existing type-1 attack. Here, because the attacks are type-2 attacks, the biggest predictor of if you will be breached is if you are targeted.

That is, until you get to section three. In this section, security has started to improve to the point where even if you are targeted, your security plays a significant role in if you are breached or not. These are the organizations that 'get it' when it comes to information security. The reality is most organizations probably are not able to get here, even if they try. The investment necessary in security operations, advanced risk modeling, and corporate culture are simply outside the reach of most organizations. Simply buying tools is not going to get you here. On the other hand, if you're going to try to get here, don't stop half-way. Otherwise you've wasted all investment since you left section one.

There is another scenario where someone not engaged in section one decides to go after the section two pool of victims with an automated attack. (Something like not-petya would work.) If this was common, it'd be a different story. However, there's no incentive for a large number of attackers to do this (as the cost is relatively fixed, and multiple attackers decreases the available victims for each). In this case, the automated attack ends up being global news because it's so wide-spread. As such, rules are created, triage is executed, and, in general, the attacker would have to continue significant investment to maintain the attack, decreasing the ROI. Given the easy ROI in section one, the sheer economics will likely prevent this kind of attack in section two.

Testing

Without testing, it's relatively hard to know in which section you are in. Pen testing might tell you how well you do in sections two and three, but knowing you lose against pen testers doesn't even tell you if you are out of section one. Instead, you need security unit testing to replicate type-1 attacks and verify that your defenses mitigate the risk.

If you never beat the pen testers, you're not in section three. However, once you start to be able to handle them, it's important to measure your operations more granularly. Are you getting better in section three or slipping back towards section two? That means measuring how quickly operations catches threats and what percent of threats they catch. Again, automated simulation of a type-2 attacks can help you capture these metrics.

Conclusion

Most organizations should be asking themselves "Am I in section one and, if so, how do I get out?" Even if you aren't in section 1, commoditization of new attacks may put you there in the near future. (See phishing, botnets, credential stuffing, and ransomware as examples over the last several years.) You need to continue to invest in security to remain ahead of section one.

On the other hand, you may just have to accept being in section two. You can walk into an organization and, in a few minutes know whether they 'get it' or not when it comes to security. Many organizations will simply never 'get it'. That's ok, it just means you're not going to make it to section three so best not to waste investment on trying. Better to spend it to stay out of section one.

However, for the elite few organizations that do 'get it', section three takes work. You need to have staff that can close their eyes and see your attack surface with all of your risks in context. And you need a top-tier security operations team. Investment in projects that take three years to fund and another two to implement may keep you out of section one, but it's never going to get you into section three. To do that you need to adapt quickly to the adversary and meet them toe-to-toe when they arrive. That requires secops.

Friday, January 26, 2018

CFP Review Ratings

Introduction

We recently completed the bsides Nashville CFP. (Thank you all who submitted. Accepts and rejects will be out shortly.) We had 53 talks for roughly 15 slots so it was a tough job. I sympathize with the conferences that have in the 100's or 1,000's of submissions.

CFP Scoring

Our CFP tool provides the ability to rate talks from 1 to 5 on both content and applicability. However, I've never been happy with how it condenses this down to a single number across all ratings.

Our best guess is it simply averages all the values of both types together. Such ratings would look like this:

(We've removed the titles as this blog is not meant to reflect on any specific talk.)

This gives us _a_ number (the same way a physician friend of mine used to say ear-thermometers give you _a_ temperature) but is it a useful one?

First, let's use the mean instead of the median:

The nice thing about the median is it limits the effect of ratings that are way out of line. In many of our talks, one person dislikes it for some reason and gives it a substantially lower rating than everyone else. We see talks like 13 shoot up significantly. It also can cause drops such as talk 51.

Scoring with a little R

But what really would be helpful is to see _all_ the ratings:

Here we can see all of the ratings broken out. It's more complex but it gives us a better idea of what is actually happening for any one talk. The green dot is the median for all ratings combined. The red dots are the talks' median value. And the grey dots are individual ratings.

We can look at 13 and see it scored basically all 5's except for for one 4 in applicability, 1 'average' rating, and one below average rating bringing the median up to 5-5. When we look at 51, we see how it had a few slightly-below-average ratings and several below-average on content, and several below-average on both content and applicability. We also get to compare to the mean of all talks (which is actually 4-4) rather than assuming 3-3 is average for a talk.

One I find particularly interesting is 29. It scored average on applicability, but it's content score, which we would want to be consistently high, is spread from 1 to 4. Not a good sign. In the first figure, it scored a 3.2 (above average if we assume 3 is average since no average is shown). In the median figure, it is 3. But in this view we can see there are significant content concerns about this talk.

Conclusion

Ultimately, we used this chart to quickly identify the talks that were above or below the mean for both content and applicability. This let us focus our time on the talks that were near the middle and gave us additional information, in addition to the speaker's proposal and our comments, to make our decision on. If you'd like to look at the code for the figures, you can see the jupyter notebook HERE.

Future Work

In the future I could see boiling this down to a few basic scores: content_percentile, applicability_percentile, content_score range, and applicability_score range as a quick way to automate initial scoring. We could easily write heuristics indicating that we want content ratings to meet a certain threshold and be tightly grouped as well as set a minimum threshold for applicability. This would let us more quickly zero in on the talks we want, (and might help larger conferences as well).

Sunday, January 7, 2018

Smaller Graphs for Easier Viewing

Introduction

As I suggested in my previous blog, visualizing graphs is hard. In the previous blog I took the approach of using a few visual tricks to display graph information in roughly sequential manner. Another option is to convert the graph to a hierarchical display. This is easy if you have a tree as hierarchical clustering or maximal entropy trees will do.

Maximal Entropy Graphs

However, our data is rarely hierarchical and so I've attempted to extend maximal entropy trees to graphs. First, it starts with the assumption that some type of weight exists for the nodes. However, this can simply be uniform across all nodes. This weight is effectively the amount of information the node contains. As in the last blog, this could be scored relative to a specific node in the graph or about any other way. It then combines nodes along edges, attempting to minimize the amount of information contained in any aggregated node. It continues this approach until it gets to the desired number of nodes, however it keeps a history of every change so that any node can be de-aggregated.

You can find the code in this Github GIST. You can then try it out in This Jupyter Notebook.

Ultimately, it takes a graph like this:

and produces one that looks like this:

Each node still contains all the information about the nodes and edges it aggregates. This allows an application to dig down into a node as necessary.

Future Work

Obviously there's a lot to do. This is less a product in and of itself than a piece for making other graph tools more useful. As such, I should probably wrap the algorithm in a visualization application that would allow calculating per-node scores as well as diving in and out of the sub-graphs contained by each node.

Also, a method for generating aggregate summarizes of the information in each node would be helpful. For example, if this is a maltego-type graph and a cluster of IPs and Hosts, it may make sense to name it a IP-Host node with a number of IP-Host edges included. Alternately, if a node aggregates a path from one point to another through several intermediaries, it may make sense to note the start and endpoint, shortest path length, and intermediary nodes. I suspect it will take multiple attempts to come up with a good name generation algorithm and that it may be context-specific.

Conclusion

In conclusion, this is another way of making otherwise illegible graphs readily consumable. Graphs are incredibly powerful in many contexts including information security. However methods such as this are necessary to unlock their potential.

Friday, January 5, 2018

Visualizing Graph Data in 3D

Introduction

One thing that's interested me for a while is how to visualize graphs. There are a lot of problems with it I'll go into below. Another is if there is a way to use 3D (and hopefully AR or VR) to improve visualization. My gut tells me 'yes', however there's a _lot_ of data telling me 'no'.

What's so hard about visualizing graphs?

Nice graphs look like this:

This one is nicely laid out and well labeled.

However, once you get over a few dozen nodes, say 60 or so, it gets a LOT harder to make them all look nice, even with manual layout. From there you go to large graphs:

In this case, you can't tell anything about the individual nodes and edges. Instead you need to be able to look at a graph laid out in a certain way algorithmically and understand it. (This one is actually a nice graph as the central cluster is highly interconnected but not too dense. I suspect most of the outlying clusters are hierarchical in nature leading to the heavily interconnected central cluster.)

However, what most people want is to look at a graph of arbitrary size and understand things about the individual nodes. When you do that you get something like this:

Labels overlapping labels, clusters of nodes where relationships are hidden. Almost completely unusable. There are some highly interconnected graph structures that look like this no matter how much you try to lay them out nicely.

It is, ultimately, extremely hard to get what people want from graph visualizations. You can get it in special cases and with manual work per graph, but there is no general solution.

What's so hard about 3D data visualization?

In theory, it seems like 3D should make visualization better. It adds an entire dimension of data! The reality is, however, we consume data in 2D. Even in the physical world, a stack of 3D bars would be mostly useless. The 3rd dimension tells us more about the shape of the first layer of objects in front of us. It does not tell us anything about the next layer. As such, visualizations like this are a 2D bar chart with clutter behind them:

Even when the data is not overlapping, placing the data in three dimensions is fundamentally difficult. In the following example, it's almost impossible to tell which points have what values on what axes:

Granted, there are ways to help improve this, (mapping points to the bounding box planes, drawing lines directly down from the point to the x-y plane, etc), but in generally you would only do that if you _really_ needed that 3rd dimension (and didn't have a 4th). Otherwise you might as well use PCA or such to project into a 2D plot. Even a plot where the 3rd dimension provides some quick and easy insight, can practically be projected to a 2D heatmap:

Do you really need to visualize a graph?

Many times when people want to visualize a graph, what they really want is to visualize the data in the graph in the context of the edges. Commonly, graph data can be subsetted with some type of graph transversal (e.g. [[select ip == xxx.xxx.xxx.xxx]] -> [[follow all domains]] <- [[return all ips]]) and the data returned in a tabular format. This is usually the best approach if you are past a few dozen nodes. Even if long, this data can easily be interpreted as many types of figures (bar, line, point charts, heatmaps, etc). Seeing the juxtaposition between visualizing graphs because they were graphs, but when the data desired was really tabular heavily influenced how I approached the problem.

My Attempt

First, I'll preface this by saying this is probably a bad data visualization. I have few reasons to believe it's a good one. Also, it is extremely rough; nothing more than a proof of concept. Still, I think it may hold promise.

The visualization is a ring of tiles. Each tile can be considered to be a node. We'll assume each node has a key, a value, and a score. There's no reason there couldn't be more or less data per node, but the score is important. The score is "how relevant a given node is to a specified node in the graph. This data is canned, but in an actual implementation, you might search for a node representing a domain or actor. Each other node in the graph would then be scored by relevance to that initial node. If you would like ideas on how to do this, consider my VERUM talk at bsidesLV 2015. For now we could say it was simply the shortest path distance.

One problem with graphs is node text. It tends to not fit on a node (which is normally drawn as a circle). In this implementation, the text scrolls across the node rectangle allowing quick identification of the information and detailed consumption of the data by watching the node for a few seconds. All without overlapping the other nodes in an illegible way.

Another problem is simply having two many nodes on screen at once time. This is solved by only having a few nodes clearly visible at any given time (say the front 3x3 grid). This leads to the question of how to access the rest of the data. The answer is simply by spinning the cylinder. The farther you get from node one (or key1 in the example), the less relevant the data. In this way, the most relevant data is also presented first.

You might be asking how much data this can provide. A quick look says there are only 12 columns in the cylinder resulting in 36 nodes, less than even the 60 we discussed above. Here we use a little trick. As nodes cross the centerline on the back side, they are actually replaced. This is kind of like a dry-cleaning shop where you can see the front of the clothing rack, but it in fact extends way back into the store. In this case, the rack extends as long as we need it to, always populated in both directions.

Demo

I highly recommend you try out the interactive demo above. it is not pretty. The data is a static json file, however that is just for simplicity.

Future Work

Obviously there's many things that can be done to improve it:

A search box can be added to the UI and a full back end API to populate the visualization from a graph.
Color can be added to identify something about the nodes such as their score relative to the search node or topic.
The spacing of the plane objects and camera can be adjusted.
Multiple cylinders could exist in a single space at the same time representing different searches.
The nodes could be interactive.
The visualizations could be located in VR or AR.
Nodes could be selected from a visualization and the sub-graph returned in a more manageable size (see the 60ish node limit above). These subgraphs could be stored as artifacts to come back to later.
The camera could be within the cylinder rather than outside of it.

Conclusion

I'll end the same way I began. I have no reason to believe this is good. It is, at least, an attempt to address issues in graph visualization. I look forward to improving on it in the future.

Building a SEIM Dashboard with R, Jupyter, and Logstash/Elastic Search

Motivation:

I am disappointed with the dashboards offered by today's SEIMs. SEIM dashboards offer limited data manipulation through immature, proprietary query languages and limited visualization options. Additionally, they tend to have proprietary data stores that limit expansion and evolution to what the vendor supports. Maybe I'm spoiled by working in R and Rstudio for my analysis, but I think we can do better.

Plan:

This blog is mainly going to be technical steps vs a narrative. It is also not the easiest solution. The easiest solution would be to already have the ELK stack, install interact.io, R, the R libraries, and the R jupyter kernel on your favorite desktop, and connect. That said, I'm going to walk through the more detailed approach below. You can view the example notebook HERE. Make sure to scroll down to the bottom where the figures are as it has a few long lists of fields.

Elastic search is becoming more common in security, (e.g. 1, e.g. 2). Combine that with the elastic package for R, and that should bring all of the great R tools to our operational data. Certainly we can create regular reports using Rmarkdown, but can we create a dashboard? Turns out with Jupyter you can! To test it out, I decided to stand up a Security Onion VM, install everything needed, and build a basic dashboard to demonstrate the concept.

Process:

Install security onion:

Security onion has an EXCELLENT install process. Simply follow that.

Install R:

Added ‘deb https://mirrors.nics.utk.edu/cran/bin/linux/ubuntu trusty/‘ to packages list

sudo apt-get install r-base

sudo apt-get install r-base-dev

— based off r-project.org

Install R-studio (not really necessary but not a bad idea)

Downloaded r-studio package from R-studio and installed

Sudo apt-get install libjpeg62

sudo dpkg -I package.deb

Install Jupiter:

(https://www.digitalocean.com/community/tutorials/how-to-set-up-a-jupyter-notebook-to-run-ipython-on-ubuntu-16-04)

Sudo apt-get install python-pip

sudo pip install —upgrade pip (required to avoid errors)

sudo -H pip install jupyter

Install Jupyterlab: (probably not necessary)

Sudo -H pip install jupyterlab

Sudo jupyter serverextension enable --py jupyterlab --sys-prefix

Install Jupiter dashboard

(https://github.com/jupyter/dashboards)

sudo -H pip install jupyter_dashboards

sudo -H pip install --upgrade six

Sudo jupyter dashboards quick-setup --sys-prefix

Install R packages & Jupypter R kernel:

Sudo apt-get install libcurl4-openssl-dev

sudo apt-get install libxml2-dev

Start R

install.packages("devtools") # (to install other stuff)

install.packages(“elastic”) # talk to elastic search

install.packages(“tidyverse”) # makes R easier

install.packages("lubridate") # helps with working with dates

install.packages("ggthemes") # has good discrete color palettes

install.packages("viridis") # has great continuous colors

# https://github.com/IRkernel/IRkernel

devtools::install_github('IRkernel/IRkernel')

# or devtools::install_local('IRkernel-master.tar.gz')

IRkernel::installspec() # to register the kernel in the current R installation

quit() # leave. Answer ’n’ to the question “save workspace?”

Install nteract: (Not necessary)

(nteract.io)

Download the package

Sudo apt-get install libappindicator1 libdbusmenu-gtk4 libindicator7

sudo dpkg -i nteract_0.2.0_amd64.deb

Set up the notebook:

Rather than type this all out, you can download an example notebook. In case you don't have an ES server populated with data, you can download this R data file which is a day of windows and linux server logs queried from ES from a blue vs red CTF.

I created the notebook using nteract.io so it is in a single order. However, if you open it on the juypter server, you can use the dashboards plugin to place the cells where you want them in a dashboard.

Results:

A lot of time spent compiling.

No need to download R/jupyter stuff on security onion if elastic search is remotely reachable.

Elastic search is not intuitive to query. Allowing people an 'easy mode' to generate queries would be significantly helpful. the `ES()` function in the workblook is an attempt to do so.

It would be nice to be able to mix interactive and dashboard cells.

This brings MUCH more power for both analysis _and_ visualization to the dashboard.

This brings portability, maintainability (ipynb files can be opened anywhere that has the R/jupyter environment and can access elastic search. They can also be forked, version controlled, etc.)

Future Work:

Need a way to have cells refresh every few minutes, likely a jupyter notebook plugin.

Interactive figures require interactive plotting tools such as Vega. This would also bring the potential ability to stream data directly to the notebook. It may even solve the ability to auto-refresh.

Conclusion:

In conclusion, you really don't want to roll-your-own-SEIM. That said, if you already have ES (or another data store R can talk to) in your SEIM and want less lock-in/more analysis flexibility, R + Jupyter may be a fun way to get that extra little emph. And hopefully in the future we'll see SEIM vendors supporting general data science tools (such as R or Python) in their query bars and figure grammars (ggplot, vega, vegalite), in their dashboards.