Information Security Analytics Blog

Common Attack Graph Schema (CAGS) 3.2

2021-08-28T14:29:00.011-07:00

It's been a while since I've updated CAGS. This is an initial post and may be modified to better fit with CAGS 2 later.

Revision: Schema updated to 3.2. See the previous 3.X schema(s) at the end of this post.

3.2 Schema

All property names must be stored as lower case
The graph must be a directed multigraph. It must be a combination of a causal bipartite multigraph with 'context, 'objects' (previously conditions, a subtype of context), and 'actions' (previously events) representing the two types of nodes and a knowledge simple graph defined in OWL used to describe the objects and actions.
Action node properties. All other properties should be defined through the knowledge graph.

type: "action" (required)
id: A URI including the graph prefix identifying the node (required)
name: The action that occured. This may be from a schema such as a VERIS action or ATT&CK technique, or may be an arbitrary string describing the action or event that took place. (required)
start_time: The time the atomic the node represents began to exist. Time should be in ISO 8601 combined date and time format (e.g. 2014-11-01T10:34Z). If no time is available, minutes since unix epoch (1/1/1970 Midnight UTC) should be used as a sequence number. (required)
finish_time: The time the atomic the node represents ceased to exist. Time should be in ISO 8601 combined date and time format (e.g. 2014-11-01T10:34Z) (optional but encouraged)
logic_operator: a function (including the language the function is defined in) that takes the state of parent objects to the node as arguments (pre-conditions) and returns the effect(s) on child objects to the node (effects). (A characteristic borrowed from formal planning.) This may be ladder logic, first order logic, higher level languages such as python, machine learning model, etc. The values accepted per pre-condition and produced per effect must be in the same set as values used for the object node state property. In practice this will often be the identity function. (For example if a parent object's state is 'compromised', after the action the child object's state will be compromised. If missing, is assumed to be the identity operator transfering the set of all state from precursor objects to affected objects.
succeeded: float from 0 (failed) to 1 (succeeded) or distribution representing the probability that action succeeded in its effects. Any effects which may be separable should be defined through a separate action. (optional)
confidence: float from 0 to 1 or distribution representing the confidence that the action succeeded. (optional)

Context node properties. All other properties should be defined through the knowledge graph. These definitions may take the from of an existing schema such as VERIS assets, the CARS data model objects, or other ontologies of objects defined through a knowledge graph.

type: "context" or "object" (required)
id: A URI including the graph prefix identifying the node (required)

Object node properties. Object nodes are a sub-type of context in that they may be instanced and have a 'state' which changes as actions are applied. Only object nodes may be part of the causal graph.

state: A property that may be used as a transient string representing the state of the object during a point in time representing the current state of the system. The sum of all object states is the state of the system. This may be as simple as "compromised", from an ontology such as VERIS attributes, the Confidentiality, Integrity, Availability triad, Bayesian or DIMFUI (Degradation, Interruption, Modification, Fabrication, Unauthorized Use, and Interception), or it may even be an arbitrary string.

Edge Properties:

source: the id of the source node. Object nodes may only have sources of action nodes and action nodes may only have sources of object nodes. All nodes part of the knowledge graph may only have sources within the knowledge graph or an object node. (required)
destination: the id of the destination node. Object nodes may only have destinations of action nodes and action nodes may only have destinations of object nodes. All nodes part of the knowledge graph may only have sources within the knowledge graph or an object node. (required)
type: Edges between actions and objects (in either direction) have a type from the set of states acceptable for the object node state property and must agree with the pre-conditions and effects of the action node involved's logic operator. All other edges are defined by the OWL knowledge schema. (required)

The acceptable edge types are: "precursor_of" (edge from an object to an action), "effect_of" (edge from an action to an object), "describe" (edge from an object or context to an object, context, or action).

id: A URI representing the edge. (optional)

It is intended that sets of nodes and edges in the graph can be joined to create a subgraph represented by a single node. The node must still obey all previous schema requirements.

Strengths

This schema builds on the 2.0 and 3.0 schemas in a few fundamental ways:

The use of knowledge graphs to provide properties simplifies defining arbitrary sets of properties. This is incredibly important as different users will want to represent different properties at different levels of detail. In Figure 1, Object 3 is a process linked to it's higher level representations. However the dotted lines show how Objects 5-8 could be used if the goal was a higher level representation of the incident.

Figure 1 - Knowledge graph used to represent different levels of description.

The use of a logic operator allows for arbitrary logic in progressing through the graph without creating complex graph structures to try and define the logic. This effectively replaces the Bayesian Conditional Probability Tables in version 2.
The action-object bipartite graph provides the ability to represent complex relationships (as a bipartite graph can represent hypergraphs and simplicial complexes, or dendrites) while still maintaining the strengths of traditional graphs. It allow allows moving almost all properties to nodes or to the knowledge graph.
The use of properties defined without schemas (action node action, action node logic operator, object node knowledge graph, and object node state) allows the schema to be "specifically vague" (credit to Gage for the term). Enough to be clear but vague enough to support varying use cases.
The set of object states is the state of the system the graph describes. To determine the state of the graph at given time, all actions must be applied in order. This provides for state management without state explosion.

Limitations

The schema does not define how parent-child relationships are established (though it is logical that children must come after parents and that parents/children are limited by the objects an action requires tas pre-conditions and the objects it may affect.
The schema does not define how to identify duplicate objects within the graph (where a a single actual object is represented by two object nodes). When a schema is not used to help avoid duplication, I envision tools that tools will be available to help identify duplicates through their knowledge graph properties. OWL allows for the same object to exist as different notes in the same knowledge graph.
The schema does not readily distinguish between ground truth and records used to observe ground truth. Care must be taken to distinguish these two types of actions and the associated objects. For example, a record may be an object child of the action that generated it. Figure 2 provides an example. The characteristics of the record can be as simple or as detailed as desired though it's prudent to consider the ability of the graph to scale to represent instances of records.

Figure 2 - Representing logs of what happened

The schema does not explicitly define actor, however it may be a relationship established in the knowledge graph and is considered a best practice.

Example

The following image provides an example based on an incident from the VERIS Community Database (VCDB), specifically case a2ed36db-0c78-4162-b2cc-dbaa2ca73866. (Note that the example leaves out the majority of the properties for brevity.)

Figure 3 - Example incident

Representations

At its core, the schema is incredibly simple as can be seen below:

This OWL file can be found here. CAGS graphs conforming to this format should be stored as triples in JSON-LD format. If converting to a property graph, the graph should be stored in JSON Graph Format (JGF).

Use Cases

Aggregation of Events

Log data comes in as atomic events. Given any single event, timestamps only reveal that later events cannot be the parent and earlier events cannot be the child, but the timestamp does not explain _what_ the parent(s) or child/children of an event are.

The graph schema should assist in determining the parent(s) and child/children of an event, (for example by defining that an event occurred due to a file, a credential, or another system and, as such, that object(s) or actions ending in that object(s) must contain the parent.

Motif Communication

It is often helpful when communicating a plurality of actions to communicate the relationships between those actions. This really will touch on multiple use-cases, but is centered around motifs as bounded portion of a path or subgraph.

Attack Surface

A system can be documented using the graph schema to identify the interconnectivity between components and highlight potential paths of attack. (Note, while many of the prior use cases are based around events (or signal generated from the system, this is based on the _actual_ state of the system and actual actions rather than the events they generate.)

Attack Graph Generation

An attack surface generated using the graph schema can be used to plan potential attacks on the system. This can be used for automated attack simulation such as cauldera, planning manual penetration testing (such as bloodhound), etc. This likely results in an attack graph, (a plurality of actions to take).

Analysis

Event data should be able to be aggregated into paths and graphs. This data can then be aggregated across data sources (different tools, sites, organizations, etc) and then queried using graph queries to identify commonalities such as common motifs.

Incident Documentation

After an incident has occurred, the incident responders can document the relationship between the observed actions (or events generated by those actions) using the graph schema.

Detection

A defender wants to define a detection that contains multiple atomic events and how they are related (such as in grapl). To do this they need both a motif of the detection and the ability to aggregate events to see if they match the motif.

Simulation

A defender may wish to simulate attacks containing more than a single event. To do so they need a motif of events and their relationships and the ability to turn that into atomic actions to take/attempt to take.

Incident Response

After aggregating events, the data can be analyzed using graph tools, neural networks, or other tools to identify things like missing edges (actions the attackers might have taken but where no event exists to document it), nodes (objects that may be involved in the incident, but are currently not included in the investigation), or clustering (to identify assets currently part of the investigation but are unlikely to have been involved).

Defense Planning

Given analysis of an attack surface producing an attack graph, the attack graph can then be analyzed to determine thing such as what events will be generated if exercised, nodes and edges central to the attack that might serve as optimal mitigation points, etc.

Risk Analysis

Given an attack surface, analyze the graph to identify the overall 'risk' associated with it. The goal is to provide quantitative feedback on the likelihood and potentially impact of cyber threats given threat intelligence.

3.1 SCHEMA

The 3.1 schema is the same as the 3.2 schema except for the following changes:

The CAGS 3.1 'uuid' property has been replaced with an 'id' which uses URIs including graph namespaces instead of UUIDs
CAGS 3.2 adds allowed edge types
CAGS 3.2 adds 'context' nodes
Added representations
Described logic_operator as optional but with a default representation if missing
Renamed the 'action' property of actions to 'name'

3.0 SCHEMA

The attack flows are defined with nodes as objects and their individual actions as hyperedges. Nodes maintain their individuals state with respect to security while edges document how state is changed by the edge. Edges also contain the logic to adjudicate complex interactions between inputs. The attack flow (or graph) in its entirety represents the state of the system (or portion of the system) being described.

NODES TYPES:

Datum
Person
Storage
Compute
Memory
Network
Other
Unknown

Nodes have a ‘state’ property representing their current state with respect to the actor. They indicate the states (confidentiality/integrity/availability, Create/Read/Update/Delete, or object-specific).

EDGES:

leads_to

Edges are hyperedges (or, alternately, a bipartite representation of hyper-edges) with with a ‘logic’ property defining the process for translating the inputs into a success at the output. Another option is to model the edge as a dendrite to represent the input to output logic of the edge.

Edges have a ‘action’ property defining the details of the action. (These may be in ATT&CK, veris, or any an arbitrary language.)

Edges may have a timestamp property to indicate the order in which they occur. In practice this can be ‘played’ on the graph to update the node states over time.

Can you predict the future? No.

2021-02-03T06:27:00.003-08:00

Did you ever wonder why some people succeed and others don't? Why Jeff Bezos is rich? Why a company got breached? Is it because Jeff Bezos somehow learned what would happen in the future? Is it because the breached company ignored the obvious future? No. No-one can predict the future.

Let's take an example: Double Pendulums

Just predict where they'll swing. Really easy right? You can model the entire pendulum with two nodes and two edges. Simple.

Give it a try: https://www.myphysicslab.com/pendulum/double-pendulum-en.html. Hit the pause button in the upper-right, drag the pendulums to the top where they can drop. Put your finger on the screen where you think they'll be in 5 seconds, hit play, and count to 5. How did it go?

Hmmm. Let’s try it again. Maybe if you saw it happen first. Hit pause, drag them back up, put 1 finger where it starts, run to the count of 5, and put another finger (same hand) where it ends. Now drag the pendulum back up to the first finger, hit play again, and count to 5. Is the second pendulum anywhere near your second finger?

You can't predict the future

If you were right you were wildly lucky. Check out 7 pendulums who's only difference is approximately 1/3rd of an ounce. It's due to chaotic motion. Even in a system with just two nodes where we know all the variables, it gets unpredictable very quickly. Now imagine if your system is something like this:

In this image the color code is as follows:

the upper-left brown is the internet.
the five fuchsia nodes to the right are user systems
the upper green are the DMZ
the blue-green and dark grey are servers
orange are management systems
light pink is infrastructure
grey is a security system
light blue at the bottom is a protected enclave.

That's about two dozen systems. An _extremely_ small IT estate. And we have little idea what all the variables it may contain. Compare that to the two pendulum model. If we can't predict two pendulums what chance do we have with this?

Try to imagine predicting the business climate and how the world will change over the next 20 years. You need to make choices now that will govern your success then. Can you (or anyone) do that?

The answer is, of course, no. Lots of people are making many decisions and some will be right, and some will be wrong. However, for the most part it's not due to the individuals making them.

So what's a person to do?

Give up? Give in? Nah, don’t do that.

In spite of all the uncertainty and the multitude of variables involved, the reality is that most useful systems do not tend to devolve into chaos. If they did they wouldn't be useful. Instead, they normally remain in common, steady states. Except for moving from one steady state to another when something changes.

And that's what you should do. Bet on the average. The common state. The place where most things end up. Don't look at people who succeeded (or failed) spectacularly. It was spectacular because it wasn't common. They couldn't predict the future and neither can you. You can bet on the most common outcome though. (As Sir Francis Galton - or Dan Kahneman if you prefer - would call it, Regression to the Mean.) For security, this means filter email, filter web content, use two- factor authentication, and manage assets.

The other thing you can do is prepare to change along with the situation. This requires creative people who can devise innovative solutions when there is some new input, as opposed to rather following the usual processes. This is one of the reasons why quality security operations are essential. Something engineered and built over several years will never cope with a significant shift in information security unless it also shifts.

And in conclusion, don't beat yourself up over it

What happened in the past did not predictably lead to today, for you or anyone else. And not only does the past not predict the future, but the future doesn’t require the past. Inverse evolutionary techniques such as Inverse Generative Social Science demonstrate that things could have started completely differently, and we still could arrive right where we are today. The best you can do is invest in the average and be creative enough to handle the unanticipated.

Simulating Security Strategy

2021-02-01T08:27:00.005-08:00

You’ve probably imagined it, right? Lots of little attackers and defenders going at it in a simulated environment while you look on with glee. But instead of spending our cycles on details such as if the attack gets in, let's leave that for the virtual detonation chambers and focus on the bigger picture of attack and defense?

That is exactly what Complex Competition does. It simulates an organization as a topology and then allows an attacker and a defender to compete on it. Table 1 provides all the rules:

Gameboard is an undirected, connected, graph. Nodes may be controlled by one or both parties. One node is marked the goal.
The defender party starts with control of all nodes except one.
The attacker party starts with control of one node only.
Parties take turns. They may:

Pay A1/D1 cost to observe the control of a node.
Pay A2/D2 cost to establish control of a node.
Pay A3/D3 cost to remove control from a node (only succeeding if they control the node).
A4/D4 cost to discovery peers of a node.
Pass or Stop at no cost.

They may only act on nodes connected to nodes they control.
The attacker party goes first.
The target node(s) is assigned values V1-Vn. When the attacker gains control of the target node X, they receive value Vx and the defender loses value Vx.
The game is over when both parties stop playing. Once a party has stopped playing, they may not start again.

This allows us to test out a lot of things which include the below:

Does randomly attacking in a network pay?

Answer: No! (Unless the target of the attack is connected to the internet)

What does it cost to defend?

Answer: anywhere from three to five times the number of actions the attacker took.

What attacker strategies work best if there’s no defender?

Answer: Attacking deep into the network, or trying a quick attack and bailing.

What attacker strategies work best if there is a defender?

Answer: Now the quick attack is a clear front runner.

How does an infrastructure compromise change the attack?

Answer: When the infrastructure is compromised, the attacker doesn’t have to dig deep into the network. (Obvious, I know. But here we can show it quantitatively.)

Now the caveats

All that analysis must be taken with a grain of salt. It’s totally dependent on the costs of the actions (all 1), the value and locations of the targets, the topology, and the attacker strategy. None of which are meant to be particularly representative in these simulations. Also, this simulation is relatively basic, but hopefully it strikes a balance between usefulness and simplicity for this first iteration.

Still, there’s a lot of other questions we could try to answer:

When should the defender stop defending / how much should they spend on defense?
How else does the location of the attacker affect their cost to reach the target?
How does the target location affect the attacker's cost to reach it?
How do different topologies affect the attacker and defender costs?
How do different costs affect the attacker's chance of reaching the target?
What is the relationship between topology, attacker strategy, attacker action cost, and target value?

And eventually we could make it more complex:

Add more information to the nodes to help players choose actions
Probability of success per edge
Cost of action per node
Replace the undirected graph with a directed graph
Different value for the attacker and defender for achieving the goal.
Separating the impact cost to the defender from the goal and having them on separate nodes
Allow the defender to take more than one action per round
Set per edge success probabilities and costs
Create action probabilities
Allow the defender to pay to increase attacker action cost (potentially per edge).
Allow the defender to pay to decrease the action success probability (potentially per edge).
Allow the defender to pay to monitor nodes without having to inspect them

Primarily, though, we simply want to get this out there and give everyone a chance to try it out, and, more than anything, illustrate the clear need to simulate security strategy. (He said the thing!)

Be the CFP review you want to be reviewed by

2019-06-09T19:59:00.001-07:00

There are lots of infosec conferences which means lots of CFPs and lots of talks reviewed. I participate in several and figured I would share some of the lessons I've learned. A caveat: This is highly opinionated. It's my experience so probably doesn't apply to everyone. I mostly do small, specialized tracks and conferences so reviewing dozens of talks, not hundreds.

The CFP

Set yourself up for success. There are probably 5 things you need to ask for in addition to the speaker info. If you don't ask for them, you'll end up asking later:

A title
An abstract. Make it clear you'll be printing the abstract!
A bulleted outline. If you don't ask for it in the CFP, you'll end up asking those who don't supply it anyway.
What attendees will gain. This could be processes, tools, knowledge. But it's the 2nd most common question I have to ask after asking for an outline. It also helps distinguish between vendor pitches and useful talks. Vendors will often speak about how _they_ did something but not necessarily how attendees can do it.
An attachment field. This will let people share slides, longer outlines, detailed explanations of the talk, etc. It's important for people who want to answer your specific questions but feel they have more they need to share.

The rating

Set your raters up for success. You can ask your reviewers to answer lots of questions about talks, but the reality is only a few will be used. I'd recommend 3 (stolen from bsidesNash:

Content (0-5). How good is the content and the speaker's likely ability to give it.
Applicability (0-5). How applicable is the content to the conference/track/interests of attendees/etc.
Comments/notes to submitters.

Most other questions will likely be another way of asking all or a portion of either question one or question 2. For example, asking "Has this speaker done a good job in previous talks?" is really just a question to help predict the quality of the content.

1 and 2 could be combined into a single accept-reject range of 0-5. I like the two as neither I nor other raters I've worked with have had trouble answering both questions for all talks. Also, they are orthogonal with very little affect of one on the other.

I also recommend 0-5. Honestly, it can be 0 to anything. The goal is simply to have a range that normalizes to 0%-100% easily. 1-5 does not. Is 1-5 20%, 40%, 60%, 80% and 100%? is it 0%, 25%, 50%, 75%, 100%? It's unclear how it maps out. Terms are even worse. "really bad", "bad", "ok", "good", "really good"? Is that 0/25%/50%/75%/100%? If so, just use those numbers. 0-5 is easily 0/20/40/60/80/100%. You could also simply provide a slider from 0 to 1 to allow people to provide the granularity they want.

Every rater should leave some note that can be passed to the submitter. They may be passed directly, summarized, or aggregated, but you'll need those notes.

Each rater will probably also keep their own notes that do not get shared with the submitter. It's honestly never clear to raters which comment field will or won't be seen by the submitter in the online review system so you might as well have a single one that will be shared and tell raters to keep private comments offline. It also helps the raters think about how to communicate their feedback positively.

I'd also recommend making raters provide a rating before seeing the submitter. Even if they can go change their score after the fact, it helps remove implicit bias based on the submitter. It's ok if a rater rates something, sees the submitter and updates their opinion based on the additional information about the org, previous talks be the speaker, etc that they can clearly articulate. But you don't want the information about the submitter, their company, experience, other submissions, etc influencing the rating implicitly and you don't want submitter ethnicity, gender, sexual orientation, etc influencing it at all.

Pre-rating

There are two things you should do as soon after CFP submission closes as possible, even before rating the talks.

Identify talks that should be moved to another track/reviewer.

Identify talks where you need to ask the submitter a question to accurately review the talk.

These two things are impossible to accomplish late in the review process. The first only really applies if you have multiple tracks with multiple raters. But if you wait to move a submission, more than likely the receiving rater will already be done and won't be interested in another talk.

For questions, it often only takes minutes, hours or a day to get an answer back, but if the review team is all on the phone making selections, that answer will be too late. Even if it's to ask for an outline, a more detailed explanation of the submission, or what attendees can expect to learn, most submitters have an answer and can get it to you quickly.

Try to do a pass through the submissions before reviewing and identify any submissions that fall into either category. Addressing it up front will lead to better outcomes for everyone at review time.

The review

After the ratings are in, it's time to review them to pick the talks:

Start with some mathematical analysis of your talks. I do it with two scores in this blog, but it works just as easily with a single rating per talk. Being able to visually check a talk's scores is strikingly helpful. I've watched it save CFPs that were completely off track, take review meetings that were going no-where and turn them around, and half the time reviewing takes.
Start with the talks that everyone rated perfect or near perfect. If everyone agreed they're good, don't waste time rehashing it. Mark these "accept".
Then go to the bottom of the list and work your way up. Basically, if no-one is willing to fall on their sword for the talk, "reject" it or mark it on the bubble. (We tend to use "bubble up" or "bubble down". Up for talks you'd accept if you could. Down for talks you'd only take if you have to.)
At some point you're going to get to talks that people liked, but had some flaw. Raters will be saying "I liked this one, but..." That means you're now into the middle section of the talks. Go back to the top, after the talks you've already accepted, and work your way down marking "accept", "reject", "bubble up", or "bubble down". Be biased against accepting. It's easier to go to the bubble to add talks than to accept more talks than you can take and cut again
Identify backup speakers. How many is up to you, but I like 1 per track per day. (Add at least one extra if international speakers are accepted as many things can prevent them from speaking.) I also like to identify someone on staff that will 'just be there' who can be easily found and give a talk (rather than having an empty room) if anything goes wrong.

Also, we tend to give reviewers one veto each; usually a talk they absolutely want, that they can use to overwrite the prevailing opinion of the group.

The notification

Now the part no CFP organizer likes, notifying people (particularly the non-acceptances). This happens in a few stages:

Notify all of the accepts. You need all of them to confirm that they can still make it. Until they confirm, you don't have a talk. That said, this normally happens pretty quickly. Accepted people are exciting and generally respond fast.
Notify the bottom 3/4ths of the non-accepts. You can't notify all because you may have some accepts that can no longer make it and so some of the non-accepts may turn into accepts.
Once you have all the accepts complete, notify the backups and get their confirmation. (Note that if some of your accepts didn't confirm, you may need to move a backup to an accept and a bubble-up to a backup.)
Finally notify any non-accepts that have not been notified.

All non-accepts deserve some feedback on why they weren't accepted. It could be that the content wasn't the right fit, that the talk felt too complex or not complex enough. it could be that the reviewers felt attendees wouldn't take a lot away from the talk. It could be there were grammatical errors in the abstract. It could simply be there wasn't enough information for raters to be confident it would be a good talk. But all non-accepts deserve to hear from you.

And the rest of it

At this point, it turns into a speaker management job. Making sure they have everything they need, know where to be and what to do. That lasts until the speaker has completed their talk, but that's a subject for another post.

Data Driven Security Strategy

2018-09-07T08:20:00.002-07:00

I presented on building a data driven security strategy at RSA this year. You can find the video here and the slides here.

If there's one thing to take away it's this:

"Strategy is HOW YOU CHOOSE plans to meet your objectives, not the plans you choose. Those plans must be in the context of the rest of security and your organization. And a data driven security strategy is using MEASURES TO CHOOSE."

Data Analysis Template

2018-09-07T08:03:00.001-07:00

This is just a quick blog to share my jupyter notebook analysis template. I analyze a lot of different datasets in a short period, so having the analysis consistent is very helpful. I'll walk through the sections quickly to share a bit about my process.

Title Section

In the title section, I have a block for any ideas to explore, specific things I intend to do, anything I need to request to be updated in the data, and any notes about the data. These are all bulleted text boxes.

This section is VERY helpful for working on multiple datasets. it's easy to forget what you were going to do or what you've done and the summary up front helps get you back in place.

Preparation

next is preparing the data. No data comes ready for analysis. Here I have blocks to read in the data, clean the created dataframe, save it to an R data (Rda) object on disk, and then, the next time I need it, I just load the Rda and skip the cleaning.

Analysis

The analysis section is basically filled with mini experiments. each chuck is one. As such, it's important that each have a bit of information in comments at the top of it:

A description of the hypothesis being tested or explored. Something like "looking at the distribution of the periodicity of events".
Once it's done, describe the results. Yes, the results should describe the results but you'll thank past you if you write down what you got from the analysis when you did it. Something like "it looks like the periodicity is bimodal with one mode representing X and another representing Y."
Add a comment with a UUID. Seriously. Every. Single. Block. If it's something interesting you're going to put it in a document or a blog or something. You want to be able to track it from beginning to end. (Ours track from the report, through several drafts of the report, through drafts of the sections, to a figures rmarkdown file that generates all the figures, to an exploratory report where we created the original analysis.) Seriously. If you like it then you shoulda put a UUID on it.
Now you can actually write the analysis code

Appendixes

This is where I put all of the extra stuff.

Testing

I always have a testing block. Throughout the analysis, you'll spend a lot time testing stuff to make it work, (or simply looking up things like the dimensions of your data and the column names). Putting those in a testing block keeps you from coming back later and wondering what the block in your analysis was there for.

Lookups

Sometimes you have big, ugly, lookups. putting them at the top clogs the Preparation section, so I tend to put them at the bottom. You'll remember you forgot to run them when your analysis fails.

Backup

Really a parking lot for anything you don't want in another section, but don't want to delete.

Ultimately, if I were doing full modeling, I'd probably want a template that follows the process outlined in Modern Dive. However, for someone just getting into analysis, hopefully this helps!

Game Analysis of the 2018 Pros vs Joes CTF at BSidesLV

2018-08-19T11:21:00.002-07:00

Introduction

Capture the Flag (CTF) contests are a staple of security conferences and BSides Las Vegas is no exception. However the Pros vs Joes (PvJ) CTF I help support there is a bit unique. Not only is it a blue vs blue CTF with red aggressor and gray user teams, but the game dynamics are a fundamental development point for the CTF team. (There's a lot more to it such as it's educational goal or that we allow blue teams to attack each other on the second day. You can read more about it at http://prosversusjoes.net/.)

Game Dynamics

When we say 'game dynamics', we mean a couple of things. First we mean what's scored and how much. In our case that is currently four things:

hosts (score given to teams for maintaining service availability)
beacons (score deducted when the red team signals a host is compromised)
flags (score deducted when the red team breaches specific files)
tickets (score deducted when the gray team is not being appropriately supported)

At a more fundamental level though, we mean the scenario the CTF is meant to represent. As a blue team CTF, we try and simulate the real world. As such, starting last year, we began to transition our game model to simulate an economy. Score is not granted so much as transferred. For example, the gold team pays the gray team for accomplishing some task, then the gray team pays a portion of that score to the blue team for maintaining the services necessary to accomplish that task. Alternately, when the red team (or another blue team) installs a beacon, the score isn't lost, but instead transferred to the team that placed the beacon.

Beginning with last year, we have started to then simulate the way we expect the game to run. This year we have also captured detailed scoring logs. This blog is about our analysis of the score from this year's game and how it helps us plan for the future.

Simulation

The first thing we do is create a game narrative and scoring profile for the game. The profile is the servers that will come online, go offline, and how much they will be scored per (5 minute) round. It is picked to produce specific outcomes such as inflation (to decrease point value early in the game when teams are just getting going and to allow dynamism throughout the game).

We then try and build distributions of how likely servers will be to go offline, how likely beacons will be and how long they will last, and how many flags will be found. This year we used previous years simulations and logs as well as expert opinion to build the distributions. The distributions we used are below:

### Define distributions to sample from

## Based on previous games/simulations and expert opinion

# H&W outage distributions

doutage_count <- distr::Norm(mean=8, sd = 8/3)

doutage_length <- distr::Norm(mean=1, sd = 1/3)

# flag distributions

dflags <- distr::Norm(mean=2, sd= 2/3) # model 0 to 4 flags lost with an average of 2

# beacon distributions

gamma_shapes <- rriskDistributions::get.gamma.par(p=c(0.5, 0.7), c(0.75, 4)) # create a gamma distribution to draw number of tickets from

dbeacons_length <- distr::Gammad(shape=gamma_shapes['shape'], scale=1/gamma_shapes['rate']) # in hours

dbeacon_count <- distr::Norm((4-3)/2+3, (4-3)/3)

Based on this we ran Monte Carlo simulations to try and predict the outcome of the game.

First, we analyzed the expected overall score.

Next we wanted to look at the components of the score.

Finally we wanted to look at the distributions of potential final scores and the contributions from the individual scoring types

The Game

And then we run the game.

The short answer is, it's VERY different. We had technical issues that prevented starting the game on time. We were not able to complete some development that prevented automatic platform deployment, some hosts were not available, and some user simulation was also not available. This is not a critique of the development team who did a crazy-awesome job both rebuilding the infrastructure for this game in the months leading up to it as well as dynamically deploying hosts during the game. It's just reality. The scoring profile was built for everything we want. I am pleased with how much of it we got on game day.

The Scoreboard

The Final Scoreboard

You can find the final scoreboard and scores here. It gives you an idea of what the game looked like at the end of the game, but doesn't tell you a lot about how we got there. I'm personally more interested in the journey than the destination so that I can support improving the game narrative and scoring profile for the next game.

Scores Over Time

The first question is how did the scores progress over time? (You'll have to forgive the timestamps as they are still in UTC I believe.) What we hoped for was relatively slow scoring the first two hours of the game. This allows teams the opportunity to make up ground later. We also do not want teams to follow a smooth line or curve. A smooth line or curve would mean very little was happening. Sudden jumps up and down, peaks and valleys, mean the game is dynamic.

What we see is a relatively slow beginning game. This is due to beacons initially being scored below the scoring profile and one of three highly-scored puzzle servers being mistakenly scored lower from it's start late in day 1 until it was corrected at the beginning of day 2.

We do see an amount of trading back and forth. ForkBomb (as an aside, I know they wanted the _actual_ fork bomb code for their name, but for this analysis text is easier) takes an early lead while Knights suffer some substantial losses (relative to the current score). Day two scores take off. The teams are relatively together through the first half of day 2, however, Arcanum takes off mid-day and doesn't look back.

The biggest difference is that when teams started to have several beacons, as part of their remediation they tended to suffer self-inflicted downtime. This caused a compound loss of score (the loss of the host scoring they would have had plus the cost of the beacons). We did not account for this duplication in our modeling, but plan to in the future.

Ultimately I take this to mean scoring worked as we wanted it to. The game was competitive throughout and the teams that performed were rewarded for it.

It does leave the question of what contributed to the score...

Individual Score Contributions

What we expect is relatively linearly increasing host contributions with a bit of an uptick late in the game and linearly decreasing beacon contributions. We also expect a few significant, discrete losses to flags.

What we find is roughly what we expected but not quite. The rate of host contribution on day two is more profound than expected for both Paisley and Arcanum suggesting the second day services may have been scored slightly high.

Also, no flags were captured. However, we do have tickets which were used by the gold team to incentivize the blue teams to meet the needs of the gray team.

The biggest difference is in beacons. We see several interesting things. First, for a period on day two, Knights employed a novel (if ultimately overruled) method for preventing beacons. We see that in the level beacon score for an hour or two. We also see a shorter level score in beacons later on when the red team employed another novel (if ultimately overruled) method that was significant enough that had to be rolled back. We also see how Arcanum benefited heavily from the day 2 rule allowing blue-on-blue aggression. Their beacon contribution actually goes UP (meaning they were gaining more score from beacons than they were losing) for a while. On the other side, Paisley suffers heavily from blue-on-blue aggression with significant beacon losses.

Ultimately this is good. We want players _playing_, especially on day 2. Next year we will try to better model the blue-on-blue action as well as find ways to incentivize flags and provided a more substantive and direct way for the gray team to motivate the blue team.

Before we move on, two final figures to look at. The first lets us see individual scoring events per team and over time. The second shows us the sum of beacon scores during each round. It gives an idea of the rate of change of score due to beacons and provides an interesting comparison between teams.

But there's more to consider such as the contributions of individual hosts and Beacons to score.

Hosts

The first thing we want to look at is how the individual servers influenced the scores. What we want to see is starting servers contributing relatively little by the late game, desktops contributing less, and puzzle servers contributing substantially once initiated. This is ultimately what we do see. (This was the analysis, done at the end of day 1, that allowed us to notice puzzle-3 scoring substantially lower than it should. We can see it's uptick on day 2 as we correct it's scoring.)

It's also useful to look at the score of each server relative to the other teams. Here it is much easier to notice the absence of the Drupal server (removed due to technical issues with it). We also notice some odd scoring for puzzle servers 13 and 15, however the contributions are minimal.

More interesting are the differences in scoring for servers such as Redis, Gitlab, and Puzzle-1. This suggests maybe these servers are harder to defend as they provided score differentiation. Also, we notice teams strategically disabling their domain controller. This suggests the domain controller should be worth more to disinsentivize this approach.

Finally, for the purpose of modeling, we'd like to understand downtime. It looks like most servers are up 75% to near 100% of the time. We can also look at the distributions per team. We will use the distribution of these points to help inform our simulations for the next game we play. We are actually lucky to have a range of distributions per team to use for modeling.

Beacons

For the purpose of this analysis, we consider a beacon new if it misses two scoring rounds (is not scored for 10 minutes).

First it's nice to look at the beacons over time. (Note that beacons are restarted between day 1 and day 2 during analysis. This doesn't affect scoring.) I like this visualization as it really helps show both the volume and the length of beacons and how they varied by team. You can also clearly see the breaks in beacons on day two that are discussed above.

The beacon data is especially helpful for building distributions for future games. First we want to know how many beacons each team had:

Day 1:

Arcanum - 17
ForkBomb - 24
Knights - 18
Paisley - 21

Day 2:

Arcanum - 13
ForkBomb - 17
Knights - 29
Paisley - 34

We also want to know how long the beacons last. The aggregate distribution isn't particular useful. However the distributions broken out by teams are interesting. They show substantial differences between teams. Arcanum had few beacons, but they lasted a long time. Paisley had very few long beacons (possibly due to self-inflicted downtime). Rather than be a power law distribution, the beacons are actually relatively even with specific peaks. (This is very different from what we simulated.)

Conclusion

In conclusion, the take-away is certainly not how any given team did. As the movie "Any Given Sunday" implied, sometimes you win, sometimes you lose. What is truly interesting is both our ability to attempt to predict how the game will go as well as our ability to then review afterwards what actually happened in the game.

Hopefully if this blog communicates anything, it's that the scoreboard at the end simply doesn't tell the whole story and that there's still a lot to learn!

Future Work

This blog is about scoring from the 2018 BSides Las Vegas PvJ CTF so doesn't go into much detail about the game itself. There's a lot to learn on the PvJ website. we are also in the process of streamlining the game while making the game more dynamic. As mentioned above, the process started in 2017 and will continue for at least another year or two. Last year we added a store so teams can spend their score. We also started treating score as a currency rather than a counter.

This year we added additional servers coming on and off line at various times as well as began the process of updating the gray team's role by allowing them to play a puzzle challenge hosted on the blue team servers.

In the next few years we will refine score flow, update the gray team's ability to seek compensation from the gray team for poor performance, and additional methods to maximize blue team's flexibility in play while minimizing their requirements. Look forward to future posts as we get the details ironed out!

A Year Not Drinking

2018-07-22T16:43:00.001-07:00

With Blackhat, Defcon, and BSides Las Vegas coming up, it seems like an appropriate time for a quick blog on alcohol. In 2017, for my birthday I took a year off drinking. Now that my birthday is past, I figured I'd share a bit about it.

Why?

Honestly, I felt I was drinking too much. There was always an excuse to drink. It was a holiday. Friends were over. My wife and I wanted to go out. There was something interesting to taste. etc.

Also, it became an end-of-day thing. Have a beer to relax after work. Just adding that up alone becomes a number not to be proud of.

I also wanted to see if it changed how I felt. Would I feel more healthy? Would I feel smarter? Since alcohol is a depressant that can last a week+ in your brain, would I be in a better mood?

And I wanted to try and save some money.

It also helped that I read a book where the main character didn't drink. I think it provided subconscious acknowledgement that it could be done as well as giving some ideas as to how.

What it took

It was easy. much easier than I expected. My goal wasn't to avoid alcohol like an allergy, but just not to have a full drink. It also helped to have a goal. "I'm not going to drink a full drink until at least X." I could easily tell people "I'm taking a year off drinking" and didn't get much pressure to drink after that.

To make it work, I had to have something else to drink though. (I drink a LOT of fluids. 2-4 liters of hot tea during the work day.) I don't like sweet drinks or fruity drinks. I also need variety and don't drink caffeine after like 7 at night, so that kinda limits my options. What I did find was:

Herbal Tea - TONS of variation here. Better during the winter when warm drinks are nice. I wish someone would make condensed herbal tea similar to what's available for ice tea.
Bitters and Tonic - This was my go-to. I have about 20 bitters of various flavors and a soda stream (modded w/ a real CO2 tank) now. I can drink these for ever and a day with a ton of variation.
Water with sliced fruit, then carbonated - It turned out this was great too. Cut a cucumber and a grapefruit into the water and let it sit a day. Then bottle it up and carbonate it.
La-croix - Not sweet and great flavors

Positive Impacts

First, I did feel like it was easier to solve complex challenges. The mental gymnastics just seemed a bit easier. Plus, it saved a BUNCH of money (minus stocking up the bitters). I'm sure the long-term effects of not poisoning myself regularly are good though I haven't quite termed long enough to find out.

Another interesting impact was social interactions were more productive. Instead of meeting over beer at the end of the day in a dark, loud place, I'd meet people in the morning or mid-day over tea. We tended to get a LOT more done.

Negative Impacts

On the other hand, there's a LOT less to do. A lot of the things that seem like fun (many times vague 'going out somewhere' concepts) just aren't exciting if you aren't drinking. Going downtown is now kinda 'bla'. Going out to bars is pretty much out of the question. (You could, but why?) So now when my wife and I try to find something to do on a free night, we actually have some trouble figuring it out. (That said, it may also be that because we have kids and so free nights are so rare we're not sure what to do with them.)

More stress. The reality was drinking was relieving stress. (Obviously not in a good way, but it was.) Life not drinking is much more stressful.

Also, I consumed a LOT more sugar. Probably linked to the last point about stress frankly. Instead of drinking alcohol, easting sweets became a way of dealing with stress, which I'm pretty sure is also not healthy.

When I drink

A side affect of this is it became very clear _when_ I drink.

First was after work to relieve stress.
Second were social events, basically as something to do when meeting people.
Third were celebrations. These tended to be heavier drinking. The problem is that the world makes sure there is always something to celebrate.

Going Forward

So my plan going forward. I don't plan on not drinking at all but I do plan on drinking less.

I plan to pick the days to drink in celebration way ahead. Probably my birthday and my wife's birthday, but likely nothing else. I think it's very important to do this ahead of time so that I have an idea how often it's happening throughout the year. It's very easy to impulse-celebration-drink and if I don't think about the year ahead, looking back on the year it's easy to find out I drank way more than I would have if I'd planned ahead.

Socially, I think I'll only drink in rare cases. And when I do, only make it one drink. Last year, I wish I'd had a drink of scotch with my father and brothers at home at Christmas. On the other hand, I probably won't drink when meeting up with people in Vegas. Those will be tea or tonic and soda type things depending on the time of day.

I'm not going to swear off tastings, particularly when offered. But on the other side, I'm not going to take an entire drink just to taste it. It's silly not to try interesting things, but it can't be an excuse to drink more.

My plan is to completely stop drinking after work. It's just too much of a slippery slope. Instead i plan to get out to the gym more and meditate (I pray, but you do you) to relieve stress.

Conclusion

So as you prepare for Vegas, drink the amount you want. But don't feel it's something you have to do. Many people don't and everyone I've spent time around has been understanding. And recognize that drinking won't make you cooler/more of a hacker/give you a fuller experience.

Now to figure out what to do about the sweets.

Good Blackhat/Defcon/BSides Las Vegas Advice

2018-06-20T11:47:00.002-07:00

Every year new people come to Las Vegas for the triumvirate of conferences, Blackhat, Defcon, and BSidesLV, better known as hacker summer camp. If you've never been, it can be an intimidating experience. To help those who might be interested in some suggestions, I've compiled the list below from my own experience (starting with Defcon 13).

Think about what you want to get out of it. BH and DC are BIG. You can easily spend the entire time just wondering. You'll learn a lot about the conferences, but not necessarily security. Plan half a day to walk around and just see things, but have a better plan after that. Pick a few talks to go to (and wait in line for). Pick a village to sit in all day (I'm partial to BSidesLV Ground Truth as I help run it). Schedule to meet people (something I do a lot).
Thursday is a down day. The schedule says there's stuff going on, but not a lot. DON'T plan to wonder on Thursday. Nothing will be ready. Plan to do something on the schedule. Meet up with people. Volunteer. Visit the Grand Canyon. But don't just assume you'll have stuff to do.
Wear shorts. Most people will be in black t-shirts. You don't have to. a t-shirt, polo, or even short sleeve button-down is fine. Just don't do slacks and long sleeves. it's HOT.
Wear comfy shoes but don't stress over it. Whats comfortable at home will be comfortable there. I wear a pair of dock shoes (sparreys).
Don't rent a car unless you'll be driving out away from las vegas (to the grand canyon or such). Instead get a week ticket for the Deuce (double decker bus on the strip)
Don't worry about your electronics. I can't find documentation of a single breach related to a compromise at BH/DC. The BH/DC noc operators have been doing it longer than those trying stuff and are generally safe. Still, patch all your stuff before going and try to use a VPN for all communication including mobile. (There will be lots of fake cell towers though the police have been cracking down on it a bit I think.)
I prefer to get a microwave and get some food, especially breakfast food, to eat in my hotel room. Food tends to be a huge portion of the cost of going and eating a bagel and some fruit and yogurt in your room for breakfast can help keep you grounded.
Speaking of being grounded, Las Vegas is a city of haves and have nots. You'll be living the good life, pampered by vendors, etc. Consider giving to those who don't have by volunteering at or donating to the Las Vegas Rescue Mission (https://vegasrescue.org/) or such.
Speaking of parties, go to one, but most are going to be either loud, over-crowded, and obnoxious or hard to get into and pretentious. (There are a very few that facilitate socializing like the bsides las vegas pool party.) Better though to go to bed early and try and have breakfast with new people each day. I generally follow groucho marx's rule for parties.
Go to some talks. Lots of people put a lot of work in to talk about lots of things. And not just the big showie talks. Those tend to be spectacle. Instead find lesser known people talking about their passion. And plan to get in, talks have waiting lines that can be LONG. Especially at defcon.
And see a show or two. Go to the day-of discount booth and get tickets to some big show (Every casino has one) but also to the little lounge shows (Burlesque, Hypnotist, Comedy, etc). Ask the hotel what smaller shows they have and what others are around.
don't bother gambling. Your time around many of the best security professionals in the world is limited. Don't waste it on throwing your money away. You can do that any time.
Don't plan to go back to your hotel room. Put everything you need for the day in a bag and go (water, snacks, clothes, batteries, etc). That includes electronics, extra power, water, and clothes if changing for the evening, (whether an extra t-shirt to replace your sweaty one or your slacks for a nice evening out). It can take you an hour to get back to your hotel and back out again and you don't want to waste that.
Take one set of nice clothes (business casual, maybe a tie and jacket, in case you want to go somewhere nice one night. Make SURE to bring close-toed shoes. Some nice restaurants will refuse you in sandals. (goes for women too).
Bring extra power. The wireless environment is FLOODED. it will DRAIN all your devices. I can drain the battery in every device I bring 2-3 times a day. USB batteries are a MUST and if you don't need the wifi on on your device, just leave it off.
Read this blog: How to Converse Better in Infosec and this one: How to Handle Being Questioned on asking & receiving questions.
Bring a big, boxy suitcase so if you find cool stuff you can bring it back. (I've flown servers back before.)
Remember that blocks in Las Vegas are about a mile. Don't look at google maps and think "it's only one block".
If you see someone you recognize in infosec (a speaker you look up to, a company CEO, etc), walk up and say "Hi. I'm <your name>. I love your work. I'm curious about what you're interested in these days." If they excuse themselves, that's fine. They may be in between things. (I've heard of people taking an hour or more to get from the hotel lobby to their room because they meet so many people that know them along the way.) If they mumble something, that's ok. After talks particularly speakers are worn out mentally. If they tell you off, that's ok. Some people are jerks. But none of those things cost you anything and the potential for a good conversation is HUGE.
If you see someone you _don't_ recognize, say "Hi. I'm <your name>. What brings you here?" Again, they could not talk to you for any number of reasons, but I have met all sorts of super interesting people just being willing to meet with whoever is willing to meet with me.
Lots of people like badges. Some are super cool. I'll be honest, all my old badges, electronic or not, are hanging in my closet taking up room I need for other things. If you want a fancy defcon badge, get a badge early as they tend to run out and then hand out paper. If I get a fancy badge and they run out, I tend to trade it to someone whose there for the first time who doesn't have one. I've got enough badges and your first defcon badge is special.
The minimum rule is 1 shower, 2 meals, and 3 hours of sleep. Personally, I get a full nights sleep, I eat all my meals, and of course shower and use deodorant.

I'm sure there's much more I'm forgetting. I'll update it if I think of anything else.

Also, you can search twitter for #gooddefconadice (or #baddefconadvice) but take it with a grain of salt.

Presentation timing like a BOSS

2018-04-25T07:21:00.000-07:00

Introduction

This year as I prepared for my RSA talk, Building a Data-Driven Security Strategy, I decided to do something slightly different. I modeled my timing practice after video game speedrunners. Ultimately it was a good experience that I plan to repeat. Here's the story.

What is a speedrun?

One thing I do to relax is watch video game speedruns. This is when people try and complete a video game as quickly as possible. (It’s so competitive that on some improvements in records are measured in terms of frames and some players spend months or even years, playing hundreds of thousands of attempts, to try and beat a record.)

One thing they all have in common is they use software to measure how long the attempt (known as a run) takes. Most break the runs down into sections so they can see how well they are doing at various parts of the game. To do this, they use timing software which measure their time per section, and overall time. Additionally, each run is individually stored and their current run is compared to previous runs.

Speedrunning for presenting

This struck me as very similar to what we do for presentations, and so for my presentation, I decided to use a popular timer program, livesplit (specifically livesplit one) to measure how well I did for each practice run of my presentation. Basically, every time I practiced my presentation, I opened the timer program and at each section transition, I clicked it. While the practice run was going, the software would indicate (by color and number) if I was getting close to my comparison time (the average time for that section). Each individual run was then saved in a livesplit xml file (.lss). I’ve attached mine for anyone that wants to play with it here.

Figure 1

The initial sections analysis (Figure 1) showed some somewhat dirty data. First, there probably shouldn’t be a run -1. Also, runs 4 and 5 look to not be complete. So we’ll limit our analysis to runs 7 to 20. For some reason, The introduction section in runs 9, 14, and 18 seems to be missing, so we’ll eliminate those times as well. It’s worth noting that incomplete runs are common in the speedrunning world and so some runs where no times are saved will be missing and other runs where the practice was cut short will exist as well. It’s also relevant that ‘apply’ and ‘conclusion’ were really mostly the same section and so I normally let ‘apply’s split run until the end of the presentation, making ‘conclusion’ rarely occur at all.

Figure 2 and 3 look much better. A few things that start popping out. First, I did about 20 practice runs though the first several were incomplete. Looking at Figure 2, we see that some sections like ‘introduction VMOS and Swot’, ‘apply’, and ‘data driven strategy’ decrease throughout the practice. On the other hand, ‘example strategies’ and ‘example walkthrough’ increased at the expense of ‘define strategy’. This was due to pulling some example and extra conversation out of the former as feedback I got suggested I should spend more time on the latter. Ultimately it looks like a reduction of about 5 minutes from the first runs to the final presentation on stage (run 20).

The file also provides the overall time for each time. Figure 4 gives a quick look. We can compare it to Figure 3 and see it’s about what we expect. A slight decline from 45 to 40 minutes in runtime between run 7ish to run 20.

Figure 4

Figure 5

We can also look at actual practice days instead of run numbers. Figure 5 tells an interesting story. I did some rough tests of the talk back in December. This was when I first put the slides together in what would be their final form. Once I had that draft together, I didn’t run it through January and February (as I worked on my part the DBIR). After my DBIR responsibilities started to slow and the RSA slide submission deadline started to come up, I picked back up again. The talk was running a little slow at the beginning of march, however through intermittent practice and refinement I had it down where I wanted it (41-43 minutes) in late March and early April. I had to put off testing it again during the week before and week of the DBIR launch. After DBIR launch I picked it up and practiced it every day while at RSA. It was running a little slow (2 runs over 43 minutes) at the conference, but the last run the morning of was right at 40 minutes with the actual presentation coming in a little faster than I wanted at 39 minutes.

We can take the same look at dates, but by section. Figures 6 and 7 provide the story. It’s not much of a difference, but it does put into perspective the larger changes in the earlier runs as substantially earlier in the development process of the talk.

Conclusion

Ultimately I find this very helpful and suspect others will as well. I regularly get questions such as “how many times do you practice your talk?” or “how long does it take you to create one”. Granted it’s a sample size of 1, but it helps give an idea of how the presentation truly evolved. I can also see how the changes I made as I refined the presentation affected the final presentation. Hopefully a few others will give this a try and post their data to compare!

Oh, and for those adventurous types, you can see the basic analysis I did in my jupyter notebook here.

The Good, The Bad, and the Lucky - (Why improving security may not decrease your risk)

2018-02-12T10:31:00.004-08:00

Introduction

The general belief is that improving security is good. Traditionally, we assume every increment ‘x’ you improve security, you get a incremental decrease ‘y’ in risk. (See the orange 'Traditional' line below.) I suspect that might not be the case. I made the argument in THIS blog that our current risk markers are unproven and likely incorrect. Now I’m suggesting that even if you were able to accurately measure risk, it might not matter as what you do might not actually change anything. Instead, the relationship may be more like blue 'Proposed' line in the figure below. Let's me explain it and why it matters...

Threats

I think we can break attacks into two groups:

Already scaled, automated attacks.
Everything else (including attacks that could be automated or even are automated, but not scaled.)

Type-1 is mostly single-step attacks. Attackers invest in a single action and then immediately get the return on that investment. These could be ransomware, DoS, it shoautomated CMS exploitation, or phishing leading to stolen credentials, to compromised bank accounts.

Type-2 includes most of what we traditionally think of as hacking. Multi-step attacks including getting a foothold, pivot internally, and exfiltrate information. Not-petya attacks would fall in here as would the types of hacks most pen testers simulate.

Security Sections

Section one in the above figure is driven by risk from type-1 attacks. If you are vulnerable to these, you are just waiting your turn to be breached. Sections two and three relate to type-2 attacks.

In section two, your defenses, are good enough to stop type-1 attacks, but are likely not good enough to stop attackers willing and able to execute type-2 attacks. This is because, having an ability to execute a multi-step attack flexibly, the threat here has many different paths to choose from. If you either aren't studying all of your attack paths in context with each other, or are simply not able to handle everything thrown at you, the attacker gets in regardless of what security you do have. As such, the primary driver of risk is attacker selection (mostly unrelated to your security).

Once your security reaches section three, you start to have the path analysis and operational abilities to stave off attacks that can flexibly take different paths. As such, the more you improve, the more you see your risk go down (if you can measure it).

Risk vs Security

The first takeaway is that if you are in section one, you are a sitting duck. Automated attacks will find you on the internet and compromise you. Imagine the attackers with a big to-do list of potential victims and some rate at which they can compromise them. You are on that list somewhere, just waiting your turn. You need to get out of section one.

The second takeaway is that if you are better than the first section, it doesn’t really matter what you do. Increasing your security doesn’t really do anything until you get to a pretty darn mature point. All the actors looking for a quick ROI are going to be focused on section one. There are so many victims in section one that to target section two they would literally have to stop attacking someone easier. Even as type-2 attacks become commoditized, there’s absolutely no incentive to expand until either all of section one victims are exploited or the type-2 attack becomes a higher Return on Investment (ROI) than an existing type-1 attack. Here, because the attacks are type-2 attacks, the biggest predictor of if you will be breached is if you are targeted.

That is, until you get to section three. In this section, security has started to improve to the point where even if you are targeted, your security plays a significant role in if you are breached or not. These are the organizations that 'get it' when it comes to information security. The reality is most organizations probably are not able to get here, even if they try. The investment necessary in security operations, advanced risk modeling, and corporate culture are simply outside the reach of most organizations. Simply buying tools is not going to get you here. On the other hand, if you're going to try to get here, don't stop half-way. Otherwise you've wasted all investment since you left section one.

There is another scenario where someone not engaged in section one decides to go after the section two pool of victims with an automated attack. (Something like not-petya would work.) If this was common, it'd be a different story. However, there's no incentive for a large number of attackers to do this (as the cost is relatively fixed, and multiple attackers decreases the available victims for each). In this case, the automated attack ends up being global news because it's so wide-spread. As such, rules are created, triage is executed, and, in general, the attacker would have to continue significant investment to maintain the attack, decreasing the ROI. Given the easy ROI in section one, the sheer economics will likely prevent this kind of attack in section two.

Testing

Without testing, it's relatively hard to know in which section you are in. Pen testing might tell you how well you do in sections two and three, but knowing you lose against pen testers doesn't even tell you if you are out of section one. Instead, you need security unit testing to replicate type-1 attacks and verify that your defenses mitigate the risk.

If you never beat the pen testers, you're not in section three. However, once you start to be able to handle them, it's important to measure your operations more granularly. Are you getting better in section three or slipping back towards section two? That means measuring how quickly operations catches threats and what percent of threats they catch. Again, automated simulation of a type-2 attacks can help you capture these metrics.

Conclusion

Most organizations should be asking themselves "Am I in section one and, if so, how do I get out?" Even if you aren't in section 1, commoditization of new attacks may put you there in the near future. (See phishing, botnets, credential stuffing, and ransomware as examples over the last several years.) You need to continue to invest in security to remain ahead of section one.

On the other hand, you may just have to accept being in section two. You can walk into an organization and, in a few minutes know whether they 'get it' or not when it comes to security. Many organizations will simply never 'get it'. That's ok, it just means you're not going to make it to section three so best not to waste investment on trying. Better to spend it to stay out of section one.

However, for the elite few organizations that do 'get it', section three takes work. You need to have staff that can close their eyes and see your attack surface with all of your risks in context. And you need a top-tier security operations team. Investment in projects that take three years to fund and another two to implement may keep you out of section one, but it's never going to get you into section three. To do that you need to adapt quickly to the adversary and meet them toe-to-toe when they arrive. That requires secops.

CFP Review Ratings

2018-01-26T14:14:00.001-08:00

Introduction

We recently completed the bsides Nashville CFP. (Thank you all who submitted. Accepts and rejects will be out shortly.) We had 53 talks for roughly 15 slots so it was a tough job. I sympathize with the conferences that have in the 100's or 1,000's of submissions.

CFP Scoring

Our CFP tool provides the ability to rate talks from 1 to 5 on both content and applicability. However, I've never been happy with how it condenses this down to a single number across all ratings.

Our best guess is it simply averages all the values of both types together. Such ratings would look like this:

(We've removed the titles as this blog is not meant to reflect on any specific talk.)

This gives us _a_ number (the same way a physician friend of mine used to say ear-thermometers give you _a_ temperature) but is it a useful one?

First, let's use the mean instead of the median:

The nice thing about the median is it limits the effect of ratings that are way out of line. In many of our talks, one person dislikes it for some reason and gives it a substantially lower rating than everyone else. We see talks like 13 shoot up significantly. It also can cause drops such as talk 51.

Scoring with a little R

But what really would be helpful is to see _all_ the ratings:

Here we can see all of the ratings broken out. It's more complex but it gives us a better idea of what is actually happening for any one talk. The green dot is the median for all ratings combined. The red dots are the talks' median value. And the grey dots are individual ratings.

We can look at 13 and see it scored basically all 5's except for for one 4 in applicability, 1 'average' rating, and one below average rating bringing the median up to 5-5. When we look at 51, we see how it had a few slightly-below-average ratings and several below-average on content, and several below-average on both content and applicability. We also get to compare to the mean of all talks (which is actually 4-4) rather than assuming 3-3 is average for a talk.

One I find particularly interesting is 29. It scored average on applicability, but it's content score, which we would want to be consistently high, is spread from 1 to 4. Not a good sign. In the first figure, it scored a 3.2 (above average if we assume 3 is average since no average is shown). In the median figure, it is 3. But in this view we can see there are significant content concerns about this talk.

Conclusion

Ultimately, we used this chart to quickly identify the talks that were above or below the mean for both content and applicability. This let us focus our time on the talks that were near the middle and gave us additional information, in addition to the speaker's proposal and our comments, to make our decision on. If you'd like to look at the code for the figures, you can see the jupyter notebook HERE.

Future Work

In the future I could see boiling this down to a few basic scores: content_percentile, applicability_percentile, content_score range, and applicability_score range as a quick way to automate initial scoring. We could easily write heuristics indicating that we want content ratings to meet a certain threshold and be tightly grouped as well as set a minimum threshold for applicability. This would let us more quickly zero in on the talks we want, (and might help larger conferences as well).

Smaller Graphs for Easier Viewing

2018-01-07T17:04:00.000-08:00

Introduction

As I suggested in my previous blog, visualizing graphs is hard. In the previous blog I took the approach of using a few visual tricks to display graph information in roughly sequential manner. Another option is to convert the graph to a hierarchical display. This is easy if you have a tree as hierarchical clustering or maximal entropy trees will do.

Maximal Entropy Graphs

However, our data is rarely hierarchical and so I've attempted to extend maximal entropy trees to graphs. First, it starts with the assumption that some type of weight exists for the nodes. However, this can simply be uniform across all nodes. This weight is effectively the amount of information the node contains. As in the last blog, this could be scored relative to a specific node in the graph or about any other way. It then combines nodes along edges, attempting to minimize the amount of information contained in any aggregated node. It continues this approach until it gets to the desired number of nodes, however it keeps a history of every change so that any node can be de-aggregated.

You can find the code in this Github GIST. You can then try it out in This Jupyter Notebook.

Ultimately, it takes a graph like this:

and produces one that looks like this:

Each node still contains all the information about the nodes and edges it aggregates. This allows an application to dig down into a node as necessary.

Future Work

Obviously there's a lot to do. This is less a product in and of itself than a piece for making other graph tools more useful. As such, I should probably wrap the algorithm in a visualization application that would allow calculating per-node scores as well as diving in and out of the sub-graphs contained by each node.

Also, a method for generating aggregate summarizes of the information in each node would be helpful. For example, if this is a maltego-type graph and a cluster of IPs and Hosts, it may make sense to name it a IP-Host node with a number of IP-Host edges included. Alternately, if a node aggregates a path from one point to another through several intermediaries, it may make sense to note the start and endpoint, shortest path length, and intermediary nodes. I suspect it will take multiple attempts to come up with a good name generation algorithm and that it may be context-specific.

Conclusion

In conclusion, this is another way of making otherwise illegible graphs readily consumable. Graphs are incredibly powerful in many contexts including information security. However methods such as this are necessary to unlock their potential.

Visualizing Graph Data in 3D

2018-01-05T18:00:00.004-08:00

Introduction

One thing that's interested me for a while is how to visualize graphs. There are a lot of problems with it I'll go into below. Another is if there is a way to use 3D (and hopefully AR or VR) to improve visualization. My gut tells me 'yes', however there's a _lot_ of data telling me 'no'.

What's so hard about visualizing graphs?

Nice graphs look like this:

This one is nicely laid out and well labeled.

However, once you get over a few dozen nodes, say 60 or so, it gets a LOT harder to make them all look nice, even with manual layout. From there you go to large graphs:

In this case, you can't tell anything about the individual nodes and edges. Instead you need to be able to look at a graph laid out in a certain way algorithmically and understand it. (This one is actually a nice graph as the central cluster is highly interconnected but not too dense. I suspect most of the outlying clusters are hierarchical in nature leading to the heavily interconnected central cluster.)

However, what most people want is to look at a graph of arbitrary size and understand things about the individual nodes. When you do that you get something like this:

Labels overlapping labels, clusters of nodes where relationships are hidden. Almost completely unusable. There are some highly interconnected graph structures that look like this no matter how much you try to lay them out nicely.

It is, ultimately, extremely hard to get what people want from graph visualizations. You can get it in special cases and with manual work per graph, but there is no general solution.

What's so hard about 3D data visualization?

In theory, it seems like 3D should make visualization better. It adds an entire dimension of data! The reality is, however, we consume data in 2D. Even in the physical world, a stack of 3D bars would be mostly useless. The 3rd dimension tells us more about the shape of the first layer of objects in front of us. It does not tell us anything about the next layer. As such, visualizations like this are a 2D bar chart with clutter behind them:

Even when the data is not overlapping, placing the data in three dimensions is fundamentally difficult. In the following example, it's almost impossible to tell which points have what values on what axes:

Granted, there are ways to help improve this, (mapping points to the bounding box planes, drawing lines directly down from the point to the x-y plane, etc), but in generally you would only do that if you _really_ needed that 3rd dimension (and didn't have a 4th). Otherwise you might as well use PCA or such to project into a 2D plot. Even a plot where the 3rd dimension provides some quick and easy insight, can practically be projected to a 2D heatmap:

Do you really need to visualize a graph?

Many times when people want to visualize a graph, what they really want is to visualize the data in the graph in the context of the edges. Commonly, graph data can be subsetted with some type of graph transversal (e.g. [[select ip == xxx.xxx.xxx.xxx]] -> [[follow all domains]] <- [[return all ips]]) and the data returned in a tabular format. This is usually the best approach if you are past a few dozen nodes. Even if long, this data can easily be interpreted as many types of figures (bar, line, point charts, heatmaps, etc). Seeing the juxtaposition between visualizing graphs because they were graphs, but when the data desired was really tabular heavily influenced how I approached the problem.

My Attempt

First, I'll preface this by saying this is probably a bad data visualization. I have few reasons to believe it's a good one. Also, it is extremely rough; nothing more than a proof of concept. Still, I think it may hold promise.

The visualization is a ring of tiles. Each tile can be considered to be a node. We'll assume each node has a key, a value, and a score. There's no reason there couldn't be more or less data per node, but the score is important. The score is "how relevant a given node is to a specified node in the graph. This data is canned, but in an actual implementation, you might search for a node representing a domain or actor. Each other node in the graph would then be scored by relevance to that initial node. If you would like ideas on how to do this, consider my VERUM talk at bsidesLV 2015. For now we could say it was simply the shortest path distance.

One problem with graphs is node text. It tends to not fit on a node (which is normally drawn as a circle). In this implementation, the text scrolls across the node rectangle allowing quick identification of the information and detailed consumption of the data by watching the node for a few seconds. All without overlapping the other nodes in an illegible way.

Another problem is simply having two many nodes on screen at once time. This is solved by only having a few nodes clearly visible at any given time (say the front 3x3 grid). This leads to the question of how to access the rest of the data. The answer is simply by spinning the cylinder. The farther you get from node one (or key1 in the example), the less relevant the data. In this way, the most relevant data is also presented first.

You might be asking how much data this can provide. A quick look says there are only 12 columns in the cylinder resulting in 36 nodes, less than even the 60 we discussed above. Here we use a little trick. As nodes cross the centerline on the back side, they are actually replaced. This is kind of like a dry-cleaning shop where you can see the front of the clothing rack, but it in fact extends way back into the store. In this case, the rack extends as long as we need it to, always populated in both directions.

Demo

I highly recommend you try out the interactive demo above. it is not pretty. The data is a static json file, however that is just for simplicity.

Future Work

Obviously there's many things that can be done to improve it:

A search box can be added to the UI and a full back end API to populate the visualization from a graph.
Color can be added to identify something about the nodes such as their score relative to the search node or topic.
The spacing of the plane objects and camera can be adjusted.
Multiple cylinders could exist in a single space at the same time representing different searches.
The nodes could be interactive.
The visualizations could be located in VR or AR.
Nodes could be selected from a visualization and the sub-graph returned in a more manageable size (see the 60ish node limit above). These subgraphs could be stored as artifacts to come back to later.
The camera could be within the cylinder rather than outside of it.

Conclusion

I'll end the same way I began. I have no reason to believe this is good. It is, at least, an attempt to address issues in graph visualization. I look forward to improving on it in the future.

Building a SEIM Dashboard with R, Jupyter, and Logstash/Elastic Search

2018-01-05T16:41:00.004-08:00

Motivation:

I am disappointed with the dashboards offered by today's SEIMs. SEIM dashboards offer limited data manipulation through immature, proprietary query languages and limited visualization options. Additionally, they tend to have proprietary data stores that limit expansion and evolution to what the vendor supports. Maybe I'm spoiled by working in R and Rstudio for my analysis, but I think we can do better.

Plan:

This blog is mainly going to be technical steps vs a narrative. It is also not the easiest solution. The easiest solution would be to already have the ELK stack, install interact.io, R, the R libraries, and the R jupyter kernel on your favorite desktop, and connect. That said, I'm going to walk through the more detailed approach below. You can view the example notebook HERE. Make sure to scroll down to the bottom where the figures are as it has a few long lists of fields.

Elastic search is becoming more common in security, (e.g. 1, e.g. 2). Combine that with the elastic package for R, and that should bring all of the great R tools to our operational data. Certainly we can create regular reports using Rmarkdown, but can we create a dashboard? Turns out with Jupyter you can! To test it out, I decided to stand up a Security Onion VM, install everything needed, and build a basic dashboard to demonstrate the concept.

Process:

Install security onion:

Security onion has an EXCELLENT install process. Simply follow that.

Install R:

Added ‘deb https://mirrors.nics.utk.edu/cran/bin/linux/ubuntu trusty/‘ to packages list

sudo apt-get install r-base

sudo apt-get install r-base-dev

— based off r-project.org

Install R-studio (not really necessary but not a bad idea)

Downloaded r-studio package from R-studio and installed

Sudo apt-get install libjpeg62

sudo dpkg -I package.deb

Install Jupiter:

(https://www.digitalocean.com/community/tutorials/how-to-set-up-a-jupyter-notebook-to-run-ipython-on-ubuntu-16-04)

Sudo apt-get install python-pip

sudo pip install —upgrade pip (required to avoid errors)

sudo -H pip install jupyter

Install Jupyterlab: (probably not necessary)

Sudo -H pip install jupyterlab

Sudo jupyter serverextension enable --py jupyterlab --sys-prefix

Install Jupiter dashboard

(https://github.com/jupyter/dashboards)

sudo -H pip install jupyter_dashboards

sudo -H pip install --upgrade six

Sudo jupyter dashboards quick-setup --sys-prefix

Install R packages & Jupypter R kernel:

Sudo apt-get install libcurl4-openssl-dev

sudo apt-get install libxml2-dev

Start R

install.packages("devtools") # (to install other stuff)

install.packages(“elastic”) # talk to elastic search

install.packages(“tidyverse”) # makes R easier

install.packages("lubridate") # helps with working with dates

install.packages("ggthemes") # has good discrete color palettes

install.packages("viridis") # has great continuous colors

# https://github.com/IRkernel/IRkernel

devtools::install_github('IRkernel/IRkernel')

# or devtools::install_local('IRkernel-master.tar.gz')

IRkernel::installspec() # to register the kernel in the current R installation

quit() # leave. Answer ’n’ to the question “save workspace?”

Install nteract: (Not necessary)

(nteract.io)

Download the package

Sudo apt-get install libappindicator1 libdbusmenu-gtk4 libindicator7

sudo dpkg -i nteract_0.2.0_amd64.deb

Set up the notebook:

Rather than type this all out, you can download an example notebook. In case you don't have an ES server populated with data, you can download this R data file which is a day of windows and linux server logs queried from ES from a blue vs red CTF.

I created the notebook using nteract.io so it is in a single order. However, if you open it on the juypter server, you can use the dashboards plugin to place the cells where you want them in a dashboard.

Results:

A lot of time spent compiling.

No need to download R/jupyter stuff on security onion if elastic search is remotely reachable.

Elastic search is not intuitive to query. Allowing people an 'easy mode' to generate queries would be significantly helpful. the `ES()` function in the workblook is an attempt to do so.

It would be nice to be able to mix interactive and dashboard cells.

This brings MUCH more power for both analysis _and_ visualization to the dashboard.

This brings portability, maintainability (ipynb files can be opened anywhere that has the R/jupyter environment and can access elastic search. They can also be forked, version controlled, etc.)

Future Work:

Need a way to have cells refresh every few minutes, likely a jupyter notebook plugin.

Interactive figures require interactive plotting tools such as Vega. This would also bring the potential ability to stream data directly to the notebook. It may even solve the ability to auto-refresh.

Conclusion:

In conclusion, you really don't want to roll-your-own-SEIM. That said, if you already have ES (or another data store R can talk to) in your SEIM and want less lock-in/more analysis flexibility, R + Jupyter may be a fun way to get that extra little emph. And hopefully in the future we'll see SEIM vendors supporting general data science tools (such as R or Python) in their query bars and figure grammars (ggplot, vega, vegalite), in their dashboards.

Building a ggplot2 stat layer

2017-09-18T08:11:00.001-07:00

Introduction

This blog will be about my experience building a simple stat layer for ggplot2. (For those not familiar, ggplot2 is a plotting library for the R programming language that his highly flexible and extendable. Layers are things like bars or lines.)

Figure 1: Example exploratory analysis figure

When we write the DBIR, we have automated analysis reports that slice and the data and generate figures that we look at to identify interesting concepts to write about. (See Figure 1. And Jay, if you're reading this, I know, I haven't changed it much from what you originally did.) (Also, for those wondering, 'ignored' is anything that shouldn't be in the sample size. That includes 'unknown', which we treat as 'unmeasured' and possibly 'NA' for 'not applicable.)

In Figure 1, you'll notice light blue confidence bars (Wilson binomial tests at a 95% confidence interval.) If you look at 'Former employee', 'Activist', and 'Nation-state', you can see they overlap a bit. What I wanted to do was visualize Turkey groups (similar to multicomp::plot.cld) to make it easy for myself and the other analysts to tell if we could say something like "Former employee was more common than Activist" (implying the difference is statistically significant).

First step: find documentation

The first step was actually rather hard. I looked for good blogs of others creating ggplot stats layers, but didn't turn anything up. (I'm not implying they don't exist. I'm terrible at googling.). Thankfully someone on twitter pointed me to a vignette on extending ggplot in the ggplot package. It's probably the best resource but it really wasn't enough for me to decide how to attack the problem. I also picked up ggplot2: Elegant graphics for data analysis in the hopes of using it to understand some of the ggplot2 internals. It primarily dealt with scripting ggplot functionality rather than extending it so I settled on using the example in the vignette as a reference. With that, I made my first attempt:

### Attempt at making a ggplot stat for independence bars
#' Internal stat to support stat_ind
#' 
#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @export
StatInd <- ggplot2::ggproto("StatInd", ggplot2::Stat,
  required_aes=c("x", "y", "n"), 
  compute_group = function(data, scales) {
    band_spacing <- -max(data$y/data$n, na.rm=TRUE) / 5
    # band_spacing <- 0.1
    bands <- testIndependence(data$y, data$n, ind.p=0.05, ind.method="fisher")

    bands <- tibble::as.tibble(bands) # Necessary to preserve column names in the cbind below
    bands <- cbind("x"=data[!is.na(data$n), "x"], bands[, grep("^band", names(bands), value=T)]) %>%
      tidyr::gather("band", "value", -x) %>%
      dplyr::filter(value) %>%
      dplyr::select(-value)
    y_locs <- data.frame("band"=unique(bands$band), "y"=(1:dplyr::n_distinct(bands$band))/dplyr::n_distinct(bands$band) * band_spacing) # band spacing v2
    bands <- dplyr::left_join(bands, y_locs, by="band")
    bands[ , c("x", "y", "band")]
  }
)


#' ggplot layer to produce independence bars
#' 
#' @rdname stat_ind
#' @inheritParams ggplot2::stat_identity
#' @param na.rm Whether to remove NAs
#' @export
stat_ind <- function(mapping = NULL, data = NULL, geom = "line", 
                     position = "identity", na.rm = FALSE, show.legend = NA,
                     inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatInd, data = data, mapping = mapping, geom = geom,
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

While it didn't work, it turns out, I was fairly close. I just didn't know it.

Since it didn't work, I looked into multiple non-layer approaches including those in the book, as well as simply drawing on top of the layer. Comparing the three options I had:

There approaches that didn't involve actually building a stat layer that probably would have worked, however they were all effectively 'hacks' for what a stack layer would have done.
The problem with simply drawing on the layer was that the labels for the bars would not be adjusted.
The problem with using a stat (other than it not currently working) was I was effectively drawing a discrete value (band #) onto a continuous axis (the bar height) on the other side of the axis line.

Ultimately I decided to stick with this though it's technically anti-ggplot to mix axes. (In the Tukey plot, the 'band' is technically a categorical variable, however we are plotting it on the long side of the bars, which is discrete.)

Fixing the stat layer

I decided to look at some of the geoms that exist within ggplot. I'd previously looked at stat_identity() to no avail. It turned out that was a mistake as stat_identity() basically does nothing. When I looked at stat_sum() I found what I needed. In retrospect, it was in the vignette, but I missed it. Since I was returning 'band' as a major feature, I needed to define `default_aes=ggplot2::aes(color=..band..)`. (At first I used `c(color=..band..)` though I quickly learned that doesn't work.)

StatInd <- ggplot2::ggproto("StatInd", ggplot2::Stat,
  default_aes=ggplot2::aes(color=..band..),
  required_aes=c("x", "y", "n"),
  compute_panel = function(data, scales) {
    band_spacing <- -max(data$y/data$n, na.rm=TRUE) / 5
    # band_spacing <- 0.1
    bands <- testIndependence(data$y, data$n, ind.p=0.05, ind.method="fisher")
    bands <- tibble::as.tibble(bands) # Necessary to preserve column names in the cbind below
    bands <- cbind("x"=data[!is.na(data$n), "x"], bands[, grep("^band", names(bands), value=T)]) %>%
      tidyr::gather("band", "value", -x) %>%
      dplyr::filter(value) %>%
      dplyr::select(-value)
    y_locs <- tibble::tibble("band"=unique(bands$band), "y"=(1:dplyr::n_distinct(bands$band))/dplyr::n_distinct(bands$band) * band_spacing) # band spacing v2
    bands <- dplyr::left_join(bands, y_locs, by="band")
    bands[ , c("x", "y", "band")]
  }
)

With that, I had a working stat.

Figure 2: It works!

Fixing the exploratory analysis reports

Unfortunately, our (DBIR) exploratory reports are a bit more complicated. When I added `stat_ind()` to our actual figure function (which adds multiple other ggplot2 pieces), I got:

This lead a ridiculous amount of hunting. I fairly quickly narrowed it down to this line in the analysis report:

gg <- gg + ggplot2::scale_y_continuous(expand = c(0, 0), limits=c(0, yexp), labels=scales::percent) # yexp = slight increase in width

Specifically, the `limits=c(0, yexp)` portion. Unfortunately, I got stuck there. Things I tried:

Changing the limits (to multiple different things, both values and NA)
Setting debug in the stat_ind() `compute_panel()` function
Setting debug on the scale_y_continuous() line and trying to step through it
Rereading the vignette
Rereading other stat_* functions
Adding the required aesthetics to the default aesthetics
Adding group to the default aesthetics

What finally worked with this:

Google the error message
Find it in the remove_missing() function in ggplot2
Find what calls remove_missing(): compute_layer()
Copy compute_layer() from the stat prototype into stat_ind(). Copy over internal ggplot2 functions that are needed by the default compute_layer() function. Hook it for debug.
Looking at the data coming into the compute_layer() function, I see the figure below with 'y' all 'NA'. Humm. That's odd...
Look at the data coming in without the ylimit set. This time 'y' data exists.
Go say 'hi' to my 2yo daughter. While carrying her around the house, realize that 'y' is no longer within the limits ....

Figure 3: Data parameter into compute_layer() with ylimits set

Figure 4: Data parameter into compute_layer() without ylimits set

So, what happened is, when `limits=c(0, yexp)` was applied, the 'y' data was replaced with 'NA' because the large integer values of 'y' was not within the 0-1 limits. `compute_layer()` was then called, which called `remove_missing()` which removes all the rows with `NA`. This caused the removed rows.

The reason this was happening is I'd accidentally overloaded the 'y' aesthetic. `y` meant something different to ggplot than it did to testIndependence(), (the internal function which calculated the bands). The solution was to replace 'y' with 's' as a required aesthetic. Now the function looks like this:

StatInd <- ggplot2::ggproto("StatInd", ggplot2::Stat,
  default_aes=ggplot2::aes(color=..band..), 
  required_aes=c("x", "s", "n"), 
  compute_panel = function(data, scales) {
    band_spacing <- -max(data$s/data$n, na.rm=TRUE) / 5
    # band_spacing <- 0.1
    bands <- testIndependence(data$s, data$n, ind.p=0.05, ind.method="fisher")
    bands <- tibble::as.tibble(bands) # Necessary to preserve column names in the cbind below
    bands <- cbind("x"=data[!is.na(data$n), "x"], bands[, grep("^band", names(bands), value=T)]) %>%
      tidyr::gather("band", "value", -x) %>%
      dplyr::filter(value) %>%
      dplyr::select(-value)
    y_locs <- tibble::tibble("band"=unique(bands$band), "y"=(1:dplyr::n_distinct(bands$band))/dplyr::n_distinct(bands$band) * band_spacing) # band spacing v2
    bands <- dplyr::left_join(bands, y_locs, by="band")
    bands[ , c("x", "y", "band")]
  } 
)

With that it finally started working!

Figure 5: Success!

Conclusion

Ultimately, I really stumbled through this. Given how many new ggplot2 geoms are always popping up, I expected this to be much more straight-forward. I expected to find multiple tutorials but really came up short. There was a lot of reading ggplot2 source code. There was a lot of trial and error. In the end though, the code isn't complicated. Much of the documentation however is within the code, or the code itself. Still, I'm hoping that the first is the roughest, and the next layer I create will come more smoothly.

The end of risk

2017-09-13T10:25:00.001-07:00

Introduction

I think risk is hurting infosec and may need to go.

First, a quick definition of risk:

Risk is the probable frequency and probable magnitude of future loss.

(Note: If this is not your definition of risk, the rest of the blog is probably going to make much less sense. Unfortunately why this is the definition of risk is outside the scope of this blog so will have to wait.)

In practicality, risk is a way to measure security by measuring the likelihood and impact of something bad happening that could have been prevented by information security.

Outcomes

That last line is a bit nebulous though right? Risk measures the opposite of what we're doing. So let's better define what we're doing. Let's call what we're doing an outcome: the end result of our structure and processes (for example, in healthcare, heart disease is a general term for negative cardiovascular outcomes). Next, let's define what we want:

(A statistically insignificant and heavily biased survey. Obviously. Why is this a good outcome? See the addendum.)

Measures, Markers, and Key Risk Indicators

Where we can directly measure this outcome, we call it a 'measure' (for example ejection fraction for heart disease) and life is a lot easier. For risk we have to use surrogate markers (for example cholesterol for heart disease), sometimes called a Key Risk Indicators in risk terms. Now when we say 'risk', we normally mean the indicators we use to predict risk. The most well respected methodology is FAIR though if you are currently using an excel spreadsheet with a list of questions for your risks, you can easily improve simply by switching to Binary Risk Assessment.

The problems with risk.

The first problem with risk is not with risk per se, but with the surrogate markers we measure to predict risk. In other fields such as medicine, before using a surrogate marker, there would be non-controversial studies linking the surrogate marker and the outcome. In security, I'm not aware of any study which shows that the surrogate markers we measure to determine risk, actually predict the outcome in a holistic way. In my opinion, there's a specific reason:

Because there are more legitimate targets than attackers, what determines an organization's outcome (at least on the attack side) is attacker choice.

You can think of it like shooting fish in a barrel. Your infosec actions may take you out of the barrel, but you'll never truly know if you're out of the barrel or your in the barrel and just weren't targeted. This, I think, is the major weakness in risk as a useful marker of outcomes.

This ignores multiple additional issues with risk such as interrelationships between risks, the impact of rational actors, and difficulty in capturing context, let alone problems in less mature risk processes that are solved in mature processes such as FAIR.

The second problem with risk is related to defense:

Risk does not explicitly help measure minimization of defense. It tells us nothing about how to decrease (or minimally increase) the use of resources in infosec defense.

I suspect we then implicitly apply an equation that Tim Clancy brought to my attention as foundational in the legal field: Burden < Cost of Injury × Probability of occurrence, (i.e. if Burden < Risk, pay the burden). It sounds good in theory, but is fraught with pitfalls. The most obvious pitfall is that it doesn't scale. Attacks are paths, except the paths are not in isolation. At any given step, they can choose to go left or right, in effect creating more paths than can be counted. As such, while one burden might be affordable, the sum of all burdens would bankrupt the organization.

What happens when markers aren't linked to outcomes?

As discussed in this thread, I think a major amount of infosec spending is socially driven. Either the purchaser is making a purchase to signals success ("We only use the newest next gen products!"), signal inclusion in a group ("I'm a good CISO"), or is purchasing due to herd mentality ("Everyone else buys this so it must be the best option to buy"). Certainly, as we've shown above, the spending is not related to the outcome. This begs the question of who sets the trends for the group or steers the herd. I like Marcus Carey's suggestion: The analysts and Value Added Resellers. Maybe this is why we see so much marketing money in infosec.

The other major driver is likely other surrogate markers. Infosec decision makers are starved for concrete truth, and so almost any number is clung to. The unfortunate fact is that, like risk, most of these numbers have no demonstrable connection to the outcome. Take hypothetical threat intelligence solution for example. This solution includes all IPv4 addresses as indicators. It has a 100% success rate in identifying threats. (It also has a near 100% false positive rate.) I suspect, with a few minor tweaks, it would be readily purchased even though it adds no value.

What can we do?

There are three questions we need to ask ourselves when evaluating metrics/measures/surrogate markers/KRI/however you refer to them. From here on out, I'll refer to 'them' as 'metrics'.

What is the outcome?
Why is this the right outcome? (Is it actionable?)
How do you know the metric is predicting this outcome?

For any metric you are considering using to base your security strategy on (the metric you use to make decisions of projects, purchases, etc with), you should be able to answer these three questions definitively. (In fact, this blog answers the first question and part of the second above in the first section.) I think there are at least three potential areas for future research that may yield acceptable metrics.

Operational metrics

I believe operational metrics have a lot of potential. They are easy to collect with a SEIM. The are actionable. They can directly predict the outcome above. ("The ideal outcome of infosec is minimizing infosec, attack and defense.") Our response process:

Prevent
Detect
Respond
Recover

should minimize infosec. With that in mind we can measure it:

Absolute count of detections. (Should go down with mitigations.)
Time to detect (Should go down with improved detection.) (Technically, absolute count of detections should go up with improved detection as well, but should also be correlated with an improved time to detect)
Percent detected in under time T. (Where time T is set such that, above time T, the attack likely succeeded.)
Percent responded to. (Depending on the classification of the incidents, this can tell you both how much time you are wasting responding to false positives and what portion of true attacks are are resolving.)
Time to respond. (Goal of responding in under T where T represents time necessary for the attack to succeed.)
Successful response. (How many attacks are you preventing from having an impact)

The above metrics are loosely based on those Sandia National Labs uses in physical security assessments. You can capture additional resource-oriented metrics:

Time spent on attacks by type. (This can help identify where your resources are being spent so you can prioritize projects to improve resource utilization)
Recovery resources used. (This can help assess the impact of failure in the Detect and Respond metrics.)
Metrics on escalated incidents. (Time spent at tier 1, type, etc. This may suggest projects to minimize tier 2 use and, therefore, overall resource utilization.)

Combine this with data from infosec projects and other measurable infosec resource costs and the impact of infosec on the organization (both attack and defense) can be measured.

Relative risk

Risk has a lot of good qualities. One way to get around it's pitfalls may be to not track risk in absolute terms (probability of impact size in FAIR's case), but in relative terms. Unfortunately, it removes the ability to give the business a single, "this is your risk" score except in terms relative to other organizations. But relative may be enough,. For implementing a security strategy where the goal is to pick the most beneficial course of action, relative risk may be enough to choose a course of action. The actions can even have defensive costs associated with them as well. The problem is that the defensive costs and the relative risk are not in the same units, making it hard to understand if purchasing a course of action is a net benefit.

Attacker cost

Finally, I think attacker cost may be a worthwhile area of research. However, I don't think it is an area that has been well explored. As such, a connection between maximizing attacker cost and "minimizing infosec" (from our outcome) has not been demonstrated. I suspect a qualified economist could easily show that as attacker costs go up, some attackers will be priced out of the market, and those that still could afford to attack, will choose less expensive sources to fulfill their needs. However a qualified economist, I am not. Second, I don't know that we have a validated way to measure attack 'cost'. It makes intuitive sense that we could estimate these costs. (We know how they are done and, as such, can estimate what it would cost us to accomplish the attack and any differences between us and attackers.) But before this is accepted, academic research in pricing attacks will be necessary.

Conclusion

So, from this blog, I want you to take a two things:

The fact that attackers pick targets means no-one really knows if their mitigations make them secure.
Three easy questions to ask when looking at metrics to guide your organization's security strategy

With a well-defined outcome, and good metric(s) to support it, you can truly build a data-driven security strategy. But that's a talk for another day.

Addendum

Why is this the right outcome? Good question. It captures multiple things at once. Breaches can be considered a cost associated with infosec and so are captured. However, it'd be naive to think that all costs attackers cause are associated with breaches (or even incidents). The generality of the definition allows it to be inclusive. It also captures the flip side: the goal of minimizing defenses. This is easy to miss, but critical to organizations. There is no benefit to infosec if the cost of stopping attacks is worse than the cost of the attacks. Ideally, stopping attacks would have zero cost of resources (though that is practically impossible). This outcome is also vague about the unit to be minimized allowing flexibility. (It doesn't say 'minimize cost' or 'minimize time'.) Ultimately it's up to the organization to measure this outcome. How they choose to do so will determine their success.

The Haves and the Have-Nots - Automation of Infosec

2017-08-24T06:24:00.000-07:00

Several years ago, I blogged about Balkanizing the Internet. More than ever it appears that a digital feudalism is emerging. A driver that I didn't necessarily consider is the automation of security.

Automation in Infosec

The future of security is speed and persuasiveness. Whoever accomplishes the OODA loop (or additive factors if you like) first has an incredible advantage. In information security, that means automation and machine learning making contextual decisions faster than humans ever could. It will be defense's algorithms against offense's. The second part is probably more interesting. Machine learning is output generated from input. In essence, humans are a much less predictable version of the same. As such, any actor or algorithm, offensive or defensive, that can figure out what input to the opposing side produces the outcome they want, and provide that input before losing will win. Because it needs to happen at speed, it's also likely to be algorithmic. We already train adversarial models to do this.

Infosec 1%'ers

The need for speed and persuasiveness driving automation and artificial intelligence in information security is it's own blog. I touch on it here because, in reality, it only describes the infosec 1%'ers. While a Google or Microsoft may be able to guard their interests with robust automation and machine learning, the local app developer, law office, or grocery store will not.

Which brings us to the recent malware. It should be a wake-up call to all information security professionals. It utilizes no new knowledge, but it provides a datapoint in the trend of automation. While the 1%, or even 50% defender might not be affected, the publicly known level of automation in infosec attack is easily ahead of a large portion of the internet and appears to be growing faster than defensive automation due to adherence to engineering practices for system management. Imagine malware automating the analysis process in bloodhound. Imagine an attack graph, knowledgeable about how to turn emails/credentials/vulnerabilities into attacks/malware, and malware/attacks into email/credentials, was built into a piece of malware, causing it to spread, unhindered as it creeps across the trust relationships that connect everyone on the planet. This could easily be implemented as a plugin for a tool such as armitage.

Balkanization

This is brings us back to the Balkanization of the Internet. In the near future, the only way to defend systems may be to cede control, regardless of the obligations, to the infosec 1%ers'. The only people protected will be those who allow automated systems to guard, modify, and manage their systems. Your choice may be to allow google to monitor all traffic on your internal network to allow their models to defend it, or quickly fall victim to roving automated threats. The internet will have devolved into roaming threats, only kept at bay by feudal lords able to oppose them.

PowerBI vs Tableau vs R

2017-08-10T09:39:00.002-07:00

Yesterday at the Nashville Analytics Summit I had the pleasure of demonstrating the strengths, weaknesses, similarities, and differences between Microsoft PowerBI, Tableau, and R.

The Setup

Last year when I spoke at the summit, I provided a rather in-depth review of of the DBIR data workflow. One thing I noticed is the talk was further along in the data science process from most attendees who were still working in Tableau or even trying to decide what tool to use for their organization. This year I decided to try and address that gap.

I recruited Kindall (a daily PowerBI user) and Ian (a daily Tableau user) to help me do a bake-off. Eric, our moderator, would give us all a dataset we'd never seen (and it turned out, in a domain we don't work in) and some questions to answer. We'd get them at 8:30 in the morning and then spend the day up until our talk at 4:15 analyzing the dataset and answering the questions. (I got the idea from the fuzzing vs reverse engineering panel at Defcon a few years ago.)

The dataset was about 100,000 rows and 50 or so columns (about half medications given) related to medical stays involving diabetes. The features were primarily factors of various sorts with a continuous feature for time in the hospital (the main variable of interest).

The Results

I'll skip most of the findings from the data as that wasn't really the point. Instead I'll focus on the tools. At a basic level, all three tools can create bar charts very quickly including color and alpha. Tableau and PowerBI were very similar so I'll start there.

Tableau and PowerBI Similarities

Both are dashboard based
Both are driven from the mouse, dragging and dropping features into the dashboard
Both have a set of visualization types pre-defined that can be used
Both allow interactivity out of the box with clicking one chart subsetting others

Tableau and PowerBI Differences:

PowerBI is a bit more web-based. It was easy to move from local to cloud and back.
PowerBI has more robust integration with other MS tools and will be familiar to excel users (though the formulas have some differences compared to excel as they are written in DAX).
PowerBI keeps a history of actions that allow you to go backwards and see how you got where you are.
To share a dashboard in PowerBI you simply share a link to it.
Finally, PowerBI is pretty easy to use for free until you need to share dashboards.
Tableau Is more desktop application based.
You can publish dashboards to a server if you have the enterprise version or you can install the Tableau viewer app (however that still requires the receiver install software). Also, sharing the actual workbook basically removes any security associated with your data.
Tableau dashboards can also be exported as PDFs but it is not the primary approach.
Tableau allows good organization of data within the GUI to help facilitate building the dashboard.
Tableau lacks the history though so there is no good way of telling how you did what you did.

Differences between R and Tableau/PowerBI

Most differences came between R and the other two tools

While PowerBI and Tableau are driven by the mouse and interact with a GUI, R is driven from the keyboard and interacts with a command-line.
In PowerBI or Tableau, initial investigation basically involves throwing features on the x and y axis and looking at the result. Both provide the ability to look at the data table behind the dashboard but it's not really part of the workflow. In R, you normally start at the data with something like `dplyr::glimpse()`, `summary()`, or `str()` which give you some summary statistics about the actual data.
In R you can build a dashboard similar to PowerBI or Tableau using the Shiny package, but it is _much_ harder. Rather than be drag-and-drop, it is very manual. To share the dashboard, the other person either needs Rstudio to run the app or you need a shiny server. (Shiny servers are free for a single concurrent user but cost money beyond that.)
R dashboards allow interaction, but it is again, more laborious.
R, however, you can actually do pretty much anything you want. As an example, we discussed plotting the residuals of a regression. In R it's a few lines. In Tableau and PowerBI there was no straight-forward method at all. The only options were to create a plot with a trend line (but no access to the underlying trend line model). We discussed building more robust models such as a decision tree for classification. Kindall found an option for it in PowerBI, but when she clicked it, it was basically just a link to R code. Finally, the concept of tidyr::gather() (which combines a set of columns into two columns, 1 for the column names, and one for the column values) was both unknown and very appealing to Ian but unavailable in Tableau.)
R can install packages. As far as we could tell, Tableau and PowerBI do not. That means someone can add Joy plots to R on a whim.
In R, making the initial image is harder. It's at least data plus an aesthetic plus a geom. To get it to match the basic figure in PowerBI and Tableau is a lot harder, potentially adding theme information, possibly additional geoms for labeling columns, etc. However, the amount of work to improve a figure in R scales linearly. After you have matching figures across all three tools, if you wanted to, say, put a plot of points in the background with a lower opacity, that's a single line similar to `geom_jitter(alpha=0.01) + `. Thats about the same amount of work as to make any other change. In Tableau or PowerBI, it would be hours of messing with things to make such simple additions or modifications (if it's possible at all). This is due to R's use of the Grammar of Graphics for figure generation.
Using the Grammar of Graphics, R can make incredible reports. PDFs can be consumer quality. (Figures for the DBIR are mostly created in R with only minor updates to most figures by the layout team.)

Take-Aways

The most important takeaway is that R is appropriate if you verbalize what you want to do, Tableau/PowerBI are appropriate if you can visualize the final outcome but don't know how to get there.

For example "I want to select subjects over 30, group them by gender, and calculate average age." That can quickly be translated to R/dplyr verbs and implemented. Regardless of how many things you want to do, if you can verbalize them, you can probably do them.
If you can visualize your final figure, you can drag and drop parts until you get to something close to what you want to do. It's trial and error, but it's quick and easy. On the other hand, it only works for fairly straight-forward outcomes.

PowerBI and Tableau are useful to quickly explore data. R is useful if you want to dig deeper.
Anything you can do in PowerBI and Tableau, you can do in R. It's just going to be a lot harder.
On the other hand, VERY quickly you hit things that R can do but Tableau or PowerBI cannot (at least directly). The solution is that PowerBI and Tableau both support running R code internally. This has it's own issues:

It requires a bit of setup.
If you learn the easy stuff in PowerBI or Tableau, but try to do the hard stuff in R, it'll be even harder because you don't know how to do the basics in R.
That said, once you've done the setup, you can probably just find how someone else has solved the problem in R and copy and paste it into your dashboard
Then, after the fact, you can go back through and teach yourself how the code actually did whatever hard thing you had it do.

From a data model perspective, R is like excel while PowerBI and Tableau are like a database. Let me demonstrate what I mean by example:

When we started analyzing, the first thing the other two did was add a unique key to the data. The reason is that without a key they aren't able to reference rows individually. They tend toward bar charts because their tools automatically aggregate data. They don't even think that they are summing/averaging the groups they are dragging in as it's done automatically
For myself using R, each row is inherently an observation. As such as I only group explicitly and first create visualizations that are scatter plots, density plots, etc given my categorical variables by a single continuous variable. On the other hand, Tableau and PowerBI make it very simple to link multiple tables and use all columns across all tables in a single figure. In R, if you want to combine two data frames, you have to manually join them.

Output: Tableau and PowerBI are designed primarily to produce dashboards. Everything else is tacked on. R is the opposite. It's designed to produce reports of various types. Dashboards and interactivity are tacked on. That said, there is a lot of work going on to make R more interactive through the production of javascript-based visualizations. I think are likely to see good dashboards in R with easy modification before we see easy high-quality report generation from PowerBI and Tableau.

Final Thoughts

This was a very good experience. It was not a competition but an opportunity to see and discuss how the tools differed. It was a lot of fun (though in some ways felt like a CTF, sitting in the vendor area doing the analysis. Being under some pressure as you don't want to embarrass your tool (and by extension its other users). I really wish I'd included Kibana or Splunk as I think they would have been different enough from PowerBI/Tableau or R to provide a unique perspective. Ultimately I'm hoping it's something that I or the conference can do again as it was a great learning opportunity!

Elasticsearch. Logstash. R. What?!

2017-05-03T13:12:00.004-07:00

Motivation

At bsidesNash, Chris Sanders gave a great talk on threat hunting. One of his recommendations was to try out an ELK (Elasticsearch, Logstash, Kibana) stack for searching for threats in log data. ELK is an easy way to stand up a distributed, scalable, stack capable of storing and searching full text records. The benefit is it's easy ingestion (Logstash), schema-agnostic storage ability, (Elasticsearch), and robust search and dashboards (Kibana) makes it easy platform for threat hunters.

However, because of it's ease, ELK tends to be a one-size-fits-all solution for many tasks. I had asked Chris about using other tools for analysis such as R by way of Rstudio and dplyr or Microsoft Power BI. Chris hadn't tried it and, at the time, neither had I. (My day job is mostly historic data analysis rather than operational monitoring.)

Opportunity

However, the DBIR Cover Challenge presented an opportunity. For those who are unaware, each year there is a code or codes hidden on the DBIR cover. That code then leads to a puzzle challenge which has resulted in some nice rewards for the winners; (iPad minis, auto-follow telescopes, Yeti coolers, quadcopters, 3D printers, and more). The challenge has multiple puzzles of which players must complete 8. So that they check their answers as they go, the site is a dynamic webapp hosted at Heroku. Because it is dynamic, I can add my own log messages into the endpoint functions.

But I needed a place to store and search the logs. Heroku provides some great plugins for this, but, given the conversation with Chris, I figured I'd try to roll my own, starting with ELK. The first hurdle was that, though there is a lot of hosted Elasticsearch and Kibana, there was much less hosted Logstash (the part I really needed). Elastic cloud didn't have it. AWS had their own tools. Finally I found logit.io which works perfectly. They provide a full ELK stack as a cloud service for around $20 at the low end with a 14 day trial. I signed up for the trail and was up-and-running in minutes. They even have an easy one-line instruction on how to set up a Heroku drain to send logs to a logit.io Logstash endpoint. From there, it is automatically stored in Elasticsearch and searchable through Kibana.

Going beyond ELK

The problem I quickly found out, was that Kibana didn't have the robust manipulation I was used to using R. While it could find entries and make basic dashboards, I simply couldn't cut the data like I wanted once I'd found the subset of data I was interested in. I tried passing the data to PowerBI, but on first blush, the streaming API setup was too limited to ingest a heroku drain using the basic setup tools. Finally, I decided to try and keep the Logstash and Elasticsearch underpinnings, but switch to R for analysis. R allows for simple pipeline analysis of data as well as robust charting.

Doin it with R

The first step was to install the packages I'd need:

install.packages("dplyr") # for simple piped data processing
install.packages("elastic") # for talking to the Elasticsearch store
install.packages(flexdashboard) # for creating a dashboard to monitor
install.packages("DT") # for displaying a HTML data table in the dashboard
install.packages("stringr") # simple string manipulation
install.packages("ggmaps", "viridis", "rgeolocate", "leaflet") # geocoding IPs and displaying them on a map
install.packages("devtools", "treemap") # create treemaps
devtools::install_github("Timelyportfolio/d3treeR") # create treemaps

After installing packages, the next step was to set up the Elasticsearch connection:

elastic::connect(es_host="<my ES endpoint>", es_port=443, es_path="", es_transport_schema = 'https', headers=list(apikey="<my api key>"))

I also manually visited: "https://<my ES endpoint>/_cat/indices?v&apikey=<my API key>&pretty=true" to see what indexes Logstash was creating. It appears to create an index per day and keep four indexes in the default logit.io setup. I stored them into a variable and then ran a query, in this case for the line log line indicating a player had submitted a specific key:

indexes <- c("logstash-2017.04.28", "logstash-2017.04.29", "logstash-2017.04.30", "logstash-2017.05.01") # I should be able to get this from `elastic::cat_indices()`, but it did not apply my apikey correctly
query <- elastic::Search(index=indexes, q="logplex_message:submitted", size=10000)$hits$hits

The following thing we need to do is remove only the fields we want from the query. The result is a list of query results, each itself a list of key:value pairs. I used the `lapply` function to extract _just_ the logplex_message field. (`lapply` takes a function and applies it to each item of a list in R.) `lapply` returns a list and so I `unlist` the results and make them a column in a dataframe:

submissions <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message))

In our puzzle challenge, we have 'trainers' who use 'keys' to indicate they've caught Breachemon. I can use my normal R skills to separate the trainer name and key from the log message and count how many times each trainer has submitted each key:

submissions <- submissions %>%
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract 'trainer'
mutate(key = gsub(".*submitted key (.*) to the bank.$", "\\1", text)) %>% # extract 'key'
group_by(trainer, key) %>% # group each trainer-key pair
tally() # short cut for `summarize(n=n())`. For each trainer-key pair, create a column 'n' with the number of times that pair occurred

From there we can visualize the table with:

DT::datatable(submissions)

We could also visualize the total submissions per trainer:

submitters <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message)) %>% # extract the log message and produce a dataframe
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract the trainer
group_by(trainer) %>% # create a group per trainer
tally() # shortcut for `summarize(n=n())`. Count the events per group
d3treeR::d3tree2(treemap::treemap(submitters, "trainer", "n", aspRatio=5/3, draw = FALSE)) # produce a treemap of submissions per person

Dashboard Time

To wrap this all together, I decided to make a simple dashboard. In the Rstudio menu, File->New File->R Markdown... In the menu, choose 'From Template' and then Template: 'Flex Dashboard'. You'll get something like:

---
title: "Untitled"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
library(flexdashboard)
```
Column {data-width=650}
-----------------------------------------------------------------------
### Chart A
```{r}
```
Column {data-width=350}
-----------------------------------------------------------------------
### Chart B
```{r}
```
### Chart C
```{r}
```

Lets add our two charts:

---
title: "Breachemon"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
elastic::connect(es_host="<my ES endpoint>", es_port=443, es_path="", es_transport_schema = 'https', headers=list(apikey="<my api key>"))
query <- elastic::Search(index=indexes, q="logplex_message:submitted", size=10000)$hits$hits
```
Column {data-width=650}
-----------------------------------------------------------------------
### Submissions
```{r fig.keep='none'}
submitters <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$logplex_message)) %>% # extract the log message and produce a dataframe
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract the trainer
group_by(trainer) %>% # create a group per trainer
tally() # shortcut for summarize(n=n()). Count the events per group
d3treeR::d3tree2(treemap::treemap(submitters, "trainer", "n", aspRatio=5/3, draw = FALSE)) # produce a treemap of submissions per person
```
### Submitters
```{r}
data.frame(text = unlist(lapply(query, function(l) {l$`_source`$logplex_message}))) %>%
mutate(trainer = gsub("Trainer ([^[:space:]]*).*$", "\\1", text)) %>% # extract 'trainer'
mutate(key = gsub(".*submitted key (.*) to the bank.$", "\\1", text)) %>% # extract 'key'
group_by(trainer, key) %>% # group each trainer-key pair
tally() # short cut for `summarize(n=n())`. For each trainer-key pair, create a column 'n' with the number of times that pair occurred
DT::datatable()
```
Column {data-width=350}
-----------------------------------------------------------------------

### Map
```{r}
ips <- data.frame(text = purrr::map_chr(query, ~ .$`_source`$msg_fwd))
geo <- rgeolocate::db_ip(as.character(unique(ips$text)), "<my free db-ip.com api key>") # geocode unique IPs, returns a list
geo <- do.call(rbind.data.frame, geo) # bind the list together as a dataframe
names(geo) <- c("IP", "Country", "State", "City") # set the dataframe column names
geo <- ips %>%
group_by(text) %>%
tally() %>% # count per IP
rename(IP = text) %>%
right_join(geo, by="IP") # join with geolocation
cities <- unique(as.character(geo$City)) # unique list of cities
cities <- cbind(ggmap::geocode(cities), cities) # geo code the cities
geo <- right_join(geo, cities, by=c("City" = "cities")) #join it back together
pal <- leaflet::colorFactor(viridis::viridis_pal(option = "C")(2), domain = geo$n) # create a color range
leaflet::leaflet(geo) %>% # make a map
leaflet::addTiles() %>% # add some default shapes to it
leaflet::addCircleMarkers(color = ~pal(n)) # add a circle with a color based on the count of submissions for each IP
```

Resulting in:

The last block pulls the msg_fwd field which contains the source IP adddress, splits it (as some have multiple), and stores it in a dataframe. It then geolocates the IPs and binds the cities. After that it geocodes latitude and longitude and joins it. Finally it places the geolocated and coded IPs as dots on a map.

Wrapup

That's not to say there aren't hang-ups. You _are_ pulling the data from the remote cluster to your local machine which is a relatively costly action. (The queries I ran returned in a fraction of the second, but I can imagine querying a billion record store, returning tens of thousands of hits, would be slower.) However, as Chris noted during his talk, not being selective in what you retrieve to search is one of the signs of a junior analyst. Also, I have not automated retrieval of more than 10,000 records or the automatic tracking of indexes as they are created. Finally, the dashboard must be refreshed manually. There's a little button to do so in the Rstudio browser, however I think it may make more sense to provide a Shiny button to use to update all or selected portions instead. Unfortunately, most of this goes beyond the few hours I was willing to put into this. proof of concept.

In the end, it was well worth the experimentation. It required no hardware and brings the robust slicing and dicing of data that the R ecosystem provides to the easy and scalable storage of ELK. Though the logit.io service doesn't allow direct configurability of most of the ELK stack, they seem responsive to requests. I'm actually not sure that the ES portion of ELK is really necessary. If you are working with a limited number of well-defined data sources, a structured store such as Postgres, or a key:value store such as hive/hbase might make more sense. R has nearly the repository of packages that Python does. On my mac pro I can work with datasets in the 10's of millions of records, providing all sorts of complex analysis. All in an easily-documentable and repeatable way.

In the future, I'd love to see the same thing done with MS PowerBI. It's not a platform I know, but I think it would definitely be an interesting one to explore. If anyone has any ideas on how to stream data to it, please let me know!

How to Handle Being Questioned

2016-11-29T08:11:00.000-08:00

In my post, How to Converse Better in Infosec, I laid out some rules for better infosec discussions. A key tenent of that blog post was asking questions. But what if you are on the receiving end of that?

To the questioned:

When expressing a view, being questioned feels like a challenge. For me, it feels as if the other person doesn't believe me and is trying to catch me in a lie. Frankly, maybe I did embellish a bit. Maybe I made a statement based on something I thought I remembered hearing but don't quite remember where I heard it. Or maybe I feel the statement is so obvious, the only reason someone would question it is if the other person wanted to try and take me down a rung.

It's OK. If, as speakers, we feel we are in the right, we can treat all questions as if the questioner doesn't know the answer and is seeking help learning, or there is some ambiguity in the questioner's mind and they are just trying to help clarify it. (Remember, for topics we are knowledgeable on, it is hard to see the subject from the perspective of a less-informed person.) Answer with the intent of being as genuinely helpful as possible. Have fun! This is our chance to help someone out!

And if we don't have the answer, we can be polite and say so. "I honestly can't demonstrate it right now. If you'll allow me the time, I'll collect the information for you and get back to you. And, in the event I can't, I'll let you know." Everyone is wrong at some point. Big people can admit it and only weak people don't accept it from others.

And to the questioner:

Be aware that you may be unintentionally putting the questioned person in an emotionally defensive position. They may have all the answers and be able to clearly explain it. They may be right, but need time to collect the evidence to demonstrate it. They may be flat out wrong but not prepared to say so.

Be a good participant in the social dynamic. If the other person can't answer, is evasive, or is demonstrating some technique to avoid answering, give them an out. Say, "It's OK, let's pick this up again later." Or "If you find/remember the answer, please message it to me." If the question is unimportant to you, you lose nothing by letting it go until the questioned person brings it up to you again. And if it is truly relevant to you, you can look it up yourself. If you feel you can't let it go, ask yourself if you're truly practicing the principle of charity.

In conclusion

Remember, a conversation involves multiple people. You're all in it together. Either everyone wins or everyone loses. So help everyone win.

What is most important in infosec?

2016-11-22T08:24:00.000-08:00

"To crush your enemies -- See them driven before you, and to hear the lamentation of their women!" - Conan the Barbarian

Maybe not.

Vulnerabilities

Recently I asked if vulnerabilities were the most important aspect of infosec. Most people said 'no', and the most common answer instead was risk. Risk is likelihood and consequence (impact). (Or here for a more infosec'y reference.) And as FAIR points out, likelihood is threat and vulnerability. (Incidentally, this is a good time to point out, when we say 'vulnerability', we aren't always saying the same thing.) While in reality, as @SpireSec points out, threat is probably more important, I suspect most orgs make it a constant 'TRUE' in which case 'likelihood' simply becomes 'vulnerability' in disguise. I doubt many appreciate the economic relationship between vulnerability and threat. As many people pointed out, the impact of the risk is also important. Yet as with 'threat', I suspect it is rarely factored into risk in more than a subjective manner. There were other aspects of risk such as vulnerable configurations, asset management and user vulnerability. And there were other opinions such as communication, education and law.

Risk

The first big take-away is that, while we agree conceptually that risk is complex and that all its parts are important, practically we reduce 'risk' down to 'vulnerability' by not dynamically managing 'threat' or 'impact'. While most organizations may say they're managing risk, very likely they're really just managing vulnerabilities. At best, when we say 'managing', we probably mean 'patching'. At worst, it's buying and blindly trusting a tool of some kind. Because, without understanding how those vulnerabilities fit into the greater attack-surface of our organization, all we can do is patch and buy. Which leads to the second take-away...

Attack Surface

The second take-away "I think we need to change the discussion from vulns to attack surface." Without understanding its attack surface, an organization can never move beyond swatting flies. If an organization is a city and they want to block attackers coming in, what we do is like blocking one lane of every road in. Sure, you shut down a lot of little roads, but the interstates still have three lanes open. And what about the airport, busses, and beaches?

Our Challenges

Unfortunately, if we can't move from vulns to full risk, our chances of moving beyond simple risk to attack surface are slim. At least in FAIR, we have the methodology to manage based on full risk, if not attack surface. However, while vulnerabilities are the data is not easy to collect. It's not easy to combine and clean. And it's not easy to analyze and act upon. (All the things vulnerability data is.) We don't even have national strategic initiatives for threat and impact, let alone attack surface the way we do for vulnerabilities, (for example bug bounties, and I Am The Cavalry).

In Conclusion

Yet we continue to spend our money and patch vulnerabilities with little understanding of the risk it addressed, let alone how that risk fits into our overall attack surface. But for those willing to put in the work, the tools do exist. And eventually we will make assessing attack surface as easy as a vulnerability assessment. Until then though, we will continue to waste our our infosec resources, wandering blindly in the dark.

P.S.

The third and final take-away is that the whole discussion completely ignores operations, (the DFIR type vs the installing-patches type). In reality, it may be a strategic decision, but the trade-offs between risk and operations based security are better left for another ~~day~~ blog.

Why Phishing Works

2016-10-18T06:18:00.000-07:00

Why Phishing Works

I've been asked many times why old attacks like phishing or use of stolen credentials still work. It's a good, simple, question. We are fully aware of these types of attacks and we have good ways of solving them. Unfortunately, there's just as simple an answer:

"The reason attackers use the same methods of attack is we assume they won't work."

We conduct phishing training. We install mail filters. And when something gets through, we treat it as an anomaly. A trouble ticket. Yet, from the 2016 DBIR, about 12% of recipients clicked the attachment or link in a phishing email. Imagine if that happened in airplanes; for example, if 12% of bolts in an airplane failed every flight. They wouldn't simply take the plane in for repairs when bolts failed. They'd build the plane to fly even if the bolts failed.

This leads to a fundamental tenant of information security:

"Your security strategy CANNOT assume perfection. Not in people. Not in processes. Not in tools. Not in defended systems."

When you assume anything will work perfectly and treat failures as a trouble ticket, you cede an advantage to the attacker. They are well aware that if they fire off 100 phishing emails, 10 will hit the mark.

What To Do

Do what engineers have been doing for generations, engineer resilience and graceful degradation into the system. Assume phishing, credential theft, malware, and other common attacks WILL succeed and plan accordingly. Build around an operational methodology. Work under the assumption that phishing has succeeded in your organization, that credentials have been stolen, that malware is present, and that your job is to find the attacker before they find what they're looking for.

Attackers are just some other guy or gal, sitting in their version of a cube, somewhere else in the world. They want their attacks to happen quickly and with as little additional effort as possible. They take advantage of the fact that we treat their initial action succeeding as an anomaly. If we assume that initial action will be partially successful and force them to exert additional effort and actively work to remain undetected, we decrease their efficiency and improve the economics of infosec in our favor.

How to Converse Better in Infosec

2016-09-22T11:46:00.000-07:00

In a previous blog, I spoke a bit about what to do when the data doesn't seem to agree with what we think. But what if it's not data you disagree with, but another person?

We've grown up in a world where the only goal in a conversation is to simply be right. It is all around us and, unfortunately, drives how we converse with other professionals. Whether it's a twitter thread or questions at the end of a conference talk, we tend to look to tear down others to build ourselves up. The mantra "Defense has to be perfect, offense only has to succeed once" pushes us to expect it in our technical dialog even though no one and no thing is perfect.

Let's change that. The next time you are on twitter, at a conference, or engaging in discussion with colleagues, try and follow the Principle of Charity. I highly recommend you read the link, but the basic premise is:

Accept what the other says if it could be true.

Now, obviously it's more complex than that. It's more like "dato non concesso" which means "given, not conceded". You are accepting their statements where logic otherwise does not prevent you from doing so, not because you believe they are true, but simply because you believe they were given in good faith. It also means interpreting statements in the way most likely to be true.

If the other says something that sounds conditionally untrue, ask questions that would help clarify that it is true.

It doesn't mean you have to accept statements that can't be true. It doesn't mean you can't confirm your interpretation. And it doesn't mean you can't ask clarifying questions. If the other's statement could be conditionally true, ask questions that help clarify that the conditions are those that make the statement true.

Do not ask questions or make statements to try and prove the other's assertion false.

It does, however, mean not nitpicking. It does mean not taking statements out of context or requiring all edge cases be true. If the other's position truly is false, you will simply fail at clarifying it as true.

And if we do we should be doing this, we should do one more thing:

Expect others to follow the same principles.

We should not, as a community, accept members not following this principle. Conversations contradictory to the Principle of Charity bring our community down and they inhibit growth. However, we will only root it out if we take a stand and speak out against it. Whether at conferences, in blogs, in podcasts, on twitter, or anywhere else, it improves us none to tear down rather than build up. I challenge you to adopt the Principle of Charity in your conversations, starting today, and make it a goal for the entire year!

Update: Also check out the follow-on blog: How to Handle Being Questioned!

Do You Trust Your Machine or Your Mind?

2016-08-30T13:43:00.000-07:00

Data science is the new buzzword. The promise of machine learning is to be able to predict anything and everything. Yet, It seems like the more data we have, the harder the truth is to find. We hear about some data that doesn't sound right to us. We ask questions and find out that there are assumptions and biases all over the data. Even if the data was true, once it is analyzed, it becomes contaminated in some way. With such things, how can we possibly trust it? Instead, as Adam Savage put it, the best course of action seems: "I reject your reality and substitute my own."

https://twitter.com/n1suzie/status/490796035376427008

The reality of your mind is: "Your mind is crazy and tells you lies." Your brain has to do the same thing the machine does in assembling data into a complete picture that a data analysis process does. (An analogy would be assembling the building blocks to the right into a single creation like a castle or whale.) It can do it, but the reality is it takes a lot of skill and a lot of thought.

Pieces for a mind to assemble into a single picture.

The downside to doing it in your brain is:

There is no documentation of how the picture was formed from the data
There is no record of what data your mind included and excluded as it assembled its picture
It is much harder to question the process your mind used in creating it's picture
Is is very hard to maintain consistency so that that the picture your mind creates today is the one it will create a year from now given the same data

Your mind is a black box. As Andy Ellis put it, "Systems are becoming too complex for risk analysis to be performed by System 1." (gut instinct). He termed it "The Approaching Complexity Apocalypse".

This doesn't mean data doesn't have it's faults. No data is the knowledge it represents. All data requires analysis to produce the picture from the data. All data has underlying assumptions and biases. You should expect your data sources to:

Publish the methodologies they use to product the pictures from the data
The provenance of the data
The known assumptions and biases, both of the data and of the methodology

Also, data science is not quite classic science. Classically, science follows the scientific method. In classic science, a hypothesis is first established and then tests are created collect data to disprove that hypothesis. If the tests fail, they hypothesis is accepted. Normally in data science, we start with the data and use it to identify hypotheses that are true. XKCD highlighted the issue with this nicely:

https://xkcd.com/882/

There will always be unknown assumptions and biases in data, but if you use them to ignore the data you put yourself at a disadvantage. If you conduct 100 studies, none of which are statistically significant, but all predicting the same thing, you have strong evidence that the thing is true.

On the other hand, this does not mean you should accept all data-based conclusions that come your way. As multiple speakers in the bSides Las Vegas Ground Truth track suggested, machines and minds should work together. The mind can help identify potential biases and assumptions, as well as potential improvements in the machine's methodology. The machine can produce reproducible results to inform the mind's decisions.

The worst thing you can do is identify biases, assumptions, and flaws in the machine and then use them to justify the validity of your mind. If you were to do so, you would need to document the methodology of your mind and subject it to the same scrutiny for biases, assumptions, and flaws. At which point, the methodology would then be in the machine.

And if you can't make your mind and the machine agree, my preference is to trust whichever system is most thoroughly documented, investigated, and validated. And that tends to be the machine.