Saturday, August 28, 2021

Common Attack Graph Schema (CAGS) 3.2

 It's been a while since I've updated CAGS. This is an initial post and may be modified to better fit with CAGS 2 later.

Revision: Schema updated to 3.2.  See the previous 3.X schema(s) at the end of this post.

3.2 Schema

  1. All property names must be stored as lower case
  2. The graph must be a directed multigraph.  It must be a combination of a causal bipartite multigraph with 'context, 'objects' (previously conditions, a subtype of context), and 'actions' (previously events) representing the two types of nodes and a knowledge simple graph defined in OWL used to describe the objects and actions.
  3. Action node properties.  All other properties should be defined through the knowledge graph.
    1. type: "action" (required)
    2. id: A URI including the graph prefix identifying the node (required)
    3. name: The action that occured.  This may be from a schema such as a VERIS action or ATT&CK technique, or may be an arbitrary string describing the action or event that took place. (required)
    4. start_time: The time the atomic the node represents began to exist.  Time should be in ISO 8601 combined date and time format (e.g. 2014-11-01T10:34Z).  If no time is available, minutes since unix epoch (1/1/1970 Midnight UTC) should be used as a sequence number. (required)
    5. finish_time: The time the atomic the node represents ceased to exist.  Time should be in ISO 8601 combined date and time format (e.g. 2014-11-01T10:34Z) (optional but encouraged)
    6. logic_operator: a function (including the language the function is defined in) that takes the state of parent objects to the node as arguments (pre-conditions) and returns the effect(s) on child objects to the node (effects).  (A characteristic borrowed from formal planning.) This may be ladder logic, first order logic, higher level languages such as python, machine learning model, etc. The values accepted per pre-condition and produced per effect must be in the same set as values used for the object node state property. In practice this will often be the identity function.  (For example if a parent object's state is 'compromised', after the action the child object's state will be compromised.  If missing, is assumed to be the identity operator transfering the set of all state from precursor objects to affected objects.
    7. succeeded: float from 0 (failed) to 1 (succeeded) or distribution representing the probability that action succeeded in its effects. Any effects which may be separable should be defined through a separate action. (optional)
    8. confidence: float from 0 to 1 or distribution representing the confidence that the action succeeded. (optional)
  4. Context node properties.  All other properties should be defined through the knowledge graph.  These definitions may take the from of an existing schema such as VERIS assets, the CARS data model objects, or other ontologies of objects defined through a knowledge graph.
    1. type: "context" or "object" (required)
    2. id: A URI including the graph prefix identifying the node (required)
  5. Object node properties.  Object nodes are a sub-type of context in that they may be instanced and have a 'state' which changes as actions are applied.  Only object nodes may be part of the causal graph.
    1. state: A property that may be used as a transient string representing the state of the object during a point in time representing the current state of the system.  The sum of all object states is the state of the system.  This may be as simple as "compromised", from an ontology such as VERIS attributes, the Confidentiality, Integrity, Availability triad, Bayesian or DIMFUI (Degradation, Interruption, Modification, Fabrication, Unauthorized Use, and Interception), or it may even be an arbitrary string.
  6. Edge Properties:
    1. source: the id of the source node. Object nodes may only have sources of action nodes and action nodes may only have sources of object nodes. All nodes part of the knowledge graph may only have sources within the knowledge graph or an object node. (required)
    2. destination: the id of the destination node. Object nodes may only have destinations of action nodes and action nodes may only have destinations of object nodes. All nodes part of the knowledge graph may only have sources within the knowledge graph or an object node. (required)
    3. type: Edges between actions and objects (in either direction) have a type from the set of states acceptable for the object node state property and must agree with the pre-conditions and effects of the action node involved's logic operator.  All other edges are defined by the OWL knowledge schema. (required)
      1. The acceptable edge types are: "precursor_of" (edge from an object to an action), "effect_of" (edge from an action to an object), "describe" (edge from an object or context to an object, context, or action).
    4. id: A URI representing the edge. (optional)
  7. It is intended that sets of nodes and edges in the graph can be joined to create a subgraph represented by a single node.  The node must still obey all previous schema requirements.

Strengths

This schema builds on the 2.0 and 3.0 schemas in a few fundamental ways:
  • The use of knowledge graphs to provide properties simplifies defining arbitrary sets of properties.  This is incredibly important as different users will want to represent different properties at different levels of detail.  In Figure 1, Object 3 is a process linked to it's higher level representations.  However the dotted lines show how Objects 5-8 could be used if the goal was a higher level representation of the incident.
Figure 1 - Knowledge graph used to represent different levels of description.
  • The use of a logic operator allows for arbitrary logic in progressing through the graph without creating complex graph structures to try and define the logic.  This effectively replaces the Bayesian Conditional Probability Tables in version 2.
  • The action-object bipartite graph provides the ability to represent complex relationships (as a bipartite graph can represent hypergraphs and simplicial complexes, or dendrites) while still maintaining the strengths of traditional graphs.  It allow allows moving almost all properties to nodes or to the knowledge graph.
  • The use of properties defined without schemas (action node action, action node logic operator, object node knowledge graph, and object node state) allows the schema to be "specifically vague" (credit to Gage for the term).  Enough to be clear but vague enough to support varying use cases.
  • The set of object states is the state of the system the graph describes.  To determine the state of the graph at given time, all actions must be applied in order. This provides for state management without state explosion.
Limitations
  • The schema does not define how parent-child relationships are established (though it is logical that children must come after parents and that parents/children are limited by the objects an action requires tas pre-conditions and the objects it may affect.
  • The schema does not define how to identify duplicate objects within the graph (where a a single actual object is represented by two object nodes).  When a schema is not used to help avoid duplication, I envision tools that tools will be available to help identify duplicates through their knowledge graph properties.  OWL allows for the same object to exist as different notes in the same knowledge graph.
  • The schema does not readily distinguish between ground truth and records used to observe ground truth.  Care must be taken to distinguish these two types of actions and the associated objects.  For example, a record may be an object child of the action that generated it.  Figure 2 provides an example.  The characteristics of the record can be as simple or as detailed as desired though it's prudent to consider the ability of the graph to scale to represent instances of records.
Figure 2 - Representing logs of what  happened


  • The schema does not explicitly define actor, however it may be a relationship established in the knowledge graph and is considered a best practice.

Example

The following image provides an example based on an incident from the VERIS Community Database (VCDB), specifically case a2ed36db-0c78-4162-b2cc-dbaa2ca73866. (Note that the example leaves out the majority of the properties for brevity.)
Figure 3 - Example incident

Representations

At its core, the schema is incredibly simple as can be seen below:

This OWL file can be found here.  CAGS graphs conforming to this format should be stored as triples in JSON-LD format.  If converting to a property graph, the graph should be stored in JSON Graph Format (JGF).

Use Cases

Aggregation of Events  

Log data comes in as atomic events.  Given any single event, timestamps only reveal that later events cannot be the parent and earlier events cannot be the child, but the timestamp does not explain _what_ the parent(s) or child/children of an event are.  
 
The graph schema should assist in determining the parent(s) and child/children of an event, (for example by defining that an event occurred due to a file, a credential, or another system and, as such, that object(s) or actions ending in that object(s) must contain the parent.  


Motif Communication

It is often helpful when communicating a plurality of actions to communicate the relationships between those actions.  This really will touch on multiple use-cases, but is centered around motifs as bounded portion of a path or subgraph.  


Attack Surface  

A system can be documented using the graph schema to identify the interconnectivity between components and highlight potential paths of attack.  (Note, while many of the prior use cases are based around events (or signal generated from the system, this is based on the _actual_ state of the system and actual actions rather than the events they generate.)  


Attack Graph Generation  

An attack surface generated using the graph schema can be used to plan potential attacks on the system.  This can be used for automated attack simulation such as cauldera, planning manual penetration testing (such as bloodhound), etc.  This likely results in an attack graph, (a plurality of actions to take).  


Analysis  

Event data should be able to be aggregated into paths and graphs.  This data can then be aggregated across data sources (different tools, sites, organizations, etc) and then queried using graph queries to identify commonalities such as common motifs.


Incident Documentation  

After an incident has occurred,  the incident responders can document the relationship between the observed actions (or events generated by those actions) using the graph schema.  


Detection  

A defender wants to define a detection that contains multiple atomic events and how they are related (such as in grapl).  To do this they need both a motif of the detection and the ability to aggregate events to see if they match the motif.


Simulation  

A defender may wish to simulate attacks containing more than a single event.  To do so they need a motif of events and their relationships and the ability to turn that into atomic actions to take/attempt to take.


Incident Response  

After aggregating events, the data can be analyzed using graph tools, neural networks, or other tools to identify things like missing edges (actions the attackers might have taken but where no event exists to document it), nodes (objects that may be involved in the incident, but are currently not included in the investigation), or clustering (to identify assets currently part of the investigation but are unlikely to have been involved).  


Defense Planning  

Given analysis of an attack surface producing an attack graph, the attack graph can then be analyzed to determine thing such as what events will be generated if exercised, nodes and edges central to the attack that might serve as optimal mitigation points, etc.


Risk Analysis  

Given an attack surface, analyze the graph to identify the overall 'risk' associated with it.  The goal is to provide quantitative feedback on the likelihood and potentially impact of cyber threats given threat intelligence.  

3.1 SCHEMA

The 3.1 schema is the same as the 3.2 schema except for the following changes:
  • The CAGS 3.1 'uuid' property has been replaced with an 'id' which uses URIs including graph namespaces instead of UUIDs
  • CAGS 3.2 adds allowed edge types
  • CAGS 3.2 adds 'context' nodes
  • Added representations
  • Described logic_operator as optional but with a default representation if missing
  • Renamed the 'action' property of actions to 'name'

3.0 SCHEMA

The attack flows are defined with nodes as objects and their individual actions as hyperedges. Nodes maintain their individuals state with respect to security while edges document how state is changed by the edge. Edges also contain the logic to adjudicate complex interactions between inputs.  The attack flow (or graph) in its entirety represents the state of the system (or portion of the system) being described.

NODES TYPES:

  • Datum
  • Person
  • Storage
  • Compute
  • Memory
  • Network
  • Other
  • Unknown
Nodes have a ‘state’ property representing their current state with respect to the actor.  They indicate the  states (confidentiality/integrity/availability, Create/Read/Update/Delete, or object-specific).

EDGES:

  • leads_to
Edges are hyperedges (or, alternately, a bipartite representation of hyper-edges) with with a ‘logic’ property defining the process for translating the inputs into a success at the output.  Another option is to model the edge as a dendrite to represent the input to output logic of the edge.

Edges have a ‘action’ property defining the details of the action. (These may be in ATT&CK, veris, or any an arbitrary language.)

Edges may have a timestamp property to indicate the order in which they occur.  In practice this can be ‘played’ on the graph to update the node states over time.

Wednesday, February 3, 2021

Can you predict the future? No.

Did you ever wonder why some people succeed and others don't? Why Jeff Bezos is rich? Why a company got breached?  Is it because Jeff Bezos somehow learned what would happen in the future?  Is it because the breached company ignored the obvious future?  No.  No-one can predict the future.  

Let's take an example: Double Pendulums

double pendulum system


Just predict where they'll swing.  Really easy right?  You can model the entire pendulum with two nodes and two edges. Simple.

two pendulum system represented by two nodes and two edges

Give it a try:  https://www.myphysicslab.com/pendulum/double-pendulum-en.html.  Hit the pause button in the upper-right, drag the pendulums to the top where they can drop.  Put your finger on the screen where you think they'll be in 5 seconds, hit play, and count to 5.  How did it go? 


Hmmm.  Let’s try it again.  Maybe if you saw it happen first.  Hit pause, drag them back up, put 1 finger where it starts, run to the count of 5, and put another finger (same hand) where it ends.  Now drag the pendulum back up to the first finger, hit play again, and count to 5.  Is the second pendulum anywhere near your second finger?


You can't predict the future

If you were right you were wildly lucky.  Check out 7 pendulums who's only difference is approximately 1/3rd of an ounce.  It's due to chaotic motion.  Even in a system with just two nodes where we know all the variables, it gets unpredictable very quickly.  Now imagine if your system is something like this:



In this image the color code is as follows:

  • the upper-left brown is the internet.  
  • the five fuchsia nodes to the right are user systems
  • the upper green are the DMZ
  • the blue-green and dark grey are servers
  • orange are management systems
  • light pink is infrastructure
  • grey is a security system
  • light blue at the bottom is a protected enclave.  

That's about two dozen systems. An _extremely_ small IT estate.  And we have little idea what all the variables it may contain.  Compare that to the two pendulum model.  If we can't predict two pendulums what chance do we have with this?


Try to imagine predicting the business climate and how the world will change over the next 20 years.  You need to make choices now that will govern your success then.  Can you (or anyone) do that?


The answer is, of course, no.  Lots of people are making many decisions and some will be right, and some will be wrong. However, for the most part it's not due to the individuals making them.


So what's a person to do?

Give up? Give in? Nah, don’t do that.


In spite of all the uncertainty and the multitude of variables involved, the reality is that most useful systems do not tend to devolve into chaos.  If they did they wouldn't be useful.  Instead, they normally remain in common, steady states. Except for moving from one steady state to another when something changes.


And that's what you should do.  Bet on the average.  The common state.  The place where most things end up.  Don't look at people who succeeded (or failed) spectacularly.  It was spectacular because it wasn't common. They couldn't predict the future and neither can you.  You can bet on the most common outcome though. (As Sir Francis Galton - or Dan Kahneman if you prefer - would call it, Regression to the Mean.)  For security, this means filter email, filter web content, use two- factor authentication, and manage assets.


The other thing you can do is prepare to change along with the situation.  This requires creative people who can devise innovative solutions when there is some new input, as opposed to rather following the usual processes.  This is one of the reasons why quality security operations are essential. Something engineered and built over several years will never cope with a significant shift in information security unless it also shifts.


And in conclusion, don't beat yourself up over it

What happened in the past did not predictably lead to today, for you or anyone else.  And not only does the past not predict the future, but the future doesn’t require the past.  Inverse evolutionary techniques such as Inverse Generative Social Science demonstrate that things could have started completely differently, and we still could arrive right where we are today.  The best you can do is invest in the average and be creative enough to handle the unanticipated.

Monday, February 1, 2021

Simulating Security Strategy

You’ve probably imagined it, right? Lots of little attackers and defenders going at it in a simulated environment while you look on with glee. But instead of spending our cycles on details such as if the attack gets in, let's leave that for the virtual detonation chambers and focus on the bigger picture of attack and defense?


That is exactly what Complex Competition does.  It simulates an organization as a topology and then allows an attacker and a defender to compete on it.  Table 1 provides all the rules:


  1. Gameboard is an undirected, connected, graph. Nodes may be controlled by one or both parties.  One node is marked the goal.

  2. The defender party starts with control of all nodes except one.

  3. The attacker party starts with control of one node only.

  4. Parties take turns. They may:

    1. Pay A1/D1 cost to observe the control of a node.  
    2. Pay A2/D2 cost to establish control of a node. 
    3. Pay A3/D3 cost to remove control from a node (only succeeding if they control the node).
    4. A4/D4 cost to discovery peers of a node.
    5. Pass or Stop at no cost.
  5. They may only act on nodes connected to nodes they control. 

  6. The attacker party goes first.

  7. The target node(s) is assigned values V1-Vn.  When the attacker gains control of the target node X, they receive value Vx and the defender loses value Vx.

  8. The game is over when both parties stop playing.  Once a party has stopped playing, they may not start again.

This allows us to test out a lot of things which include the below:


Does randomly attacking in a network pay? 


Answer: No! (Unless the target of the attack is connected to the internet)


What does it cost to defend?


Answer: anywhere from three to five times the number of actions the attacker took.


What attacker strategies work best if there’s no defender?

Answer: Attacking deep into the network, or trying a quick attack and bailing.


What attacker strategies work best if there is a defender?

Answer: Now the quick attack is a clear front runner.


How does an infrastructure compromise change the attack?

Answer: When the infrastructure is compromised, the attacker doesn’t have to dig deep into the network. (Obvious, I know. But here we can show it quantitatively.)


Now the caveats


All that analysis must be taken with a grain of salt.  It’s totally dependent on the costs of the actions (all 1), the value and locations of the targets, the topology, and the attacker strategy.  None of which are meant to be particularly representative in these simulations.  Also, this simulation is relatively basic, but hopefully it strikes a balance between usefulness and simplicity for this first iteration.


Still, there’s a lot of other questions we could try to answer:

  • When should the defender stop defending / how much should they spend on defense?
  • How else does the location of the attacker affect their cost to reach the target?
  • How does the target location affect the attacker's cost to reach it?
  • How do different topologies affect the attacker and defender costs?
  • How do different costs affect the attacker's chance of reaching the target?
  • What is the relationship between topology, attacker strategy, attacker action cost, and target value?

And eventually we could make it more complex:

  • Add more information to the nodes to help players choose actions
  • Probability of success per edge
  • Cost of action per node
  • Replace the undirected graph with a directed graph
  • Different value for the attacker and defender for achieving the goal.
  • Separating the impact cost to the defender from the goal and having them on separate nodes
  • Allow the defender to take more than one action per round
  • Set per edge success probabilities and costs
  • Create action probabilities
  • Allow the defender to pay to increase attacker action cost (potentially per edge).
  • Allow the defender to pay to decrease the action success probability (potentially per edge).
  • Allow the defender to pay to monitor nodes without having to inspect them

Primarily, though, we simply want to get this out there and give everyone a chance to try it out,   and, more than anything, illustrate the clear need to simulate security strategy. (He said the thing!)