Monday, July 29, 2013

Cyber Attack Graph Schema (CAGS) 1.0

While the concept of attack graphs has been discussed, once thing that is lacking is a standard definition for an attack graph.  This blog hopes to resolve that by presenting a new standard: the Cyber Attack Graph Schema (CAGS) 1.0
1.    All property names must be lower case
2.    Nodes must have the following properties:
1.    "class": May be "actor", "event", "condition", "attribute"
2.    "cpt": must be a JSON string in the format defined at
3.    "start": The time the node is created. Time should be in ISO 8601 combined date and time format (e.g. 2013-03-14T16:57Z)
4.    "id": Assigned by database.
3.    Nodes must have property "label".
4.    The "label" property of nodes of "class" "event", "condition", or "actor" will contain a string holding a narrative describing the actor, event, or condition
5.    The "label" property of nodes of "class" "attribute" must contain a JSON formatted string with a single "{'type':'value'}" pair. Type is the type/name of the attribute and value the value.
6.    Nodes of any class MAY have property "comments" providing additional narrative on the node
7.    Nodes of any class MAY have property "finish" providing a finish time for the node. Time should be in ISO 8601 combined date and time format (e.g. 2013-03-14T16:57Z)
8.    Edges must have the following properties:
1.    "source": the id of the source node
2.    "target": the id of the target node
3.    "id": id assigned by the database
4.    "relationship":
1.    Value of "influence" if "source" property "class" is "attribute" and "target" property "class" is "event" or "condition".  Value of "leads to" if "source" property "class" is "event", "threat"
2.    Value of "influence" if "condition" and "target" property "class" is "actor", "event", or "condition"
3.    Value of "described by" if "source" property "class" is "event", "condition", or "actor" and "target" property "class" is "attribute"
4.    Value of "described by" if both "source" and "target" property "class" are "attribute"
5.    "directed": value of "True"
9.    Edges may have a property "confidence" with an integer value from 0 to 100 representing the percent confidence
10.                    Edges must be directed
11.                    Nodes and Edges may have additional properties, however they will not be validated and may be ignored by the attack graph.
12.                    Nodes and Edges missing values may still be accepted if the value can be filled in.


  1. Consider replacing spaces with underscores (i.e. "described by" becomes "described_by".)

    Consider replacing "start" with "start_time" as start is ambiguous in some cypher queries.

    Consider describing attributes as {"class":"attribute", "attribute":, :} rather than just {"class":"attribute", :} to improve ease of querying the graph.

  2. Consider requiring edges to have a start_time.

  3. Consider describing attributes as {"class":"attribute", "attribute type":, "type value":}. This would improve querying the graph directly for a value and for the type.

  4. 'label' is reserved in some graph databases. Consider using the class value in place of label and indexing all class values on all nodes.

  5. The cpt requirement will be removed in next version.

  6. Graph IDs should be a URI of the form :?class=< node class>&=&= so class:attribute, attribute = ip, ip = at mybiz would be mybiz:?class=attribute&attribute=ip&ip=

  7. To allow efficient storage, it may be necessary to express {class:, :, :,} with explicit columns of {class:, key:, value:}. The advantage is that nodes can be indexed on class, key, and value. The limitation is that the a:b, b:c, c:d, d:etc, chain is limited in length.

  8. Consider making edge URIs derived from their source, relationship, destination triple.

    In documentation, may want to correlate source, relationship, destination to subject, predicate, object.

    1. Edge URIs should be as follows ":?source=&destination=&relationship=". (This is necessary as the source and destination are URIs in and of themselves.) Hash should be an md5 hash of the source and destination URI in URL namespace.

      If there is a chain from the relationship such as =, those should then be added "&=...".

      Finally, if an origin exists, the origin should be added. "&origin=".

      For example:

  9. Consider allowing edges to have sub-relationships such as: .

    Consider allowing edges to have an origin to explain the enrichment they came from. e.g. .

  10. The URI should be stored as an attribute to the node or edge with a key of 'uri' and should be used as the node and edge id whenever possible.

  11. Need to consider how to handle the difference between "no relationship found" and "creation of relationship not attempted".

  12. Prefixes should not be required on URIs within a graph. The reasoning being that if the nodes/edges are within a graph, the prefix is implicit.

    The case exists where we may wish to suggest that knowledge about a node resides in another graph. While adding the prefix to the node would indicate that, it also allows for two nodes of the same key:value to exist in the same graph. Moreso, a key:value node such as could be used to suggest an algorithm should query another graph for the information.

    This does not preclude having a prefix on a node in a graph, (with the absence of a prefix implying the location of the graph represents the prefix), however such a prefix would require a means of translating a prefix to a fully qualified location which does not currently exist in the schema.

    This does not preclude including a prefix (or the fully qualified URI) in a client subgraph to help distinguish between nodes from different locations. However, it will still suffer from the same issue of potential duplicate nodes. It is more advisable that prefixes only be kept for edges. The client may choose how to keep the mapping between prefix and full location.