Sunday, February 10, 2013

Defensive Construct Exchange Standard


It has come time to provide a standard for the transmission of information between defensive tools.  I understand that this is not a unique endeavor, however all attempts to this point have been limited by a single approach, a set of attributes a construct must or should have.

Why do we need something else?  First, this is not a new SIEM.  This is not Arcsight, it's not Splunk, it's not CIF or ELSA.  This is not an information structure.  It's not STIX, VERIS, or Mandiant IOCs.  If anything, it's similar to TAXII or IDMEF.  However all of these approaches (and the many other existing approaches) have a primary flaw: they have structure.  The fundamental issues it that no matter what tool we use, it will collect different data.  We will have similar fields (URLs, IPs, etc) tool to tool, but each provides a slightly different construct with slightly different fields associated with each construct.  This limits all but the most general indexing tools (such as Splunk or ELSA) from importing data without an importer designed specifically for that data (such as an Arcsight connector).

Also, basically all tools (other than Paterva's Maltego) take a database approach to storing data.  While this still allows searching data to match specific patterns (such as IP address), it is less efficient as linkages are implied only by the existence of the pattern in a row with other data.  Passing data as records may hide linkages that could otherwise be uncovered.



How is this different?  In the Defensive Construct Exchange Standard (DCES), we see constructs as a small graph (in the graph theory sense).  All the fields in the construct are represented as nodes and the nodes are linked with edges (rather than with a predefined construct or record format as in most other tools).  See the example below for a visual representation.  Because of this, sending parties and tools may provide any set or subset of fields, whether it be ones defined in STIX, CybOX, IDMEF, Unified2, or one specific to their needs.  Receiving parties may easily discard or replace portions of the format that are unimportant to them while adding their own information to the construct.  I'll detail some of the uses of this approach below.


Whats the standard?  Initially, the standard is as follows.  I recognize that this is a very early approach and that working with tool builders, vendors, and stakeholders will be necessary to fully realize this standard:
  1. All discrete pieces of information within a construct will be given an individual node (in the graph theory sense).  All nodes within the construct are a type of Attribute.  The actual attribute Attribute will be stored as a tuple within the node's "Metadata" attribute.  
  2. All nodes will be linked to a node containing a construct ID generated by the construct originator.  (It will be recommended that those linking constructs into their own graphs generate a local construct ID so as to avoid conflicts within their graph.)
  3. The construct ID will be a child node of all Attributes within the construct.
  4. The nodes and edges will be represented in JSON.  They will be transmitted in accordance with the JSON format outlined by the @gephi graph streaming project (Gephi graph streaming).  In practice, all constructs should be transmittable as a grouping of 'add node' and 'add edge' messages, with the recipient deciding how to actually handle the information.
  5. Attributes within the construct may have their own Attributes.  (I.E. A threat construct's location Attribute may have a 'confidence' Attribute.  Also, Attributes within the construct may have a child attribute representing a Classification such as "company proprietary", "PII", etc.)  (Note this is less of a rule of the standard as an explicit flexibility.)

What a great idea gabe!  Can we see an example?  The following construct is used as an example in the STIX format.  In the STIX example, it represents a link within a phishing email.  Using our new format, it could be visually represented as:
This would then be represented in JSON as:
{"an":{"A":{"label":"Construct From X","Class":"Attribute","Metadata":{"ID":<value>}}}}\r
{"ae":{"1":{"source":"A","target":"B","directed":true}}}
{"ae":{"2":{"source":"A","target":"C","directed":true}}}
{"ae":{"3":{"source":"A","target":"D","directed":true}}}
{"ae":{"4":{"source":"D","target":"C","directed":true}}}
{"ae":{"5":{"source":"C","target":"B","directed":true}}}
{"ae":{"6":{"source":"A","target":"E","directed":true}}}
{"ae":{"7":{"source":"A","target":"F","directed":true}}}
{"ae":{"8":{"source":"A","target":"G","directed":true}}}
{"ae":{"9":{"source":"G","target":"F","directed":true}}}
{"ae":{"10":{"source":"F","target":"E","directed":true}}}
{"ae":{"11":{"source":"A","target":"H","directed":true}}}
{"ae":{"12":{"source":"A","target":"I","directed":true}}}
{"ae":{"13":{"source":"A","target":"J","directed":true}}}
{"ae":{"14":{"source":"J","target":"I","directed":true}}}
{"ae":{"15":{"source":"I","target":"H","directed":true}}}
{"an":{"B":{"label":"URL","Class":"Attribute","Metadata":{"URL":<value>}}}}
{"an":{"C":{"label":"DOMAIN","Class":"Attribute","Metadata":{"DOMAIN":<value>}}}}
{"an":{"D":{"label":"WHOIS","Class":"Attribute","Metadata":{"WHOIS":<value>}}}}
{"an":{"E":{"label":"DNS Query","Class":"Attribute","Metadata":{"DNS Query":<value>}}}}
{"an":{"F":{"label":"DNS Record","Class":"Attribute","Metadata":{"DNS Record":<value>}}}}
{"an":{"G":{"label":"DNS Record Type","Class":"Attribute","Metadata":{"Record Type":<value>}}}}
{"an":{"H":{"label":"DNS Query","Class":"Attribute","Metadata":{"DNS Query":<value2>}}}}
{"an":{"I":{"label":"DNS Record","Class":"Attribute","Metadata":{"DNS Record":<value2>}}}}
{"an":{"J":{"label":"DNS Record Type","Class":"Attribute","Metadata":{"Record Type":<value2>}}}}

How will this approach be used?  In the most basic sense, two tools or groups exchanging information can simply use this to exchange standard formats (such as an IDMEF message).  Alternately, it could be easily databased by tools such as Splunk or ELSA, however neither of these approaches makes use of the strength of the format and instead simply provide backwards compatibility with previous approaches and workflows.

A better use would be to maintain threat and event data in a graph.  Graphs can be stored in memory, in a standard RDB, in a graph database or in any of a number of formats.  When a DCES construct arrives at the receiving tool, it will likely parse the information, drop information it is uninterested in, and add information (such as a local ID) that it finds useful.  From there the construct can be stored, linked to the rest of the graph (based on common information such as a common IP, alert ID, or any information that is present in both the construct and the graph.  This linkage may be permanent or temporary to allow searching of the graph for other related information.  This is similar to adding information in a graph in Maltego.  

Tool wise, the clear benefit is that once a single DCES handler has been defined, there is no need to adjust it based on different construct formats which it might receive.  Therefore a tool or organization can share and receive a much larger and more diverse set of information.  From an operational standpoint this allows more robust collection and definition of threat actor (and non-threat actor) information.  It also allows new approaches to determining the reputation of an event, (i.e. is it a false positive or is it linked to other suspicious behavior).

We're on the cusp of mounting an effective information security defense and putting all the information our threats reveal about themselves to use.  To do that though, we must not just be able to tune big SIEMS to accept a specific set of information, but must be able to aggregate all information and understand it's associations.  This format is a step in that direction.

9 comments:

  1. There is a concern that a graph may not know when the construct has ended. I think to deal with that, data which should be treated as a single construct can be sent together: {an:{"A":{}, "B":{}},ae:{1:{},2:{}}} etc.

    ReplyDelete
  2. A good question is what should be returned upon adding. I think the whole construct should be returned. Some options are: 1. The entire construct as it is represented in the receiving data store. 2. A mapping of the sending node/edge IDs to the receiving ones. 3. The construct ID within the receiving data store.

    ReplyDelete
  3. I think I'm going to change it slightly to have the construct ID point to the attributes as the construct ID describes them, not the other way around.

    ReplyDelete
  4. I'm considering using the WAMP spec (http://wamp.ws/spec) to wrap the DCES events. WAMP should provide subscribe and publish capabilities as well as allowing extention to RPC capabilities.

    ReplyDelete
  5. To ensure the message is appropriately interpretted, I think the dictionary should include the following keys: "DCES_TYPE" and "DCES_VERSION". DCES_TYPE will predominantly be "GRAPH". Version will probably always be "1" but never hurts to have it.

    ReplyDelete
    Replies
    1. The DCES_TYPE may not be necessary as the Cypher will be an RDP call rather than a pubsub call and the only other type is the graph update.

      Delete
  6. "Metadata" should probably not be a dictionary within the dictionary as it will inherently be stored as a string in most cases. Consider making any metadata a property of the node/edge it's self.

    ReplyDelete
    Replies
    1. This doesn't work as it makes it impossible to parse nodes with additional properties. "metadata" will be a required property for attributes and will contain a tuple of ("type", "value"). This should strike a balance between querying the graph for attributes and allowing flexibility in node properties.

      Delete
  7. This comment has been removed by a blog administrator.

    ReplyDelete