Tuesday, January 6, 2015

Standardized Data Trees (UN M.49, ISO 3166-1, ISO 3366-2, country population/area, NAICS)

To help in aggregating, comparing, and validating data, I've created a pair of data of graphs which represent tree hierarchies of standard formatted data.

The World Graph contains:

World Graph Visualized

The NAICS Graph contains:
The percentage under the graph in the NAICS graph and the aggregate population/geographic area allow two things:
  1. Provide an amount as a dimension for data coded in these systems, (whether that amount be percentage of NAICS codes, population, or geographic area).
  2. Provide a means of comparing the similarity of two records by finding the Lowest Common Ancestor (LCA) and retrieving the score from that node.  The greater the score, the greater the distance between the nodes.
The hierarchies can be used for validating data as well as comparing things (as above) that are not on the same level of the graph.  For example, a US state could be compared to the country to South-Eastern Asia.

None of this is groundbreaking, but hopefully some find the graphs of use.