import time
import logging

from graphit import Graph
from graphit.graph_helpers import graph_directionality

logging.basicConfig(level=logging.WARN)

Tutorial 1: Building simple graphs

This is the first tutorial in a series illustrating the basics of using graphs in graphit. We start off by building a simple graph by adding nodes and edges and later on removing them again.

A Graph is a container with nodes and edges

A graph is a collection of nodes (vertices) connected using edges. In graphit, nodes and edges are contained in a Graph object and they can represent any arbitrary piece of data as long as they are hashable such as: text, numbers, images, files or even Python functions or other objects.

The functionality for adding nodes and edges to a Graph is similar to most other graph packages including the popular Python library NetworkX as is illustrated below.

1.1 Creating a graph

Create an empty Graph with no nodes and no edges.

g = Graph()

print(g)
print(g.nodes, g.edges)
<GraphBase object 4422259192: 0 nodes, 0 edges>
[] []

Technical note 1: Node and edge storage

graphit uses flexible storage drivers to store node and edge information. The default driver stores information as a Python dictionary but this may well be a driver that stores information in a high-performance data store. The storage driver API enforces key/value storage in which the node identifier (nid) is the primary key and the node attribute dictionary the value. A graph natively supports the storage of multiple node attributes and therefor most node related functions expect a node value to behave in a Python dictionary like fashion.

1.2 Adding nodes

Use the graph add_node method to add a single node to the graph or add_nodes for multiple nodes at once. Both methods return the unique node identifier(s) for the node(s) just added to the graph. add_nodes accepts any iterable object as a source of nodes.

# Add a single node
nid = g.add_node('node')
print(nid, g)

# Adding multiple nodes at once
nids = g.add_nodes([1, 2, 'three'])
print(nids, g)

# using a string as iterable source of characters
g.add_nodes('graphit')
print(g)
1 <GraphBase object 4422259192: 1 nodes, 0 edges>
[2, 3, 4] <GraphBase object 4422259192: 4 nodes, 0 edges>
<GraphBase object 4422259192: 11 nodes, 0 edges>

Important note 1: Node identifiers

Every node added to the graph is stored allong with a node identifier (nid) that can be used to retrieve the node at a later time. The Graph class supports two methods to define node ID’s:

  1. An automatically incremented integer identifier enabled by default using the Graph.auto_nid attribute (set to True).

  2. Using the node key (first argument to add_node as identifier enabled by setting the Graph.auto_nid to False.

The benefit of the first option is unique identification of every node added even if the primary node key already exists. It will require the user to keep track of the unique ID’s assigned to the nodes or querying for a node using attribute based query methods.

These requirements are not needed with the second option. Here the node key serves as the identifier (nid) and may be any hashable object except None. A unique auto incremented ID (as with option 1) will still be added to the node attribute storage but it will not be used for identfication. The downside of method 2 is that the node key used may not be unique in case of which the add_node(s) will complain not adding the new node.

# Using the node key as identifier
a = Graph(auto_nid=False)
nids = a.add_nodes(['node', 'node2'], node_attr=100)
print(nids)
print(a.nodes['node'])
print(a.nodes.keys())

# Adding the same node again will not work
nid = a.add_node('node', node_attr=100)
WARNING:graphit:Node with identifier "node" already assigned
['node', 'node2']
{'key': 'node', 'node_attr': 100, '_id': 1}
['node', 'node2']

1.3 Adding data to nodes

Any additional key/value pair used as input to the add_node or add_nodes method will be added to the node storage

nid = g.add_node('node', node_attr=100, value=int(time.time()))
g.nodes[nid]
{'key': 'node', 'node_attr': 100, 'value': 1570207050, '_id': 12}

The node/edge storage exposes a dictionary like API (technical note 1). For node with ID 14, it show the added data as dictionary key/value pairs. The auto incremented node ID is stored as ’_id’ and the node key as ‘key’

nids = g.add_nodes(['data', 1.22, True, len], node_attr=100, value=int(time.time()))
for nid in nids:
    print(g.nodes[nid])
{'key': 'data', 'node_attr': 100, 'value': 1570207050, '_id': 13}
{'key': 1.22, 'node_attr': 100, 'value': 1570207050, '_id': 14}
{'key': True, 'node_attr': 100, 'value': 1570207050, '_id': 15}
{'key': <built-in function len>, 'node_attr': 100, 'value': 1570207050, '_id': 16}

With add_nodes additional keyword arguments will be added to all nodes added from the iterable. If the iterable contains a tuple or list of length 2 with a dictionary at the second position, the key/value pairs of that dictionary will be used as attributes in the new node together with addtional keyword arguments to add_nodes if defined. This functionality can be used for adding the nodes of one graph as new nodes to another as illustrated below.

# Build the first graph
b = Graph()
b.add_nodes('second', attr=True)

# Add nodes of graph b to g
nids = g.add_nodes(b.nodes.items())
for nid in nids:
    print(g.nodes[nid])
{'key': 's', 'attr': True, '_id': 17}
{'key': 'e', 'attr': True, '_id': 18}
{'key': 'c', 'attr': True, '_id': 19}
{'key': 'o', 'attr': True, '_id': 20}
{'key': 'n', 'attr': True, '_id': 21}
{'key': 'd', 'attr': True, '_id': 22}

Now graph g contains the nodes of graph b as new nodes. Alternativly, you could use the full graph b as node in g:

nid = g.add_node(b)
print(g.nodes[nid])
{'key': <GraphBase object 4420139048: 6 nodes, 0 edges>, '_id': 23}

Technical note 2: unicode support

graphit commits to the use of unicode strings as much as possible both in python 2.x and 3.x. All data keys and values that are strings are stored in unicode. When using data collections as value such as lists, dictionaries or tuples, the user is responsible for ensuring unicode complience. In python 3.x all strings are unicode by default.

1.4 Adding edges

Adding edges is in general anologous to adding nodes by using the add_edge and add_edges methods. The difference is that both methods require as argument two node identifiers that form the edge. The identifiers are either two automatically generated unique ID’s or custom ones (see important node 1).

Graph directionality

An edge between two nodes can be un-directional (edge points both ways) or directional (edge only points one way). Global directionality of a graph is set using graph.directed attribute which is False by default yielding an un-directional graph. Global directionality can be overruled in add_edge(s) allowing for mixed graphs.

# Adding an edge in an un-directed graph
eid = g.add_edge(1, 2, edge_attr='text')

print(eid)
print(g, g.directed)
print(g.edges())
print(graph_directionality(g))
(1, 2)
<GraphBase object 4422259192: 23 nodes, 2 edges> False
{(1, 2): {'edge_attr': 'text'}, (2, 1): {'edge_attr': 'text'}}
undirectional
# Add directed edge.
eid = g.add_edge(2, 3, directed=True)

print(g.edges())
print(graph_directionality(g))
{(1, 2): {'edge_attr': 'text'}, (2, 1): {'edge_attr': 'text'}, (2, 3): {}}
mixed
# Add multiple edges at once
eids = g.add_edges([(2, 4), (4, 5), (4, 6), (5, 7), (6, 8), (7, 9)], global_attr=True)
eids += g.add_edges([(8, 9, {'attr': 100}), (9, 10), (10, 11, {'text': False}), (10, 12), (10, 13)], global_attr=True)

print(g)
for eid in eids:
    print(g.edges[eid])

# Adding the same edge again will not work
g.add_edge(2, 4)
WARNING:graphit:Edge between nodes 2-4 exists. Use edge update to change attributes.
WARNING:graphit:Edge between nodes 4-2 exists. Use edge update to change attributes.
<GraphBase object 4422259192: 23 nodes, 25 edges>
{'global_attr': True}
{'global_attr': True}
{'global_attr': True}
{'global_attr': True}
{'global_attr': True}
{'global_attr': True}
{'attr': 100, 'global_attr': True}
{'global_attr': True}
{'text': False, 'global_attr': True}
{'global_attr': True}
{'global_attr': True}
(2, 4)

The add_edge(s) method supports a shortcut for fast graph creation by adding nodes based on edge identifiers using the node_from_edge argument

c = Graph()
c.add_edges([(1, 2), (2, 4), (4, 5), (4, 6), (5, 7), (6, 8), (7, 9)], node_from_edge=True)

print(c)
<GraphBase object 4422743816: 8 nodes, 14 edges>

1.5 Removing nodes and edges

It may not come as a surprise but removing nodes and edges is accomplished using similar methods as adding them namely: remove_node, remove_nodes, remove_edge and remove_edges.

Removing nodes will also remove edges depending on it and adjust node adjacency. Removing edges will not remove associated nodes but will adjust adjacency. The graph.directed attribute controlling graph directionality when adding edges also affects the removal of them. When remove_edge(s) is called with the directed=True argument on an un-directed graph then only one edge of the pair of edges is removed otherwise both edges will be removed.

# Removing a single node
g.remove_node(4)

# Node 4 no longer in adjacency
print(4 in g.adjacency)

# Node more edges with node 4
print([edge for edge in g.edges if 4 in edge])
False
[]
print(g)

# Removing multiple nodes
g.remove_nodes([1,2,3,5])
print(g)
<GraphBase object 4422259192: 22 nodes, 19 edges>
<GraphBase object 4422259192: 18 nodes, 14 edges>
# Removing edges works in a similar way
g.remove_edge(6, 8)

print(g.getnodes([6, 8]))  # Nodes are still there
print(8 in g.adjacency[6]) # Adjacency is adjusted
<GraphBase object 4422640296: 2 nodes, 0 edges>
False
print(g)

# Removing multiple edges
g.remove_edges([(9, 7), (11, 10), (9, 8)])

print(g)
<GraphBase object 4422259192: 18 nodes, 12 edges>
<GraphBase object 4422259192: 18 nodes, 6 edges>

Removing all nodes and edges from a graph has a shortcut clear method.

g.clear()
print(g)
WARNING:root:(10, 9) defines a reference ($ref) to non-existing (9, 10)
WARNING:root:(12, 10) defines a reference ($ref) to non-existing (10, 12)
WARNING:root:(13, 10) defines a reference ($ref) to non-existing (10, 13)
<GraphBase object 4422259192: 0 nodes, 0 edges>