graphit.graph_io package

graphit.graph_io.io_adl_format module

Reading and writing graphs as adjacency lists (.adl)

Adjacency lists are a simple textual representation of node identifiers and their linkage (adjacency) to one another.

The graph with edges a-b, a-c, d-e can be represented as the following adjacency list (anything following the # in a line is a comment):

a b c # source target target d e

graphit.graph_io.io_adl_format.read_adl(adl_file, graph=None)

Construct a graph from a adjacency list (ADL)

Note

the directionality of the graph is not defined explicitly in the adjacency list and thus depends on the graph.directional attribute that is False (undirectional) by default.

Parameters
  • adl_file (File, string, stream or URL) – ADL graph data.

  • graph (:graphit:Graph) – Graph object to import ADL data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_adl_format.write_adl(graph)

Export graph as adjacency list (ADL)

Note

This format does not store graph, node, or edge data.

Parameters

graph (:graphit:Graph) – Graph object to export

Returns

Graph object

Return type

:py:str

graphit.graph_io.io_cwl_format module

Functions for importing data structures in Common Workflow Language format.

The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.

CWL data structures are stored in JSON or YAML format. The lie_graph CWL parser supports syntax version 1.0.2 as described here:

Citation:

Peter Amstutz, Michael R. Crusoe, Nebojša Tijanić (editors), Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, Dan Leehr, Hervé Ménager, Maya Nedeljkovich, Matt Scales, Stian Soiland-Reyes, Luka Stojanovic (2016): Common Workflow Language, v1.0. Specification, Common Workflow Language working group. https://w3id.org/cwl/v1.0/ doi:10.6084/m9.figshare.3115156.v2

For more information on CWL consult:

https://www.commonwl.org

graphit.graph_io.io_cwl_format.read_cwl(cwl_file, graph=None, **kwargs)

Parse Common Wokflow Language data structures to a graph

Additional keyword arguments (kwargs) are passed to read_pydata

Parameters
  • cwl_file (File, string, stream or URL) – CWL data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_dot_format module

Functions for exporting and importing graphs to and from graph description language (DOT) format

graphit.graph_io.io_dot_format.write_dot(graph, graph_name=None)

DOT graphs are either directional (digraph) or undirectional, mixed mode is not supported.

Nodes and edges are all exported separably, short hand notations are not supported. Grouping and supgraphs are not supported. Graph attributes in graph.data, graph.edges and graph.nodes will be exported as DOT directives regardless if they are official GraphVis DOT graph directives as listed in the reference documentation:

Dot reserved rendering keywords part of the graphs global attributes in graph.data or part of the node and edge attributes are exported as part of the DOT graph.

Parameters
  • graph (:graphit:Graph) – Graph object to export

  • graph_name (:py:str) – name of the ‘graph’ or ‘digraph’. Uses the ‘title’ attribute in graph.data by default, else graph_name

Returns

DOT graph representation

Return type

:py:str

graphit.graph_io.io_dot_format.read_dot(dot, graph=None)

Read graph in DOT format

Parameters
  • dot (File, string, stream or URL) – DOT graph data.

  • graph (:graphit:Graph) – Graph object to import DOT data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_flattened_data_format module

Functions for importing and exporting flattened (dot seperated) data structures

graphit.graph_io.io_flattened_data_format.read_flattened()
graphit.graph_io.io_flattened_data_format.write_flattened(graph, sep='.', default=None, allow_none=False, **kwargs)

graphit.graph_io.io_gexf_format module

Reading and writing graphs in GEXF format.

GEXF (Graph Exchange XML Format) is a language for describing complex network structures, their associated data and dynamics.

Reference and specification:
graphit.graph_io.io_gexf_format.read_gexf(gexf_file, graph=None)

Read graphs in GEXF format

Uses the Python build-in etree cElementTree parser to parse the XML document and convert the elements into nodes. The XML element tag becomes the node key, XML text becomes the node value and XML attributes are added to the node as additional attributes.

Parameters
  • gexf_file (File, string, stream or URL) – XML data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_gexf_format.write_gexf(graph, node_tools=<class 'graphit.graph_io.io_gexf_format.GEXFNodeTools'>, edge_tools=<class 'graphit.graph_io.io_gexf_format.GEXFEdgeTools'>)

Export a graph to an GEXF data format

Custom XML serializers may be introduced as a custom NodeTools class using the node_tools attribute. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.

Parameters
  • graph (:graphit:Graph) – Graph to export

  • node_tools (:graphit:NodeTools) – NodeTools class with node serialize method

  • edge_tools (:graphit:EdgeTools) – EdgeTools class with node serialize method

Returns

Graph exported as a hierarchical XML node structure

Return type

:py:str

graphit.graph_io.io_gml_format module

Functions for exporting and importing graphs to and from graph modelling language (GML) format as described in the online documentation:

graphit.graph_io.io_gml_format.read_gml(gml, graph=None)

Read graph in GML format

Parameters
  • gml (File, string, stream or URL) – GML graph data.

  • graph (:graphit:Graph) – Graph object to import GML data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_gml_format.write_gml(graph, node_tools=None, edge_tools=None)

Export a graphit graph to GML format

Export graphit Graph data, nodes and edges in Graph Modelling Language (GML) format. The function replaces the graph NodeTools and EdgeTools with a custom version exposing a serialize method responsible for serializing the node/edge attributes in a GML format. The NodeTools class is also used to export Graph.data attributes.

Custom serializers may be introduced as custom NodeTools or EdgeTools classes using the node_tools and/or edge_tools attributes. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.

Parameters
  • graph (:graphit:Graph) – Graph object to export

  • node_tools (:graphit:NodeTools) – NodeTools class with node serialize method

  • edge_tools (:graphit:EdgeTools) – EdgeTools class with edge serialize method

Returns

GML graph representation

Return type

:py:str

graphit.graph_io.io_helpers module

graphit.graph_io.io_helpers.initial_node(nodes)

Return node ID of node with smallest _ID identifier.

Parameters

nodes – graph ‘nodes’ object

Returns

node ID

graphit.graph_io.io_helpers.resolve_root_node(graph)

Resolve the node ID of the root node of the graph.

For Graph objects there is no strict concept of a root node and by default the ‘root’ attribute of the grpah is not defined. Here, the root will resolve to the node nid with the smallest _id number which usually is the first node added when the graph was created.

For GraphAxis object a root is essential for defining the graph hierarchy and thus, the graph ‘root’ attribute should be defined. If it is not defined it will also default to the node nid with the smallest _id number. If the user defined or default root is in the (sub)graph it is returned. If not, an attempt will be made to resolve it following:

  • If the graph is a single node, its node ID will be root.

  • If the graph has multiple nodes and the root is defined in the full_graph, return the node ID closest to the root

Parameters

graph – graph to resolve root node for

Returns

root node ID

graphit.graph_io.io_helpers.coarse_type(n)
graphit.graph_io.io_helpers.check_graphit_version(file_version)

Check if the graph version of the file is (backwards) compatible with the current graphit module version

Parameters

file_version (:py:str) – graphit version to check

graphit.graph_io.io_helpers.open_anything(source, mode='r')

Open input available from a file, a Python file like object, standard input, a URL or a string and return a uniform Python file like object with standard methods.

Parameters
  • source (mixed) – Input as file, Python file like object, standard input, URL or a string

  • mode (string) – file access mode, defaults to ‘r’

Returns

Python file like object

class graphit.graph_io.io_helpers.FormatDetect(set_locale='en_US.UTF-8', decimal_point=None, thousands_sep=None)

Bases: object

Type cast string or unicode objects to float, integer or boolean.

Uses localization to identify

TODO: comma separated strings fail if one comma

parse(value, target_type=None)

Parse an unknown value to a float, integer, boolean or else remain in unicode.

Parameters
  • value – value to parse

  • target_type – type to convert to as ‘integer’, ‘number’, ‘string’, ‘boolean’ or automatic ‘detect’

Returns

parsed value

to_boolean(value)
to_detect(value)
static to_integer(value)
static to_number(value)
static to_string(value)
class graphit.graph_io.io_helpers.StreamReader(stream)

Bases: object

StreamReader class

Extention of the Python file like object (io class) to read data as flexible streams. Enables a stream to be read by character, or block of characters crossing file lines.

Parameters

stream – textual data that can be parsed as file-like object.

next()

Iterator next method

Returns next character in the iterations as long as there are characters left in the file-like object

Raises

StopIteration, if no more characters

read_upto_block(blocks, sep=(' ', '\n'), keep=False)

Return characters from active position up to a certain block of characters or the first occurrence of one of multiple blocks. A block is defined as a sequence of characters bounded by separator characters sep usually spaces and newline characters.

Parameters
  • blocks (:py:str, :py:list, :py:tuple) – block(s) to search for.

  • sep (:py:tuple, :py:list) – block seperation characters

  • keep (:py:bool) – keep the block to search for as part of the returned string

Returns

tuple of text segment and termination character

Return type

:py:tuple

read_upto_char(chars, keep=False)

Return characters from active position up to a certain character or the first occurrence of one of multiple characters.

Parameters
  • chars (:py:str, :py:list, :py:tuple) – character(s) to search for.

  • keep (:py:bool) – keep the character to search for as part of the returned string

Returns

tuple of text segment and termination character

Return type

:py:tuple

readline()

Returns ‘readline’ method of the base file-like object

set_cursor(position)

Move the file reader cursor to a new position in the file

Parameters

position (:py:int) – position to move to

slice(start, stop, step=1)

Text slice method.

Returns a segment of text defined by a start and stop character position relative to the start of the text.

Parameters
  • start (:py:int) – start character position

  • stop (:py:str) – stop character position

Return type

:py:str

tell()

Return current position of file cursor

Return type

:py:int

graphit.graph_io.io_jgf_format module

Functions for reading and writing graph files in the graphit .jgf JSON format

This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.

graphit.graph_io.io_jgf_format.read_jgf(jgf_format, graph=None)

Read JSON graph format (.jgf)

This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.

Format description. Primary key/value pairs: * graph: Graph class meta-data. Serializes all class attributes of type

int, float, bool, long, str or unicode.

  • nodes: Graph node identifiers (keys) and attributes (values)

  • edges: Graph enumerated edge identifiers

  • edge_attr: Graph edge attributes

Parameters
  • jgf_format (:py:str) – JSON encoded graph data to parse

  • graph (:graphit:Graph) – Graph object to import TGF data in

Returns

Graph object

Return type

Graph or GraphAxis object

graphit.graph_io.io_jgf_format.write_jgf(graph, indent=2, encoding='utf-8', **kwargs)

Write JSON graph format

This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.

Format description. Primary key/value pairs: * graph: Graph class meta-data. Serializes all class attributes of type

int, float, bool, long, str or unicode.

  • data: Graph meta-data dictionary

  • nodes: Graph node identifiers (keys) and attributes (values)

  • edges: Graph enumerated edge identifiers

  • edge_attr: Graph edge attributes

Parameters
  • graph (Graph or GraphAxis object) – graph object to serialize

  • indent (:py:int) – JSON indentation count

  • encoding (:py:str) – JSON string encoding

  • kwargs (:py:dic) – additional data to be stored as file meta data

Returns

JSON encoded graph dictionary

Return type

:py:str

graphit.graph_io.io_json_format module

Functions for importing and exporting JSON data into a graph data structure

graphit.graph_io.io_json_format.read_json(json_file, graph=None, **kwargs)

Parse (hierarchical) JSON data structure to a graph

Use the default Python json parser to parse the JSON file to a dictionary followed by io_dict_format.read_pydata to parse to a graph structure.

Additional keyword arguments (kwargs) are passed to read_pydata

Parameters
  • json_file (File, string, stream or URL) – json data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_json_format.write_json(graph, default=None, include_root=False, allow_none=True, **kwrags)

Export a graph to a (nested) JSON structure

Convert graph representation of the dictionary tree into JSON using a nested or flattened representation of the dictionary hierarchy.

Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default.

Additional keyword arguments (kwargs) are passed to json.dumps()

Parameters
  • graph (:graphit:GraphAxis) – Graph object to export

  • default (mixed) – value to use when node value was not found using value_tag.

  • include_root (:py:bool) – Include the root node in the hierarchy

  • root_nid – root node ID in graph hierarchy

  • allow_none (:py:bool) – allow None values in the output

Return type

:py:json

graphit.graph_io.io_jsonschema_format module

Functions for building and validating graphs based on a JSON schema definition. http://json-schema.org

graphit.graph_io.io_jsonschema_format.read_json_schema(schema, graph=None, exclude_args=None, resolve_ref=True)

Import hierarchical data structures defined in a JSON schema format

Parameters
  • schema (dict, file, string, stream or URL) – JSON Schema data format to import

  • graph (:graphit:Graph) – graph object to import TGF data in

  • exclude_args (:py:list) – JSON schema arguments to exclude from import

  • resolve_ref (:py:bool) – Parse JSON schema ‘definitions’

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_jsonschema_format_drafts module

Classes representing JSON Schema draft version as specified by http://json-schema.org.

class graphit.graph_io.io_jsonschema_format_drafts.StringType

Bases: graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07

set(key, value=None)

Set node attribute values.

Parameters
  • key – node attribute key

  • value – node attribute value

class graphit.graph_io.io_jsonschema_format_drafts.IntegerType

Bases: graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07

set(key, value=None)

Set node attribute values.

Parameters
  • key – node attribute key

  • value – node attribute value

class graphit.graph_io.io_jsonschema_format_drafts.BooleanType

Bases: graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07

set(key, value=None)

Set node attribute values.

Parameters
  • key – node attribute key

  • value – node attribute value

class graphit.graph_io.io_jsonschema_format_drafts.NumberType

Bases: graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07

set(key, value=None)

Set node attribute values.

Parameters
  • key – node attribute key

  • value – node attribute value

class graphit.graph_io.io_jsonschema_format_drafts.ArrayType

Bases: graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07

set(key, value=None)

Set node attribute values.

Parameters
  • key – node attribute key

  • value – node attribute value

graphit.graph_io.io_lgf_format module

Functions for reading and writing graphs defined in the LEMON Graph Format

Reference: http://lemon.cs.elte.hu/pub/doc/1.3/a00002.html

Citation: Balázs Dezső, Alpár Jüttner, Péter Kovács “LEMON – an Open Source

C++ Graph Template Library” (2011) Electronic Notes in Theoretical Computer Science, 264(5), 23-45

graphit.graph_io.io_lgf_format.read_lgf(lgf, graph=None)

Read graph in LEMON Graph Format (LGF)

Parameters
  • lgf (File, string, stream or URL) – LGF graph data.

  • graph (:graphit:Graph) – Graph object to import LGF data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_lgf_format.write_lgf(graph)

Write graph in LEMON Graph Format (LGF)

Parameters

graph (:graphit:Graph) – Graph object to import LGF data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_lgr_format module

Reading and writing graphs in LEDA format (.gw, .lgr).

The Library of Efficient Data types and Algorithms (LEDA) is a propitiatory licensed software library providing C++ implementations of a broad variety of algorithms for graph theory and computational geometry.

Specifications:

http://www.algorithmic-solutions.info/leda_guide/graphs/leda_native_graph_fileformat.html

Example:

#header section LEDA.GRAPH string int -1 #nodes section 5 |{v1}| |{v2}| |{v3}| |{v4}| |{v5}| #edges section 7 1 2 0 |{4}| 1 3 0 |{3}| 2 3 0 |{2}| 3 4 0 |{3}| 3 5 0 |{7}| 4 5 0 |{6}| 5 1 0 |{1}|

The LEDA graph format is a simple and a fast format always separated in a header, nodes and edges section. The header always starts with LEDA.GRAPH followed by the data type for node and edge data as string, int, float or boolean or ‘void’ if no data defined. The fourth line described directionality of the graph as directed (-1) or undirected (-2). The nodes section starts with the number of nodes followed by an ordered list of node labels (between |{}|) that are sequentially number starting from 1. The node labels are converted to the respective types as indicated in the header section. The edge section is similar to nodes but list for each edge the source and target nodes following the sequential number of the nodes, reversal number (not used) and edge data label (between |{}|).

graphit.graph_io.io_lgr_format.read_lgr(lgr, graph=None, edge_label='label')

Read graph in LEDA format

Nodes are added to the graph using a unique ID or with the node data as label depending if the graph.data.auto_nid is True or False. Edge data is added to the edge attributes using edge_label as key. The data types for both nodes and edges is set according to the specifications in the LEDA header as either string, int, float or bool.

Parameters
  • lgr (File, string, stream or URL) – LEDA graph data.

  • graph (:graphit:Graph) – Graph object to import LEDA data in

  • edge_label (:py:str) – edge data label name

Returns

Graph object

Return type

:graphit:Graph

Raises

TypeError if node/edge type conversion failed GraphitException in case of malformed LEDA file

graphit.graph_io.io_lgr_format.write_lgr(graph, node_key=None, edge_key=None, node_data_type='string', edge_data_type='void')

Export a graph to an LGR data format

The LEDA format allows for export of only one node or edge data type (as: |{data type}|). For nodes this is usually the node label and for edges any arbitrary data key,value pair. In both cases the data type is required to be of either: string, int, float or bool.

Nodes and edges are exported by iterating over them using iternodes and iteredges. Iteration uses the graphit Object Relations Mapper (ORM) allowing full control over the data export by overriding the get method globally in the ‘NodeTools’ or ‘EdgeTools’ classes or using custom classes registered with the ORM. Data returned by the get method will be serialized regardless the return type.

The node and edge data types are registered globally in the LENA file using node_data_type and edge_data_type set to ‘void’ (no data) by default.

Parameters
  • graph (:graphit:Graph) – Graph to export

  • node_key (:py:str) – key name of node data to export

  • edge_key (:py:str) – key name of edge data to export

  • node_data_type (:py:str) – primitive data type of exported node data

  • edge_data_type (:py:str) – primitive data type of exported edge data

Returns

Graph exported as LGR format

Return type

:py:str

Raises

GraphitException

graphit.graph_io.io_p2g_format module

Reading and writing graphs defined in P2G Graph Format (.p2g) used for representing metabolic pathways from the KEGG database.

A file that describes a uniquely labeled graph (with extension “.gr”) format looks like the following:

name 3 4 a 1 2 b

c 0 2

“name” is simply a description of what the graph corresponds to. The second line displays the number of nodes and number of edges, respectively. This sample graph contains three nodes labeled “a”, “b”, and “c”. The rest of the graph contains two lines for each node. The first line for a node contains the node label. After the declaration of the node label, the out-edges of that node in the graph are provided. For instance, “a” is linked to nodes 1 and 2, which are labeled “b” and “c”, while the node labeled “b” has no outgoing edges. Observe that node labeled “c” has an outgoing edge to itself. Indeed, self-loops are allowed. Node index starts from 0.

graphit.graph_io.io_p2g_format.read_p2g(p2g_file, graph=None)

Read graph in P2G format

Parameters
  • p2g_file (File, string, stream or URL) – P2G data to parse

  • graph (:graphit:Graph) – Graph object to import to or Graph by default

Returns

Graph instance

Return type

:graphit:Graph

graphit.graph_io.io_p2g_format.write_p2g(graph, graph_name_label='name')

Export a graphit graph to P2G format

Parameters
  • graph (:graphit:Graph) – Graph object to export

  • graph_name_label (:py:str) – graph.data attribute label for the graph name

Returns

P2G graph representation

Return type

:py:str

graphit.graph_io.io_pdb_format module

Reading and writing graphs in RCSB Protein DataBank format (.pdb).

The PDB molecular structure format is represented as GraphAxis graph using the Model-Segment-Residue-Atom (MSRA) hierarchical structure. The reader and writer support the official wwPDB guidelines for MODEL, ATOM, HETATM and CONECT records. Other records are not supported.

Reference and specification:
graphit.graph_io.io_pdb_format.read_pdb(pdb_file, graph=None, column_format={'atalt': (slice(16, 17, None), <class 'str'>), 'atname': (slice(12, 16, None), <class 'str'>), 'atnum': (slice(6, 12, None), <class 'int'>), 'b': (slice(60, 66, None), <class 'float'>), 'chain': (slice(21, 22, None), <class 'str'>), 'charge': (slice(78, 80, None), <class 'float'>), 'elem': (slice(76, 78, None), <class 'str'>), 'insert': (slice(26, 30, None), <class 'str'>), 'label': (slice(0, 6, None), <class 'str'>), 'occ': (slice(54, 60, None), <class 'float'>), 'resname': (slice(17, 21, None), <class 'str'>), 'resnum': (slice(22, 26, None), <class 'int'>), 'segid': (slice(72, 76, None), <class 'str'>), 'xcoor': (slice(30, 38, None), <class 'float'>), 'ycoor': (slice(38, 46, None), <class 'float'>), 'zcoor': (slice(46, 54, None), <class 'float'>)})

Parse RCSB Protein Data Bank (PDB) structure files to a graph

Builds a Model-Segment-Residue-Atom (MSRA) hierarchy of the structure in a GraphAxis graph. Primary structure data will be extracted from the columns in ATOM and HETATM lines. The data label, character positions and required type conversion are described by the column_format dictionary by default supporting wwPDB version 3.3 guidelines. CONECT records will represented as edges between atoms. These edges can be identified by the ‘label=conect’ attribute.

Note

The GraphAxis ‘auto_nid’ functionality will be enabled for the import to uniquely represent structures possibly sharing similar atom numbers (MODELS).

Parameters
  • pdb_file (File, string, stream or URL) – PDB data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

  • column_format (:py:dict) – ATOM/HETATM line label based slice records

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_pdb_format.write_pdb(graph, atom_format='{label:6}{atnum:>5} {atname:^4}{atalt:1}{resname:>3} {chain}{resnum:>4}{insert:1} {xcoor:8.3f}{ycoor:8.3f}{zcoor:8.3f}{occ:6.2f}{b:6.2f} {segid:>4}{elem:>2}{charge:>2}\n')

Export a Model-Segment-Residue-Atom (MSRA) graph structure as RCSB Protein Data Bank (PDB) structure file

PDB ATOM and HETATM lines are formatted using the atom_format string formatter using Python’s keyword based format() mini-language.

Parameters
  • graph (:graphit:graph) – Graph to export

  • atom_format (:py:str) – String formater for ATOM/HETATM lines

Returns

RCSB PDB string

Return type

:py:str

graphit.graph_io.io_pgf_format module

Reading and writing graphs defined in Propitiatory Graph Format (.pgf) a format specific to the graphit module.

PGF stores graphit graph data as plain text python dictionaries or as serialized byte stream using the Python pickle module. Graphit graphs can contain any hashable Python object as node (not just integers and strings). Storing a graph by “Pickling” it is probably the best way of representing arbitrary hashable data types. Both storage options are feature rich but not portable as they are (so far) only supported by graphit.

graphit.graph_io.io_pgf_format.read_pgf(pgf_file, graph=None, pickle_graph=False)

Import graph from Graph Python Format file

PGF format is the modules own file format consisting of a serialized graph data, nodes and edges dictionaries. Import either as plain text serialized dictionary or pickled graph object. The format is feature rich with good performance but is not portable.

Parameters
  • pgf_file (File, string, stream or URL) – PGF data to parse

  • graph (:graphit:Graph) – Graph object to import to or Graph by default

  • pickle_graph (:py:bool) – PGF format is a pickled graph

Returns

Graph instance

Return type

:graphit:Graph

graphit.graph_io.io_pgf_format.write_pgf(graph, pickle_graph=False)

Export graph as Graph Python Format file

PGF format is the modules own file format consisting of a serialized graph data, nodes and edges dictionaries. Exports either as plain text serialized dictionary or pickled graph object. The format is feature rich with good performance but is not portable.

Parameters
  • graph (:graphit:Graph) – Graph object to export

  • pickle_graph (:py:bool) – serialize the Graph using Python pickle

Returns

Graph in GPF graph representation

Return type

:py:str

graphit.graph_io.io_pydata_format module

Functions for importing and exporting (nested) Python data structures into graph data structures.

graphit.graph_io.io_pydata_format.read_pydata(data, graph=None, parser_classes=None, level=0)

Parse (hierarchical) python data structures to a graph

Many data formats are first parsed to a python structure before they are converted to a graph using the read_pydata function. The function supports any object that is an instance of, or behaves as, a Python dictionary, list, tuple or set and converts these (nested) structures to graph nodes and edges for connectivity. Data is stored in nodes using the node and edge ‘key_tag’ and ‘value_tag’ attributes in the Graph class.

Data type and format information are also stored as part of the nodes to enable reconstruction of the Python data structure on export using the write_pydata function. Changing type and format on a node or edge allows for customized data export.

Parsing of data structures to nodes and edges is handled by parser classes that need to define the methods deserialize for reading and serialize for writing. In write_pydata these classes are registered with the ORM to fully customize the use of the serialize method. In the read_pydata function the ORM cannot be used because the nodes/edges themselves do not yet exist. Instead they are provided as a dictionary through the parser_classes argument. The dictionary defines the string representation of the Python data type as key and parser class as value.

Parser customization is important as Python data structures can be represented as a graph structure in different ways. This is certainly true for dictionaries where key/value pairs can be part of the node attributes, as separate nodes or as a combination of the two. read_pydata has quick support for two scenario’s using the level argument:

  • level 0: every dictionary key/value pair is represented as a node regardless of its position in the nested data structure

  • level 1: all keys at the same level in the hierarchy that have a primitive type value are stored as part of the node attributes.

If the graph is empty, the first node added to the graph is assigned as root node. If the graph is not empty, new nodes and edges will be added to it as subgraph. Edge connections between the two will have to be made afterwards.

Parameters
  • data – Python (hierarchical) data structure

  • graph (:graphit:GraphAxis) – GraphAxis object to import dictionary data in

  • parser_classes (:py:dict) – parser class definition for different Python data types. Updates default classes for level 0 or 1

  • level (:py:int) – dictionary parsing mode

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_pydata_format.write_pydata(graph, default=None, allow_none=True, export_all=False, include_root=False)

Export a graph to a (nested) dictionary

Convert graph representation of the dictionary tree into a dictionary using a nested representation of the dictionary hierarchy.

Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default. Export using these primary key_tag/value_tag pairs is de default behaviour. If a node contains more data these can be exported as part of a dictionary using the export_all argument.

Note

export_all is important when dictionary data structures where imported using level=1 in read_pydata. In this case, all key value pairs at the same dictionary level are contained in the same node.

Node values that are ‘None’ are exported by default unless allow_none equals False. If the key_tag exists but value_tag is absent use default as default.

Note

if a graph is composed out of multiple, independent subgraphs only the subgraph for which the root node is defined will be exported. To export all, iterate over the subgraphs and define the appropriate root for each of them.

Parameters
  • graph (:graphit:GraphAxis) – Graph object to export

  • default (mixed) – value to use when node value was not found using value_tag.

  • allow_none (:py:bool) – allow None values in the output

  • export_all (:py:bool) – Export the full node storage dictionary.

  • include_root (:py:bool) – Include the root node in the hierarchy

Return type

:py:dict

graphit.graph_io.io_tgf_format module

Functions for reading and writing graphs defined in Trivial Graph Format (.tgf) a simple text-based file format for describing graphs. It consists of a list of node definitions, which map node IDs to labels, followed by a list of edges, which specify node pairs and an optional edge label. Node IDs can be arbitrary identifiers, whereas labels for both nodes and edges are plain strings.

The graph may be interpreted as a directed or undirected graph. For directed graphs, to specify the concept of bi-directionality in an edge, one may either specify two edges (forward and back) or differentiate the edge by means of a label.

TGF format only described the nodes themselves and edges connecting them. Node data (attributes) are not represented.

Example:

1 January 2 March 3 April 4 May 5 December 6 June 7 September # 1 2 3 2 4 3 5 1 Happy New Year! 5 3 April Fools Day 6 3 6 1 7 5 7 6 7 1

Reference: https://en.wikipedia.org/wiki/Trivial_Graph_Format

graphit.graph_io.io_tgf_format.read_tgf(tgf, graph=None)

Read graph in Trivial Graph Format

TGF format dictates that nodes to be listed in the file first with each node on a new line. A ‘#’ character signals the end of the node list and the start of the edge list.

Node and edge ID’s can be integers, float or strings. They are parsed automatically to their most likely format. Simple node and edge labels are supported in TGF as all characters that follow the node or edge ID’s. They are parsed and stored in the Graph node and edge data stores using the graphs default or custom ‘key_tag’.

TGF data is imported into a default Graph object if no custom Graph instance is provided. The graph behaviour and the data import process is influenced and can be controlled using a (custom) Graph class.

Note

TGF format always defines edges in a directed fashion. This is enforced even for custom graphs.

Parameters
  • tgf (File, string, stream or URL) – TGF graph data.

  • graph (:graphit:Graph) – Graph object to import TGF data in

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_tgf_format.write_tgf(graph)

Export a graph in Trivial Graph Format

TGF graph export uses the Graph iternodes and iteredges methods to retrieve nodes and edges and ‘get’ the data labels. The behaviour of this process is determined by the single node/edge mixin classes and the ORM mapper.

Parameters

graph (:graphit:Graph) – Graph object to export

Returns

TGF graph representation

Return type

:py:str

graphit.graph_io.io_web_format module

Functions for reading and writing hierarchical data structures defined by the Spider data modelling package as .web format files.

A .web format defines data blocks containing key, value pairs or Array data as hierarchically nested blocks enclosed using braces ‘(‘ and ‘),’ and indented for visual clarity.

Every data item in the format is written on a new line and is either a traditional key, value pair as: ‘key = value,’ or a single that are together combined into an array like:

key = FloatArray (

1.0, 2.0,

)

Key, value pairs and array type data structures can be freely combined such as in:

c2segments = LabeledRangePairArray (
LabeledRangePair (
r = LabeledRangeArray (
LabeledRange (

start = 1, end = 12, chain = ‘A’,

), LabeledRange (

start = 1, end = 12, chain = ‘B’,

),

),

),

),

The data inside a LabeledRange are key, value pairs but LabeledRange and also LabeledRangePair are array types. The latter two are stored as nodes in the graph and are automatically assigned a key as “itemX” where X is an incremented integer.

The type of any piece of data is loosely defined by a type identifier in front of every new data block that closely reassembles a Python style class name. The ‘FloatArray’ identifier in the expression above would be an example. These identifiers are usually coupled to classes in charge of data exchange by an object relations mapper such as the one used in the graphit package.

graphit.graph_io.io_web_format.read_web(web, graph=None, orm_data_tag='haddock_type', auto_parse_format=True)

Import hierarchical data structures defined in the Spider .web format

The data block type identifiers used in the .web format are stored in the nodes using the orm_data_tag attribute. These can be used by the Graph ORM mapper for custom data exchange in the graph.

Parameters
  • web (file, string, stream or URL) – Spider .web data format to import

  • graph (:graphit:Graph) – graph object to import TGF data in

  • orm_data_tag (:py:str) – data key to use for .web data identifier

  • auto_parse_format (:py:bool) – automatically detect basic format types using JSON decoding

Returns

Graph object

Return type

:graphit:Graph

graphit.graph_io.io_web_format.write_web(graph, orm_data_tag='haddock_type', indent=2, root_nid=None)

Export a graph in Spyder .web format

Empty data blocks or Python None values are not exported.

Web graph export uses the Graph iternodes and iteredges methods to retrieve nodes and edges and ‘get’ the data labels. The behaviour of this process is determined by the single node/edge mixin classes and the ORM mapper.

Parameters
  • graph (:graphit:Graph) – Graph object to export

  • orm_data_tag (:py:str) – data key to use for .web data identifier

  • indent (:py:int) – .web file white space indentation level

  • root_nid – Root node ID in graph hierarchy

Returns

Spyder .web graph representation

Return type

:py:str

graphit.graph_io.io_xml_format module

Functions for exporting and importing XML documents in a graph structure.

graphit.graph_io.io_xml_format.read_xml(xml_file, graph=None)

Parse hierarchical XML data structure to a graph

Uses the Python build-in etree cElementTree parser to parse the XML document and convert the elements into nodes. The XML element tag becomes the node key, XML text becomes the node value and XML attributes are added to the node as additional attributes.

Parameters
  • xml_file (File, string, stream or URL) – XML data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_xml_format.write_xml(graph, node_tools=<class 'graphit.graph_io.io_xml_format.XMLNodeTools'>)

Export a graph to an XML data format

Custom XML serializers may be introduced as a custom NodeTools class using the node_tools attribute. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.

Parameters
  • graph (:graphit:Graph) – Graph to export

  • node_tools (:graphit:NodeTools) – NodeTools class with node serialize method

Returns

Graph exported as a hierarchical XML node structure

Return type

:py:str

graphit.graph_io.io_yaml_format module

Functions for importing and exporting graphs in YAML format. Relies on PyYAML.

graphit.graph_io.io_yaml_format.read_yaml(yaml_file, graph=None, link_subgraphs=True, **kwargs)

Parse (hierarchical) YAML data structure to a graph

YAML files are parsed to a Python data structure that is converted to graph using the read_pydata method. Additional keyword arguments (kwargs) are passed to read_pydata

A YAML file may contain multiple separate data structures. The subgraphs that result from importing these separate data structures are linked to the root of the first imported graph by default. Changing the link_subgraphs attribute to False will not link the subgraphs.

Parameters
  • yaml_file (File, string, stream or URL) – yaml data to parse

  • graph (:graphit:Graph) – Graph object to import dictionary data in

  • link_subgraphs (:py:bool) – link subgraphs from separate data structures in the YAML file to the root graph

Returns

GraphAxis object

Return type

:graphit:GraphAxis

graphit.graph_io.io_yaml_format.write_yaml(graph, default=None, include_root=False, allow_none=True)

Export a graph to a (nested) JSON structure

Convert graph representation of the dictionary tree into JSON using a nested or flattened representation of the dictionary hierarchy.

Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default.

Parameters
  • graph (:graphit:GraphAxis) – Graph object to export

  • default (mixed) – value to use when node value was not found using value_tag.

  • include_root (:py:bool) – Include the root node in the hierarchy

  • allow_none (:py:bool) – allow None values in the output

Return type

:py:yaml

Module contents

Importing and exporting data structures as graphit graphs.

These include both data structures in a format dedicated to representing graphs and other (hierarchical) data that has a structure that could be represented as a graph.