graphit.graph_io package¶
graphit.graph_io.io_adl_format module¶
Reading and writing graphs as adjacency lists (.adl)
Adjacency lists are a simple textual representation of node identifiers and their linkage (adjacency) to one another.
The graph with edges a-b, a-c, d-e can be represented as the following adjacency list (anything following the # in a line is a comment):
a b c # source target target d e
-
graphit.graph_io.io_adl_format.
read_adl
(adl_file, graph=None)¶ Construct a graph from a adjacency list (ADL)
Note
the directionality of the graph is not defined explicitly in the adjacency list and thus depends on the graph.directional attribute that is False (undirectional) by default.
- Parameters
adl_file (File, string, stream or URL) – ADL graph data.
graph (:graphit:Graph) – Graph object to import ADL data in
- Returns
Graph object
- Return type
:graphit:Graph
-
graphit.graph_io.io_adl_format.
write_adl
(graph)¶ Export graph as adjacency list (ADL)
Note
This format does not store graph, node, or edge data.
- Parameters
graph (:graphit:Graph) – Graph object to export
- Returns
Graph object
- Return type
:py:str
graphit.graph_io.io_cwl_format module¶
Functions for importing data structures in Common Workflow Language format.
The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.
CWL data structures are stored in JSON or YAML format. The lie_graph CWL parser supports syntax version 1.0.2 as described here:
- Citation:
Peter Amstutz, Michael R. Crusoe, Nebojša Tijanić (editors), Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, Dan Leehr, Hervé Ménager, Maya Nedeljkovich, Matt Scales, Stian Soiland-Reyes, Luka Stojanovic (2016): Common Workflow Language, v1.0. Specification, Common Workflow Language working group. https://w3id.org/cwl/v1.0/ doi:10.6084/m9.figshare.3115156.v2
- For more information on CWL consult:
-
graphit.graph_io.io_cwl_format.
read_cwl
(cwl_file, graph=None, **kwargs)¶ Parse Common Wokflow Language data structures to a graph
Additional keyword arguments (kwargs) are passed to read_pydata
- Parameters
cwl_file (File, string, stream or URL) – CWL data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
graphit.graph_io.io_dot_format module¶
Functions for exporting and importing graphs to and from graph description language (DOT) format
-
graphit.graph_io.io_dot_format.
write_dot
(graph, graph_name=None)¶ DOT graphs are either directional (digraph) or undirectional, mixed mode is not supported.
Nodes and edges are all exported separably, short hand notations are not supported. Grouping and supgraphs are not supported. Graph attributes in graph.data, graph.edges and graph.nodes will be exported as DOT directives regardless if they are official GraphVis DOT graph directives as listed in the reference documentation:
Dot reserved rendering keywords part of the graphs global attributes in graph.data or part of the node and edge attributes are exported as part of the DOT graph.
- Parameters
graph (:graphit:Graph) – Graph object to export
graph_name (:py:str) – name of the ‘graph’ or ‘digraph’. Uses the ‘title’ attribute in graph.data by default, else graph_name
- Returns
DOT graph representation
- Return type
:py:str
-
graphit.graph_io.io_dot_format.
read_dot
(dot, graph=None)¶ Read graph in DOT format
- Parameters
dot (File, string, stream or URL) – DOT graph data.
graph (:graphit:Graph) – Graph object to import DOT data in
- Returns
Graph object
- Return type
:graphit:Graph
graphit.graph_io.io_flattened_data_format module¶
Functions for importing and exporting flattened (dot seperated) data structures
-
graphit.graph_io.io_flattened_data_format.
read_flattened
()¶
-
graphit.graph_io.io_flattened_data_format.
write_flattened
(graph, sep='.', default=None, allow_none=False, **kwargs)¶
graphit.graph_io.io_gexf_format module¶
Reading and writing graphs in GEXF format.
GEXF (Graph Exchange XML Format) is a language for describing complex network structures, their associated data and dynamics.
- Reference and specification:
-
graphit.graph_io.io_gexf_format.
read_gexf
(gexf_file, graph=None)¶ Read graphs in GEXF format
Uses the Python build-in etree cElementTree parser to parse the XML document and convert the elements into nodes. The XML element tag becomes the node key, XML text becomes the node value and XML attributes are added to the node as additional attributes.
- Parameters
gexf_file (File, string, stream or URL) – XML data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_gexf_format.
write_gexf
(graph, node_tools=<class 'graphit.graph_io.io_gexf_format.GEXFNodeTools'>, edge_tools=<class 'graphit.graph_io.io_gexf_format.GEXFEdgeTools'>)¶ Export a graph to an GEXF data format
Custom XML serializers may be introduced as a custom NodeTools class using the node_tools attribute. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.
- Parameters
graph (:graphit:Graph) – Graph to export
node_tools (:graphit:NodeTools) – NodeTools class with node serialize method
edge_tools (:graphit:EdgeTools) – EdgeTools class with node serialize method
- Returns
Graph exported as a hierarchical XML node structure
- Return type
:py:str
graphit.graph_io.io_gml_format module¶
Functions for exporting and importing graphs to and from graph modelling language (GML) format as described in the online documentation:
-
graphit.graph_io.io_gml_format.
read_gml
(gml, graph=None)¶ Read graph in GML format
- Parameters
gml (File, string, stream or URL) – GML graph data.
graph (:graphit:Graph) – Graph object to import GML data in
- Returns
Graph object
- Return type
:graphit:Graph
-
graphit.graph_io.io_gml_format.
write_gml
(graph, node_tools=None, edge_tools=None)¶ Export a graphit graph to GML format
Export graphit Graph data, nodes and edges in Graph Modelling Language (GML) format. The function replaces the graph NodeTools and EdgeTools with a custom version exposing a serialize method responsible for serializing the node/edge attributes in a GML format. The NodeTools class is also used to export Graph.data attributes.
Custom serializers may be introduced as custom NodeTools or EdgeTools classes using the node_tools and/or edge_tools attributes. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.
- Parameters
graph (:graphit:Graph) – Graph object to export
node_tools (:graphit:NodeTools) – NodeTools class with node serialize method
edge_tools (:graphit:EdgeTools) – EdgeTools class with edge serialize method
- Returns
GML graph representation
- Return type
:py:str
graphit.graph_io.io_helpers module¶
-
graphit.graph_io.io_helpers.
initial_node
(nodes)¶ Return node ID of node with smallest _ID identifier.
- Parameters
nodes – graph ‘nodes’ object
- Returns
node ID
-
graphit.graph_io.io_helpers.
resolve_root_node
(graph)¶ Resolve the node ID of the root node of the graph.
For Graph objects there is no strict concept of a root node and by default the ‘root’ attribute of the grpah is not defined. Here, the root will resolve to the node nid with the smallest _id number which usually is the first node added when the graph was created.
For GraphAxis object a root is essential for defining the graph hierarchy and thus, the graph ‘root’ attribute should be defined. If it is not defined it will also default to the node nid with the smallest _id number. If the user defined or default root is in the (sub)graph it is returned. If not, an attempt will be made to resolve it following:
If the graph is a single node, its node ID will be root.
If the graph has multiple nodes and the root is defined in the full_graph, return the node ID closest to the root
- Parameters
graph – graph to resolve root node for
- Returns
root node ID
-
graphit.graph_io.io_helpers.
coarse_type
(n)¶
-
graphit.graph_io.io_helpers.
check_graphit_version
(file_version)¶ Check if the graph version of the file is (backwards) compatible with the current graphit module version
- Parameters
file_version (:py:str) – graphit version to check
-
graphit.graph_io.io_helpers.
open_anything
(source, mode='r')¶ Open input available from a file, a Python file like object, standard input, a URL or a string and return a uniform Python file like object with standard methods.
- Parameters
source (mixed) – Input as file, Python file like object, standard input, URL or a string
mode (string) – file access mode, defaults to ‘r’
- Returns
Python file like object
-
class
graphit.graph_io.io_helpers.
FormatDetect
(set_locale='en_US.UTF-8', decimal_point=None, thousands_sep=None)¶ Bases:
object
Type cast string or unicode objects to float, integer or boolean.
Uses localization to identify
TODO: comma separated strings fail if one comma
-
parse
(value, target_type=None)¶ Parse an unknown value to a float, integer, boolean or else remain in unicode.
- Parameters
value – value to parse
target_type – type to convert to as ‘integer’, ‘number’, ‘string’, ‘boolean’ or automatic ‘detect’
- Returns
parsed value
-
to_boolean
(value)¶
-
to_detect
(value)¶
-
static
to_integer
(value)¶
-
static
to_number
(value)¶
-
static
to_string
(value)¶
-
-
class
graphit.graph_io.io_helpers.
StreamReader
(stream)¶ Bases:
object
StreamReader class
Extention of the Python file like object (io class) to read data as flexible streams. Enables a stream to be read by character, or block of characters crossing file lines.
- Parameters
stream – textual data that can be parsed as file-like object.
-
next
()¶ Iterator next method
Returns next character in the iterations as long as there are characters left in the file-like object
- Raises
StopIteration, if no more characters
-
read_upto_block
(blocks, sep=(' ', '\n'), keep=False)¶ Return characters from active position up to a certain block of characters or the first occurrence of one of multiple blocks. A block is defined as a sequence of characters bounded by separator characters sep usually spaces and newline characters.
- Parameters
blocks (:py:str, :py:list, :py:tuple) – block(s) to search for.
sep (:py:tuple, :py:list) – block seperation characters
keep (:py:bool) – keep the block to search for as part of the returned string
- Returns
tuple of text segment and termination character
- Return type
:py:tuple
-
read_upto_char
(chars, keep=False)¶ Return characters from active position up to a certain character or the first occurrence of one of multiple characters.
- Parameters
chars (:py:str, :py:list, :py:tuple) – character(s) to search for.
keep (:py:bool) – keep the character to search for as part of the returned string
- Returns
tuple of text segment and termination character
- Return type
:py:tuple
-
readline
()¶ Returns ‘readline’ method of the base file-like object
-
set_cursor
(position)¶ Move the file reader cursor to a new position in the file
- Parameters
position (:py:int) – position to move to
-
slice
(start, stop, step=1)¶ Text slice method.
Returns a segment of text defined by a start and stop character position relative to the start of the text.
- Parameters
start (:py:int) – start character position
stop (:py:str) – stop character position
- Return type
:py:str
-
tell
()¶ Return current position of file cursor
- Return type
:py:int
graphit.graph_io.io_jgf_format module¶
Functions for reading and writing graph files in the graphit .jgf JSON format
This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.
-
graphit.graph_io.io_jgf_format.
read_jgf
(jgf_format, graph=None)¶ Read JSON graph format (.jgf)
This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.
Format description. Primary key/value pairs: * graph: Graph class meta-data. Serializes all class attributes of type
int, float, bool, long, str or unicode.
nodes: Graph node identifiers (keys) and attributes (values)
edges: Graph enumerated edge identifiers
edge_attr: Graph edge attributes
- Parameters
jgf_format (:py:str) – JSON encoded graph data to parse
graph (:graphit:Graph) – Graph object to import TGF data in
- Returns
Graph object
- Return type
Graph or GraphAxis object
-
graphit.graph_io.io_jgf_format.
write_jgf
(graph, indent=2, encoding='utf-8', **kwargs)¶ Write JSON graph format
This is a propitiatory format in which the graph meta-data, the nodes, edges and their data dictionaries are stored in JSON format.
Format description. Primary key/value pairs: * graph: Graph class meta-data. Serializes all class attributes of type
int, float, bool, long, str or unicode.
data: Graph meta-data dictionary
nodes: Graph node identifiers (keys) and attributes (values)
edges: Graph enumerated edge identifiers
edge_attr: Graph edge attributes
- Parameters
graph (Graph or GraphAxis object) – graph object to serialize
indent (:py:int) – JSON indentation count
encoding (:py:str) – JSON string encoding
kwargs (:py:dic) – additional data to be stored as file meta data
- Returns
JSON encoded graph dictionary
- Return type
:py:str
graphit.graph_io.io_json_format module¶
Functions for importing and exporting JSON data into a graph data structure
-
graphit.graph_io.io_json_format.
read_json
(json_file, graph=None, **kwargs)¶ Parse (hierarchical) JSON data structure to a graph
Use the default Python json parser to parse the JSON file to a dictionary followed by io_dict_format.read_pydata to parse to a graph structure.
Additional keyword arguments (kwargs) are passed to read_pydata
- Parameters
json_file (File, string, stream or URL) – json data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_json_format.
write_json
(graph, default=None, include_root=False, allow_none=True, **kwrags)¶ Export a graph to a (nested) JSON structure
Convert graph representation of the dictionary tree into JSON using a nested or flattened representation of the dictionary hierarchy.
Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default.
Additional keyword arguments (kwargs) are passed to json.dumps()
- Parameters
graph (:graphit:GraphAxis) – Graph object to export
default (mixed) – value to use when node value was not found using value_tag.
include_root (:py:bool) – Include the root node in the hierarchy
root_nid – root node ID in graph hierarchy
allow_none (:py:bool) – allow None values in the output
- Return type
:py:json
graphit.graph_io.io_jsonschema_format module¶
Functions for building and validating graphs based on a JSON schema definition. http://json-schema.org
-
graphit.graph_io.io_jsonschema_format.
read_json_schema
(schema, graph=None, exclude_args=None, resolve_ref=True)¶ Import hierarchical data structures defined in a JSON schema format
- Parameters
schema (dict, file, string, stream or URL) – JSON Schema data format to import
graph (:graphit:Graph) – graph object to import TGF data in
exclude_args (:py:list) – JSON schema arguments to exclude from import
resolve_ref (:py:bool) – Parse JSON schema ‘definitions’
- Returns
Graph object
- Return type
:graphit:Graph
graphit.graph_io.io_jsonschema_format_drafts module¶
Classes representing JSON Schema draft version as specified by http://json-schema.org.
-
class
graphit.graph_io.io_jsonschema_format_drafts.
StringType
¶ Bases:
graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07
-
set
(key, value=None)¶ Set node attribute values.
- Parameters
key – node attribute key
value – node attribute value
-
-
class
graphit.graph_io.io_jsonschema_format_drafts.
IntegerType
¶ Bases:
graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07
-
set
(key, value=None)¶ Set node attribute values.
- Parameters
key – node attribute key
value – node attribute value
-
-
class
graphit.graph_io.io_jsonschema_format_drafts.
BooleanType
¶ Bases:
graphit.graph_io.io_jsonschema_format_drafts.JSONSchemaValidatorDraft07
-
set
(key, value=None)¶ Set node attribute values.
- Parameters
key – node attribute key
value – node attribute value
-
graphit.graph_io.io_lgf_format module¶
Functions for reading and writing graphs defined in the LEMON Graph Format
Reference: http://lemon.cs.elte.hu/pub/doc/1.3/a00002.html
- Citation: Balázs Dezső, Alpár Jüttner, Péter Kovács “LEMON – an Open Source
C++ Graph Template Library” (2011) Electronic Notes in Theoretical Computer Science, 264(5), 23-45
-
graphit.graph_io.io_lgf_format.
read_lgf
(lgf, graph=None)¶ Read graph in LEMON Graph Format (LGF)
- Parameters
lgf (File, string, stream or URL) – LGF graph data.
graph (:graphit:Graph) – Graph object to import LGF data in
- Returns
Graph object
- Return type
:graphit:Graph
-
graphit.graph_io.io_lgf_format.
write_lgf
(graph)¶ Write graph in LEMON Graph Format (LGF)
- Parameters
graph (:graphit:Graph) – Graph object to import LGF data in
- Returns
Graph object
- Return type
:graphit:Graph
graphit.graph_io.io_lgr_format module¶
Reading and writing graphs in LEDA format (.gw, .lgr).
The Library of Efficient Data types and Algorithms (LEDA) is a propitiatory licensed software library providing C++ implementations of a broad variety of algorithms for graph theory and computational geometry.
- Specifications:
http://www.algorithmic-solutions.info/leda_guide/graphs/leda_native_graph_fileformat.html
Example:
The LEDA graph format is a simple and a fast format always separated in a header, nodes and edges section. The header always starts with LEDA.GRAPH followed by the data type for node and edge data as string, int, float or boolean or ‘void’ if no data defined. The fourth line described directionality of the graph as directed (-1) or undirected (-2). The nodes section starts with the number of nodes followed by an ordered list of node labels (between |{}|) that are sequentially number starting from 1. The node labels are converted to the respective types as indicated in the header section. The edge section is similar to nodes but list for each edge the source and target nodes following the sequential number of the nodes, reversal number (not used) and edge data label (between |{}|).
-
graphit.graph_io.io_lgr_format.
read_lgr
(lgr, graph=None, edge_label='label')¶ Read graph in LEDA format
Nodes are added to the graph using a unique ID or with the node data as label depending if the graph.data.auto_nid is True or False. Edge data is added to the edge attributes using edge_label as key. The data types for both nodes and edges is set according to the specifications in the LEDA header as either string, int, float or bool.
- Parameters
lgr (File, string, stream or URL) – LEDA graph data.
graph (:graphit:Graph) – Graph object to import LEDA data in
edge_label (:py:str) – edge data label name
- Returns
Graph object
- Return type
:graphit:Graph
- Raises
TypeError if node/edge type conversion failed GraphitException in case of malformed LEDA file
-
graphit.graph_io.io_lgr_format.
write_lgr
(graph, node_key=None, edge_key=None, node_data_type='string', edge_data_type='void')¶ Export a graph to an LGR data format
The LEDA format allows for export of only one node or edge data type (as: |{data type}|). For nodes this is usually the node label and for edges any arbitrary data key,value pair. In both cases the data type is required to be of either: string, int, float or bool.
Nodes and edges are exported by iterating over them using iternodes and iteredges. Iteration uses the graphit Object Relations Mapper (ORM) allowing full control over the data export by overriding the get method globally in the ‘NodeTools’ or ‘EdgeTools’ classes or using custom classes registered with the ORM. Data returned by the get method will be serialized regardless the return type.
The node and edge data types are registered globally in the LENA file using node_data_type and edge_data_type set to ‘void’ (no data) by default.
- Parameters
graph (:graphit:Graph) – Graph to export
node_key (:py:str) – key name of node data to export
edge_key (:py:str) – key name of edge data to export
node_data_type (:py:str) – primitive data type of exported node data
edge_data_type (:py:str) – primitive data type of exported edge data
- Returns
Graph exported as LGR format
- Return type
:py:str
- Raises
GraphitException
graphit.graph_io.io_p2g_format module¶
Reading and writing graphs defined in P2G Graph Format (.p2g) used for representing metabolic pathways from the KEGG database.
A file that describes a uniquely labeled graph (with extension “.gr”) format looks like the following:
name 3 4 a 1 2 b
c 0 2
“name” is simply a description of what the graph corresponds to. The second line displays the number of nodes and number of edges, respectively. This sample graph contains three nodes labeled “a”, “b”, and “c”. The rest of the graph contains two lines for each node. The first line for a node contains the node label. After the declaration of the node label, the out-edges of that node in the graph are provided. For instance, “a” is linked to nodes 1 and 2, which are labeled “b” and “c”, while the node labeled “b” has no outgoing edges. Observe that node labeled “c” has an outgoing edge to itself. Indeed, self-loops are allowed. Node index starts from 0.
-
graphit.graph_io.io_p2g_format.
read_p2g
(p2g_file, graph=None)¶ Read graph in P2G format
- Parameters
p2g_file (File, string, stream or URL) – P2G data to parse
graph (:graphit:Graph) – Graph object to import to or Graph by default
- Returns
Graph instance
- Return type
:graphit:Graph
-
graphit.graph_io.io_p2g_format.
write_p2g
(graph, graph_name_label='name')¶ Export a graphit graph to P2G format
- Parameters
graph (:graphit:Graph) – Graph object to export
graph_name_label (:py:str) – graph.data attribute label for the graph name
- Returns
P2G graph representation
- Return type
:py:str
graphit.graph_io.io_pdb_format module¶
Reading and writing graphs in RCSB Protein DataBank format (.pdb).
The PDB molecular structure format is represented as GraphAxis graph using the Model-Segment-Residue-Atom (MSRA) hierarchical structure. The reader and writer support the official wwPDB guidelines for MODEL, ATOM, HETATM and CONECT records. Other records are not supported.
- Reference and specification:
-
graphit.graph_io.io_pdb_format.
read_pdb
(pdb_file, graph=None, column_format={'atalt': (slice(16, 17, None), <class 'str'>), 'atname': (slice(12, 16, None), <class 'str'>), 'atnum': (slice(6, 12, None), <class 'int'>), 'b': (slice(60, 66, None), <class 'float'>), 'chain': (slice(21, 22, None), <class 'str'>), 'charge': (slice(78, 80, None), <class 'float'>), 'elem': (slice(76, 78, None), <class 'str'>), 'insert': (slice(26, 30, None), <class 'str'>), 'label': (slice(0, 6, None), <class 'str'>), 'occ': (slice(54, 60, None), <class 'float'>), 'resname': (slice(17, 21, None), <class 'str'>), 'resnum': (slice(22, 26, None), <class 'int'>), 'segid': (slice(72, 76, None), <class 'str'>), 'xcoor': (slice(30, 38, None), <class 'float'>), 'ycoor': (slice(38, 46, None), <class 'float'>), 'zcoor': (slice(46, 54, None), <class 'float'>)})¶ Parse RCSB Protein Data Bank (PDB) structure files to a graph
Builds a Model-Segment-Residue-Atom (MSRA) hierarchy of the structure in a GraphAxis graph. Primary structure data will be extracted from the columns in ATOM and HETATM lines. The data label, character positions and required type conversion are described by the column_format dictionary by default supporting wwPDB version 3.3 guidelines. CONECT records will represented as edges between atoms. These edges can be identified by the ‘label=conect’ attribute.
Note
The GraphAxis ‘auto_nid’ functionality will be enabled for the import to uniquely represent structures possibly sharing similar atom numbers (MODELS).
- Parameters
pdb_file (File, string, stream or URL) – PDB data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
column_format (:py:dict) – ATOM/HETATM line label based slice records
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_pdb_format.
write_pdb
(graph, atom_format='{label:6}{atnum:>5} {atname:^4}{atalt:1}{resname:>3} {chain}{resnum:>4}{insert:1} {xcoor:8.3f}{ycoor:8.3f}{zcoor:8.3f}{occ:6.2f}{b:6.2f} {segid:>4}{elem:>2}{charge:>2}\n')¶ Export a Model-Segment-Residue-Atom (MSRA) graph structure as RCSB Protein Data Bank (PDB) structure file
PDB ATOM and HETATM lines are formatted using the atom_format string formatter using Python’s keyword based format() mini-language.
- Parameters
graph (:graphit:graph) – Graph to export
atom_format (:py:str) – String formater for ATOM/HETATM lines
- Returns
RCSB PDB string
- Return type
:py:str
graphit.graph_io.io_pgf_format module¶
Reading and writing graphs defined in Propitiatory Graph Format (.pgf) a format specific to the graphit module.
PGF stores graphit graph data as plain text python dictionaries or as serialized byte stream using the Python pickle module. Graphit graphs can contain any hashable Python object as node (not just integers and strings). Storing a graph by “Pickling” it is probably the best way of representing arbitrary hashable data types. Both storage options are feature rich but not portable as they are (so far) only supported by graphit.
-
graphit.graph_io.io_pgf_format.
read_pgf
(pgf_file, graph=None, pickle_graph=False)¶ Import graph from Graph Python Format file
PGF format is the modules own file format consisting of a serialized graph data, nodes and edges dictionaries. Import either as plain text serialized dictionary or pickled graph object. The format is feature rich with good performance but is not portable.
- Parameters
pgf_file (File, string, stream or URL) – PGF data to parse
graph (:graphit:Graph) – Graph object to import to or Graph by default
pickle_graph (:py:bool) – PGF format is a pickled graph
- Returns
Graph instance
- Return type
:graphit:Graph
-
graphit.graph_io.io_pgf_format.
write_pgf
(graph, pickle_graph=False)¶ Export graph as Graph Python Format file
PGF format is the modules own file format consisting of a serialized graph data, nodes and edges dictionaries. Exports either as plain text serialized dictionary or pickled graph object. The format is feature rich with good performance but is not portable.
- Parameters
graph (:graphit:Graph) – Graph object to export
pickle_graph (:py:bool) – serialize the Graph using Python pickle
- Returns
Graph in GPF graph representation
- Return type
:py:str
graphit.graph_io.io_pydata_format module¶
Functions for importing and exporting (nested) Python data structures into graph data structures.
-
graphit.graph_io.io_pydata_format.
read_pydata
(data, graph=None, parser_classes=None, level=0)¶ Parse (hierarchical) python data structures to a graph
Many data formats are first parsed to a python structure before they are converted to a graph using the read_pydata function. The function supports any object that is an instance of, or behaves as, a Python dictionary, list, tuple or set and converts these (nested) structures to graph nodes and edges for connectivity. Data is stored in nodes using the node and edge ‘key_tag’ and ‘value_tag’ attributes in the Graph class.
Data type and format information are also stored as part of the nodes to enable reconstruction of the Python data structure on export using the write_pydata function. Changing type and format on a node or edge allows for customized data export.
Parsing of data structures to nodes and edges is handled by parser classes that need to define the methods deserialize for reading and serialize for writing. In write_pydata these classes are registered with the ORM to fully customize the use of the serialize method. In the read_pydata function the ORM cannot be used because the nodes/edges themselves do not yet exist. Instead they are provided as a dictionary through the parser_classes argument. The dictionary defines the string representation of the Python data type as key and parser class as value.
Parser customization is important as Python data structures can be represented as a graph structure in different ways. This is certainly true for dictionaries where key/value pairs can be part of the node attributes, as separate nodes or as a combination of the two. read_pydata has quick support for two scenario’s using the level argument:
level 0: every dictionary key/value pair is represented as a node regardless of its position in the nested data structure
level 1: all keys at the same level in the hierarchy that have a primitive type value are stored as part of the node attributes.
If the graph is empty, the first node added to the graph is assigned as root node. If the graph is not empty, new nodes and edges will be added to it as subgraph. Edge connections between the two will have to be made afterwards.
- Parameters
data – Python (hierarchical) data structure
graph (:graphit:GraphAxis) – GraphAxis object to import dictionary data in
parser_classes (:py:dict) – parser class definition for different Python data types. Updates default classes for level 0 or 1
level (:py:int) – dictionary parsing mode
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_pydata_format.
write_pydata
(graph, default=None, allow_none=True, export_all=False, include_root=False)¶ Export a graph to a (nested) dictionary
Convert graph representation of the dictionary tree into a dictionary using a nested representation of the dictionary hierarchy.
Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default. Export using these primary key_tag/value_tag pairs is de default behaviour. If a node contains more data these can be exported as part of a dictionary using the export_all argument.
Note
export_all is important when dictionary data structures where imported using level=1 in read_pydata. In this case, all key value pairs at the same dictionary level are contained in the same node.
Node values that are ‘None’ are exported by default unless allow_none equals False. If the key_tag exists but value_tag is absent use default as default.
Note
if a graph is composed out of multiple, independent subgraphs only the subgraph for which the root node is defined will be exported. To export all, iterate over the subgraphs and define the appropriate root for each of them.
- Parameters
graph (:graphit:GraphAxis) – Graph object to export
default (mixed) – value to use when node value was not found using value_tag.
allow_none (:py:bool) – allow None values in the output
export_all (:py:bool) – Export the full node storage dictionary.
include_root (:py:bool) – Include the root node in the hierarchy
- Return type
:py:dict
graphit.graph_io.io_tgf_format module¶
Functions for reading and writing graphs defined in Trivial Graph Format (.tgf) a simple text-based file format for describing graphs. It consists of a list of node definitions, which map node IDs to labels, followed by a list of edges, which specify node pairs and an optional edge label. Node IDs can be arbitrary identifiers, whereas labels for both nodes and edges are plain strings.
The graph may be interpreted as a directed or undirected graph. For directed graphs, to specify the concept of bi-directionality in an edge, one may either specify two edges (forward and back) or differentiate the edge by means of a label.
TGF format only described the nodes themselves and edges connecting them. Node data (attributes) are not represented.
- Example:
1 January 2 March 3 April 4 May 5 December 6 June 7 September # 1 2 3 2 4 3 5 1 Happy New Year! 5 3 April Fools Day 6 3 6 1 7 5 7 6 7 1
Reference: https://en.wikipedia.org/wiki/Trivial_Graph_Format
-
graphit.graph_io.io_tgf_format.
read_tgf
(tgf, graph=None)¶ Read graph in Trivial Graph Format
TGF format dictates that nodes to be listed in the file first with each node on a new line. A ‘#’ character signals the end of the node list and the start of the edge list.
Node and edge ID’s can be integers, float or strings. They are parsed automatically to their most likely format. Simple node and edge labels are supported in TGF as all characters that follow the node or edge ID’s. They are parsed and stored in the Graph node and edge data stores using the graphs default or custom ‘key_tag’.
TGF data is imported into a default Graph object if no custom Graph instance is provided. The graph behaviour and the data import process is influenced and can be controlled using a (custom) Graph class.
Note
TGF format always defines edges in a directed fashion. This is enforced even for custom graphs.
- Parameters
tgf (File, string, stream or URL) – TGF graph data.
graph (:graphit:Graph) – Graph object to import TGF data in
- Returns
Graph object
- Return type
:graphit:Graph
-
graphit.graph_io.io_tgf_format.
write_tgf
(graph)¶ Export a graph in Trivial Graph Format
TGF graph export uses the Graph iternodes and iteredges methods to retrieve nodes and edges and ‘get’ the data labels. The behaviour of this process is determined by the single node/edge mixin classes and the ORM mapper.
- Parameters
graph (:graphit:Graph) – Graph object to export
- Returns
TGF graph representation
- Return type
:py:str
graphit.graph_io.io_web_format module¶
Functions for reading and writing hierarchical data structures defined by the Spider data modelling package as .web format files.
A .web format defines data blocks containing key, value pairs or Array data as hierarchically nested blocks enclosed using braces ‘(‘ and ‘),’ and indented for visual clarity.
Every data item in the format is written on a new line and is either a traditional key, value pair as: ‘key = value,’ or a single that are together combined into an array like:
- key = FloatArray (
1.0, 2.0,
)
Key, value pairs and array type data structures can be freely combined such as in:
- c2segments = LabeledRangePairArray (
- LabeledRangePair (
- r = LabeledRangeArray (
- LabeledRange (
start = 1, end = 12, chain = ‘A’,
), LabeledRange (
start = 1, end = 12, chain = ‘B’,
),
),
),
),
The data inside a LabeledRange are key, value pairs but LabeledRange and also LabeledRangePair are array types. The latter two are stored as nodes in the graph and are automatically assigned a key as “itemX” where X is an incremented integer.
The type of any piece of data is loosely defined by a type identifier in front of every new data block that closely reassembles a Python style class name. The ‘FloatArray’ identifier in the expression above would be an example. These identifiers are usually coupled to classes in charge of data exchange by an object relations mapper such as the one used in the graphit package.
-
graphit.graph_io.io_web_format.
read_web
(web, graph=None, orm_data_tag='haddock_type', auto_parse_format=True)¶ Import hierarchical data structures defined in the Spider .web format
The data block type identifiers used in the .web format are stored in the nodes using the orm_data_tag attribute. These can be used by the Graph ORM mapper for custom data exchange in the graph.
- Parameters
web (file, string, stream or URL) – Spider .web data format to import
graph (:graphit:Graph) – graph object to import TGF data in
orm_data_tag (:py:str) – data key to use for .web data identifier
auto_parse_format (:py:bool) – automatically detect basic format types using JSON decoding
- Returns
Graph object
- Return type
:graphit:Graph
-
graphit.graph_io.io_web_format.
write_web
(graph, orm_data_tag='haddock_type', indent=2, root_nid=None)¶ Export a graph in Spyder .web format
Empty data blocks or Python None values are not exported.
Web graph export uses the Graph iternodes and iteredges methods to retrieve nodes and edges and ‘get’ the data labels. The behaviour of this process is determined by the single node/edge mixin classes and the ORM mapper.
- Parameters
graph (:graphit:Graph) – Graph object to export
orm_data_tag (:py:str) – data key to use for .web data identifier
indent (:py:int) – .web file white space indentation level
root_nid – Root node ID in graph hierarchy
- Returns
Spyder .web graph representation
- Return type
:py:str
graphit.graph_io.io_xml_format module¶
Functions for exporting and importing XML documents in a graph structure.
-
graphit.graph_io.io_xml_format.
read_xml
(xml_file, graph=None)¶ Parse hierarchical XML data structure to a graph
Uses the Python build-in etree cElementTree parser to parse the XML document and convert the elements into nodes. The XML element tag becomes the node key, XML text becomes the node value and XML attributes are added to the node as additional attributes.
- Parameters
xml_file (File, string, stream or URL) – XML data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_xml_format.
write_xml
(graph, node_tools=<class 'graphit.graph_io.io_xml_format.XMLNodeTools'>)¶ Export a graph to an XML data format
Custom XML serializers may be introduced as a custom NodeTools class using the node_tools attribute. In addition, the graph ORM may be used to inject tailored serialize methods in specific nodes or edges.
- Parameters
graph (:graphit:Graph) – Graph to export
node_tools (:graphit:NodeTools) – NodeTools class with node serialize method
- Returns
Graph exported as a hierarchical XML node structure
- Return type
:py:str
graphit.graph_io.io_yaml_format module¶
Functions for importing and exporting graphs in YAML format. Relies on PyYAML.
-
graphit.graph_io.io_yaml_format.
read_yaml
(yaml_file, graph=None, link_subgraphs=True, **kwargs)¶ Parse (hierarchical) YAML data structure to a graph
YAML files are parsed to a Python data structure that is converted to graph using the read_pydata method. Additional keyword arguments (kwargs) are passed to read_pydata
A YAML file may contain multiple separate data structures. The subgraphs that result from importing these separate data structures are linked to the root of the first imported graph by default. Changing the link_subgraphs attribute to False will not link the subgraphs.
- Parameters
yaml_file (File, string, stream or URL) – yaml data to parse
graph (:graphit:Graph) – Graph object to import dictionary data in
link_subgraphs (:py:bool) – link subgraphs from separate data structures in the YAML file to the root graph
- Returns
GraphAxis object
- Return type
:graphit:GraphAxis
-
graphit.graph_io.io_yaml_format.
write_yaml
(graph, default=None, include_root=False, allow_none=True)¶ Export a graph to a (nested) JSON structure
Convert graph representation of the dictionary tree into JSON using a nested or flattened representation of the dictionary hierarchy.
Dictionary keys and values are obtained from the node attributes using key_tag and value_tag. The key_tag is set to graph key_tag by default.
- Parameters
graph (:graphit:GraphAxis) – Graph object to export
default (mixed) – value to use when node value was not found using value_tag.
include_root (:py:bool) – Include the root node in the hierarchy
allow_none (:py:bool) – allow None values in the output
- Return type
:py:yaml
Module contents¶
Importing and exporting data structures as graphit graphs.
These include both data structures in a format dedicated to representing graphs and other (hierarchical) data that has a structure that could be represented as a graph.