🚀 Quick Start
The PyNeuraLogic library serves for learning on structured data. This quick start guide will showcase one of its uses on graph structures. Nevertheless, the library is directly applicable to more complex structures, such as relational databases.
Tip
Check out one of the runnable 🔬 Examples in Google Colab!
Graph Representation
Graphs are structures describing entities (vertices) and relations (edges) between them. In this guide, we will look into how to encode graphs as inputs in different formats and how to learn on graphs.
Tensor Representation
In PyNeuraLogic, you can encode input graphs in various formats depending on your preferences. One such format is a tensor format that you might
already know from other GNN-focused frameworks and libraries. The input graph is
represented in a graph connectivity format, i.e., tensor of shape [2, num_of_edges]
. The features are encoded
via tensor of shape [num_of_nodes, num_of_features]
.
Let’s consider a simple undirected graph shown above. We can simply encode the structure of the graph (edges) via the
edge_index
property and nodes’ features via the x
property of class Data
, which encapsulates graphs’ data. We can also assign a label to each node via
the y
property.
The TensorDataset
instance then holds a list of such graphs tensor representations
(Data
instances) and can be fed into models
from neuralogic.dataset import Data, TensorDataset
data = Data(
edge_index=[
[0, 1, 1, 2, 2, 0],
[1, 0, 2, 1, 0, 2],
],
x=[[0], [1], [-1]],
y=[[1], [0], [1]],
y_mask=[0, 1, 2],
)
dataset = TensorDataset(data=[data])
Logic Representation
The tensor representation works well for elementary use cases, but it can be quite limiting for more complex inputs. Not everything can be easily aligned and fitted into a few tensors, and working with tensors can get quickly cumbersome. That’s where the logic representation comes in with its high expressiveness.
The logic format is based on relational logic constructs to encode the input data, such as graphs. Those constructs are
mainly so-called facts, which are represented in PyNeuraLogic as Relation.predicate_name(...terms)[value]
.
The Dataset
class contains a set of fact lists representing input graphs. The encoding of the previously shown simple graph can look like the following:
from neuralogic.core import Relation
from neuralogic.dataset import Dataset
dataset = Dataset()
dataset.add_example([
Relation.edge(0, 1), Relation.edge(1, 2), Relation.edge(2, 0),
Relation.edge(1, 0), Relation.edge(2, 1), Relation.edge(0, 2),
Relation.node_feature(0)[0],
Relation.node_feature(1)[1],
Relation.node_feature(2)[-1],
])
As you can see, this encoding can be pretty lengthy, but at the same time, it gives us multiple benefits over the tensor
representation. For example, nothing stops you from adding edge features, such as Relation.edge(0, 1)[1.0]
,
or even introducing hypergraphs, such as Relation.edge(0, 1, 2)
(read more about Hypergraph Neural Networks).
Note
We used the edge as the predicate name (Relation.edge
) to represent the graph edges and the feature (Relation.node_feature
) to represent nodes’ features. This naming is arbitrary -
edges and any other input data can have any predicate name. In this documentation, we will stick to edge predicate name for
representing edges and feature predicate name for representing features.
To assign labels, we use queries. Labels can be assigned to basically anything - nodes, graphs, sub-graphs, etc. In this example, we will label nodes, just like in the case of tensor format representation.
dataset.add_queries([
Relation.predict(0)[1],
Relation.predict(1)[0],
Relation.predict(2)[1],
])
Note
The name Relation.predict
refers to the output layer of our model, which we will define in the next section.
Model Definition
Models in PyNeuraLogic are not just particular computational graphs, as common in classic deep learning, but can be viewed more generally as templates for (differentiable) computation.
The template structure is encoded in the instance of the Template
class via relational rules or, for convenience, pre-defined modules (which are also expanded into said rules, check out the 🦓 Module Zoo for a list of modules).
from neuralogic.core import Template, Settings
from neuralogic.nn.module import GCNConv
template = Template()
template.add_module(
GCNConv(in_channels=1, out_channels=5, output_name="h0", feature_name="node_feature", edge_name="edge")
)
template.add_module(
GCNConv(in_channels=5, out_channels=1, output_name="predict", feature_name="h0", edge_name="edge")
)
Here we defined two GCNConv
layers via pre-defined modules.
We further discuss template definition via the rule format, which forms the core advantage of this framework, in the section of the documentation.
Evaluating Model
Now when we have our template defined, we have to get (build) the model from the template to be able to run training and inference on it.
We do that by calling the build
method.
from neuralogic.core import Settings
from neuralogic.optim import SGD
settings = Settings(optimizer=SGD(lr=0.01), epochs=100)
model = template.build(Settings())
The input dataset that we are trying to evaluate/train has to be also built. When we have the built dataset and model, performing the forward and backward propagation is straightforward.
built_dataset = model.build_dataset(dataset)
model.train() # or model.test() to change the mode
output = model(built_dataset)
Evaluators
For faster prototyping, we have prepared evaluators which encapsulate helpers, such as training loop and
evaluation. Evaluators can then be customized via various settings wrapped in the Settings
class.
from neuralogic.nn import get_evaluator
from neuralogic.core import Settings
from neuralogic.optim import SGD
settings = Settings(optimizer=SGD(lr=0.01), epochs=100)
evaluator = get_evaluator(template, settings)
built_dataset = evaluator.build_dataset(dataset)
evaluator.train(built_dataset, generator=False)