neuralogic.dataset

Available dataset formats


class Dataset(samples: List[Sample] | Sample | None = None)[source]

Dataset encapsulating (learning) samples in the form of logic format, allowing users to fully take advantage of the PyNeuraLogic library.

class FileDataset(examples_file: str | None = None, queries_file: str | None = None)[source]

FileDataset represents samples stored in files in the NeuraLogic (logic) format.

Parameters:
  • examples_file (Optional[str]) – Path to the examples file. Default: None

  • queries_file (Optional[str]) – Path to the queries file. Default: None

class Data(x: Sequence, edge_index: Sequence, y: Sequence | float | int, edge_attr: Sequence | None = None, y_mask: Sequence | None = None)[source]

The Data instance stores information about one specific graph instance.

Example

For example, the directed graph \(G = (V, E)\), where \(E = \{(0, 1), (1, 2), (2, 0)\}\), node features \(X = \{[0], [1], [0]\}\) and target nodes’ labels \(Y = \{0, 1, 0\}\) would be represented as:

data = Data(
    x=[[0], [1], [0]],
    edge_index=[
        [0, 1, 2],
        [1, 2, 0],
    ],
    y=[0, 1, 0],
)
Parameters:
  • x (Sequence) – Sequence of node features.

  • edge_index (Sequence) – Edges represented via a graph connectivity format - matrix [[...src], [...dst]].

  • y (Union[Sequence, float, int]) – Sequence of labels of all nodes or one graph label.

  • edge_attr (Optional[Sequence]) – Optional sequence of edge features. Default: None

  • y_mask (Optional[Sequence]) – Optional sequence of node ids to generate queries for. Default: None (all nodes)

static from_pyg(data) List[Data][source]

Converts a PyTorch Geometric Data instance into a list of PyNeuraLogic Data instances. The conversion supports train_mask, test_mask and val_mask attributes - for each mask the conversion yields a new data instance.

Parameters:

data – The PyTorch Geometric Data instance

Returns:

The list of PyNeuraLogic Data instances

class TensorDataset(data: List[Data], one_hot_encode_labels: bool = False, one_hot_decode_features: bool = False, one_hot_decode_edge_features: bool = False, number_of_classes: int = 1, feature_name: str = 'node_feature', edge_name: str = 'edge', output_name: str = 'predict')[source]

The TensorDataset holds a list of Data instances - a list of graphs represented in a tensor format.

Parameters:
  • data (List[Data]) – List of data (graph) instances.

  • one_hot_encode_labels (bool) – Turn numerical labels into one hot encoded vectors - e.g., label 2 would be turned into a vector [0, 0, 1, .., 0] of length number_of_classes. Default: False

  • one_hot_decode_features (bool = False) – Turn one hot encoded feature vectors into a scalar - e.g., feature vector [0, 0, 1] would be turned into a predicate <feature_name>_2. Default: False

  • one_hot_decode_edge_features (bool = False) – Turn one hot encoded edge feature vectors into a scalar - e.g., edge feature vector [0, 0, 1] would be turned into a predicate <edge_name>_2. Default: False

  • number_of_classes (int) – Specifies the number of classes for converting numerical labels to one hot encoded vectors. Default: 1

  • feature_name (str) – Specify the node feature predicate name used for converting into the logic format. Default: "node_feature"

  • edge_name (str) – Specify the edge predicate name used for converting into the logic format. Default: "edge"

  • output_name (str) – Specify the output predicate name used for converting into the logic format. Default: "predict"

class CSVFile(relation_name: str, csv_source: TextIO | Path, sep=',', value_column: str | int | None = None, default_value: float | int | None = None, value_mapper: Callable | None = None, term_columns: Sequence[str | int] | None = None, header: bool = False, skip_rows: int = 0, n_rows: int | None = None, replace_empty_column: str | float | int = 0)[source]
class CSVDataset(csv_files: List[CSVFile] | CSVFile, csv_queries: CSVFile | None = None, mode: Mode = Mode.ONE_EXAMPLE)[source]
class Mode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
class DBSource(relation_name: str, table_name: str, term_columns: List[str], value_column: str | None = None, default_value: float | int = 1.0, value_mapper: Callable | None = None, skip_rows: int = 0, n_rows: int | None = None, replace_empty_column: str | float | int = 0, sep=',')[source]
class DBDataset(connection, db_sources: List[DBSource] | DBSource, queries_db_source: DBSource | None = None, mode: Mode = Mode.ONE_EXAMPLE)[source]