classes package

Submodules

classes.abstract_importer module

class classes.abstract_importer.AbstractImporter(file_path: str = None, concatenated_samples: Union[pandas.core.frame.DataFrame, numpy.ndarray] = None, variables: pandas.core.frame.DataFrame = None, prior_net_structure: pandas.core.frame.DataFrame = None)

Bases: abc.ABC

Abstract class that exposes all the necessary methods to process the trajectories and the net structure.

Parameters
  • file_path (str) – the file path, or dataset name if you import already processed data

  • concatenated_samples (typing.Union[pandas.DataFrame, numpy.ndarray]) – Dataframe or numpy array containing the concatenation of all the processed trajectories

  • variables (pandas.DataFrame) – Dataframe containing the nodes labels and cardinalities

Prior_net_structure

Dataframe containing the structure of the network (edges)

_sorter

A list containing the variables labels in the SAME order as the columns in concatenated_samples

Warning

The parameters variables and prior_net_structure HAVE to be properly constructed as Pandas Dataframes with the following structure: Header of _df_structure = [From_Node | To_Node] Header of _df_variables = [Variable_Label | Variable_Cardinality] See the tutorial on how to construct a correct concatenated_samples Dataframe/ndarray.

Note

See :class:JsonImporter for an example implementation

build_list_of_samples_array(concatenated_sample: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → List

Builds a List containing the the delta times numpy array, and the complete transitions matrix

Parameters

concatenated_sample (typing.Union[pandas.Dataframe, numpy.ndarray]) – the dataframe/array from which the time, and transitions matrix have to be extracted and converted

Returns

the resulting list of numpy arrays

Return type

List

abstract build_sorter(sample_frame: pandas.core.frame.DataFrame) → List

Initializes the _sorter class member from a trajectory dataframe, exctracting the header of the frame and keeping ONLY the variables symbolic labels, cutting out the time label in the header.

Parameters

sample_frame (pandas.DataFrame) – The dataframe from which extract the header

Returns

A list containing the processed header.

Return type

List

clear_concatenated_frame() → None

Removes all values in the dataframe concatenated_samples.

compute_row_delta_in_all_samples_frames(df_samples_list: List) → None

Calls the method compute_row_delta_sigle_samples_frame on every dataframe present in the list df_samples_list. Concatenates the result in the dataframe concatanated_samples

Parameters

df_samples_list (List) – the datframe’s list to be processed and concatenated

Warning

The Dataframe sample_frame has to follow the column structure of this header: Header of sample_frame = [Time | Variable values] The class member self._sorter HAS to be properly INITIALIZED (See class members definition doc)

Note

After the call of this method the class member concatanated_samples will contain all processed and merged trajectories

compute_row_delta_sigle_samples_frame(sample_frame: pandas.core.frame.DataFrame, columns_header: List, shifted_cols_header: List) → pandas.core.frame.DataFrame

Computes the difference between each value present in th time column. Copies and shift by one position up all the values present in the remaining columns.

Parameters
  • sample_frame (pandas.Dataframe) – the traj to be processed

  • columns_header (List) – the original header of sample_frame

  • shifted_cols_header (List) – a copy of columns_header with changed names of the contents

Returns

The processed dataframe

Return type

pandas.Dataframe

Warning

the Dataframe sample_frame has to follow the column structure of this header: Header of sample_frame = [Time | Variable values]

property concatenated_samples
abstract dataset_id() → object

If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset

property file_path
property sorter
property structure
property variables

classes.cache module

class classes.cache.Cache

Bases: object

This class acts as a cache of SetOfCims objects for a node.

_list_of_sets_of_parents

a list of Sets objects of the parents to which the cim in cache at SAME index is related

_actual_cache

a list of setOfCims objects

clear() → None

Clear the contents both of _actual_cache and _list_of_sets_of_parents.

find(parents_comb: Set)classes.set_of_cims.SetOfCims

Tries to find in cache given the symbolic parents combination parents_comb the SetOfCims related to that parents_comb.

Parameters

parents_comb (Set) – the parents related to that SetOfCims

Returns

A SetOfCims object if the parents_comb index is found in _list_of_sets_of_parents. None otherwise.

Return type

SetOfCims

put(parents_comb: Set, socim: classes.set_of_cims.SetOfCims) → None

Place in cache the SetOfCims object, and the related symbolic index parents_comb in _list_of_sets_of_parents.

Parameters
  • parents_comb (Set) – the symbolic set index

  • socim (SetOfCims) – the related SetOfCims object

classes.conditional_intensity_matrix module

class classes.conditional_intensity_matrix.ConditionalIntensityMatrix(state_residence_times: numpy.array, state_transition_matrix: numpy.array)

Bases: object

Abstracts the Conditional Intesity matrix of a node as aggregation of the state residence times vector and state transition matrix and the actual CIM matrix.

Parameters
  • state_residence_times (numpy.array) – state residence times vector

  • state_transition_matrix (numpy.ndArray) – the transitions count matrix

_cim

the actual cim of the node

property cim
compute_cim_coefficients() → None

Compute the coefficients of the matrix _cim by using the following equality q_xx’ = M[x, x’] / T[x]. The class member _cim will contain the computed cim

property state_residence_times
property state_transition_matrix

classes.json_importer module

class classes.json_importer.JsonImporter(file_path: str, samples_label: str, structure_label: str, variables_label: str, time_key: str, variables_key: str)

Bases: classes.abstract_importer.AbstractImporter

Implements the abstracts methods of AbstractImporter and adds all the necessary methods to process and prepare the data in json extension.

Parameters
  • file_path (string) – the path of the file that contains tha data to be imported

  • samples_label (string) – the reference key for the samples in the trajectories

  • structure_label (string) – the reference key for the structure of the network data

  • variables_label (string) – the reference key for the cardinalites of the nodes data

  • time_key (string) – the key used to identify the timestamps in each trajectory

  • variables_key (string) – the key used to identify the names of the variables in the net

_array_indx

the index of the outer JsonArray to extract the data from

_df_samples_list

a Dataframe list in which every dataframe contains a trajectory

_raw_data

The raw contents of the json file to import

build_sorter(sample_frame: pandas.core.frame.DataFrame) → List

Implements the abstract method build_sorter of the AbstractImporter for this dataset.

clear_data_frame_list() → None

Removes all values present in the dataframes in the list _df_samples_list.

dataset_id() → object

If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset

import_data(indx: int) → None

Implements the abstract method of AbstractImporter.

Parameters

indx (int) – the index of the outer JsonArray to extract the data from

import_sampled_cims(raw_data: List, indx: int, cims_key: str) → Dict

Imports the synthetic CIMS in the dataset in a dictionary, using variables labels as keys for the set of CIMS of a particular node.

Parameters
  • raw_data (List) – List of Dicts

  • indx (int) – The index of the array from which the data have to be extracted

  • cims_key (string) – the key where the json object cims are placed

Returns

a dictionary containing the sampled CIMS for all the variables in the net

Return type

Dictionary

import_structure(raw_data: List) → pandas.core.frame.DataFrame

Imports in a dataframe the data in the list raw_data at the key _structure_label

Parameters

raw_data (List) – List of Dicts

Returns

Dataframe containg the starting node a ending node of every arc of the network

Return type

pandas.Dataframe

import_trajectories(raw_data: List) → List

Imports the trajectories from the list of dicts raw_data.

Parameters

raw_data (List) – List of Dicts

Returns

List of dataframes containing all the trajectories

Return type

List

import_variables(raw_data: List) → pandas.core.frame.DataFrame

Imports the data in raw_data at the key _variables_label.

Parameters

raw_data (List) – List of Dicts

Returns

Datframe containg the variables simbolic labels and their cardinalities

Return type

pandas.Dataframe

normalize_trajectories(raw_data: List, indx: int, trajectories_key: str) → List

Extracts the trajectories in raw_data at the index index at the key trajectories key.

Parameters
  • raw_data (List) – List of Dicts

  • indx (int) – The index of the array from which the data have to be extracted

  • trajectories_key (string) – the key of the trajectories objects

Returns

A list of daframes containg the trajectories

Return type

List

one_level_normalizing(raw_data: List, indx: int, key: str) → pandas.core.frame.DataFrame

Extracts the one-level nested data in the list raw_data at the index indx at the key key.

Parameters
  • raw_data (List) – List of Dicts

  • indx (int) – The index of the array from which the data have to be extracted

  • key (string) – the key for the Dicts from which exctract data

Returns

A normalized dataframe

Return type

pandas.Datframe

read_json_file() → List

Reads the JSON file in the path self.filePath.

Returns

The contents of the json file

Return type

List

classes.network_graph module

class classes.network_graph.NetworkGraph(graph_struct: classes.structure.Structure)

Bases: object

Abstracts the infos contained in the Structure class in the form of a directed graph. Has the task of creating all the necessary filtering and indexing structures for parameters estimation

Parameters

graph_struct (Structure) – the Structure object from which infos about the net will be extracted

_graph

directed graph

_aggregated_info_about_nodes_parents

a structure that contains all the necessary infos about every parents of the node of which all the indexing and filtering structures will be constructed.

_time_scalar_indexing_structure

the indexing structure for state res time estimation

_transition_scalar_indexing_structure

the indexing structure for transition computation

_time_filtering

the columns filtering structure used in the computation of the state res times

_transition_filtering

the columns filtering structure used in the computation of the transition from one state to another

_p_combs_structure

all the possible parents states combination for the node of interest

add_edges(list_of_edges: List) → None

Add the edges to the _graph contained in the list list_of_edges.

Parameters

list_of_edges (List) – the list containing of tuples containing the edges

add_nodes(list_of_nodes: List) → None

Adds the nodes to the _graph contained in the list of nodes list_of_nodes. Sets all the properties that identify a nodes (index, positional index, cardinality)

Parameters

list_of_nodes (List) – the nodes to add to _graph

static build_p_comb_structure_for_a_node(parents_values: List) → numpy.ndarray

Builds the combinatorial structure that contains the combinations of all the values contained in parents_values.

Parameters

parents_values (List) – the cardinalities of the nodes

Returns

A numpy matrix containing a grid of the combinations

Return type

numpy.ndArray

static build_time_columns_filtering_for_a_node(node_indx: int, p_indxs: List) → numpy.ndarray

Builds the necessary structure to filter the desired columns indicated by node_indx and p_indxs in the dataset. This structute will be used in the computation of the state res times. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :return: The filtering structure for times estimation :rtype: numpy.ndArray

static build_time_scalar_indexing_structure_for_a_node(node_states: int, parents_vals: List) → numpy.ndarray

Builds an indexing structure for the computation of state residence times values.

Parameters
  • node_states (int) – the node cardinality

  • parents_vals (List) – the caridinalites of the node’s parents

Returns

The time indexing structure

Return type

numpy.ndArray

static build_transition_filtering_for_a_node(node_indx: int, p_indxs: List, nodes_number: int) → numpy.ndarray

Builds the necessary structure to filter the desired columns indicated by node_indx and p_indxs in the dataset. This structure will be used in the computation of the state transitions values. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :param nodes_number: the total number of nodes in the dataset :type nodes_number: int :return: The filtering structure for transitions estimation :rtype: numpy.ndArray

static build_transition_scalar_indexing_structure_for_a_node(node_states_number: int, parents_vals: List) → numpy.ndarray

Builds an indexing structure for the computation of state transitions values.

Parameters
  • node_states_number (int) – the node cardinality

  • parents_vals (List) – the caridinalites of the node’s parents

Returns

The transition indexing structure

Return type

numpy.ndArray

clear_indexing_filtering_structures() → None

Initialize all the filtering/indexing structures.

property edges
fast_init(node_id: str) → None

Initializes all the necessary structures for parameters estimation of the node identified by the label node_id

Parameters

node_id (string) – the label of the node

get_node_indx(node_id) → int
get_ordered_by_indx_set_of_parents(node: str) → Tuple

Builds the aggregated structure that holds all the infos relative to the parent set of the node, namely (parents_labels, parents_indexes, parents_cardinalities).

Parameters

node (string) – the label of the node

Returns

a tuple containing all the parent set infos

Return type

Tuple

get_parents_by_id(node_id) → List

Returns a list of labels of the parents of the node node_id

Parameters

node_id (string) – the node label

Returns

a List of labels of the parents

Return type

List

get_positional_node_indx(node_id) → int
get_states_number(node_id) → int
property nodes
property nodes_indexes
property nodes_values
property p_combs
remove_node(node_id: str) → None

Remove the node node_id from all the class members. Initialize all the filtering/indexing structures.

property time_filtering
property time_scalar_indexing_strucure
property transition_filtering
property transition_scalar_indexing_structure

classes.parameters_estimator module

class classes.parameters_estimator.ParametersEstimator(trajectories: classes.trajectory.Trajectory, net_graph: classes.network_graph.NetworkGraph)

Bases: object

Has the task of computing the cims of particular node given the trajectories and the net structure in the graph _net_graph.

Parameters
_single_set_of_cims

the set of cims object that will hold the cims of the node

compute_parameters_for_node(node_id: str)classes.set_of_cims.SetOfCims

Compute the CIMS of the node identified by the label node_id.

Parameters

node_id (string) – the node label

Returns

A SetOfCims object filled with the computed CIMS

Return type

SetOfCims

static compute_state_res_time_for_node(times: numpy.ndarray, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexes_struct: numpy.ndarray, T: numpy.ndarray) → None

Compute the state residence times for a node and fill the matrix T with the results

Parameters
  • node_indx (int) – the index of the node

  • times (numpy.array) – the times deltas vector

  • trajectory (numpy.ndArray) – the trajectory

  • cols_filter (numpy.array) – the columns filtering structure

  • scalar_indexes_struct (numpy.array) – the indexing structure

  • T (numpy.ndArray) – the state residence times vectors

static compute_state_transitions_for_a_node(node_indx: int, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexing: numpy.ndarray, M: numpy.ndarray) → None

Compute the state residence times for a node and fill the matrices M with the results.

Parameters
  • node_indx (int) – the index of the node

  • trajectory (numpy.ndArray) – the trajectory

  • cols_filter (numpy.array) – the columns filtering structure

  • scalar_indexing (numpy.array) – the indexing structure

  • M (numpy.ndArray) – the state transitions matrices

fast_init(node_id: str) → None

Initializes all the necessary structures for the parameters estimation for the node node_id.

Parameters

node_id (string) – the node label

classes.sample_path module

class classes.sample_path.SamplePath(importer: classes.abstract_importer.AbstractImporter)

Bases: object

Aggregates all the informations about the trajectories, the real structure of the sampled net and variables cardinalites. Has the task of creating the objects Trajectory and Structure that will contain the mentioned data.

Parameters

importer (AbstractImporter) – the Importer object which contains the imported and processed data

_trajectories

the Trajectory object that will contain all the concatenated trajectories

_structure

the Structure Object that will contain all the structural infos about the net

_total_variables_count

the number of variables in the net

build_structure() → None

Builds the Structure object that aggregates all the infos about the net.

build_trajectories() → None

Builds the Trajectory object that will contain all the trajectories. Clears all the unused dataframes in _importer Object

property has_prior_net_structure
property structure
property total_variables_count
property trajectories

classes.set_of_cims module

class classes.set_of_cims.SetOfCims(node_id: str, parents_states_number: List, node_states_number: int, p_combs: numpy.ndarray)

Bases: object

Aggregates all the CIMS of the node identified by the label _node_id.

Parameters
  • node_id – the node label

  • parents_states_number (List) – the cardinalities of the parents

  • node_states_number (int) – the caridinality of the node

  • p_combs (numpy.ndArray) – the p_comb structure bound to this node

_state_residence_time

matrix containing all the state residence time vectors for the node

_transition_matrices

matrix containing all the transition matrices for the node

_actual_cims

the cims of the node

property actual_cims
build_cims(state_res_times: numpy.ndarray, transition_matrices: numpy.ndarray) → None

Build the ConditionalIntensityMatrix objects given the state residence times and transitions matrices. Compute the cim coefficients.The class member _actual_cims will contain the computed cims.

Parameters
  • state_res_times (numpy.ndArray) – the state residence times matrix

  • transition_matrices (numpy.ndArray) – the transition matrices

build_times_and_transitions_structures() → None

Initializes at the correct dimensions the state residence times matrix and the state transition matrices.

filter_cims_with_mask(mask_arr: numpy.ndarray, comb: List) → numpy.ndarray

Filter the cims contained in the array _actual_cims given the boolean mask mask_arr and the index comb.

Parameters
  • mask_arr (numpy.array) – the boolean mask that indicates which parent to consider

  • comb (numpy.array) – the state/s of the filtered parents

Returns

Array of ConditionalIntensityMatrix objects

Return type

numpy.array

get_cims_number()
property p_combs

classes.structure module

class classes.structure.Structure(nodes_labels_list: List, nodes_indexes_arr: numpy.ndarray, nodes_vals_arr: numpy.ndarray, edges_list: List, total_variables_number: int)

Bases: object

Contains all the infos about the network structure(nodes labels, nodes caridinalites, edges, indexes)

Parameters
  • nodes_labels_list (List) – the symbolic names of the variables

  • nodes_indexes_arr (numpy.ndArray) – the indexes of the nodes

  • nodes_vals_arr (numpy.ndArray) – the cardinalites of the nodes

  • edges_list (List) – the edges of the network

  • total_variables_number (int) – the total number of variables in the dataset

property edges
get_node_id(node_indx: int) → str

Given the node_index returns the node label.

Parameters

node_indx (int) – the node index

Returns

the node label

Return type

string

get_node_indx(node_id: str) → int

Given the node_index returns the node label.

Parameters

node_id (string) – the node label

Returns

the node index

Return type

int

get_positional_node_indx(node_id: str) → int
get_states_number(node: str) → int

Given the node label node returns the cardinality of the node.

Parameters

node (string) – the node label

Returns

the node cardinality

Return type

int

property nodes_indexes
property nodes_labels
property nodes_values
remove_node(node_id: str) → None

Remove the node node_id from all the class members. The class member _total_variables_number since it refers to the total number of variables in the dataset.

property total_variables_number

classes.structure_estimator module

class classes.structure_estimator.StructureEstimator(sample_path: classes.sample_path.SamplePath, exp_test_alfa: float, chi_test_alfa: float)

Bases: object

Has the task of estimating the network structure given the trajectories in samplepath.

Parameters
  • sample_path (SamplePath) – the _sample_path object containing the trajectories and the real structure

  • exp_test_alfa (float) – the significance level for the exponential Hp test

  • chi_test_alfa (float) – the significance level for the chi Hp test

_nodes

the nodes labels

_nodes_vals

the nodes cardinalities

_nodes_indxs

the nodes indexes

_complete_graph

the complete directed graph built using the nodes labels in _nodes

_cache

the Cache object

adjacency_matrix() → numpy.ndarray

Converts the estimated structure _complete_graph to a boolean adjacency matrix representation.

Returns

The adjacency matrix of the graph _complete_graph

Return type

numpy.ndArray

static build_complete_graph(node_ids: List) → networkx.classes.digraph.DiGraph

Builds a complete directed graph (no self loops) given the nodes labels in the list node_ids:

Parameters

node_ids (List) – the list of nodes labels

Returns

a complete Digraph Object

Return type

networkx.DiGraph

complete_test(test_parent: str, test_child: str, parent_set: List, child_states_numb: int, tot_vars_count: int) → bool

Performs a complete independence test on the directed graphs G1 = {test_child U parent_set} G2 = {G1 U test_parent} (added as an additional parent of the test_child). Generates all the necessary structures and datas to perform the tests.

Parameters
  • test_parent (string) – the node label of the test parent

  • test_child (string) – the node label of the child

  • parent_set (List) – the common parent set

  • child_states_numb (int) – the cardinality of the test_child

  • tot_vars_count (int) – the total number of variables in the net

Returns

True iff test_child and test_parent are independent given the sep_set parent_set. False otherwise

Return type

bool

ctpc_algorithm() → None

Compute the CTPC algorithm over the entire net.

static generate_possible_sub_sets_of_size(u: List, size: int, parent_label: str) → Iterator

Creates a list containing all possible subsets of the list u of size size, that do not contains a the node identified by parent_label.

Parameters
  • u (List) – the list of nodes

  • size (int) – the size of the subsets

  • parent_label (string) – the node to exclude in the subsets generation

Returns

an Iterator Object containing a list of lists

Return type

Iterator

independence_test(child_states_numb: int, cim1: classes.conditional_intensity_matrix.ConditionalIntensityMatrix, cim2: classes.conditional_intensity_matrix.ConditionalIntensityMatrix) → bool

Compute the actual independence test using two cims. It is performed first the exponential test and if the null hypothesis is not rejected, it is performed also the chi_test.

Parameters
Returns

True iff both tests do NOT reject the null hypothesis of independence. False otherwise.

Return type

bool

one_iteration_of_CTPC_algorithm(var_id: str) → None

Performs an iteration of the CTPC algorithm using the node var_id as test_child.

Parameters

var_id (string) – the node label of the test child

save_plot_estimated_structure_graph() → None

Plot the estimated structure in a graphical model style. Spurious edges are colored in red.

save_results() → None

Save the estimated Structure to a .json file in the path where the data are loaded from. The file is named as the input dataset but the results_ word is appended to the results file.

spurious_edges() → List
Return the spurious edges present in the estimated structure, if a prior net structure is present in

_sample_path.structure.

Returns

A list containing the spurious edges

Return type

List

classes.trajectory module

class classes.trajectory.Trajectory(list_of_columns: List, original_cols_number: int)

Bases: object

Abstracts the infos about a complete set of trajectories, represented as a numpy array of doubles (the time deltas) and a numpy matrix of ints (the changes of states).

Parameters
  • list_of_columns (List) – the list containing the times array and values matrix

  • original_cols_number (int) – total number of cols in the data

_actual_trajectory

the trajectory containing also the duplicated/shifted values

_times

the array containing the time deltas

property complete_trajectory
size()
property times
property trajectory

Module contents