classes package¶

Submodules¶

classes.abstract_importer module¶

class classes.abstract_importer.AbstractImporter(file_path: str = None, concatenated_samples: Union[pandas.core.frame.DataFrame, numpy.ndarray] = None, variables: pandas.core.frame.DataFrame = None, prior_net_structure: pandas.core.frame.DataFrame = None)¶

Bases: abc.ABC

Abstract class that exposes all the necessary methods to process the trajectories and the net structure.

Parameters

file_path (str) – the file path, or dataset name if you import already processed data
concatenated_samples (typing.Union[pandas.DataFrame, numpy.ndarray]) – Dataframe or numpy array containing the concatenation of all the processed trajectories
variables (pandas.DataFrame) – Dataframe containing the nodes labels and cardinalities

Prior_net_structure

Dataframe containing the structure of the network (edges)

_sorter

A list containing the variables labels in the SAME order as the columns in concatenated_samples

Warning

The parameters variables and prior_net_structure HAVE to be properly constructed as Pandas Dataframes with the following structure: Header of _df_structure = [From_Node | To_Node] Header of _df_variables = [Variable_Label | Variable_Cardinality] See the tutorial on how to construct a correct concatenated_samples Dataframe/ndarray.

Note

See :class:JsonImporter for an example implementation

build_list_of_samples_array(concatenated_sample: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → List¶

Builds a List containing the the delta times numpy array, and the complete transitions matrix

Parameters: concatenated_sample (typing.Union[pandas.Dataframe, numpy.ndarray]) – the dataframe/array from which the time, and transitions matrix have to be extracted and converted
Returns: the resulting list of numpy arrays
Return type: List

abstract build_sorter(sample_frame: pandas.core.frame.DataFrame) → List¶

Initializes the _sorter class member from a trajectory dataframe, exctracting the header of the frame and keeping ONLY the variables symbolic labels, cutting out the time label in the header.

Parameters: sample_frame (pandas.DataFrame) – The dataframe from which extract the header
Returns: A list containing the processed header.
Return type: List

clear_concatenated_frame() → None¶: Removes all values in the dataframe concatenated_samples.

compute_row_delta_in_all_samples_frames(df_samples_list: List) → None¶

Calls the method compute_row_delta_sigle_samples_frame on every dataframe present in the list df_samples_list. Concatenates the result in the dataframe concatanated_samples

Parameters: df_samples_list (List) – the datframe’s list to be processed and concatenated

Warning

The Dataframe sample_frame has to follow the column structure of this header: Header of sample_frame = [Time | Variable values] The class member self._sorter HAS to be properly INITIALIZED (See class members definition doc)

Note

After the call of this method the class member concatanated_samples will contain all processed and merged trajectories

compute_row_delta_sigle_samples_frame(sample_frame: pandas.core.frame.DataFrame, columns_header: List, shifted_cols_header: List) → pandas.core.frame.DataFrame¶

Computes the difference between each value present in th time column. Copies and shift by one position up all the values present in the remaining columns.

Parameters

sample_frame (pandas.Dataframe) – the traj to be processed
columns_header (List) – the original header of sample_frame
shifted_cols_header (List) – a copy of columns_header with changed names of the contents

Returns

The processed dataframe

Return type

pandas.Dataframe

Warning

the Dataframe sample_frame has to follow the column structure of this header: Header of sample_frame = [Time | Variable values]

property concatenated_samples¶

abstract dataset_id() → object¶: If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset

property file_path¶

property sorter¶

property structure¶

property variables¶

classes.cache module¶

class classes.cache.Cache¶

Bases: object

This class acts as a cache of SetOfCims objects for a node.

_list_of_sets_of_parents: a list of Sets objects of the parents to which the cim in cache at SAME index is related
_actual_cache: a list of setOfCims objects

clear() → None¶: Clear the contents both of _actual_cache and _list_of_sets_of_parents.

find(parents_comb: Set) → classes.set_of_cims.SetOfCims ¶

Tries to find in cache given the symbolic parents combination parents_comb the SetOfCims related to that parents_comb.

Parameters: parents_comb (Set) – the parents related to that SetOfCims
Returns: A SetOfCims object if the parents_comb index is found in _list_of_sets_of_parents. None otherwise.
Return type: SetOfCims

put(parents_comb: Set, socim: classes.set_of_cims.SetOfCims) → None¶

Place in cache the SetOfCims object, and the related symbolic index parents_comb in _list_of_sets_of_parents.

Parameters

parents_comb (Set) – the symbolic set index
socim (SetOfCims) – the related SetOfCims object

classes.conditional_intensity_matrix module¶

class classes.conditional_intensity_matrix.ConditionalIntensityMatrix(state_residence_times: numpy.array, state_transition_matrix: numpy.array)¶

Bases: object

Abstracts the Conditional Intesity matrix of a node as aggregation of the state residence times vector and state transition matrix and the actual CIM matrix.

Parameters

state_residence_times (numpy.array) – state residence times vector
state_transition_matrix (numpy.ndArray) – the transitions count matrix

_cim

the actual cim of the node

property cim¶

compute_cim_coefficients() → None¶: Compute the coefficients of the matrix _cim by using the following equality q_xx’ = M[x, x’] / T[x]. The class member _cim will contain the computed cim

property state_residence_times¶

property state_transition_matrix¶

classes.json_importer module¶

class classes.json_importer.JsonImporter(file_path: str, samples_label: str, structure_label: str, variables_label: str, time_key: str, variables_key: str)¶

Bases: classes.abstract_importer.AbstractImporter

Implements the abstracts methods of AbstractImporter and adds all the necessary methods to process and prepare the data in json extension.

Parameters

file_path (string) – the path of the file that contains tha data to be imported
samples_label (string) – the reference key for the samples in the trajectories
structure_label (string) – the reference key for the structure of the network data
variables_label (string) – the reference key for the cardinalites of the nodes data
time_key (string) – the key used to identify the timestamps in each trajectory
variables_key (string) – the key used to identify the names of the variables in the net

_array_indx

the index of the outer JsonArray to extract the data from

_df_samples_list

a Dataframe list in which every dataframe contains a trajectory

_raw_data

The raw contents of the json file to import

build_sorter(sample_frame: pandas.core.frame.DataFrame) → List¶: Implements the abstract method build_sorter of the AbstractImporter for this dataset.

clear_data_frame_list() → None¶: Removes all values present in the dataframes in the list _df_samples_list.

dataset_id() → object¶: If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset

import_data(indx: int) → None¶

Implements the abstract method of AbstractImporter.

Parameters: indx (int) – the index of the outer JsonArray to extract the data from

import_sampled_cims(raw_data: List, indx: int, cims_key: str) → Dict¶

Imports the synthetic CIMS in the dataset in a dictionary, using variables labels as keys for the set of CIMS of a particular node.

Parameters

raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
cims_key (string) – the key where the json object cims are placed

Returns

a dictionary containing the sampled CIMS for all the variables in the net

Return type

Dictionary

import_structure(raw_data: List) → pandas.core.frame.DataFrame¶

Imports in a dataframe the data in the list raw_data at the key _structure_label

Parameters: raw_data (List) – List of Dicts
Returns: Dataframe containg the starting node a ending node of every arc of the network
Return type: pandas.Dataframe

import_trajectories(raw_data: List) → List¶

Imports the trajectories from the list of dicts raw_data.

Parameters: raw_data (List) – List of Dicts
Returns: List of dataframes containing all the trajectories
Return type: List

import_variables(raw_data: List) → pandas.core.frame.DataFrame¶

Imports the data in raw_data at the key _variables_label.

Parameters: raw_data (List) – List of Dicts
Returns: Datframe containg the variables simbolic labels and their cardinalities
Return type: pandas.Dataframe

normalize_trajectories(raw_data: List, indx: int, trajectories_key: str) → List¶

Extracts the trajectories in raw_data at the index index at the key trajectories key.

Parameters

raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
trajectories_key (string) – the key of the trajectories objects

Returns

A list of daframes containg the trajectories

Return type

List

one_level_normalizing(raw_data: List, indx: int, key: str) → pandas.core.frame.DataFrame¶

Extracts the one-level nested data in the list raw_data at the index indx at the key key.

Parameters

raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
key (string) – the key for the Dicts from which exctract data

Returns

A normalized dataframe

Return type

pandas.Datframe

read_json_file() → List¶

Reads the JSON file in the path self.filePath.

Returns: The contents of the json file
Return type: List

classes.network_graph module¶

class classes.network_graph.NetworkGraph(graph_struct: classes.structure.Structure)¶

Bases: object

Abstracts the infos contained in the Structure class in the form of a directed graph. Has the task of creating all the necessary filtering and indexing structures for parameters estimation

Parameters: graph_struct (Structure) – the Structure object from which infos about the net will be extracted
_graph: directed graph
_aggregated_info_about_nodes_parents: a structure that contains all the necessary infos about every parents of the node of which all the indexing and filtering structures will be constructed.
_time_scalar_indexing_structure: the indexing structure for state res time estimation
_transition_scalar_indexing_structure: the indexing structure for transition computation
_time_filtering: the columns filtering structure used in the computation of the state res times
_transition_filtering: the columns filtering structure used in the computation of the transition from one state to another
_p_combs_structure: all the possible parents states combination for the node of interest

add_edges(list_of_edges: List) → None¶

Add the edges to the _graph contained in the list list_of_edges.

Parameters: list_of_edges (List) – the list containing of tuples containing the edges

add_nodes(list_of_nodes: List) → None¶

Adds the nodes to the _graph contained in the list of nodes list_of_nodes. Sets all the properties that identify a nodes (index, positional index, cardinality)

Parameters: list_of_nodes (List) – the nodes to add to _graph

static build_p_comb_structure_for_a_node(parents_values: List) → numpy.ndarray¶

Builds the combinatorial structure that contains the combinations of all the values contained in parents_values.

Parameters: parents_values (List) – the cardinalities of the nodes
Returns: A numpy matrix containing a grid of the combinations
Return type: numpy.ndArray

static build_time_columns_filtering_for_a_node(node_indx: int, p_indxs: List) → numpy.ndarray¶: Builds the necessary structure to filter the desired columns indicated by node_indx and p_indxs in the dataset. This structute will be used in the computation of the state res times. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :return: The filtering structure for times estimation :rtype: numpy.ndArray

static build_time_scalar_indexing_structure_for_a_node(node_states: int, parents_vals: List) → numpy.ndarray¶

Builds an indexing structure for the computation of state residence times values.

Parameters

node_states (int) – the node cardinality
parents_vals (List) – the caridinalites of the node’s parents

Returns

The time indexing structure

Return type

numpy.ndArray

static build_transition_filtering_for_a_node(node_indx: int, p_indxs: List, nodes_number: int) → numpy.ndarray¶: Builds the necessary structure to filter the desired columns indicated by node_indx and p_indxs in the dataset. This structure will be used in the computation of the state transitions values. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :param nodes_number: the total number of nodes in the dataset :type nodes_number: int :return: The filtering structure for transitions estimation :rtype: numpy.ndArray

static build_transition_scalar_indexing_structure_for_a_node(node_states_number: int, parents_vals: List) → numpy.ndarray¶

Builds an indexing structure for the computation of state transitions values.

Parameters

node_states_number (int) – the node cardinality
parents_vals (List) – the caridinalites of the node’s parents

Returns

The transition indexing structure

Return type

numpy.ndArray

clear_indexing_filtering_structures() → None¶: Initialize all the filtering/indexing structures.

property edges¶

fast_init(node_id: str) → None¶

Initializes all the necessary structures for parameters estimation of the node identified by the label node_id

Parameters: node_id (string) – the label of the node

get_node_indx(node_id) → int¶

get_ordered_by_indx_set_of_parents(node: str) → Tuple¶

Builds the aggregated structure that holds all the infos relative to the parent set of the node, namely (parents_labels, parents_indexes, parents_cardinalities).

Parameters: node (string) – the label of the node
Returns: a tuple containing all the parent set infos
Return type: Tuple

get_parents_by_id(node_id) → List¶

Returns a list of labels of the parents of the node node_id

Parameters: node_id (string) – the node label
Returns: a List of labels of the parents
Return type: List

get_positional_node_indx(node_id) → int¶

get_states_number(node_id) → int¶

property nodes¶

property nodes_indexes¶

property nodes_values¶

property p_combs¶

remove_node(node_id: str) → None¶: Remove the node node_id from all the class members. Initialize all the filtering/indexing structures.

property time_filtering¶

property time_scalar_indexing_strucure¶

property transition_filtering¶

property transition_scalar_indexing_structure¶

classes.parameters_estimator module¶

class classes.parameters_estimator.ParametersEstimator(trajectories: classes.trajectory.Trajectory, net_graph: classes.network_graph.NetworkGraph)¶

Bases: object

Has the task of computing the cims of particular node given the trajectories and the net structure in the graph _net_graph.

Parameters

trajectories (Trajectory) – the trajectories
net_graph (NetworkGraph) – the net structure

_single_set_of_cims

the set of cims object that will hold the cims of the node

compute_parameters_for_node(node_id: str) → classes.set_of_cims.SetOfCims ¶

Compute the CIMS of the node identified by the label node_id.

Parameters: node_id (string) – the node label
Returns: A SetOfCims object filled with the computed CIMS
Return type: SetOfCims

static compute_state_res_time_for_node(times: numpy.ndarray, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexes_struct: numpy.ndarray, T: numpy.ndarray) → None¶

Compute the state residence times for a node and fill the matrix T with the results

Parameters

node_indx (int) – the index of the node
times (numpy.array) – the times deltas vector
trajectory (numpy.ndArray) – the trajectory
cols_filter (numpy.array) – the columns filtering structure
scalar_indexes_struct (numpy.array) – the indexing structure
T (numpy.ndArray) – the state residence times vectors

static compute_state_transitions_for_a_node(node_indx: int, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexing: numpy.ndarray, M: numpy.ndarray) → None¶

Compute the state residence times for a node and fill the matrices M with the results.

Parameters

node_indx (int) – the index of the node
trajectory (numpy.ndArray) – the trajectory
cols_filter (numpy.array) – the columns filtering structure
scalar_indexing (numpy.array) – the indexing structure
M (numpy.ndArray) – the state transitions matrices

fast_init(node_id: str) → None¶

Initializes all the necessary structures for the parameters estimation for the node node_id.

Parameters: node_id (string) – the node label

classes.sample_path module¶

class classes.sample_path.SamplePath(importer: classes.abstract_importer.AbstractImporter)¶

Bases: object

Aggregates all the informations about the trajectories, the real structure of the sampled net and variables cardinalites. Has the task of creating the objects Trajectory and Structure that will contain the mentioned data.

Parameters: importer (AbstractImporter) – the Importer object which contains the imported and processed data
_trajectories: the Trajectory object that will contain all the concatenated trajectories
_structure: the Structure Object that will contain all the structural infos about the net
_total_variables_count: the number of variables in the net

build_structure() → None¶: Builds the Structure object that aggregates all the infos about the net.

build_trajectories() → None¶: Builds the Trajectory object that will contain all the trajectories. Clears all the unused dataframes in _importer Object

property has_prior_net_structure¶

property structure¶

property total_variables_count¶

property trajectories¶

classes.set_of_cims module¶

class classes.set_of_cims.SetOfCims(node_id: str, parents_states_number: List, node_states_number: int, p_combs: numpy.ndarray)¶

Bases: object

Aggregates all the CIMS of the node identified by the label _node_id.

Parameters

node_id – the node label
parents_states_number (List) – the cardinalities of the parents
node_states_number (int) – the caridinality of the node
p_combs (numpy.ndArray) – the p_comb structure bound to this node

_state_residence_time

matrix containing all the state residence time vectors for the node

_transition_matrices

matrix containing all the transition matrices for the node

_actual_cims

the cims of the node

property actual_cims¶

build_cims(state_res_times: numpy.ndarray, transition_matrices: numpy.ndarray) → None¶

Build the ConditionalIntensityMatrix objects given the state residence times and transitions matrices. Compute the cim coefficients.The class member _actual_cims will contain the computed cims.

Parameters

state_res_times (numpy.ndArray) – the state residence times matrix
transition_matrices (numpy.ndArray) – the transition matrices

build_times_and_transitions_structures() → None¶: Initializes at the correct dimensions the state residence times matrix and the state transition matrices.

filter_cims_with_mask(mask_arr: numpy.ndarray, comb: List) → numpy.ndarray¶

Filter the cims contained in the array _actual_cims given the boolean mask mask_arr and the index comb.

Parameters

mask_arr (numpy.array) – the boolean mask that indicates which parent to consider
comb (numpy.array) – the state/s of the filtered parents

Returns

Array of ConditionalIntensityMatrix objects

Return type

numpy.array

get_cims_number()¶

property p_combs¶

classes.structure module¶

class classes.structure.Structure(nodes_labels_list: List, nodes_indexes_arr: numpy.ndarray, nodes_vals_arr: numpy.ndarray, edges_list: List, total_variables_number: int)¶

Bases: object

Contains all the infos about the network structure(nodes labels, nodes caridinalites, edges, indexes)

Parameters

nodes_labels_list (List) – the symbolic names of the variables
nodes_indexes_arr (numpy.ndArray) – the indexes of the nodes
nodes_vals_arr (numpy.ndArray) – the cardinalites of the nodes
edges_list (List) – the edges of the network
total_variables_number (int) – the total number of variables in the dataset

property edges¶

get_node_id(node_indx: int) → str¶

Given the node_index returns the node label.

Parameters: node_indx (int) – the node index
Returns: the node label
Return type: string

get_node_indx(node_id: str) → int¶

Given the node_index returns the node label.

Parameters: node_id (string) – the node label
Returns: the node index
Return type: int

get_positional_node_indx(node_id: str) → int¶

get_states_number(node: str) → int¶

Given the node label node returns the cardinality of the node.

Parameters: node (string) – the node label
Returns: the node cardinality
Return type: int

property nodes_indexes¶

property nodes_labels¶

property nodes_values¶

remove_node(node_id: str) → None¶: Remove the node node_id from all the class members. The class member _total_variables_number since it refers to the total number of variables in the dataset.

property total_variables_number¶

classes.structure_estimator module¶

class classes.structure_estimator.StructureEstimator(sample_path: classes.sample_path.SamplePath, exp_test_alfa: float, chi_test_alfa: float)¶

Bases: object

Has the task of estimating the network structure given the trajectories in samplepath.

Parameters

sample_path (SamplePath) – the _sample_path object containing the trajectories and the real structure
exp_test_alfa (float) – the significance level for the exponential Hp test
chi_test_alfa (float) – the significance level for the chi Hp test

_nodes

the nodes labels

_nodes_vals

the nodes cardinalities

_nodes_indxs

the nodes indexes

_complete_graph

the complete directed graph built using the nodes labels in _nodes

_cache

the Cache object

adjacency_matrix() → numpy.ndarray¶

Converts the estimated structure _complete_graph to a boolean adjacency matrix representation.

Returns: The adjacency matrix of the graph _complete_graph
Return type: numpy.ndArray

static build_complete_graph(node_ids: List) → networkx.classes.digraph.DiGraph¶

Builds a complete directed graph (no self loops) given the nodes labels in the list node_ids:

Parameters: node_ids (List) – the list of nodes labels
Returns: a complete Digraph Object
Return type: networkx.DiGraph

complete_test(test_parent: str, test_child: str, parent_set: List, child_states_numb: int, tot_vars_count: int) → bool¶

Performs a complete independence test on the directed graphs G1 = {test_child U parent_set} G2 = {G1 U test_parent} (added as an additional parent of the test_child). Generates all the necessary structures and datas to perform the tests.

Parameters

test_parent (string) – the node label of the test parent
test_child (string) – the node label of the child
parent_set (List) – the common parent set
child_states_numb (int) – the cardinality of the test_child
tot_vars_count (int) – the total number of variables in the net

Returns

True iff test_child and test_parent are independent given the sep_set parent_set. False otherwise

Return type

bool

ctpc_algorithm() → None¶: Compute the CTPC algorithm over the entire net.

static generate_possible_sub_sets_of_size(u: List, size: int, parent_label: str) → Iterator¶

Creates a list containing all possible subsets of the list u of size size, that do not contains a the node identified by parent_label.

Parameters

u (List) – the list of nodes
size (int) – the size of the subsets
parent_label (string) – the node to exclude in the subsets generation

Returns

an Iterator Object containing a list of lists

Return type

Iterator

independence_test(child_states_numb: int, cim1: classes.conditional_intensity_matrix.ConditionalIntensityMatrix, cim2: classes.conditional_intensity_matrix.ConditionalIntensityMatrix) → bool¶

Compute the actual independence test using two cims. It is performed first the exponential test and if the null hypothesis is not rejected, it is performed also the chi_test.

Parameters

child_states_numb (int) – the cardinality of the test child
cim1 (ConditionalIntensityMatrix) – a cim belonging to the graph without test parent
cim2 (ConditionalIntensityMatrix) – a cim belonging to the graph with test parent

Returns

True iff both tests do NOT reject the null hypothesis of independence. False otherwise.

Return type

bool

one_iteration_of_CTPC_algorithm(var_id: str) → None¶

Performs an iteration of the CTPC algorithm using the node var_id as test_child.

Parameters: var_id (string) – the node label of the test child

save_plot_estimated_structure_graph() → None¶: Plot the estimated structure in a graphical model style. Spurious edges are colored in red.

save_results() → None¶: Save the estimated Structure to a .json file in the path where the data are loaded from. The file is named as the input dataset but the results_ word is appended to the results file.

spurious_edges() → List¶

Return the spurious edges present in the estimated structure, if a prior net structure is present in: _sample_path.structure.

Returns: A list containing the spurious edges
Return type: List

classes.trajectory module¶

class classes.trajectory.Trajectory(list_of_columns: List, original_cols_number: int)¶

Bases: object

Abstracts the infos about a complete set of trajectories, represented as a numpy array of doubles (the time deltas) and a numpy matrix of ints (the changes of states).

Parameters

list_of_columns (List) – the list containing the times array and values matrix
original_cols_number (int) – total number of cols in the data

_actual_trajectory

the trajectory containing also the duplicated/shifted values

_times

the array containing the time deltas

property complete_trajectory¶

size()¶

property times¶

property trajectory¶

classes package¶

Submodules¶

classes.abstract_importer module¶

classes.cache module¶

classes.conditional_intensity_matrix module¶

classes.json_importer module¶

classes.network_graph module¶

classes.parameters_estimator module¶

classes.sample_path module¶

classes.set_of_cims module¶

classes.structure module¶

classes.structure_estimator module¶

classes.trajectory module¶

Module contents¶