classes package¶
Submodules¶
classes.abstract_importer module¶
-
class
classes.abstract_importer.
AbstractImporter
(file_path: str = None, concatenated_samples: Union[pandas.core.frame.DataFrame, numpy.ndarray] = None, variables: pandas.core.frame.DataFrame = None, prior_net_structure: pandas.core.frame.DataFrame = None)¶ Bases:
abc.ABC
Abstract class that exposes all the necessary methods to process the trajectories and the net structure.
- Parameters
file_path (str) – the file path, or dataset name if you import already processed data
concatenated_samples (typing.Union[pandas.DataFrame, numpy.ndarray]) – Dataframe or numpy array containing the concatenation of all the processed trajectories
variables (pandas.DataFrame) – Dataframe containing the nodes labels and cardinalities
- Prior_net_structure
Dataframe containing the structure of the network (edges)
- _sorter
A list containing the variables labels in the SAME order as the columns in
concatenated_samples
Warning
The parameters
variables
andprior_net_structure
HAVE to be properly constructed as Pandas Dataframes with the following structure: Header of _df_structure = [From_Node | To_Node] Header of _df_variables = [Variable_Label | Variable_Cardinality] See the tutorial on how to construct a correctconcatenated_samples
Dataframe/ndarray.Note
See :class:
JsonImporter
for an example implementation-
build_list_of_samples_array
(concatenated_sample: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → List¶ Builds a List containing the the delta times numpy array, and the complete transitions matrix
- Parameters
concatenated_sample (typing.Union[pandas.Dataframe, numpy.ndarray]) – the dataframe/array from which the time, and transitions matrix have to be extracted and converted
- Returns
the resulting list of numpy arrays
- Return type
List
-
abstract
build_sorter
(sample_frame: pandas.core.frame.DataFrame) → List¶ Initializes the
_sorter
class member from a trajectory dataframe, exctracting the header of the frame and keeping ONLY the variables symbolic labels, cutting out the time label in the header.- Parameters
sample_frame (pandas.DataFrame) – The dataframe from which extract the header
- Returns
A list containing the processed header.
- Return type
List
-
clear_concatenated_frame
() → None¶ Removes all values in the dataframe concatenated_samples.
-
compute_row_delta_in_all_samples_frames
(df_samples_list: List) → None¶ Calls the method
compute_row_delta_sigle_samples_frame
on every dataframe present in the listdf_samples_list
. Concatenates the result in the dataframeconcatanated_samples
- Parameters
df_samples_list (List) – the datframe’s list to be processed and concatenated
Warning
The Dataframe sample_frame has to follow the column structure of this header: Header of sample_frame = [Time | Variable values] The class member self._sorter HAS to be properly INITIALIZED (See class members definition doc)
Note
After the call of this method the class member
concatanated_samples
will contain all processed and merged trajectories
-
compute_row_delta_sigle_samples_frame
(sample_frame: pandas.core.frame.DataFrame, columns_header: List, shifted_cols_header: List) → pandas.core.frame.DataFrame¶ Computes the difference between each value present in th time column. Copies and shift by one position up all the values present in the remaining columns.
- Parameters
sample_frame (pandas.Dataframe) – the traj to be processed
columns_header (List) – the original header of sample_frame
shifted_cols_header (List) – a copy of columns_header with changed names of the contents
- Returns
The processed dataframe
- Return type
pandas.Dataframe
Warning
the Dataframe
sample_frame
has to follow the column structure of this header: Header of sample_frame = [Time | Variable values]
-
property
concatenated_samples
¶
-
abstract
dataset_id
() → object¶ If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset
-
property
file_path
¶
-
property
sorter
¶
-
property
structure
¶
-
property
variables
¶
classes.cache module¶
-
class
classes.cache.
Cache
¶ Bases:
object
This class acts as a cache of
SetOfCims
objects for a node.- _list_of_sets_of_parents
a list of
Sets
objects of the parents to which the cim in cache at SAME index is related- _actual_cache
a list of setOfCims objects
-
clear
() → None¶ Clear the contents both of
_actual_cache
and_list_of_sets_of_parents
.
-
find
(parents_comb: Set) → classes.set_of_cims.SetOfCims¶ Tries to find in cache given the symbolic parents combination
parents_comb
theSetOfCims
related to thatparents_comb
.- Parameters
parents_comb (Set) – the parents related to that
SetOfCims
- Returns
A
SetOfCims
object if theparents_comb
index is found in_list_of_sets_of_parents
. None otherwise.- Return type
-
put
(parents_comb: Set, socim: classes.set_of_cims.SetOfCims) → None¶ Place in cache the
SetOfCims
object, and the related symbolic indexparents_comb
in_list_of_sets_of_parents
.- Parameters
parents_comb (Set) – the symbolic set index
socim (SetOfCims) – the related SetOfCims object
classes.conditional_intensity_matrix module¶
-
class
classes.conditional_intensity_matrix.
ConditionalIntensityMatrix
(state_residence_times: numpy.array, state_transition_matrix: numpy.array)¶ Bases:
object
Abstracts the Conditional Intesity matrix of a node as aggregation of the state residence times vector and state transition matrix and the actual CIM matrix.
- Parameters
state_residence_times (numpy.array) – state residence times vector
state_transition_matrix (numpy.ndArray) – the transitions count matrix
- _cim
the actual cim of the node
-
property
cim
¶
-
compute_cim_coefficients
() → None¶ Compute the coefficients of the matrix _cim by using the following equality q_xx’ = M[x, x’] / T[x]. The class member
_cim
will contain the computed cim
-
property
state_residence_times
¶
-
property
state_transition_matrix
¶
classes.json_importer module¶
-
class
classes.json_importer.
JsonImporter
(file_path: str, samples_label: str, structure_label: str, variables_label: str, time_key: str, variables_key: str)¶ Bases:
classes.abstract_importer.AbstractImporter
Implements the abstracts methods of AbstractImporter and adds all the necessary methods to process and prepare the data in json extension.
- Parameters
file_path (string) – the path of the file that contains tha data to be imported
samples_label (string) – the reference key for the samples in the trajectories
structure_label (string) – the reference key for the structure of the network data
variables_label (string) – the reference key for the cardinalites of the nodes data
time_key (string) – the key used to identify the timestamps in each trajectory
variables_key (string) – the key used to identify the names of the variables in the net
- _array_indx
the index of the outer JsonArray to extract the data from
- _df_samples_list
a Dataframe list in which every dataframe contains a trajectory
- _raw_data
The raw contents of the json file to import
-
build_sorter
(sample_frame: pandas.core.frame.DataFrame) → List¶ Implements the abstract method build_sorter of the
AbstractImporter
for this dataset.
-
clear_data_frame_list
() → None¶ Removes all values present in the dataframes in the list
_df_samples_list
.
-
dataset_id
() → object¶ If the original dataset contains multiple dataset, this method returns a unique id to identify the current dataset
-
import_data
(indx: int) → None¶ Implements the abstract method of
AbstractImporter
.- Parameters
indx (int) – the index of the outer JsonArray to extract the data from
-
import_sampled_cims
(raw_data: List, indx: int, cims_key: str) → Dict¶ Imports the synthetic CIMS in the dataset in a dictionary, using variables labels as keys for the set of CIMS of a particular node.
- Parameters
raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
cims_key (string) – the key where the json object cims are placed
- Returns
a dictionary containing the sampled CIMS for all the variables in the net
- Return type
Dictionary
-
import_structure
(raw_data: List) → pandas.core.frame.DataFrame¶ Imports in a dataframe the data in the list raw_data at the key
_structure_label
- Parameters
raw_data (List) – List of Dicts
- Returns
Dataframe containg the starting node a ending node of every arc of the network
- Return type
pandas.Dataframe
-
import_trajectories
(raw_data: List) → List¶ Imports the trajectories from the list of dicts
raw_data
.- Parameters
raw_data (List) – List of Dicts
- Returns
List of dataframes containing all the trajectories
- Return type
List
-
import_variables
(raw_data: List) → pandas.core.frame.DataFrame¶ Imports the data in
raw_data
at the key_variables_label
.- Parameters
raw_data (List) – List of Dicts
- Returns
Datframe containg the variables simbolic labels and their cardinalities
- Return type
pandas.Dataframe
-
normalize_trajectories
(raw_data: List, indx: int, trajectories_key: str) → List¶ Extracts the trajectories in
raw_data
at the indexindex
at the keytrajectories key
.- Parameters
raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
trajectories_key (string) – the key of the trajectories objects
- Returns
A list of daframes containg the trajectories
- Return type
List
-
one_level_normalizing
(raw_data: List, indx: int, key: str) → pandas.core.frame.DataFrame¶ Extracts the one-level nested data in the list
raw_data
at the indexindx
at the keykey
.- Parameters
raw_data (List) – List of Dicts
indx (int) – The index of the array from which the data have to be extracted
key (string) – the key for the Dicts from which exctract data
- Returns
A normalized dataframe
- Return type
pandas.Datframe
-
read_json_file
() → List¶ Reads the JSON file in the path self.filePath.
- Returns
The contents of the json file
- Return type
List
classes.network_graph module¶
-
class
classes.network_graph.
NetworkGraph
(graph_struct: classes.structure.Structure)¶ Bases:
object
Abstracts the infos contained in the Structure class in the form of a directed graph. Has the task of creating all the necessary filtering and indexing structures for parameters estimation
- Parameters
graph_struct (Structure) – the
Structure
object from which infos about the net will be extracted- _graph
directed graph
- _aggregated_info_about_nodes_parents
a structure that contains all the necessary infos about every parents of the node of which all the indexing and filtering structures will be constructed.
- _time_scalar_indexing_structure
the indexing structure for state res time estimation
- _transition_scalar_indexing_structure
the indexing structure for transition computation
- _time_filtering
the columns filtering structure used in the computation of the state res times
- _transition_filtering
the columns filtering structure used in the computation of the transition from one state to another
- _p_combs_structure
all the possible parents states combination for the node of interest
-
add_edges
(list_of_edges: List) → None¶ Add the edges to the
_graph
contained in the listlist_of_edges
.- Parameters
list_of_edges (List) – the list containing of tuples containing the edges
-
add_nodes
(list_of_nodes: List) → None¶ Adds the nodes to the
_graph
contained in the list of nodeslist_of_nodes
. Sets all the properties that identify a nodes (index, positional index, cardinality)- Parameters
list_of_nodes (List) – the nodes to add to
_graph
-
static
build_p_comb_structure_for_a_node
(parents_values: List) → numpy.ndarray¶ Builds the combinatorial structure that contains the combinations of all the values contained in
parents_values
.- Parameters
parents_values (List) – the cardinalities of the nodes
- Returns
A numpy matrix containing a grid of the combinations
- Return type
numpy.ndArray
-
static
build_time_columns_filtering_for_a_node
(node_indx: int, p_indxs: List) → numpy.ndarray¶ Builds the necessary structure to filter the desired columns indicated by
node_indx
andp_indxs
in the dataset. This structute will be used in the computation of the state res times. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :return: The filtering structure for times estimation :rtype: numpy.ndArray
-
static
build_time_scalar_indexing_structure_for_a_node
(node_states: int, parents_vals: List) → numpy.ndarray¶ Builds an indexing structure for the computation of state residence times values.
- Parameters
node_states (int) – the node cardinality
parents_vals (List) – the caridinalites of the node’s parents
- Returns
The time indexing structure
- Return type
numpy.ndArray
-
static
build_transition_filtering_for_a_node
(node_indx: int, p_indxs: List, nodes_number: int) → numpy.ndarray¶ Builds the necessary structure to filter the desired columns indicated by
node_indx
andp_indxs
in the dataset. This structure will be used in the computation of the state transitions values. :param node_indx: the index of the node :type node_indx: int :param p_indxs: the indexes of the node’s parents :type p_indxs: List :param nodes_number: the total number of nodes in the dataset :type nodes_number: int :return: The filtering structure for transitions estimation :rtype: numpy.ndArray
-
static
build_transition_scalar_indexing_structure_for_a_node
(node_states_number: int, parents_vals: List) → numpy.ndarray¶ Builds an indexing structure for the computation of state transitions values.
- Parameters
node_states_number (int) – the node cardinality
parents_vals (List) – the caridinalites of the node’s parents
- Returns
The transition indexing structure
- Return type
numpy.ndArray
-
clear_indexing_filtering_structures
() → None¶ Initialize all the filtering/indexing structures.
-
property
edges
¶
-
fast_init
(node_id: str) → None¶ Initializes all the necessary structures for parameters estimation of the node identified by the label node_id
- Parameters
node_id (string) – the label of the node
-
get_node_indx
(node_id) → int¶
-
get_ordered_by_indx_set_of_parents
(node: str) → Tuple¶ Builds the aggregated structure that holds all the infos relative to the parent set of the node, namely (parents_labels, parents_indexes, parents_cardinalities).
- Parameters
node (string) – the label of the node
- Returns
a tuple containing all the parent set infos
- Return type
Tuple
-
get_parents_by_id
(node_id) → List¶ Returns a list of labels of the parents of the node
node_id
- Parameters
node_id (string) – the node label
- Returns
a List of labels of the parents
- Return type
List
-
get_positional_node_indx
(node_id) → int¶
-
get_states_number
(node_id) → int¶
-
property
nodes
¶
-
property
nodes_indexes
¶
-
property
nodes_values
¶
-
property
p_combs
¶
-
remove_node
(node_id: str) → None¶ Remove the node
node_id
from all the class members. Initialize all the filtering/indexing structures.
-
property
time_filtering
¶
-
property
time_scalar_indexing_strucure
¶
-
property
transition_filtering
¶
-
property
transition_scalar_indexing_structure
¶
classes.parameters_estimator module¶
-
class
classes.parameters_estimator.
ParametersEstimator
(trajectories: classes.trajectory.Trajectory, net_graph: classes.network_graph.NetworkGraph)¶ Bases:
object
Has the task of computing the cims of particular node given the trajectories and the net structure in the graph
_net_graph
.- Parameters
trajectories (Trajectory) – the trajectories
net_graph (NetworkGraph) – the net structure
- _single_set_of_cims
the set of cims object that will hold the cims of the node
-
compute_parameters_for_node
(node_id: str) → classes.set_of_cims.SetOfCims¶ Compute the CIMS of the node identified by the label
node_id
.- Parameters
node_id (string) – the node label
- Returns
A SetOfCims object filled with the computed CIMS
- Return type
-
static
compute_state_res_time_for_node
(times: numpy.ndarray, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexes_struct: numpy.ndarray, T: numpy.ndarray) → None¶ Compute the state residence times for a node and fill the matrix
T
with the results- Parameters
node_indx (int) – the index of the node
times (numpy.array) – the times deltas vector
trajectory (numpy.ndArray) – the trajectory
cols_filter (numpy.array) – the columns filtering structure
scalar_indexes_struct (numpy.array) – the indexing structure
T (numpy.ndArray) – the state residence times vectors
-
static
compute_state_transitions_for_a_node
(node_indx: int, trajectory: numpy.ndarray, cols_filter: numpy.ndarray, scalar_indexing: numpy.ndarray, M: numpy.ndarray) → None¶ Compute the state residence times for a node and fill the matrices
M
with the results.- Parameters
node_indx (int) – the index of the node
trajectory (numpy.ndArray) – the trajectory
cols_filter (numpy.array) – the columns filtering structure
scalar_indexing (numpy.array) – the indexing structure
M (numpy.ndArray) – the state transitions matrices
-
fast_init
(node_id: str) → None¶ Initializes all the necessary structures for the parameters estimation for the node
node_id
.- Parameters
node_id (string) – the node label
classes.sample_path module¶
-
class
classes.sample_path.
SamplePath
(importer: classes.abstract_importer.AbstractImporter)¶ Bases:
object
Aggregates all the informations about the trajectories, the real structure of the sampled net and variables cardinalites. Has the task of creating the objects
Trajectory
andStructure
that will contain the mentioned data.- Parameters
importer (AbstractImporter) – the Importer object which contains the imported and processed data
- _trajectories
the
Trajectory
object that will contain all the concatenated trajectories- _structure
the
Structure
Object that will contain all the structural infos about the net- _total_variables_count
the number of variables in the net
-
build_structure
() → None¶ Builds the
Structure
object that aggregates all the infos about the net.
-
build_trajectories
() → None¶ Builds the Trajectory object that will contain all the trajectories. Clears all the unused dataframes in
_importer
Object
-
property
has_prior_net_structure
¶
-
property
structure
¶
-
property
total_variables_count
¶
-
property
trajectories
¶
classes.set_of_cims module¶
-
class
classes.set_of_cims.
SetOfCims
(node_id: str, parents_states_number: List, node_states_number: int, p_combs: numpy.ndarray)¶ Bases:
object
Aggregates all the CIMS of the node identified by the label _node_id.
- Parameters
node_id – the node label
parents_states_number (List) – the cardinalities of the parents
node_states_number (int) – the caridinality of the node
p_combs (numpy.ndArray) – the p_comb structure bound to this node
- _state_residence_time
matrix containing all the state residence time vectors for the node
- _transition_matrices
matrix containing all the transition matrices for the node
- _actual_cims
the cims of the node
-
property
actual_cims
¶
-
build_cims
(state_res_times: numpy.ndarray, transition_matrices: numpy.ndarray) → None¶ Build the
ConditionalIntensityMatrix
objects given the state residence times and transitions matrices. Compute the cim coefficients.The class member_actual_cims
will contain the computed cims.- Parameters
state_res_times (numpy.ndArray) – the state residence times matrix
transition_matrices (numpy.ndArray) – the transition matrices
-
build_times_and_transitions_structures
() → None¶ Initializes at the correct dimensions the state residence times matrix and the state transition matrices.
-
filter_cims_with_mask
(mask_arr: numpy.ndarray, comb: List) → numpy.ndarray¶ Filter the cims contained in the array
_actual_cims
given the boolean maskmask_arr
and the indexcomb
.- Parameters
mask_arr (numpy.array) – the boolean mask that indicates which parent to consider
comb (numpy.array) – the state/s of the filtered parents
- Returns
Array of
ConditionalIntensityMatrix
objects- Return type
numpy.array
-
get_cims_number
()¶
-
property
p_combs
¶
classes.structure module¶
-
class
classes.structure.
Structure
(nodes_labels_list: List, nodes_indexes_arr: numpy.ndarray, nodes_vals_arr: numpy.ndarray, edges_list: List, total_variables_number: int)¶ Bases:
object
Contains all the infos about the network structure(nodes labels, nodes caridinalites, edges, indexes)
- Parameters
nodes_labels_list (List) – the symbolic names of the variables
nodes_indexes_arr (numpy.ndArray) – the indexes of the nodes
nodes_vals_arr (numpy.ndArray) – the cardinalites of the nodes
edges_list (List) – the edges of the network
total_variables_number (int) – the total number of variables in the dataset
-
property
edges
¶
-
get_node_id
(node_indx: int) → str¶ Given the
node_index
returns the node label.- Parameters
node_indx (int) – the node index
- Returns
the node label
- Return type
string
-
get_node_indx
(node_id: str) → int¶ Given the
node_index
returns the node label.- Parameters
node_id (string) – the node label
- Returns
the node index
- Return type
int
-
get_positional_node_indx
(node_id: str) → int¶
-
get_states_number
(node: str) → int¶ Given the node label
node
returns the cardinality of the node.- Parameters
node (string) – the node label
- Returns
the node cardinality
- Return type
int
-
property
nodes_indexes
¶
-
property
nodes_labels
¶
-
property
nodes_values
¶
-
remove_node
(node_id: str) → None¶ Remove the node
node_id
from all the class members. The class member_total_variables_number
since it refers to the total number of variables in the dataset.
-
property
total_variables_number
¶
classes.structure_estimator module¶
-
class
classes.structure_estimator.
StructureEstimator
(sample_path: classes.sample_path.SamplePath, exp_test_alfa: float, chi_test_alfa: float)¶ Bases:
object
Has the task of estimating the network structure given the trajectories in
samplepath
.- Parameters
sample_path (SamplePath) – the _sample_path object containing the trajectories and the real structure
exp_test_alfa (float) – the significance level for the exponential Hp test
chi_test_alfa (float) – the significance level for the chi Hp test
- _nodes
the nodes labels
- _nodes_vals
the nodes cardinalities
- _nodes_indxs
the nodes indexes
- _complete_graph
the complete directed graph built using the nodes labels in
_nodes
- _cache
the Cache object
-
adjacency_matrix
() → numpy.ndarray¶ Converts the estimated structure
_complete_graph
to a boolean adjacency matrix representation.- Returns
The adjacency matrix of the graph
_complete_graph
- Return type
numpy.ndArray
-
static
build_complete_graph
(node_ids: List) → networkx.classes.digraph.DiGraph¶ Builds a complete directed graph (no self loops) given the nodes labels in the list
node_ids
:- Parameters
node_ids (List) – the list of nodes labels
- Returns
a complete Digraph Object
- Return type
networkx.DiGraph
-
complete_test
(test_parent: str, test_child: str, parent_set: List, child_states_numb: int, tot_vars_count: int) → bool¶ Performs a complete independence test on the directed graphs G1 = {test_child U parent_set} G2 = {G1 U test_parent} (added as an additional parent of the test_child). Generates all the necessary structures and datas to perform the tests.
- Parameters
test_parent (string) – the node label of the test parent
test_child (string) – the node label of the child
parent_set (List) – the common parent set
child_states_numb (int) – the cardinality of the
test_child
tot_vars_count (int) – the total number of variables in the net
- Returns
True iff test_child and test_parent are independent given the sep_set parent_set. False otherwise
- Return type
bool
-
ctpc_algorithm
() → None¶ Compute the CTPC algorithm over the entire net.
-
static
generate_possible_sub_sets_of_size
(u: List, size: int, parent_label: str) → Iterator¶ Creates a list containing all possible subsets of the list
u
of sizesize
, that do not contains a the node identified byparent_label
.- Parameters
u (List) – the list of nodes
size (int) – the size of the subsets
parent_label (string) – the node to exclude in the subsets generation
- Returns
an Iterator Object containing a list of lists
- Return type
Iterator
-
independence_test
(child_states_numb: int, cim1: classes.conditional_intensity_matrix.ConditionalIntensityMatrix, cim2: classes.conditional_intensity_matrix.ConditionalIntensityMatrix) → bool¶ Compute the actual independence test using two cims. It is performed first the exponential test and if the null hypothesis is not rejected, it is performed also the chi_test.
- Parameters
child_states_numb (int) – the cardinality of the test child
cim1 (ConditionalIntensityMatrix) – a cim belonging to the graph without test parent
cim2 (ConditionalIntensityMatrix) – a cim belonging to the graph with test parent
- Returns
True iff both tests do NOT reject the null hypothesis of independence. False otherwise.
- Return type
bool
-
one_iteration_of_CTPC_algorithm
(var_id: str) → None¶ Performs an iteration of the CTPC algorithm using the node
var_id
astest_child
.- Parameters
var_id (string) – the node label of the test child
-
save_plot_estimated_structure_graph
() → None¶ Plot the estimated structure in a graphical model style. Spurious edges are colored in red.
-
save_results
() → None¶ Save the estimated Structure to a .json file in the path where the data are loaded from. The file is named as the input dataset but the results_ word is appended to the results file.
-
spurious_edges
() → List¶ - Return the spurious edges present in the estimated structure, if a prior net structure is present in
_sample_path.structure
.
- Returns
A list containing the spurious edges
- Return type
List
classes.trajectory module¶
-
class
classes.trajectory.
Trajectory
(list_of_columns: List, original_cols_number: int)¶ Bases:
object
Abstracts the infos about a complete set of trajectories, represented as a numpy array of doubles (the time deltas) and a numpy matrix of ints (the changes of states).
- Parameters
list_of_columns (List) – the list containing the times array and values matrix
original_cols_number (int) – total number of cols in the data
- _actual_trajectory
the trajectory containing also the duplicated/shifted values
- _times
the array containing the time deltas
-
property
complete_trajectory
¶
-
size
()¶
-
property
times
¶
-
property
trajectory
¶