Title: | Mixture of Multilayer Integrator Stochastic Block Models |
---|---|
Description: | Our approach uses a mixture of multilayer stochastic block models to group co-membership matrices with similar information into components and to partition observations into different clusters. See De Santiago (2023, ISBN: 978-2-87587-088-9). |
Authors: | Kylliann De Santiago [aut, cre], Marie Szafranski [aut], Christophe Ambroise [aut] |
Maintainer: | Kylliann De Santiago <[email protected]> |
License: | GPL-3 |
Version: | 0.0.1.3 |
Built: | 2025-01-28 03:07:58 UTC |
Source: | https://github.com/cran/mimiSBM |
mimiSBM model for fixed K and Q
BayesianMixture_SBM_model( A, K, Q, beta_0 = rep(1/2, K), theta_0 = rep(1/2, Q), eta_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), xi_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), tol = 0.001, iter_max = 10, n_init = 1, alternate = TRUE, Verbose = TRUE, eps_conv = 1e-04, type_init = "SBM", nbCores = 2 )
BayesianMixture_SBM_model( A, K, Q, beta_0 = rep(1/2, K), theta_0 = rep(1/2, Q), eta_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), xi_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), tol = 0.001, iter_max = 10, n_init = 1, alternate = TRUE, Verbose = TRUE, eps_conv = 1e-04, type_init = "SBM", nbCores = 2 )
A |
an array of dim=c(N,N,V) |
K |
number of clusters |
Q |
number of components |
beta_0 |
hyperparameters for beta |
theta_0 |
hyperparameters for theta |
eta_0 |
hyperparameters for eta |
xi_0 |
hyperparameters for xi |
tol |
convergence parameter on ELBO |
iter_max |
maximal number of iteration of mimiSBM |
n_init |
number of initialization of the mimi algorithm. |
alternate |
boolean indicated if we put an M-step after each part of the E-step, after u optimization and after tau optimization. If not, we optimize u and tau and after the M-step is made. |
Verbose |
boolean for information on model fitting |
eps_conv |
parameter of convergence for tau. |
type_init |
select the type of initialization type_init=c("SBM","Kmeans","random") |
nbCores |
the number of cores used to parallelize the calculations See the vignette for more details. |
model with estimation of coefficients.
Clustering Matrix : One hot encoding
CEM(Z)
CEM(Z)
Z |
a matrix N x K, with probabilities to belong of a cluster in rows for each observation. |
Z a matrix N x K One-Hot-Encoded by rows, where K is the number of clusters.
Z <- matrix(rnorm(12),3,4) Z_cem <- CEM(Z) print(Z_cem)
Z <- matrix(rnorm(12),3,4) Z_cem <- CEM(Z) print(Z_cem)
Diagonal coefficient to 0 on each slice given the 3rd dimension.
diag_nulle(A)
diag_nulle(A)
A |
a array of dimension dim=c(N,N,V) |
A with 0 on each diagonal given the 3rd dimension.
SBM on each layer
fit_SBM_per_layer(A, silent = FALSE, ncores = 2)
fit_SBM_per_layer(A, silent = FALSE, ncores = 2)
A |
an array of dim=c(N,N,V) |
silent |
Boolean for verbose |
ncores |
the number of cores used to parallelize the calculations of the various SBMs |
a list containing the parameters of each SBM applied to each view
SBM on each layer - parallelized
fit_SBM_per_layer_parallel(A, nbCores = 2)
fit_SBM_per_layer_parallel(A, nbCores = 2)
A |
an array of dim=c(N,N,V) |
nbCores |
the number of cores used to parallelize the calculations of the various SBMs |
a list containing the parameters of each SBM applied to each view
Initialization of mimiSBM parameters
initialisation_params_bayesian( A, K, Q, beta_0 = rep(1/2, K), theta_0 = rep(1/2, Q), eta_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), xi_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), type_init = "SBM", nbCores = 2 )
initialisation_params_bayesian( A, K, Q, beta_0 = rep(1/2, K), theta_0 = rep(1/2, Q), eta_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), xi_0 = array(rep(1/2, K * K * Q), c(K, K, Q)), type_init = "SBM", nbCores = 2 )
A |
an array of dim=c(N,N,V) |
K |
Number of clusters |
Q |
Number of components |
beta_0 |
hyperparameters for beta |
theta_0 |
hyperparameters for theta |
eta_0 |
hyperparameters for eta |
xi_0 |
hyperparameters for xi |
type_init |
select the type of initialization type_init=c("SBM","Kmeans","random") |
nbCores |
the number of cores used to parallelize the calculations of the various SBMs |
a list params
updated
This function can be used to perturb a clustering vector in order to randomly associate certain individuals with another cluster.
lab_switching(Z, p_out = 0.1)
lab_switching(Z, p_out = 0.1)
Z |
a clustering vector |
p_out |
a probability of perturbation for the clustering |
a perturbed clustering vector
Z <- sample(1:4,100,replace=TRUE) p = 0.1 Z_pert <- lab_switching(Z,p) table("Initial clustering" = Z,"Perturbed clustering" = Z_pert)
Z <- sample(1:4,100,replace=TRUE) p = 0.1 Z_pert <- lab_switching(Z,p) table("Initial clustering" = Z,"Perturbed clustering" = Z_pert)
log softmax of matrices (by row)
log_Softmax(log_X)
log_Softmax(log_X)
log_X |
a matrix of log(X) |
X with log_softmax function applied on each row
set.seed(42) X <- matrix(rnorm(15,mean=5),5,3) log_X <- log(X) X_softmax <- log_Softmax(X)
set.seed(42) X <- matrix(rnorm(15,mean=5),5,3) log_X <- log(X) X_softmax <- log_Softmax(X)
mimiSBM Evidence Lower BOund
Loss_BayesianMSBM(params)
Loss_BayesianMSBM(params)
params |
a list of parameters of the model |
computation of the mimiSBM ELBO
Create probality-component list for clustering per view component.
Mat_lien_alpha(clusters, K_barre, K)
Mat_lien_alpha(clusters, K_barre, K)
clusters |
list of link between final clustering and clustering per view component. |
K_barre |
Number of clusters in the final clustering |
K |
Vector of size Q, indicate the number of clusters in each component. |
alpha : probality-component list for clustering per view component.
Model that allows both clustering of individuals and grouping of views by component. This bayesian model estimates the probability of individuals belonging to each cluster (cluster crossing all views) and the membership component for all views. In addition, the connectivity tensor between classes, conditional on the components, is also estimated.
mimiSBM( A, Kset, Qset, beta_0 = 1/2, theta_0 = 1/2, eta_0 = 1/2, xi_0 = 1/2, criterion = "ILVB", tol = 0.001, iter_max = 10, n_init = 1, alternate = FALSE, Verbose = FALSE, eps_conv = 1e-04, type_init = "SBM" )
mimiSBM( A, Kset, Qset, beta_0 = 1/2, theta_0 = 1/2, eta_0 = 1/2, xi_0 = 1/2, criterion = "ILVB", tol = 0.001, iter_max = 10, n_init = 1, alternate = FALSE, Verbose = FALSE, eps_conv = 1e-04, type_init = "SBM" )
A |
an array of dim=c(N,N,V) |
Kset |
Set of number of clusters |
Qset |
Set of number of components |
beta_0 |
hyperparameters for beta |
theta_0 |
hyperparameters for theta |
eta_0 |
hyperparameters for eta |
xi_0 |
hyperparameters for xi |
criterion |
model selection criterion, criterion=c("ILVB","ICL_approx","ICL_variationnel","ICL_exact") |
tol |
convergence parameter on ELBO |
iter_max |
maximal number of iteration of mimiSBM |
n_init |
number of initialization of the mimi algorithm. |
alternate |
boolean indicated if we put an M-step after each part of the E-step, after u optimization and after tau optimization. If not, we optimize u and tau and after the M-step is made. |
Verbose |
boolean for information on model fitting |
eps_conv |
parameter of convergence for tau. |
type_init |
select the type of initialization type_init=c("SBM","Kmeans","random") |
The best model, conditionnally to the criterion, and its parameters.
set.seed(42) K = c(2,3); pi_k = rep(1/4,4) ; rho = rep(1/2,2) res <- rSMB_partition(N = 50,V = 5,K = K ,pi_k = pi_k ,rho = rho,p_switch = 0.1) A = res$simulation$A ; Kset = 4 ; Qset = 2 model <- mimiSBM(A,Kset,Qset,n_init = 1, Verbose=FALSE)
set.seed(42) K = c(2,3); pi_k = rep(1/4,4) ; rho = rep(1/2,2) res <- rSMB_partition(N = 50,V = 5,K = K ,pi_k = pi_k ,rho = rho,p_switch = 0.1) A = res$simulation$A ; Kset = 4 ; Qset = 2 model <- mimiSBM(A,Kset,Qset,n_init = 1, Verbose=FALSE)
Calculation of Log multinomial Beta value.
multinomial_lbeta_function(x)
multinomial_lbeta_function(x)
x |
a vector |
sum(lgamma(x[j])) - lgamma(sum(x))
One Hot Encoding with Error machine
one_hot_errormachine(Z, size = NULL)
one_hot_errormachine(Z, size = NULL)
Z |
a vector of size N, where Z[i] value indicate the cluster membership of observation i. |
size |
optional parameter, indicating the number of classes (avoid some empty class problems). |
Z a matrix N x K One-Hot-Encoded by rows, where K is the number of clusters.
Z <- sample(1:4,10,replace=TRUE) Z_OHE <- one_hot_errormachine(Z) print(Z_OHE)
Z <- sample(1:4,10,replace=TRUE) Z_OHE <- one_hot_errormachine(Z) print(Z_OHE)
Create a link between final clustering and clustering per view component.
partition_K_barre(K_barre, K)
partition_K_barre(K_barre, K)
K_barre |
Number of clusters in the final clustering |
K |
Vector of size Q, indicate the number of clusters in each component. K[q] <= K_barre for all q |
cluster : a list of link between final clustering and clustering per view component.
A function to plot each adjacency matrices defined by the thrid dimension of an array, and plot the sum of all theses matrices.
plot_adjacency(A)
plot_adjacency(A)
A |
an array with dim=c(N,N,V). |
None
Simulate data from the mimiSBM generative model.
rMSBM(N, V, alpha_klq, pi_k, rho, sorted = TRUE, p_switch = NULL)
rMSBM(N, V, alpha_klq, pi_k, rho, sorted = TRUE, p_switch = NULL)
N |
Number of individuals. |
V |
Number of views. |
alpha_klq |
array of component-connection probability (K,K,Q). |
pi_k |
Vector of proportions of individuals across clusters. |
rho |
Vector of proportion of views across components. |
sorted |
Boolean for simulation reordering (clusters and components membership). |
p_switch |
probability of label-switching, if NULL no perturbation between true clustering and the connectivity of individuals. |
list with the parameters of the simulation ($params), and the simulations ($simulation).
This simulation process assumes that we have partial information on the clustering within each view component, and that the final clustering of individuals depends on a combination of the clustering on each of the views. In addition, we take into account possible label-switching: we consider that an individual belongs with a certain probability to the wrong class, thus disturbing the adjacency matrices and making the simulation more real and complex.
rSMB_partition(N, V, K, pi_k, rho, sorted = TRUE, p_switch = NULL)
rSMB_partition(N, V, K, pi_k, rho, sorted = TRUE, p_switch = NULL)
N |
Number of observations |
V |
Number of views |
K |
Vector of size Q, indicate the number of clusters in each component. |
pi_k |
Vector of proportions of observations across clusters. |
rho |
Vector of proportion of views across components. |
sorted |
Boolean for simulation reordering (clusters and components membership). |
p_switch |
probability of label-switching, if NULL no perturbation between true clustering and the connectivity of individuals. |
See the vignette for more information.
list with the parameters of the simulation ($params), and the simulations ($simulation).
Sort the clustering matrix
sort_Z(Z)
sort_Z(Z)
Z |
a matrix N x K, with probabilities to belong of a cluster in rows for each observation. |
a sorted matrix
Transposition of an array
transpo(A)
transpo(A)
A |
a array of dim= c(.,.,V) |
A_transposed, the transposed array according the third dimension
Upper triangular Matrix/Array
trig_sup(A, transp = FALSE, diag = TRUE)
trig_sup(A, transp = FALSE, diag = TRUE)
A |
a array or a squared matrix |
transp |
boolean, indicate if we need a transposition or not. |
diag |
boolean, if True, diagonal is not used. |
a array or a squared matrix, with only upper-triangular coefficients with non-zero values
Update of bayesian parameter beta
update_beta_bayesian(params)
update_beta_bayesian(params)
params |
list of parameters of the model |
params with beta updated
Update of bayesian parameter eta
update_eta_bayesian(A, params)
update_eta_bayesian(A, params)
A |
an array of dim=c(N,N,V) |
params |
list of parameters of the model |
params with eta updated
Update of bayesian parameter tau
update_tau_bayesian(A, params, eps_conv = 1e-04)
update_tau_bayesian(A, params, eps_conv = 1e-04)
A |
an array of dim=c(N,N,V) |
params |
list of parameters of the model |
eps_conv |
parameter of convergence. |
params with tau updated
Update of bayesian parameter theta
update_theta_bayesian(params)
update_theta_bayesian(params)
params |
list of parameters of the model |
params with theta updated
Update of bayesian parameter u
update_u_bayesian(A, params)
update_u_bayesian(A, params)
A |
an array of dim=c(N,N,V) |
params |
list of parameters of the model |
params with u updated
Update of bayesian parameter xi
update_xi_bayesian(A, params)
update_xi_bayesian(A, params)
A |
an array of dim=c(N,N,V) |
params |
list of parameters of the model |
params with xi updated
Variational Bayes Expectation Maximization
VBEM_step(A, params, alternate = TRUE, eps_conv = 0.001)
VBEM_step(A, params, alternate = TRUE, eps_conv = 0.001)
A |
an array of dim=c(N,N,V) |
params |
list of parameters of the model |
alternate |
boolean indicated if we put an M-step after each part of the E-step, after u optimization and after tau optimization. If not, we optimize u and tau and after the M-step is made. |
eps_conv |
parameter of convergence for tau. |
params with updated parameters.