This is a step-by-step guideline for correlated data simulation. More information about the different variable types can be found in the Variable Types vignette. More information about the differences between correlation methods 1 and 2 can be found in the Comparison of Correlation Methods 1 and 2 vignette. Some functions have been modified from the SimMultiCorrData package (Fialkowski 2018).
Obtain the distributional parameters for the desired variables.
SimMultiCorrData::calc_theory
. If the goal is to mimic an
empirical data set, these values can be found using
SimMultiCorrData::calc_moments
(using the method of
moments) or SimMultiCorrData::calc_fisherk
(using Fisher’s
k-statistics). If the standardized cumulants are obtained from
calc_theory
, the user may need to use rounded values as
inputs (i.e. skews = round(skews, 8)
). For example, in
order to ensure that skew is exactly 0
for symmetric distributions. Due to the nature of the integration
involved in calc_theory
, the results are approximations.
Greater accuracy can be achieved by increasing the number of
subdivisions (sub
) used in the integration process.For mixture variables, the parameters are specified at the
component level by the inputs mix_skews
,
mix_skurts
, mix_fifths
,
mix_sixths
, and mix_Six
. The mixing
probabilities, means, and standard deviations of the component variables
are given by mix_pis
, mix_mus
and
mix_sigmas
.
The means and variances of non-mixture and mixture variables are
specified by means
and vars
. These are at the
variable level, i.e., they refer to the continuous non-mixture and
mixture variables themselves. The function calc_mixmoments
calculates the expected mean, standard deviation, and standardized
cumulants for mixture variables based on the component
distributions.
For some sets of cumulants, it is either not possible to find
power method constants or the calculated constants do not generate valid
power method PDF’s. In these situations, adding a value to the sixth
cumulant may provide solutions (see find_constants
). If
simulation results indicate that a continuous variable does not generate
a valid PDF, the user can try find_constants
with various
sixth cumulant correction vectors to determine if a valid PDF can be
found. These sixth cumulant corrections are specified in the simulation
functions using Six
or mix_Six
.
Choice of Fleishman (1978)’s or
Headrick (2002)’s Method: Using the
fifth-order PMT (method
= “Polynomial”) allows additional
control over the fifth and sixth moments of the generated distribution,
improving accuracy. In addition, the range of feasible standardized
kurtosis (γ2)
values, given skew (γ1) and standardized
fifth (γ3) and
sixth (γ4)
cumulants, is larger than with the third-order method
(method
= “Fleishman”). For example, Fleishman’s method can
not be used to generate a non-normal distribution with a ratio of γ12/γ2 > 9/14
(Headrick and Kowalchuk 2007). This
eliminates the family of distributions, which has a constant ratio of
γ12/γ2 = 2/3.
The fifth-order method also generates more distributions with valid
PDF’s, However, if the fifth and sixth cumulants do not exist, the
third-order PMT should be used.
Ordinal variables (r ≥ 2 categories): these are the cumulative marginal probabilities and support values (if desired). The probabilities should be combined into a list of length equal to the number of ordinal variables. The ith element is a vector of the cumulative probabilities defining the marginal distribution of the ith variable. If the variable can take r values, the vector will contain r − 1 probabilities (the rth is assumed to be 1). For binary variables, the user-supplied probability should be the probability of the 1st (lower) support value. This would ordinarily be considered the probability of failure (q), while the probability of the 2nd (upper) support value would be considered the probability of success (p = 1 − q). The support values should be combined into a separate list. The ith element is a vector containing the r ordered support values. If not provided, the default is for the ith element to be the vector 1, ..., r.
Poisson variables: the lambda (mean > 0) values
should be given as a vector (see stats::dpois
). For
zero-inflated Poisson variables, the probability of a structural zero is
specified in p_zip
(see VGAM::dzipois
). The
default is p_zip = 0
for all variables. For correlation
method 2, the total cumulative probability truncation values are
specified in pois_eps
, with the default of 0.0001 for all variables. The order for
parameters should be regular then zero-inflated for all inputs. The
distribution functions are taken from the VGAM package
(Yee 2018).
Negative Binomial variables: the sizes (target number of
successes) and either the success probabilities or the means should be
given as vectors (see stats::dnbinom
). The variable
represents the number of failures which occur in a sequence of Bernoulli
trials before the target number of successes is achieved. For
zero-inflated NB variables, the probability of a structural zero is
specified in p_zinb
(see VGAM::dzinegbin
). The
default is p_zinb = 0
for all variables. For correlation
method 2, the total cumulative probability truncation values are
specified in nb_eps
, with the default of 0.0001 for all variables. The order for
parameters should be regular then zero-inflated for all inputs. The
distribution functions are taken from the VGAM package
(Yee 2018).
Check that all parameter inputs
have the correct format using validpar
. There are no checks
within the correlation validation or simulation functions in order to
decrease simulation time.
If continuous variables are desired, verify that the standardized
kurtoses are greater than the lower skurtosis bounds.
These bounds can be calculated using
SimMultiCorrData::calc_lower_skurt
, given the skewness (for
method
= “Fleishman”) and standardized fifth and sixth
cumulants (for method
= “Polynomial”) for each variable.
Different seeds should be examined to see if a lower boundary can be
found. If a lower bound produces power method constants that yield an
invalid PDF, the user may wish to provide a Skurt
vector of
kurtosis corrections. In this case, calc_lower_skurt
will
attempt to find the smallest value that produces a kurtosis which yields
a valid power method PDF. In addition, if method
=
“Polynomial”, a sixth cumulant correction vector (Six
) may
be used to facilitate convergence of the root-solving algorithm. Since
this step can take considerable computation time, the user may instead
wish to perform this check after simulation if any of the variables have
invalid power method PDF’s.
Check if the target correlation matrix rho
falls
within the feasible correlation bounds, given the
parameters for the desired distributions. The ordering of the
variables in rho
must be 1st ordinal, 2nd continuous
non-mixture, 3rd components of continuous mixture variables, 4th regular
Poisson, 5th zero-inflated Poisson, 6th regular NB, and 7th
zero-inflated NB. These bounds can be calculated using either
validcorr
(correlation method 1) or validcorr2
(correlation method 2). Note that falling within these bounds does not
guarantee that the target correlation can be achieved. However, the
check can alert the user to pairwise correlations that obviously fall
outside the bounds.
Generate the variables using either correlation
method 1 and corrvar
or correlation method 2 and
corrvar2
. The user may want to try both to see which gives
a better approximation to the variables and correlation matrix. The
accuracy and simulation time will vary by situation. In addition, the
error loop can minimize the correlation errors in most situations. See
the Error Loop Algorithm vignette for
details about the error loop.
Summarize the results numerically. The functions
corrvar
and corrvar2
do not provide variable
or correlation summaries in order to decrease simulation time. These can
be obtained using summary_var
, which gives summaries by
variable type, the final correlation matrix, and the maximum error
between the final and target correlation matrices. Additional summary
functions include: SimMultiCorrData::sim_cdf_prob
(to
calculate a cumulative probability up to a given continuous y value),
SimMultiCorrData::power_norm_corr
(to calculate the
correlation between a continuous variable and the generating standard
normal variable), and SimMultiCorrData::stats_pdf
(to
calculate the 100α% symmetric
trimmed-mean, median, mode, and maximum height of a valid power method
PDF).
Summarize the results graphically. Comparing the simulated data to the target distribution demonstrates simulation accuracy. The graphing functions provided in this package and the SimMultiCorrData package can be used to display simulated data values, PDF’s, or CDF’s. The target distributions (either by theoretical distribution name or given an empirical data set) can be added to the data value or PDF plots. Cumulative probabilities can be added to the CDF plots (for continuous variables).
The following examples demonstrate the use of the corrvar and corrvar2 functions to simulate the following correlated variables (n = 10,000):
Ordinal variable: O1 is binary with p = 0.3.
Continuous non-mixture variables: C1 and C2 have a Logistic(0, 1) distribution.
Continuous mixture variables:
Poisson variable: P1 with λ = 0.5 is a zero-inflated Poisson variable with the probability of a structural zero set at 0.1.
Negative Binomial variable: NB1 with size = 2 and μ = 2/3 is a zero-inflated NB variable with the probability of a structural zero set at 0.2.
Headrick (2002)’s fifth-order
transformation (method
= “Polynomial”) is used for the
continuous variables.
The target pairwise correlation is set at 0.35 between O1, C1, C2, M11, M12, M21, M22, M23, P1, and NB1. The correlation between the components of the same mixture variable (i.e., M11 and M12) is set at 0. Therefore, the correlation is controlled at the component level for the mixture variables. However, the expected correlations for the mixture variables can be approximated (see the Expected Cumulants and Correlations for Continuous Mixture Variables vignette).
library("SimCorrMix")
library("printr")
options(scipen = 999)
seed <- 276
n <- 10000
# Continuous variables
L <- calc_theory("Logistic", c(0, 1))
C <- calc_theory("Chisq", 4)
B <- calc_theory("Beta", c(4, 1.5))
# Non-mixture variables
skews <- rep(L[3], 2)
skurts <- rep(L[4], 2)
fifths <- rep(L[5], 2)
sixths <- rep(L[6], 2)
Six <- list(1.75, 1.75)
# Mixture variables
mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5))
mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1]))
mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2]))
mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3]))
mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4]))
mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5]))
mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6]))
mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03))
Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]],
mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]])
Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]],
mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]])
means <- c(L[1], L[1], Nstcum[1], Mstcum[1])
vars <- c(L[2]^2, L[2]^2, Nstcum[2]^2, Mstcum[2]^2)
marginal <- list(0.3)
support <- list(c(0, 1))
lam <- 0.5
p_zip <- 0.1
size <- 2
prob <- 0.75
mu <- size * (1 - prob)/prob
p_zinb <- 0.2
k_cat <- length(marginal)
k_cont <- length(Six)
k_mix <- length(mix_pis)
k_comp <- sum(unlist(lapply(mix_pis, length)))
k_pois <- length(lam)
k_nb <- length(size)
k_total <- k_cat + k_cont + k_comp + k_pois + k_nb
Rey <- matrix(0.35, k_total, k_total)
diag(Rey) <- 1
rownames(Rey) <- colnames(Rey) <- c("O1", "C1", "C2", "M1_1", "M1_2", "M2_1",
"M2_2", "M2_3", "P1", "NB1")
Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0
Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <-
Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0
validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial",
means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus,
mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six,
marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey)
## [1] TRUE
Since this step takes considerable computation time, the user may wish to calculate these after simulation if any of the simulated continuous variables have invalid PDF’s. The calculation will be demonstrated for the Chisq(4) distribution using both the third and fifth-order PMT’s for comparison.
Using Fleishman (1978)’s third-order method:
Lower_third <- calc_lower_skurt(method = "Fleishman", skews = C[3],
Skurt = seq(1.161, 1.17, 0.001), seed = 104)
knitr::kable(Lower_third$Min[, c("skew", "valid.pdf", "skurtosis")],
row.names = FALSE, caption = "Third-Order Lower Skurtosis Bound")
skew | valid.pdf | skurtosis |
---|---|---|
1.414214 | TRUE | 3.141272 |
The original lower skurtosis boundary (see
Lower_third$Invalid.C
) of 1.979272 has been increased to
3.141272, so that the skurtosis correction is 1.162. The skurtosis for
the distribution (3) is lower than this
boundary and the third-order PMT should not be used to simulate this
variable.
Using Headrick (2002)’s fifth-order method:
Lower_fifth <- calc_lower_skurt(method = "Polynomial", skews = C[3],
fifths = C[5], sixths = C[6], Skurt = seq(0.022, 0.03, 0.001), seed = 104)
knitr::kable(Lower_fifth$Min[, c("skew", "fifth", "sixth", "valid.pdf",
"skurtosis")], row.names = FALSE,
caption = "Fifth-Order Lower Skurtosis Bound")
skew | fifth | sixth | valid.pdf | skurtosis |
---|---|---|---|---|
1.414214 | 8.485281 | 30 | TRUE | 2.998959 |
The original lower skurtosis boundary (see
Lower_fifth$Invalid.C
) of 2.975959 has been increased to
2.998959, so that the skurtosis correction is 0.023. The skurtosis for
the distribution (3) is approximately
equal to this boundary. This does not cause a problem since the
simulated variable has a valid power method PDF.
The remaining steps vary by simulation method:
valid1 <- validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial",
means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus,
mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six,
marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed)
## All correlations are in feasible range!
Sim1 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb,
"Polynomial", means, vars, skews, skurts, fifths, sixths, Six,
mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob,
mu = NULL, p_zinb, Rey, seed, epsilon = 0.01)
## Total Simulation time: 0.016 minutes
Sum1 <- summary_var(Sim1$Y_cat, Sim1$Y_cont, Sim1$Y_comp, Sim1$Y_mix,
Sim1$Y_pois, Sim1$Y_nb, means, vars, skews, skurts, fifths, sixths,
mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
mix_sixths, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey)
Sim1_error <- abs(Rey - Sum1$rho_calc)
Summary of correlation errors:
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0 | 0.0006508 | 0.0017105 | 0.0069717 | 0.0042267 | 0.056645 |
Simulated correlation matrix for O1, C1, C2, M1, M2, P1, and NB1:
rho_mix <- Sum1$rho_mix
rownames(rho_mix) <- c("01", "C1", "C2", "M1", "M2", "P1", "NB1")
colnames(rho_mix) <- rownames(rho_mix)
rho_mix
01 | C1 | C2 | M1 | M2 | P1 | NB1 | |
---|---|---|---|---|---|---|---|
01 | 1.0000000 | 0.3506508 | 0.3514869 | 0.1444575 | 0.1779704 | 0.3022981 | 0.2933550 |
C1 | 0.3506508 | 1.0000000 | 0.3498480 | 0.1584599 | 0.1863418 | 0.3536539 | 0.3517377 |
C2 | 0.3514869 | 0.3498480 | 1.0000000 | 0.1520860 | 0.1917156 | 0.3542267 | 0.3559927 |
M1 | 0.1444575 | 0.1584599 | 0.1520860 | 1.0000000 | 0.0736568 | 0.1686911 | 0.1578840 |
M2 | 0.1779704 | 0.1863418 | 0.1917156 | 0.0736568 | 1.0000000 | 0.1824241 | 0.2059094 |
P1 | 0.3022981 | 0.3536539 | 0.3542267 | 0.1686911 | 0.1824241 | 1.0000000 | 0.3597090 |
NB1 | 0.2933550 | 0.3517377 | 0.3559927 | 0.1578840 | 0.2059094 | 0.3597090 | 1.0000000 |
We can approximate the expected correlations using the formulas in
the Expected Cumulants and Correlations for
Continuous Mixture Variables vignette and the rho_M1M2
and rho_M1Y
functions:
p_M11M21 <- p_M11M22 <- p_M11M23 <- 0.35
p_M12M21 <- p_M12M22 <- p_M12M23 <- 0.35
p_M1M2 <- matrix(c(p_M11M21, p_M11M22, p_M11M23, p_M12M21, p_M12M22, p_M12M23),
2, 3, byrow = TRUE)
rhoM1M2 <- rho_M1M2(mix_pis, mix_mus, mix_sigmas, p_M1M2)
p_M11C1 <- p_M12C1 <- 0.35
p_M1C1 <- c(p_M11C1, p_M12C1)
rho_M1C1 <- rho_M1Y(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], p_M1C1)
p_M21C1 <- p_M22C1 <- p_M23C1 <- 0.35
p_M2C1 <- c(p_M21C1, p_M22C1, p_M23C1)
rho_M2C1 <- rho_M1Y(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], p_M2C1)
Do all continuous variables have valid PDF’s?
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
## [1] 1.75 1.75 NA NA 1.75 NA 0.03
Non-mixture continuous variables and components of mixture variables:
target_sum <- Sum1$target_sum
cont_sum <- Sum1$cont_sum
rownames(target_sum) <- rownames(cont_sum) <- c("C1", "C2", "M1_1", "M1_2",
"M2_1", "M2_2", "M2_3")
knitr::kable(target_sum, digits = 5, row.names = TRUE,
caption = "Summary of Target Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
C1 | 1 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
C2 | 2 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
M1_1 | 3 | -2.00000 | 1.00000 | 0.00000 | 0.0000 | 0.00000 | 0.00000 |
M1_2 | 4 | 2.00000 | 1.00000 | 0.00000 | 0.0000 | 0.00000 | 0.00000 |
M2_1 | 5 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
M2_2 | 6 | 4.00000 | 8.00000 | 1.41421 | 3.0000 | 8.48528 | 30.00000 |
M2_3 | 7 | 0.72727 | 0.03051 | -0.69388 | -0.0686 | 1.82825 | -3.37911 |
knitr::kable(cont_sum[, -c(2, 5:7)], digits = 5, row.names = TRUE,
caption = "Summary of Simulated Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
C1 | 1 | -0.00236 | 1.81353 | -0.07568 | 1.15596 | -0.82258 | 5.31958 |
C2 | 2 | -0.00292 | 1.81711 | -0.12261 | 1.43216 | -2.29544 | 11.90657 |
M1_1 | 3 | -2.00000 | 0.99995 | -0.01327 | 0.00604 | -0.07277 | 0.00823 |
M1_2 | 4 | 2.00000 | 0.99995 | -0.00596 | 0.02387 | -0.07693 | 0.06703 |
M2_1 | 5 | -0.00010 | 1.81918 | -0.05528 | 2.04284 | -7.37085 | 86.32503 |
M2_2 | 6 | 4.00129 | 2.83990 | 1.44906 | 3.30692 | 10.22705 | 36.08930 |
M2_3 | 7 | 0.72736 | 0.17502 | -0.69956 | -0.07740 | 1.84831 | -3.22273 |
Mixture continuous variables:
target_mix <- Sum1$target_mix
mix_sum <- Sum1$mix_sum
rownames(target_mix) <- rownames(mix_sum) <- c("M1", "M2")
knitr::kable(target_mix, digits = 5, row.names = TRUE,
caption = "Summary of Target Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
M1 | 1 | 0.40000 | 2.20000 | -0.28850 | -1.15402 | 1.79302 | 6.17327 |
M2 | 2 | 1.16364 | 2.17086 | 2.01328 | 8.78954 | 36.48103 | 192.72198 |
knitr::kable(mix_sum[, -c(2, 5:7)], digits = 5, row.names = TRUE,
caption = "Summary of Simulated Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
M1 | 1 | 0.40000 | 2.19989 | -0.30464 | -1.13676 | 1.88209 | 5.93124 |
M2 | 2 | 1.16364 | 2.17075 | 1.89196 | 7.56122 | 25.47225 | 98.90376 |
Nplot <- plot_simpdf_theory(sim_y = Sim1$Y_mix[, 1], ylower = -10,
yupper = 10, title = "PDF of Mixture of N(-2, 1) and N(2, 1) Distributions",
fx = function(x) mix_pis[[1]][1] * dnorm(x, mix_mus[[1]][1],
mix_sigmas[[1]][1]) + mix_pis[[1]][2] * dnorm(x, mix_mus[[1]][2],
mix_sigmas[[1]][2]), lower = -Inf, upper = Inf, sim_size = 0.5,
target_size = 0.5)
Nplot
Mplot <- plot_simpdf_theory(sim_y = Sim1$Y_mix[, 2],
title = paste("PDF of Mixture of Logistic(0, 1), Chisq(4),",
"\nand Beta(4, 1.5) Distributions", sep = ""),
fx = function(x) mix_pis[[2]][1] * dlogis(x, 0, 1) + mix_pis[[2]][2] *
dchisq(x, 4) + mix_pis[[2]][3] * dbeta(x, 4, 1.5),
lower = -Inf, upper = Inf, sim_size = 0.5, target_size = 0.5)
Mplot
|
Distribution | P0 | Exp_P0 | Mean | Exp_Mean | Var | Exp_Var | Skew | Skurtosis | |
---|---|---|---|---|---|---|---|---|---|
mean | 1 | 0.644 | 0.6458776 | 0.4509 | 0.45 | 0.4699892 | 0.5277778 | 1.538379 | 2.241735 |
Pplot <- plot_simpdf_theory(sim_y = Sim1$Y_pois[, 1],
title = "PMF of Zero-Inflated Poisson Distribution", Dist = "Poisson",
params = c(lam, p_zip), cont_var = FALSE, col_width = 0.25)
Pplot
Distribution | P0 | Exp_P0 | Prob | Mean | Exp_Mean | Var | Exp_Var | Skew | Skurtosis | |
---|---|---|---|---|---|---|---|---|---|---|
mean | 1 | 0.6566 | 0.65 | 0.75 | 0.5285 | 0.5333333 | 0.7771877 | 0.7822222 | 2.023852 | 4.9968 |
For this example, mu
is used to describe NB1 instead of
prob
for demonstration purposes.
pois_eps <- 0.0001
nb_eps <- 0.0001
valid2 <- validcorr2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial",
means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus,
mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal,
lam, p_zip, size, prob = NULL, mu, p_zinb, pois_eps, nb_eps, Rey, seed)
## All correlations are in feasible range!
Sim2 <- corrvar2(n, k_cat, k_cont, k_mix, k_pois, k_nb,
"Polynomial", means, vars, skews, skurts, fifths, sixths, Six,
mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob = NULL, mu,
p_zinb, pois_eps, nb_eps, Rey, seed, epsilon = 0.01)
## Total Simulation time: 0.002 minutes
Sum2 <- summary_var(Sim2$Y_cat, Sim2$Y_cont, Sim2$Y_comp, Sim2$Y_mix,
Sim2$Y_pois, Sim2$Y_nb, means, vars, skews, skurts, fifths, sixths,
mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
mix_sixths, marginal, lam, p_zip, size, prob = NULL, mu, p_zinb, Rey)
Sim2_error <- abs(Rey - Sum2$rho_calc)
Summary of correlation errors:
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0 | 0.0005899 | 0.0023243 | 0.0056261 | 0.0058526 | 0.0415392 |
Simulated correlation matrix for O1, C1, C2, M1, M2, P1, and NB1:
rho_mix <- Sum2$rho_mix
rownames(rho_mix) <- c("01", "C1", "C2", "M1", "M2", "P1", "NB1")
colnames(rho_mix) <- rownames(rho_mix)
rho_mix
01 | C1 | C2 | M1 | M2 | P1 | NB1 | |
---|---|---|---|---|---|---|---|
01 | 1.0000000 | 0.3529766 | 0.3441474 | 0.1655733 | 0.1837398 | 0.3465248 | 0.3418101 |
C1 | 0.3529766 | 1.0000000 | 0.3479055 | 0.1648210 | 0.2019607 | 0.3455603 | 0.3530611 |
C2 | 0.3441474 | 0.3479055 | 1.0000000 | 0.1616851 | 0.1889727 | 0.3451090 | 0.3540683 |
M1 | 0.1655733 | 0.1648210 | 0.1616851 | 1.0000000 | 0.0898465 | 0.1582795 | 0.1636599 |
M2 | 0.1837398 | 0.2019607 | 0.1889727 | 0.0898465 | 1.0000000 | 0.2015298 | 0.2026512 |
P1 | 0.3465248 | 0.3455603 | 0.3451090 | 0.1582795 | 0.2015298 | 1.0000000 | 0.3484443 |
NB1 | 0.3418101 | 0.3530611 | 0.3540683 | 0.1636599 | 0.2026512 | 0.3484443 | 1.0000000 |
Do all continuous variables have valid PDF’s?
## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"
## [1] 1.75 1.75 NA NA 1.75 NA 0.03
Non-mixture continuous variables and components of mixture variables:
target_sum <- Sum2$target_sum
cont_sum <- Sum2$cont_sum
rownames(target_sum) <- rownames(cont_sum) <- c("C1", "C2", "M1_1", "M1_2",
"M2_1", "M2_2", "M2_3")
knitr::kable(target_sum, digits = 5, row.names = TRUE,
caption = "Summary of Target Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
C1 | 1 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
C2 | 2 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
M1_1 | 3 | -2.00000 | 1.00000 | 0.00000 | 0.0000 | 0.00000 | 0.00000 |
M1_2 | 4 | 2.00000 | 1.00000 | 0.00000 | 0.0000 | 0.00000 | 0.00000 |
M2_1 | 5 | 0.00000 | 3.28987 | 0.00000 | 1.2000 | 0.00000 | 6.85714 |
M2_2 | 6 | 4.00000 | 8.00000 | 1.41421 | 3.0000 | 8.48528 | 30.00000 |
M2_3 | 7 | 0.72727 | 0.03051 | -0.69388 | -0.0686 | 1.82825 | -3.37911 |
knitr::kable(cont_sum[, -c(2, 5:7)], digits = 5, row.names = TRUE,
caption = "Summary of Simulated Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
C1 | 1 | -0.00105 | 1.81604 | -0.05534 | 1.32078 | -1.17866 | 10.75831 |
C2 | 2 | 0.00306 | 1.81411 | 0.05950 | 1.22812 | -0.24923 | 7.56250 |
M1_1 | 3 | -2.00000 | 0.99995 | -0.01369 | 0.03369 | 0.15084 | 0.17987 |
M1_2 | 4 | 2.00000 | 0.99995 | 0.00329 | -0.00220 | 0.00322 | 0.04448 |
M2_1 | 5 | -0.00184 | 1.81394 | -0.05008 | 1.15472 | -0.12668 | 4.51961 |
M2_2 | 6 | 3.99969 | 2.83353 | 1.44988 | 3.20046 | 9.24133 | 30.55547 |
M2_3 | 7 | 0.72723 | 0.17470 | -0.67999 | -0.07132 | 1.83467 | -3.15223 |
Mixture continuous variables:
target_mix <- Sum2$target_mix
mix_sum <- Sum2$mix_sum
rownames(target_mix) <- rownames(mix_sum) <- c("M1", "M2")
knitr::kable(target_mix, digits = 5, row.names = TRUE,
caption = "Summary of Target Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
M1 | 1 | 0.40000 | 2.20000 | -0.28850 | -1.15402 | 1.79302 | 6.17327 |
M2 | 2 | 1.16364 | 2.17086 | 2.01328 | 8.78954 | 36.48103 | 192.72198 |
knitr::kable(mix_sum[, -c(2, 5:7)], digits = 5, row.names = TRUE,
caption = "Summary of Simulated Distributions")
Distribution | Mean | SD | Skew | Skurtosis | Fifth | Sixth | |
---|---|---|---|---|---|---|---|
M1 | 1 | 0.40000 | 2.19989 | -0.30415 | -1.13163 | 1.86277 | 5.88804 |
M2 | 2 | 1.16364 | 2.17075 | 2.11372 | 9.86318 | 46.71707 | 273.36693 |
|
Distribution | P0 | Exp_P0 | Mean | Exp_Mean | Var | Exp_Var | Skew | Skurtosis | |
---|---|---|---|---|---|---|---|---|---|
mean | 1 | 0.6423 | 0.6458776 | 0.4535 | 0.45 | 0.4704377 | 0.5277778 | 1.518603 | 2.198245 |
Distribution | P0 | Exp_P0 | Prob | Mean | Exp_Mean | Var | Exp_Var | Skew | Skurtosis | |
---|---|---|---|---|---|---|---|---|---|---|
mean | 1 | 0.646 | 0.65 | 0.75 | 0.5349 | 0.5333333 | 0.787782 | 0.7822222 | 2.214351 | 6.927319 |