Title: | Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions |
---|---|
Description: | Generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al., 2009 <DOI:10.1016/j.csda.2008.02.032>). The methods extend those found in the 'SimMultiCorrData' R package. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman (1978)'s third order <DOI:10.1007/BF02293811> or Headrick (2002)'s fifth order <DOI:10.1016/S0167-9473(02)00072-5> polynomial transformation method (the power method transformation, PMT). Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation occurs at the component level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of ordsample() in package 'GenOrd'. Count variables are simulated using the inverse CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently. Correlation Method 1 adapts Yahav and Shmueli's 2012 method <DOI:10.1002/asmb.901> and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari's 2015 modification of the 'GenOrd' package <DOI:10.1002/asmb.2072> and performs best under the opposite scenarios. The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also contains functions to calculate the standardized cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables. |
Authors: | Allison Cynthia Fialkowski |
Maintainer: | Allison Cynthia Fialkowski <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-03 04:21:01 UTC |
Source: | https://github.com/afialkowski/simcorrmix |
This function uses the method of moments to calculate the expected mean, standard deviation, skewness,
standardized kurtosis, and standardized fifth and sixth cumulants for a continuous mixture variable based on the distributions
of its components. The result can be used as input to find_constants
or for comparison to a
simulated mixture variable from contmixvar1
, corrvar
, or
corrvar2
. See the Expected Cumulants and Correlations for Continuous Mixture Variables vignette
for equations of the cumulants.
calc_mixmoments(mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, mix_skews = NULL, mix_skurts = NULL, mix_fifths = NULL, mix_sixths = NULL)
calc_mixmoments(mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, mix_skews = NULL, mix_skurts = NULL, mix_fifths = NULL, mix_sixths = NULL)
mix_pis |
a vector of mixing probabilities that sum to 1 for the component distributions |
mix_mus |
a vector of means for the component distributions |
mix_sigmas |
a vector of standard deviations for the component distributions |
mix_skews |
a vector of skew values for the component distributions |
mix_skurts |
a vector of standardized kurtoses for the component distributions |
mix_fifths |
a vector of standardized fifth cumulants for the component distributions; keep NULL if using |
mix_sixths |
a vector of standardized sixth cumulants for the component distributions; keep NULL if using |
A vector of the mean, standard deviation, skewness, standardized kurtosis, and standardized fifth and sixth cumulants
Please see references for SimCorrMix
.
# Mixture of Normal(-2, 1) and Normal(2, 1) calc_mixmoments(mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0))
# Mixture of Normal(-2, 1) and Normal(2, 1) calc_mixmoments(mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0))
This function simulates one continuous mixture variable. Mixture distributions describe random variables that
are drawn from more than one component distribution. For a random variable from a finite continuous mixture
distribution with
components, the probability density function (PDF) can be described by:
The are mixing parameters which determine the weight of each component distribution
in the overall
probability distribution. As long as each component has a valid PDF, the overall distribution
has a valid PDF.
The main assumption is statistical independence between the process of randomly selecting the component distribution and the
distributions themselves. Each component
is generated using either Fleishman's third-order (
method
= "Fleishman",
doi:10.1007/BF02293811) or Headrick's fifth-order (method
= "Polynomial",
doi:10.1016/S0167-9473(02)00072-5) power method transformation (PMT). It works by matching standardized
cumulants – the first four (mean, variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean,
variance, skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method. The transformation is
expressed as follows:
where and
both equal
for Fleishman's method. The real constants are calculated by
find_constants
. These components are then transformed to the desired mixture variable using a
random multinomial variable generated based on the mixing probabilities. There are no parameter input checks in order to decrease
simulation time. All inputs should be checked prior to simulation with validpar
. Summaries for the
simulation results can be obtained with summary_var
.
Mixture distributions provide a useful way for describing heterogeneity in a population, especially when an outcome is a
composite response from multiple sources. The vignette Variable Types provides more information about simulation of mixture
variables and the required parameters. The vignette Expected Cumulants and Correlations for Continuous Mixture Variables
gives the equations for the expected cumulants of a mixture variable. In addition, Headrick & Kowalchuk (2007,
doi:10.1080/10629360600605065) outlined a general method for comparing a simulated distribution to a given theoretical
distribution
. These steps can be found in the Continuous Mixture Distributions vignette.
contmixvar1(n = 10000, method = c("Fleishman", "Polynomial"), means = 0, vars = 1, mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, mix_skews = NULL, mix_skurts = NULL, mix_fifths = NULL, mix_sixths = NULL, mix_Six = list(), seed = 1234, cstart = list(), quiet = FALSE)
contmixvar1(n = 10000, method = c("Fleishman", "Polynomial"), means = 0, vars = 1, mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, mix_skews = NULL, mix_skurts = NULL, mix_fifths = NULL, mix_sixths = NULL, mix_Six = list(), seed = 1234, cstart = list(), quiet = FALSE)
n |
the sample size (i.e. the length of the simulated variable; default = 10000) |
method |
the method used to generate the component variables. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
means |
mean for the mixture variable (default = 0) |
vars |
variance for the mixture variable (default = 1) |
mix_pis |
a vector of mixing probabilities that sum to 1 for the component distributions |
mix_mus |
a vector of means for the component distributions |
mix_sigmas |
a vector of standard deviations for the component distributions |
mix_skews |
a vector of skew values for the component distributions |
mix_skurts |
a vector of standardized kurtoses for the component distributions |
mix_fifths |
a vector of standardized fifth cumulants for the component distributions; keep NULL if using |
mix_sixths |
a vector of standardized sixth cumulants for the component distributions; keep NULL if using |
mix_Six |
a list of vectors of sixth cumulant correction values for the component distributions of |
seed |
the seed value for random number generation (default = 1234) |
cstart |
a list of length equal to the total number of mixture components containing initial values for root-solving
algorithm used in |
quiet |
if FALSE prints total simulation time |
A list with the following components:
constants
a data.frame of the constants
Y_comp
a data.frame of the components of the mixture variable
Y_mix
a data.frame of the generated mixture variable
sixth_correction
the sixth cumulant correction values for Y_comp
valid.pdf
"TRUE" if constants generate a valid PDF, else "FALSE"
Time
the total simulation time in minutes
1) A check is performed to see if any distributions are repeated within the parameter inputs, i.e. if the mixture variable contains 2 components with the same standardized cumulants. These are noted so that the constants are only calculated once.
2) The constants are calculated for each component variable using find_constants
. If no
solutions are found that generate a valid power method PDF, the function will return constants that produce an invalid PDF
(or a stop error if no solutions can be found). Possible solutions include: 1) changing the seed, or 2) using a mix_Six
list with vectors of sixth cumulant correction values (if method
= "Polynomial"). Errors regarding constant
calculation are the most probable cause of function failure.
3) A matrix X_cont
of dim n x length(mix_pis)
of standard normal variables is generated and singular-value decomposition is done to
remove any correlation. The constants
are applied to X_cont
to create the component variables Y
with the desired distributions.
4) A random multinomial variable M = rmultinom(n, size = 1, prob = mix_pis)
is generated using stats::rmultinom
.
The continuous mixture variable Y_mix
is created from the component variables Y
based on this multinomial variable.
That is, if M[i, k_i] = 1
, then Y_mix[i] = Y[i, k_i]
. A location-scale transformation is done on Y_mix
to give it mean means
and variance vars
.
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the simulation will stop. It may help to first use find_constants
for each component variable to
determine if a sixth cumulant correction value is needed. The solutions can be used as starting values (see cstart
below).
If the standardized cumulants are obtained from calc_theory
, the user may need to use rounded values as inputs (i.e.
skews = round(skews, 8)
). For example, in order to ensure that skew is exactly 0 for symmetric distributions.
2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
See references for SimCorrMix
.
find_constants
, validpar
, summary_var
# Mixture of Normal(-2, 1) and Normal(2, 1) Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) Bsum <- summary_var(Y_comp = Bmix$Y_comp, Y_mix = Bmix$Y_mix, means = means, vars = vars, mix_pis = mix_pis, mix_mus = mix_mus, mix_sigmas = mix_sigmas, mix_skews = mix_skews, mix_skurts = mix_skurts, mix_fifths = mix_fifths, mix_sixths = mix_sixths) ## End(Not run)
# Mixture of Normal(-2, 1) and Normal(2, 1) Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) Bsum <- summary_var(Y_comp = Bmix$Y_comp, Y_mix = Bmix$Y_mix, means = means, vars = vars, mix_pis = mix_pis, mix_mus = mix_mus, mix_sigmas = mix_sigmas, mix_skews = mix_skews, mix_skurts = mix_skurts, mix_fifths = mix_fifths, mix_sixths = mix_sixths) ## End(Not run)
This function attempts to correct the final pairwise correlations of simulated variables to be within epsilon
of the target correlations. It updates the intermediate normal correlation iteratively in a loop until either the maximum error
is less than epsilon or the number of iterations exceeds maxit
. This function would not ordinarily be called directly by
the user. The function is a modification of Barbiero & Ferrari's ordcont
function in
GenOrd-package
. The ordcont
function has been modified in the following ways:
1) It works for continuous, ordinal (r >= 2 categories), and count (regular or zero-inflated, Poisson or Negative Binomial) variables.
2) The initial correlation check has been removed because the intermediate correlation matrix
Sigma
from corrvar
or corrvar2
has already been
checked for positive-definiteness and used to generate variables.
3) Eigenvalue decomposition is done on Sigma
to impose the correct intermediate correlations on the normal variables.
If Sigma
is not positive-definite, the negative eigenvalues are replaced with 0.
4) The final positive-definite check has been removed.
5) The intermediate correlation update function was changed to accommodate more situations.
6) Allowing specifications for the sample size and the seed for reproducibility.
The vignette Variable Types describes the algorithm used in the error loop.
corr_error(n = 10000, k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, mu = NULL, p_zinb = 0, seed = 1234, epsilon = 0.001, maxit = 1000, rho0 = NULL, Sigma = NULL, rho_calc = NULL)
corr_error(n = 10000, k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, mu = NULL, p_zinb = 0, seed = 1234, epsilon = 0.001, maxit = 1000, rho0 = NULL, Sigma = NULL, rho_calc = NULL)
n |
the sample size |
k_cat |
the number of ordinal (r >= 2 categories) variables |
k_cont |
the number of continuous variables (these may be regular continuous variables or components of continuous mixture variables) |
k_pois |
the number of Poisson (regular or zero-inflated) variables |
k_nb |
the number of Negative Binomial (regular or zero-inflated) variables |
method |
the method used to generate the continuous variables. "Fleishman" uses a third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
means |
a vector of means for the continuous variables |
vars |
a vector of variances for the continuous variables |
constants |
a matrix with |
marginal |
a list of length equal |
support |
a list of length equal |
lam |
a vector of lambda (mean > 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the zero-inflated
Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
mu |
a vector of mean parameters for the NB variables; order the same as in |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
seed |
the seed value for random number generation |
epsilon |
the maximum acceptable error between the final and target pairwise correlation; smaller epsilons take more time |
maxit |
the maximum number of iterations to use to find the intermediate correlation; the
correction loop stops when either the iteration number passes |
rho0 |
the target correlation matrix |
Sigma |
the intermediate correlation matrix previously used in |
rho_calc |
the final correlation matrix calculated in |
A list with the following components:
Sigma
the intermediate MVN correlation matrix resulting from the error loop
rho_calc
the calculated final correlation matrix generated from Sigma
Y_cat
the ordinal variables
Y
the continuous (mean 0, variance 1) variables
Y_cont
the continuous variables with desired mean and variance
Y_pois
the Poisson variables
Y_nb
the Negative Binomial variables
niter
a matrix containing the number of iterations required for each variable pair
Please see references for SimCorrMix
.
This function simulates k_cat
ordinal ( categories),
k_cont
continuous non-mixture,
k_mix
continuous mixture, k_pois
Poisson (regular and zero-inflated), and/or k_nb
Negative Binomial
(regular and zero-inflated) variables with a specified correlation matrix rho
. The variables are generated from
multivariate normal variables with intermediate correlation matrix Sigma
, calculated by intercorr
,
and then transformed. The intermediate correlations involving count variables are determined using correlation method 1.
The ordering of the variables in rho
must be 1st ordinal, 2nd continuous non-mixture,
3rd components of the continuous mixture, 4th regular Poisson, 5th zero-inflated Poisson, 6th regular NB, and 7th zero-inflated NB.
Note that it is possible for k_cat
, k_cont
, k_mix
, k_pois
, and/or k_nb
to be 0.
Simulation occurs at the component-level for continuous mixture distributions. The target correlation matrix is specified in terms of
correlations with components of continuous mixture variables. There are no parameter input checks
in order to decrease simulation time. All inputs should be checked prior to simulation with validpar
and validcorr
. Summaries for the simulation results can be obtained with summary_var
.
All continuous variables are simulated using either Fleishman's third-order (method
= "Fleishman", doi:10.1007/BF02293811) or Headrick's fifth-order
(method
= "Polynomial", doi:10.1016/S0167-9473(02)00072-5) power method transformation. It works by matching standardized
cumulants – the first four (mean, variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean,
variance, skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method. The transformation is
expressed as follows:
where and
both equal
for Fleishman's method. The real constants are calculated by
find_constants
. Continuous mixture variables are generated componentwise and then transformed to
the desired mixture variables based on random multinomial variables generated from the mixing probabilities. Ordinal variables
( categories) are generated by discretizing the standard normal
variables at quantiles. These quantiles are determined by evaluating the inverse standard normal CDF at the cumulative
probabilities defined by each variable's marginal distribution. Count variables are generated using the inverse CDF method. The
CDF of a standard normal variable has a uniform distribution. The appropriate quantile function (F_Y)^(-1) is applied to
this uniform variable with the designated parameters to generate the count variable: Y = (F_Y)^(-1)(Phi(Z)). The Negative
Binomial variable represents the number of failures which occur in a sequence of Bernoulli trials before the target number of
successes is achieved. Zero-inflated Poisson or NB variables are obtained by setting the probability of a structural zero to be
greater than 0. The optional error loop attempts to correct the final pairwise correlations to be within a user-specified
precision value (
epsilon
) of the target correlations.
The vignette Variable Types discusses how each of the different variables are generated and describes the required parameters.
The vignette Overall Workflow for Generation of Correlated Data provides a detailed example discussing the step-by-step simulation process and comparing correlation methods 1 and 2.
corrvar(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, errorloop = FALSE, epsilon = 0.001, maxit = 1000, use.nearPD = TRUE, nrand = 1e+05, Sigma = NULL, cstart = list(), quiet = FALSE)
corrvar(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, errorloop = FALSE, epsilon = 0.001, maxit = 1000, use.nearPD = TRUE, nrand = 1e+05, Sigma = NULL, cstart = list(), quiet = FALSE)
n |
the sample size (i.e. the length of each simulated variable; default = 10000) |
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables (default = 0) |
k_mix |
the number of continuous mixture variables (default = 0) |
k_pois |
the number of regular Poisson and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular Negative Binomial and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the |
means |
a vector of means for the |
vars |
a vector of variances for the |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
Six |
a list of vectors of sixth cumulant correction values for the |
mix_pis |
a list of length |
mix_mus |
a list of length |
mix_sigmas |
a list of length |
mix_skews |
a list of length |
mix_skurts |
a list of length |
mix_fifths |
a list of length |
mix_sixths |
a list of length |
mix_Six |
a list of length |
marginal |
a list of length equal to |
support |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
seed |
the seed value for random number generation (default = 1234) |
errorloop |
if TRUE, uses |
epsilon |
the maximum acceptable error between the final and target pairwise correlations (default = 0.001)
in the calculation of ordinal intermediate correlations with |
maxit |
the maximum number of iterations to use (default = 1000) in the calculation of ordinal
intermediate correlations with |
use.nearPD |
TRUE to convert the overall intermediate correlation matrix to the nearest positive definite matrix with |
nrand |
the number of random numbers to generate in calculating intermediate correlations with
|
Sigma |
an intermediate correlation matrix to use if the user wants to provide one, else it is calculated within by
|
cstart |
a list of length equal to |
quiet |
if FALSE prints simulation messages, if TRUE suppresses message printing |
A list whose components vary based on the type of simulated variables.
If ordinal variables are produced: Y_cat
the ordinal variables,
If continuous variables are produced:
constants
a data.frame of the constants,
Y_cont
the continuous non-mixture variables,
Y_comp
the components of the continuous mixture variables,
Y_mix
the continuous mixture variables,
sixth_correction
a list of sixth cumulant correction values,
valid.pdf
a vector where the i-th element is "TRUE" if the constants for the i-th continuous variable generate a valid PDF, else "FALSE"
If Poisson variables are produced: Y_pois
the regular and zero-inflated Poisson variables,
If Negative Binomial variables are produced: Y_nb
the regular and zero-inflated Negative Binomial variables,
Additionally, the following elements:
Sigma
the intermediate correlation matrix (after the error loop),
Error_Time
the time in minutes required to use the error loop,
Time
the total simulation time in minutes,
niter
a matrix of the number of iterations used for each variable in the error loop,
The intermediate correlations used in method 1 are more simulation based than those in method 2, which means that accuracy increases with sample size and the number of repetitions. In addition, specifying the seed allows for reproducibility. In addition, method 1 differs from method 2 in the following ways:
1) The intermediate correlation for count variables is based on the method of Yahav & Shmueli (2012, doi:10.1002/asmb.901), which uses a simulation based, logarithmic transformation of the target correlation. This method becomes less accurate as the variable mean gets closer to zero.
2) The ordinal - count variable correlations are based on an extension of the method of Amatya & Demirtas (2015, doi:10.1080/00949655.2014.953534), in which the correlation correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between the count variable and the normal variable used to generate it and a simulated upper bound on the correlation between an ordinal variable and the normal variable used to generate it (see Demirtas & Hedeker, 2011, doi:10.1198/tast.2011.10090).
3) The continuous - count variable correlations are based on an extension of the methods of Amatya & Demirtas (2015) and Demirtas et al. (2012, doi:10.1002/sim.5362), in which the correlation correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between the count variable and the normal variable used to generate it and the power method correlation between the continuous variable and the normal variable used to generate it (see Headrick & Kowalchuk, 2007, doi:10.1080/10629360600605065). The intermediate correlations are the ratio of the target correlations to the correction factor.
Please see the Comparison of Correlation Methods 1 and 2 vignette for more information and a step-by-step overview of the simulation process.
Using the fifth-order approximation allows additional control over the fifth and sixth moments of the generated distribution, improving
accuracy. In addition, the range of feasible standardized kurtosis () values, given skew (
) and standardized fifth (
) and sixth
(
) cumulants, is larger than with Fleishman's method (see
calc_lower_skurt
).
For example, the Fleishman method can not be used to generate a non-normal distribution with a ratio of
(see Headrick & Kowalchuk, 2007). This eliminates the Chi-squared family of distributions, which has
a constant ratio of
. The fifth-order method also generates more distributions with valid PDF's.
However, if the fifth and sixth cumulants are unknown or do not exist, the Fleishman approximation should be used.
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the simulation will stop. It may help to first use find_constants
for each continuous variable to
determine if a sixth cumulant correction value is needed. The solutions can be used as starting values (see cstart
below).
If the standardized cumulants are obtained from calc_theory
, the user may need to use rounded values as inputs (i.e.
skews = round(skews, 8)
). For example, in order to ensure that skew is exactly 0 for symmetric distributions.
2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
3) The feasibility of the final correlation matrix rho
, given the
distribution parameters, should be checked first using validcorr
. This function either checks
if a given rho
is plausible or returns the lower and upper final correlation limits. It should be noted that even if a target
correlation matrix is within the "plausible range," it still may not be possible to achieve the desired matrix. This happens most
frequently when generating ordinal variables or using negative correlations. The error loop frequently fixes these problems.
Please see references for SimCorrMix
.
find_constants
, validpar
, validcorr
,
intercorr
, corr_error
, summary_var
Sim1 <- corrvar(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), support = list(0:2), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) # simulate without the error loop Sim2 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, epsilon = 0.01) names(Sim2) # simulate with the error loop Sim2_EL <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, errorloop = TRUE, epsilon = 0.01) names(Sim2_EL) ## End(Not run)
Sim1 <- corrvar(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), support = list(0:2), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) # simulate without the error loop Sim2 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, epsilon = 0.01) names(Sim2) # simulate with the error loop Sim2_EL <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, errorloop = TRUE, epsilon = 0.01) names(Sim2_EL) ## End(Not run)
This function simulates k_cat
ordinal ( categories),
k_cont
continuous non-mixture,
k_mix
continuous mixture, k_pois
Poisson (regular and zero-inflated), and/or k_nb
Negative Binomial
(regular and zero-inflated) variables with a specified correlation matrix rho
. The variables are generated from
multivariate normal variables with intermediate correlation matrix Sigma
, calculated by intercorr2
,
and then transformed. The intermediate correlations involving count variables are determined using correlation method 2.
The ordering of the variables in rho
must be 1st ordinal, 2nd continuous non-mixture,
3rd components of the continuous mixture, 4th regular Poisson, 5th zero-inflated Poisson, 6th regular NB, and 7th zero-inflated NB.
Note that it is possible for k_cat
, k_cont
, k_mix
, k_pois
, and/or k_nb
to be 0.
Simulation occurs at the component-level for continuous mixture distributions. The target correlation matrix is specified in terms of
correlations with components of continuous mixture variables. There are no parameter input checks
in order to decrease simulation time. All inputs should be checked prior to simulation with validpar
and validcorr2
. Summaries for the simulation results can be obtained with summary_var
.
All continuous variables are simulated using either Fleishman's third-order (method
= "Fleishman", doi:10.1007/BF02293811) or Headrick's fifth-order
(method
= "Polynomial", doi:10.1016/S0167-9473(02)00072-5) power method transformation. It works by matching standardized
cumulants – the first four (mean, variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean,
variance, skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method. The transformation is
expressed as follows:
where and
both equal
for Fleishman's method. The real constants are calculated by
find_constants
. Continuous mixture variables are generated componentwise and then transformed to
the desired mixture variables based on random multinomial variables generated from the mixing probabilities. Ordinal variables
( categories) are generated by discretizing the standard normal
variables at quantiles. These quantiles are determined by evaluating the inverse standard normal CDF at the cumulative
probabilities defined by each variable's marginal distribution. Count variables are generated using the inverse CDF method. The
CDF of a standard normal variable has a uniform distribution. The appropriate quantile function (F_Y)^(-1) is applied to
this uniform variable with the designated parameters to generate the count variable: Y = (F_Y)^(-1)(Phi(Z)). The Negative
Binomial variable represents the number of failures which occur in a sequence of Bernoulli trials before the target number of
successes is achieved. Zero-inflated Poisson or NB variables are obtained by setting the probability of a structural zero to be
greater than 0. The optional error loop attempts to correct the final pairwise correlations to be within a user-specified
precision value (
epsilon
) of the target correlations.
The vignette Variable Types discusses how each of the different variables are generated and describes the required parameters.
The vignette Overall Workflow for Generation of Correlated Data provides a detailed example discussing the step-by-step simulation process and comparing correlation methods 1 and 2.
corrvar2(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, seed = 1234, errorloop = FALSE, epsilon = 0.001, maxit = 1000, use.nearPD = TRUE, Sigma = NULL, cstart = list(), quiet = FALSE)
corrvar2(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, seed = 1234, errorloop = FALSE, epsilon = 0.001, maxit = 1000, use.nearPD = TRUE, Sigma = NULL, cstart = list(), quiet = FALSE)
n |
the sample size (i.e. the length of each simulated variable; default = 10000) |
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables (default = 0) |
k_mix |
the number of continuous mixture variables (default = 0) |
k_pois |
the number of regular Poisson and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular Negative Binomial and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the |
means |
a vector of means for the |
vars |
a vector of variances for the |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
Six |
a list of vectors of sixth cumulant correction values for the |
mix_pis |
a list of length |
mix_mus |
a list of length |
mix_sigmas |
a list of length |
mix_skews |
a list of length |
mix_skurts |
a list of length |
mix_fifths |
a list of length |
mix_sixths |
a list of length |
mix_Six |
a list of length |
marginal |
a list of length equal to |
support |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
pois_eps |
a vector of length |
nb_eps |
a vector of length |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
seed |
the seed value for random number generation (default = 1234) |
errorloop |
if TRUE, uses |
epsilon |
the maximum acceptable error between the final and target pairwise correlations (default = 0.001)
in the calculation of ordinal intermediate correlations with |
maxit |
the maximum number of iterations to use (default = 1000) in the calculation of ordinal
intermediate correlations with |
use.nearPD |
TRUE to convert the overall intermediate correlation matrix to the nearest positive definite matrix with |
Sigma |
an intermediate correlation matrix to use if the user wants to provide one, else it is calculated within by
|
cstart |
a list of length equal to |
quiet |
if FALSE prints simulation messages, if TRUE suppresses message printing |
A list whose components vary based on the type of simulated variables.
If ordinal variables are produced: Y_cat
the ordinal variables,
If continuous variables are produced:
constants
a data.frame of the constants,
Y_cont
the continuous non-mixture variables,
Y_comp
the components of the continuous mixture variables,
Y_mix
the continuous mixture variables,
sixth_correction
a list of sixth cumulant correction values,
valid.pdf
a vector where the i-th element is "TRUE" if the constants for the i-th continuous variable generate a valid PDF, else "FALSE"
If Poisson variables are produced: Y_pois
the regular and zero-inflated Poisson variables,
If Negative Binomial variables are produced: Y_nb
the regular and zero-inflated Negative Binomial variables,
Additionally, the following elements:
Sigma
the intermediate correlation matrix (after the error loop),
Error_Time
the time in minutes required to use the error loop,
Time
the total simulation time in minutes,
niter
a matrix of the number of iterations used for each variable in the error loop,
The intermediate correlations used in method 2 are less simulation based than those in method 1, and no seed is needed. Their calculations involve greater utilization of correction loops which make iterative adjustments until a maximum error has been reached (if possible). In addition, method 2 differs from method 1 in the following ways:
1) The intermediate correlations involving count variables are based on the methods of Barbiero & Ferrari (2012,
doi:10.1080/00273171.2012.692630, 2015, doi:10.1002/asmb.2072).
The Poisson or Negative Binomial support is made finite by removing a small user-specified value (i.e. 1e-06) from the total
cumulative probability. This truncation factor may differ for each count variable. The count variables are subsequently
treated as ordinal and intermediate correlations are calculated using the correction loop of
ord_norm
.
2) The continuous - count variable correlations are based on an extension of the method of Demirtas et al. (2012, doi:10.1002/sim.5362), and the count variables are treated as ordinal. The correction factor is the product of the power method correlation between the continuous variable and the normal variable used to generate it (see Headrick & Kowalchuk, 2007, doi:10.1080/10629360600605065) and the point-polyserial correlation between the ordinalized count variable and the normal variable used to generate it (see Olsson et al., 1982, doi:10.1007/BF02294164). The intermediate correlations are the ratio of the target correlations to the correction factor.
Please see the Comparison of Correlation Methods 1 and 2 vignette for more information and a step-by-step overview of the simulation process.
Using the fifth-order approximation allows additional control over the fifth and sixth moments of the generated distribution, improving
accuracy. In addition, the range of feasible standardized kurtosis () values, given skew (
) and standardized fifth (
) and sixth
(
) cumulants, is larger than with Fleishman's method (see
calc_lower_skurt
).
For example, the Fleishman method can not be used to generate a non-normal distribution with a ratio of
(see Headrick & Kowalchuk, 2007). This eliminates the Chi-squared family of distributions, which has
a constant ratio of
. The fifth-order method also generates more distributions with valid PDF's.
However, if the fifth and sixth cumulants are unknown or do not exist, the Fleishman approximation should be used.
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the simulation will stop. It may help to first use find_constants
for each continuous variable to
determine if a sixth cumulant correction value is needed. The solutions can be used as starting values (see cstart
below).
If the standardized cumulants are obtained from calc_theory
, the user may need to use rounded values as inputs (i.e.
skews = round(skews, 8)
). For example, in order to ensure that skew is exactly 0 for symmetric distributions.
2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
3) The feasibility of the final correlation matrix rho
, given the
distribution parameters, should be checked first using validcorr2
. This function either checks
if a given rho
is plausible or returns the lower and upper final correlation limits. It should be noted that even if a target
correlation matrix is within the "plausible range," it still may not be possible to achieve the desired matrix. This happens most
frequently when generating ordinal variables or using negative correlations. The error loop frequently fixes these problems.
Barbiero A & Ferrari PA (2015). Simulation of correlated Poisson variables. Applied Stochastic Models in Business and Industry, 31:669-80. doi:10.1002/asmb.2072.
Barbiero A & Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given
Correlation Matrix and Marginal Distributions. R package version 1.4.0.
https://CRAN.R-project.org/package=GenOrd
Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.
Demirtas H (2006). A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical
Computation and Simulation, 76(11):1017-1025.
doi:10.1080/10629360600569246.
Demirtas H (2014). Joint Generation of Binary and Nonnormal Continuous Data. Biometrics & Biostatistics, S12.
Demirtas H, Hedeker D, & Mermelstein RJ (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27):3337-3346. doi:10.1002/sim.5362.
Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. doi:10.1177/096228029600500202.
Ferrari PA & Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. doi:10.1080/00273171.2012.692630.
Fialkowski AC (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. https://CRAN.R-project.org/package=SimMultiCorrData.
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. doi:10.1007/BF02293811.
Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi:10.1016/S0167-9473(02)00072-5. (ScienceDirect)
Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1):65-71. doi:10.22237/jmasm/1083370080.
Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi:10.1080/10629360600605065.
Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi:10.1007/BF02294317.
Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using
Mathematica. Journal of Statistical Software, 19(3):1 - 17.
doi:10.18637/jss.v019.i03.
Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.
Ismail N & Zamani H (2013). Estimation of Claim Count Data Using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models. Casualty Actuarial Society E-Forum 41(20):1-28.
Lambert D (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 34(1):1-14.
Olsson U, Drasgow F, & Dorans NJ (1982). The Polyserial Correlation Coefficient. Psychometrika, 47(3):337-47. doi:10.1007/BF02294164.
Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.
Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research, 5:155-178. doi:10.1177/096228029600500204.
Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. doi:10.1007/BF02293687.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
Zhang X, Mallick H, & Yi N (2016). Zero-Inflated Negative Binomial Regression for Differential Abundance Testing in Microbiome Studies. Journal of Bioinformatics and Genomics 2(2):1-9. doi:10.18454/jbg.2016.2.2.1.
find_constants
, validpar
, validcorr2
,
intercorr2
, corr_error
, summary_var
Sim1 <- corrvar2(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), support = list(0:2), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 pois_eps <- 0.0001 size <- 2 prob <- 0.75 p_zinb <- 0.2 nb_eps <- 0.0001 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed) # simulate without the error loop Sim2 <- corrvar2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed, epsilon = 0.01) names(Sim2) # simulate with the error loop Sim2_EL <- corrvar2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed, errorloop = TRUE, epsilon = 0.01) names(Sim2_EL) ## End(Not run)
Sim1 <- corrvar2(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), support = list(0:2), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 pois_eps <- 0.0001 size <- 2 prob <- 0.75 p_zinb <- 0.2 nb_eps <- 0.0001 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed) # simulate without the error loop Sim2 <- corrvar2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed, epsilon = 0.01) names(Sim2) # simulate with the error loop Sim2_EL <- corrvar2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed, errorloop = TRUE, epsilon = 0.01) names(Sim2_EL) ## End(Not run)
This function calculates a k x k
intermediate matrix of correlations, where k = k_cat + k_cont +
k_pois + k_nb
, to be used in simulating variables with corrvar
. The k_cont
includes regular continuous variables
and components of continuous mixture variables. The ordering of the variables must be
ordinal, continuous non-mixture, components of continuous mixture variables, regular Poisson, zero-inflated Poisson, regular Negative
Binomial (NB), and zero-inflated NB (note that it is possible for k_cat
, k_cont
, k_pois
, and/or k_nb
to be 0).
There are no parameter input checks in order to decrease simulation time. All inputs should be checked prior to simulation with
validpar
. There is a message given if the calculated
intermediate correlation matrix Sigma
is not positive-definite because it may not be possible to find a MVN correlation
matrix that will produce the desired marginal distributions. This function is called by the simulation function
corrvar
, and would only be used separately if the user wants to first find the intermediate correlation matrix.
This matrix Sigma
can be used as an input to corrvar
.
Please see the Comparison of Correlation Methods 1 and 2 vignette for information about calculations by variable pair type and the differences between
this function and intercorr2
.
intercorr(k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, epsilon = 0.001, maxit = 1000, nrand = 1e+05, quiet = FALSE)
intercorr(k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, epsilon = 0.001, maxit = 1000, nrand = 1e+05, quiet = FALSE)
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables and components of continuous mixture variables (default = 0) |
k_pois |
the number of regular and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the |
constants |
a matrix with |
marginal |
a list of length equal to |
support |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
seed |
the seed value for random number generation (default = 1234) |
epsilon |
the maximum acceptable error between the pairwise correlations (default = 0.001)
in the calculation of ordinal intermediate correlations with |
maxit |
the maximum number of iterations to use (default = 1000) in the calculation of ordinal
intermediate correlations with |
nrand |
the number of random numbers to generate in calculating intermediate correlations (default = 10000) |
quiet |
if FALSE prints simulation messages, if TRUE suppresses message printing |
the intermediate MVN correlation matrix
Please see references for SimCorrMix
.
Sigma1 <- intercorr(k_cat = 1, k_cont = 1, method = "Polynomial", constants = matrix(c(0, 1, 0, 0, 0, 0), 1, 6), marginal = list(0.3), support = list(c(0, 1)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 1 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable seed <- 1234 # Mixture of N(-2, 1) and N(2, 1) constants <- rbind(c(0, 1, 0, 0, 0, 0), c(0, 1, 0, 0, 0, 0)) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 2 Rey <- matrix(0.35, 5, 5) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Sigma2 <- intercorr(k_cat, k_cont, k_pois, k_nb, "Polynomial", constants, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) ## End(Not run)
Sigma1 <- intercorr(k_cat = 1, k_cont = 1, method = "Polynomial", constants = matrix(c(0, 1, 0, 0, 0, 0), 1, 6), marginal = list(0.3), support = list(c(0, 1)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 1 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable seed <- 1234 # Mixture of N(-2, 1) and N(2, 1) constants <- rbind(c(0, 1, 0, 0, 0, 0), c(0, 1, 0, 0, 0, 0)) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 2 Rey <- matrix(0.35, 5, 5) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Sigma2 <- intercorr(k_cat, k_cont, k_pois, k_nb, "Polynomial", constants, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) ## End(Not run)
This function calculates the k_cat x k_nb
intermediate matrix of correlations for the k_cat
ordinal ( categories) and
k_nb
Negative Binomial variables required to produce the target correlations in rho_cat_nb
. It extends the method of Amatya & Demirtas (2015, doi:10.1080/00949655.2014.953534)
to ordinal - Negative Binomial pairs and allows for regular or zero-inflated NB variables. Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable
discretized to produce an ordinal variable Y1, and Z2 is the standard normal variable used to generate a Negative Binomial
variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor. The
correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between a Negative Binomial variable
and the normal variable used to generate it and a simulated GSC upper bound on the correlation between an ordinal variable and the normal variable used to generate it (see Demirtas & Hedeker, 2011,
doi:10.1198/tast.2011.10090). The function is used in intercorr
and corrvar
.
This function would not ordinarily be called by the user.
intercorr_cat_nb(rho_cat_nb = NULL, marginal = list(), size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
intercorr_cat_nb(rho_cat_nb = NULL, marginal = list(), size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
rho_cat_nb |
a |
marginal |
a list of length equal to |
size |
a vector of size parameters for the Negative Binomial variables (see |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
a k_cat x k_nb
matrix whose rows represent the k_cat
ordinal variables and columns represent the
k_nb
Negative Binomial variables
Please see references for intercorr_cat_pois
.
This function calculates a k_cat x k_pois
intermediate matrix of correlations for the k_cat
ordinal ( categories) and
k_pois
Poisson variables required to produce the target correlations in rho_cat_pois
. It extends the method of Amatya & Demirtas (2015, doi:10.1080/00949655.2014.953534)
to ordinal - Poisson pairs and allows for regular or zero-inflated Poisson variables.
Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable discretized to produce an
ordinal variable Y1, and Z2 is the standard normal variable used to generate a Poisson variable via the inverse CDF method) is
calculated by dividing the target correlation by a correction factor. The correction factor is the product of the
upper Frechet-Hoeffding bound on the correlation between a Poisson variable and the normal variable used to generate it
and a simulated GSC upper bound on the correlation between an ordinal variable and the normal variable used to generate it (see
Demirtas & Hedeker, 2011, doi:10.1198/tast.2011.10090). The function is used in intercorr
and
corrvar
. This function would not ordinarily be called by the user.
intercorr_cat_pois(rho_cat_pois = NULL, marginal = list(), lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
intercorr_cat_pois(rho_cat_pois = NULL, marginal = list(), lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
rho_cat_pois |
a |
marginal |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
a k_cat x k_pois
matrix whose rows represent the k_cat
ordinal variables and columns represent the k_pois
Poisson variables
Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi:10.1080/00949655.2014.953534.
Demirtas H & Hedeker D (2011). A practical way for computing approximate lower and upper correlation bounds. American Statistician, 65(2):104-109. doi:10.1198/tast.2011.10090.
Frechet M (1951). Sur les tableaux de correlation dont les marges sont donnees. Ann. l'Univ. Lyon SectA, 14:53-77.
Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding. New York: Springer-Verlag; 1994. p. 57-107.
Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi:10.1002/asmb.901.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
This function finds the intermediate correlation for standard normal random variables
which are used in Fleishman's third-order (doi:10.1007/BF02293811) or Headrick's fifth-order
(doi:10.1016/S0167-9473(02)00072-5) polynomial transformation method (PMT) using nleqslv
. It is used in
intercorr
and intercorr2
and would not ordinarily be called by the user. The
correlations are found pairwise so that eigen-value or principal components decomposition should be done on the resulting Sigma
matrix. The Comparison of Correlation Methods 1 and 2 vignette contains the equations which were derived by Headrick and Sawilowsky
(doi:10.1007/BF02294317) or Headrick (doi:10.1016/S0167-9473(02)00072-5).
intercorr_cont(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont = NULL)
intercorr_cont(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont = NULL)
method |
the method used to generate the continuous variables. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
constants |
a matrix with each row a vector of constants c0, c1, c2, c3 (if |
rho_cont |
a matrix of target correlations among continuous variables, does not have to be symmetric |
the intermediate matrix of correlations with the same dimensions as rho_cont
Please see additional references for SimCorrMix
.
Fialkowski AC (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. https://CRAN.R-project.org/package=SimMultiCorrData.
Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi:10.1016/S0167-9473(02)00072-5. (ScienceDirect)
Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi:10.1080/10629360600605065.
Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi:10.1007/BF02294317.
intercorr
, intercorr2
, nleqslv
This function calculates a k_cont x k_nb
intermediate matrix of correlations for the k_cont
continuous and
k_nb
Negative Binomial variables. It extends the method of Amatya & Demirtas (2015, doi:10.1080/00949655.2014.953534) to
continuous variables generated using Headrick's fifth-order polynomial transformation and regular or zero-inflated NB variables.
Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable transformed using Headrick's fifth-order
or Fleishman's third-order method to produce a continuous variable Y1, and Z2 is the standard normal variable used to generate a
Negative Binomial variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor.
The correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between a Negative Binomial variable and
the normal variable used to generate it and the power method correlation (described in Headrick & Kowalchuk, 2007,
doi:10.1080/10629360600605065) between Y1 and Z1. The function is used in intercorr
and
corrvar
. This function would not ordinarily be called by the user.
intercorr_cont_nb(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_nb = NULL, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
intercorr_cont_nb(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_nb = NULL, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
method |
the method used to generate the |
constants |
a matrix with |
rho_cont_nb |
a |
size |
a vector of size parameters for the Negative Binomial variables (see |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
a k_cont x k_nb
matrix whose rows represent the k_cont
continuous variables and columns represent the
k_nb
Negative Binomial variables
Please see references for intercorr_cont_pois
.
find_constants
,
intercorr
, corrvar
This function calculates a k_cont x k_nb
intermediate matrix of correlations for the k_cont
continuous and
k_nb
Negative Binomial variables. It extends the methods of Demirtas et al. (2012, doi:10.1002/sim.5362) and
Barbiero & Ferrari (2015, doi:10.1002/asmb.2072) by:
1) including non-normal continuous and regular or zero-inflated Negative Binomial variables
2) allowing the continuous variables to be generated via Fleishman's third-order or Headrick's fifth-order transformation, and
3) since the count variables are treated as ordinal, using the point-polyserial and polyserial correlations to calculate the
intermediate correlations (similar to findintercorr_cont_cat
in SimMultiCorrData
).
Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable transformed using Headrick's fifth-order
or Fleishman's third-order method to produce a continuous variable Y1, and Z2 is the standard normal variable used to generate a
Negative Binomial variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor. The
correction factor is the product of the point-polyserial correlation between Y2 and Z2 (described in Olsson et al., 1982,
doi:10.1007/BF02294164) and the power method correlation (described in Headrick & Kowalchuk, 2007, doi:10.1080/10629360600605065)
between Y1 and Z1. After the maximum support value has been found using maxcount_support
, the point-polyserial correlation is given by:
where
Here, is the j-th support
value and
is
. The power method correlation is given by:
where if
method
= "Fleishman". The function is used in
intercorr2
and corrvar2
. This function would not ordinarily be called by the user.
intercorr_cont_nb2(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_nb = NULL, nb_marg = list(), nb_support = list())
intercorr_cont_nb2(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_nb = NULL, nb_marg = list(), nb_support = list())
method |
the method used to generate the |
constants |
a matrix with |
rho_cont_nb |
a |
nb_marg |
a list of length equal to |
nb_support |
a list of length equal to |
a k_cont x k_nb
matrix whose rows represent the k_cont
continuous variables and columns represent the
k_nb
Negative Binomial variables
Please see references in intercorr_cont_pois2
.
find_constants
, power_norm_corr
,
intercorr2
, corrvar2
This function calculates a k_cont x k_pois
intermediate matrix of correlations for the k_cont
continuous and
k_pois
Poisson variables. It extends the method of Amatya & Demirtas (2015, doi:10.1080/00949655.2014.953534) to continuous
variables generated using Headrick's fifth-order polynomial transformation and zero-inflated Poisson variables. Here, the
intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable transformed using Headrick's fifth-order or
Fleishman's third-order method to produce a continuous variable Y1, and Z2 is the standard normal variable used to generate a
Poisson variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor. The
correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between a Poisson variable and the
normal variable used to generate it and the power method correlation (described in Headrick & Kowalchuk, 2007,
doi:10.1080/10629360600605065) between Y1 and Z1. The function is used in intercorr
and
corrvar
. This function would not ordinarily be called by the user.
intercorr_cont_pois(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_pois = NULL, lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
intercorr_cont_pois(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_pois = NULL, lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
method |
the method used to generate the |
constants |
a matrix with |
rho_cont_pois |
a |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
a k_cont x k_pois
matrix whose rows represent the k_cont
continuous variables and columns represent the
k_pois
Poisson variables
Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi:10.1080/00949655.2014.953534.
Demirtas H & Hedeker D (2011). A practical way for computing approximate lower and upper correlation bounds. American Statistician, 65(2):104-109. doi:10.1198/tast.2011.10090.
Frechet M (1951). Sur les tableaux de correlation dont les marges sont donnees. Ann. l'Univ. Lyon SectA, 14:53-77.
Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi:10.1080/10629360600605065.
Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding. New York: Springer-Verlag; 1994. p. 57-107.
Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi:10.1002/asmb.901.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
power_norm_corr
, find_constants
,
intercorr
, corrvar
This function calculates a k_cont x k_pois
intermediate matrix of correlations for the k_cont
continuous and
k_pois
Poisson variables. It extends the methods of Demirtas et al. (2012, doi:10.1002/sim.5362) and
Barbiero & Ferrari (2015, doi:10.1002/asmb.2072) by:
1) including non-normal continuous and regular or zero-inflated Poisson variables
2) allowing the continuous variables to be generated via Fleishman's third-order or Headrick's fifth-order transformation, and
3) since the count variables are treated as ordinal, using the point-polyserial and polyserial correlations to calculate the
intermediate correlations (similar to findintercorr_cont_cat
) in SimMultiCorrData
).
Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable transformed using Headrick's fifth-order
or Fleishman's third-order method to produce a continuous variable Y1, and Z2 is the standard normal variable used to generate a
Poisson variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor. The
correction factor is the product of the point-polyserial correlation between Y2 and Z2 (described in Olsson et al., 1982,
doi:10.1007/BF02294164) and the power method correlation (described in Headrick & Kowalchuk, 2007, doi:10.1080/10629360600605065)
between Y1 and Z1. After the maximum support value has been found using maxcount_support
, the point-polyserial correlation is given by:
where
Here, is the j-th support
value and
is
. The power method correlation is given by:
where if
method
= "Fleishman". The function is used in
intercorr2
and corrvar2
. This function would not ordinarily be called by the user.
intercorr_cont_pois2(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_pois = NULL, pois_marg = list(), pois_support = list())
intercorr_cont_pois2(method = c("Fleishman", "Polynomial"), constants = NULL, rho_cont_pois = NULL, pois_marg = list(), pois_support = list())
method |
the method used to generate the |
constants |
a matrix with |
rho_cont_pois |
a |
pois_marg |
a list of length equal to |
pois_support |
a list of length equal to |
a k_cont x k_pois
matrix whose rows represent the k_cont
continuous variables and columns represent the
k_pois
Poisson variables
Please see additional references in intercorr_cont_pois
.
Barbiero A & Ferrari PA (2015). Simulation of correlated Poisson variables. Applied Stochastic Models in Business and Industry, 31:669-80. doi:10.1002/asmb.2072.
find_constants
, power_norm_corr
,
intercorr2
, corrvar2
This function calculates a k_nb x k_nb
intermediate matrix of correlations for the Negative Binomial variables by
extending the method of Yahav & Shmueli (2012, doi:10.1002/asmb.901). The intermediate correlation between Z1 and Z2 (the
standard normal variables used to generate the Negative Binomial variables Y1 and Y2 via the inverse CDF method) is
calculated using a logarithmic transformation of the target correlation. First, the upper and lower Frechet-Hoeffding bounds
(mincor, maxcor) on are simulated. Then the intermediate correlation is found as follows:
where ,
, and
.
The function adapts code from Amatya & Demirtas' (2016) package
PoisNor-package
by:
1) allowing specifications for the number of random variates and the seed for reproducibility
2) providing the following checks: if Sigma_(Z1, Z2)
> 1, Sigma_(Z1, Z2)
is set to 1; if Sigma_(Z1, Z2)
< -1,
Sigma_(Z1, Z2)
is set to -1
3) simulating regular and zero-inflated Negative Binomial variables.
The function is used in intercorr
and corrvar
and would not ordinarily be called by the user.
intercorr_nb(rho_nb = NULL, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
intercorr_nb(rho_nb = NULL, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
rho_nb |
a |
size |
a vector of size parameters for the Negative Binomial variables (see |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
the k_nb x k_nb
intermediate correlation matrix for the Negative Binomial variables
Please see references for intercorr_pois
.
intercorr_pois
, intercorr_pois_nb
,
intercorr
, corrvar
This function calculates a k_pois x k_pois
intermediate matrix of correlations for the
Poisson variables using the method of Yahav & Shmueli (2012, doi:10.1002/asmb.901). The intermediate correlation between Z1 and Z2
(the standard normal variables used to generate the Poisson variables Y1 and Y2 via the inverse CDF method) is
calculated using a logarithmic transformation of the target correlation. First, the upper and lower Frechet-Hoeffding bounds
(mincor, maxcor) on are simulated. Then the intermediate correlation is found as follows:
where ,
, and
.
The function adapts code from Amatya & Demirtas' (2016) package
PoisNor-package
by:
1) allowing specifications for the number of random variates and the seed for reproducibility
2) providing the following checks: if Sigma_(Z1, Z2)
> 1, Sigma_(Z1, Z2)
is set to 1; if Sigma_(Z1, Z2)
< -1,
Sigma_(Z1, Z2)
is set to -1
3) simulating regular and zero-inflated Poisson variables.
The function is used in intercorr
and corrvar
and would not ordinarily be called by the user.
intercorr_pois(rho_pois = NULL, lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
intercorr_pois(rho_pois = NULL, lam = NULL, p_zip = 0, nrand = 1e+05, seed = 1234)
rho_pois |
a |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
the k_pois x k_pois
intermediate correlation matrix for the Poisson variables
Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi:10.1080/00949655.2014.953534.
Demirtas H & Hedeker D (2011). A practical way for computing approximate lower and upper correlation bounds. American Statistician, 65(2):104-109.
Frechet M (1951). Sur les tableaux de correlation dont les marges sont donnees. Ann. l'Univ. Lyon SectA, 14:53-77.
Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding. New York: Springer-Verlag; 1994. p. 57-107.
Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi:10.1002/asmb.901.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
intercorr_nb
, intercorr_pois_nb
,
intercorr
, corrvar
This function calculates a k_pois x k_nb
intermediate matrix of correlations for the
Poisson and Negative Binomial variables by extending the method of Yahav & Shmueli (2012, doi:10.1002/asmb.901). The intermediate
correlation between Z1 and Z2 (the standard normal variables used to generate the Poisson and Negative Binomial variables Y1 and Y2
via the inverse CDF method) is calculated using a logarithmic transformation of the target correlation. First, the upper and lower
Frechet-Hoeffding bounds (mincor, maxcor) on are simulated. Then the intermediate correlation is found as follows:
where ,
, and
.
The function adapts code from Amatya & Demirtas' (2016) package
PoisNor-package
by:
1) allowing specifications for the number of random variates and the seed for reproducibility
2) providing the following checks: if Sigma_(Z1, Z2)
> 1, Sigma_(Z1, Z2)
is set to 1; if Sigma_(Z1, Z2)
< -1,
Sigma_(Z1, Z2)
is set to -1
3) simulating regular and zero-inflated Poisson and Negative Binomial variables.
The function is used in intercorr
and corrvar
and would not ordinarily be called by the user.
intercorr_pois_nb(rho_pois_nb = NULL, lam = NULL, p_zip = 0, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
intercorr_pois_nb(rho_pois_nb = NULL, lam = NULL, p_zip = 0, size = NULL, mu = NULL, p_zinb = 0, nrand = 1e+05, seed = 1234)
rho_pois_nb |
a |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
seed |
the seed used in random number generation (default = 1234) |
the k_pois x k_nb
intermediate correlation matrix whose rows represent the k_pois
Poisson variables and
columns represent the k_nb
Negative Binomial variables
Please see references for intercorr_pois
.
intercorr_pois
, intercorr_nb
,
intercorr
, corrvar
This function calculates a k x k
intermediate matrix of correlations, where k = k_cat + k_cont +
k_pois + k_nb
, to be used in simulating variables with corrvar2
. The k_cont
includes regular continuous variables
and components of continuous mixture variables. The ordering of the variables must be
ordinal, continuous non-mixture, components of continuous mixture variables, regular Poisson, zero-inflated Poisson, regular Negative
Binomial (NB), and zero-inflated NB (note that it is possible for k_cat
, k_cont
, k_pois
, and/or k_nb
to be 0).
There are no parameter input checks in order to decrease simulation time. All inputs should be checked prior to simulation with
validpar
. There is a message given if the calculated
intermediate correlation matrix Sigma
is not positive-definite because it may not be possible to find a MVN correlation
matrix that will produce the desired marginal distributions. This function is called by the simulation function
corrvar2
, and would only be used separately if the user wants to first find the intermediate correlation matrix.
This matrix Sigma
can be used as an input to corrvar2
.
Please see the Comparison of Correlation Methods 1 and 2 vignette for information about calculations by variable pair type and the differences between
this function and intercorr
.
intercorr2(k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, epsilon = 0.001, maxit = 1000, quiet = FALSE)
intercorr2(k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), constants = NULL, marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, epsilon = 0.001, maxit = 1000, quiet = FALSE)
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables and components of continuous mixture variables (default = 0) |
k_pois |
the number of regular and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the |
constants |
a matrix with |
marginal |
a list of length equal to |
support |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
pois_eps |
a vector of length |
nb_eps |
a vector of length |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
epsilon |
the maximum acceptable error between the pairwise correlations (default = 0.001)
in the calculation of ordinal intermediate correlations with |
maxit |
the maximum number of iterations to use (default = 1000) in the calculation of ordinal
intermediate correlations with |
quiet |
if FALSE prints simulation messages, if TRUE suppresses message printing |
the intermediate MVN correlation matrix
Please see references for SimCorrMix
.
Sigma1 <- intercorr2(k_cat = 1, k_cont = 1, method = "Polynomial", constants = matrix(c(0, 1, 0, 0, 0, 0), 1, 6), marginal = list(0.3), support = list(c(0, 1)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 1 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable # The defaults of pois_eps <- nb_eps <- 0.0001 are used. # Mixture of N(-2, 1) and N(2, 1) constants <- rbind(c(0, 1, 0, 0, 0, 0), c(0, 1, 0, 0, 0, 0)) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 2 Rey <- matrix(0.35, 5, 5) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Sigma2 <- intercorr2(k_cat, k_cont, k_pois, k_nb, "Polynomial", constants, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) ## End(Not run)
Sigma1 <- intercorr2(k_cat = 1, k_cont = 1, method = "Polynomial", constants = matrix(c(0, 1, 0, 0, 0, 0), 1, 6), marginal = list(0.3), support = list(c(0, 1)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 1 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable # The defaults of pois_eps <- nb_eps <- 0.0001 are used. # Mixture of N(-2, 1) and N(2, 1) constants <- rbind(c(0, 1, 0, 0, 0, 0), c(0, 1, 0, 0, 0, 0)) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 2 Rey <- matrix(0.35, 5, 5) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Sigma2 <- intercorr2(k_cat, k_cont, k_pois, k_nb, "Polynomial", constants, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) ## End(Not run)
This function calculates the maximum support value for count variables by extending the method of Barbiero &
Ferrari (2015, doi:10.1002/asmb.2072) to include regular and zero-inflated Poisson and Negative Binomial variables. In order for
count variables to be treated as ordinal in the calculation of the intermediate MVN correlation matrix, their infinite support must
be truncated (made finite). This is done by setting the total cumulative probability equal to 1 - a small user-specified value
(pois_eps
or nb_eps
). The maximum support value equals the inverse CDF applied to this result. The truncation values
may differ for each variable. The function is used in intercorr2
and corrvar2
and
would not ordinarily be called by the user.
maxcount_support(k_pois = 0, k_nb = 0, lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = NULL, nb_eps = NULL)
maxcount_support(k_pois = 0, k_nb = 0, lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = NULL, nb_eps = NULL)
k_pois |
the number of Poisson variables |
k_nb |
the number of Negative Binomial variables |
lam |
a vector of lambda (mean > 0) constants for the regular and zero-inflated Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
pois_eps |
a vector of length |
nb_eps |
a vector of length |
a data.frame with k_pois + k_nb
rows; the column names are:
Distribution
Poisson or Negative Binomial
Number
the variable index
Max
the maximum support value
Barbiero A & Ferrari PA (2015). Simulation of correlated Poisson variables. Applied Stochastic Models in Business and Industry, 31:669-80. doi:10.1002/asmb.2072.
This function calculates the correlation of ordinal variables (or variables treated as "ordinal"), with given marginal
distributions, obtained from discretizing standard normal variables with a specified correlation matrix. The function modifies
Barbiero & Ferrari's contord
function in GenOrd-package
. It uses
pmvnorm
function from the mvtnorm package to calculate multivariate normal cumulative probabilities
defined by the normal quantiles obtained at marginal
and the supplied correlation matrix Sigma
. This function is used
within ord_norm
and would not ordinarily be called by the user.
norm_ord(marginal = list(), Sigma = NULL, support = list(), Spearman = FALSE)
norm_ord(marginal = list(), Sigma = NULL, support = list(), Spearman = FALSE)
marginal |
a list of length equal to the number of variables; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1) |
Sigma |
the correlation matrix of the multivariate standard normal variable |
support |
a list of length equal to the number of variables; the i-th element is a vector of containing the r ordered support values; if not provided (i.e. support = list()), the default is for the i-th element to be the vector 1, ..., r |
Spearman |
if TRUE, Spearman's correlations are used (and support is not required); if FALSE (default) Pearson's correlations are used |
the correlation matrix of the ordinal variables
Please see references in ord_norm
.
Alan Genz, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi, Friedrich Leisch, Fabian Scheipl, Torsten Hothorn (2018). mvtnorm: Multivariate Normal and t Distributions. R package version 1.0-8. https://CRAN.R-project.org/package=mvtnorm.
Alan Genz, Frank Bretz (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195., Springer-Verlag, Heidelberg. ISBN 978-3-642-01688-2.
This function calculates the intermediate MVN correlation needed to generate a variable described by
a discrete marginal distribution and associated finite support. This includes ordinal ( categories) variables
or variables that are treated as ordinal (i.e. count variables in the Barbiero & Ferrari, 2015 method used in
corrvar2
, doi:10.1002/asmb.2072). The function is a modification of Barbiero & Ferrari's
ordcont
function in GenOrd-package
.
It works by setting the intermediate MVN correlation equal to the target correlation and updating each intermediate pairwise
correlation until the final pairwise correlation is within epsilon
of the target correlation or the maximum number of
iterations has been reached. This function uses norm_ord
to calculate the ordinal correlation obtained
from discretizing the normal variables generated from the intermediate correlation matrix. The ordcont
has been modified in the following ways:
1) the initial correlation check has been removed because this is done within the simulation functions
2) the final positive-definite check has been removed
3) the intermediate correlation update function was changed to accommodate more situations
This function would not ordinarily be called by the user. Note that this will return a matrix that is NOT positive-definite
because this is corrected for in the simulation functions corrvar
and corrvar2
using the method of Higham (2002) and the nearPD
function.
ord_norm(marginal = list(), rho = NULL, support = list(), epsilon = 0.001, maxit = 1000, Spearman = FALSE)
ord_norm(marginal = list(), rho = NULL, support = list(), epsilon = 0.001, maxit = 1000, Spearman = FALSE)
marginal |
a list of length equal to the number of variables; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1) |
rho |
the target correlation matrix |
support |
a list of length equal to the number of variables; the i-th element is a vector of containing the r ordered support values; if not provided (i.e. support = list()), the default is for the i-th element to be the vector 1, ..., r |
epsilon |
the maximum acceptable error between the final and target pairwise correlations (default = 0.001); smaller values take more time |
maxit |
the maximum number of iterations to use (default = 1000) to find the intermediate correlation; the
correction loop stops when either the iteration number passes |
Spearman |
if TRUE, Spearman's correlations are used (and support is not required); if FALSE (default) Pearson's correlations are used |
A list with the following components:
SigmaC
the intermediate MVN correlation matrix
rho0
the calculated final correlation matrix generated from SigmaC
rho
the target final correlation matrix
niter
a matrix containing the number of iterations required for each variable pair
maxerr
the maximum final error between the final and target correlation matrices
Barbiero A, Ferrari PA (2015). Simulation of correlated Poisson variables. Applied Stochastic Models in Business and Industry, 31:669-80. doi:10.1002/asmb.2072.
Barbiero A, Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given
Correlation Matrix and Marginal Distributions. R package version 1.4.0.
https://CRAN.R-project.org/package=GenOrd
Ferrari PA, Barbiero A (2012). Simulating ordinal data, Multivariate Behavioral Research, 47(4):566-589. doi:10.1080/00273171.2012.692630.
corrvar
, corrvar2
, norm_ord
,
intercorr
, intercorr2
This plots the PDF of simulated continuous or count (regular or zero-inflated, Poisson or Negative Binomial) data and
overlays the target PDF (if overlay
= TRUE), which is specified by distribution name (plus up to 4 parameters) or PDF
function fx
(plus support bounds). If a continuous target distribution is provided (cont_var = TRUE
), the simulated
data is scaled and then transformed (i.e.
) so that it has the same mean (
) and
variance (
) as the target distribution. The PDF's of continuous variables are shown as lines (using
geom_density
and ggplot2::geom_line
). It works for valid or invalid power method PDF's.
The PMF's of count variables are shown as vertical bar graphs (using ggplot2::geom_col
). The function returns a
ggplot2-package
object so the user can save it or modify it as necessary. The graph parameters
(i.e. title
, sim_color
, sim_lty
, sim_size
, target_color
, target_lty
, target_size
,
legend.position
, legend.justification
, legend.text.size
, title.text.size
,
axis.text.size
, and axis.title.size
) are inputs to the ggplot2-package
functions so information about
valid inputs can be obtained from that package's documentation.
plot_simpdf_theory(sim_y, title = "Simulated Probability Density Function", ylower = NULL, yupper = NULL, sim_color = "dark blue", sim_lty = 1, sim_size = 1, col_width = 0.5, overlay = TRUE, cont_var = TRUE, target_color = "dark green", target_lty = 2, target_size = 1, Dist = c("Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Dagum", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", "Negative_Binomial"), params = NULL, fx = NULL, lower = NULL, upper = NULL, legend.position = c(0.975, 0.9), legend.justification = c(1, 1), legend.text.size = 10, title.text.size = 15, axis.text.size = 10, axis.title.size = 13)
plot_simpdf_theory(sim_y, title = "Simulated Probability Density Function", ylower = NULL, yupper = NULL, sim_color = "dark blue", sim_lty = 1, sim_size = 1, col_width = 0.5, overlay = TRUE, cont_var = TRUE, target_color = "dark green", target_lty = 2, target_size = 1, Dist = c("Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Dagum", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", "Negative_Binomial"), params = NULL, fx = NULL, lower = NULL, upper = NULL, legend.position = c(0.975, 0.9), legend.justification = c(1, 1), legend.text.size = 10, title.text.size = 15, axis.text.size = 10, axis.title.size = 13)
sim_y |
a vector of simulated data |
title |
the title for the graph (default = "Simulated Probability Density Function") |
ylower |
the lower y value to use in the plot (default = NULL, uses minimum simulated y value) on the x-axis |
yupper |
the upper y value (default = NULL, uses maximum simulated y value) on the x-axis |
sim_color |
the line color for the simulated PDF (or column fill color in the case of
|
sim_lty |
the line type for the simulated PDF (default = 1, solid line) |
sim_size |
the line width for the simulated PDF |
col_width |
width of column for simulated/target PMF of count variables (default = 0.5) |
overlay |
if TRUE (default), the target distribution is also plotted given either a distribution name (and parameters) or PDF function fx (with bounds = ylower, yupper) |
cont_var |
TRUE (default) for continuous variables, FALSE for count variables |
target_color |
the line color for the target PDF (or column fill color in the case of
|
target_lty |
the line type for the target PDF (default = 2, dashed line) |
target_size |
the line width for the target PDF |
Dist |
name of the distribution. The possible values are: "Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq",
"Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz",
"Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", |
params |
a vector of parameters (up to 4) for the desired distribution (keep NULL if |
fx |
a PDF input as a function of x only, i.e. |
lower |
the lower support bound for |
upper |
the upper support bound for |
legend.position |
the position of the legend |
legend.justification |
the justification of the legend |
legend.text.size |
the size of the legend labels |
title.text.size |
the size of the plot title |
axis.text.size |
the size of the axes text (tick labels) |
axis.title.size |
the size of the axes titles |
A ggplot2-package
object.
Please see the references for plot_simtheory
.
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) plot_simpdf_theory(Nmix$Y_mix[, 1], title = "Mixture of Normal Distributions", fx = function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1), lower = -5, upper = 5) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) plot_simpdf_theory(Bmix$Y_mix[, 1], title = "Mixture of Beta Distributions", fx = function(x) mix_pis[1] * dbeta(x, 6, 3) + mix_pis[2] * dbeta(x, 4, 1.5) + mix_pis[3] * dbeta(x, 10, 20), lower = 0, upper = 1) ## End(Not run)
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) plot_simpdf_theory(Nmix$Y_mix[, 1], title = "Mixture of Normal Distributions", fx = function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1), lower = -5, upper = 5) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) plot_simpdf_theory(Bmix$Y_mix[, 1], title = "Mixture of Beta Distributions", fx = function(x) mix_pis[1] * dbeta(x, 6, 3) + mix_pis[2] * dbeta(x, 4, 1.5) + mix_pis[3] * dbeta(x, 10, 20), lower = 0, upper = 1) ## End(Not run)
This plots simulated continuous or count (regular or zero-inflated, Poisson or Negative Binomial) data and overlays data
(if overlay
= TRUE) generated from the target distribution. The target is specified by name (plus up to 4 parameters) or
PDF function fx
(plus support bounds). Due to the integration involved in finding the CDF from the PDF supplied by
fx
, only continuous fx
may be supplied. Both are plotted as histograms (using geom_histogram
).
If a continuous target distribution is specified (cont_var = TRUE
), the simulated data is
scaled and then transformed (i.e.
) so that it has the same mean (
) and variance
(
) as the target distribution. It works for valid or invalid power method PDF's. It returns a
ggplot2-package
object so the user can save it or
modify it as necessary. The graph parameters (i.e. title
, sim_color
, target_color
,
legend.position
, legend.justification
, legend.text.size
, title.text.size
,
axis.text.size
, and axis.title.size
) are inputs to the ggplot2-package
functions so information about
valid inputs can be obtained from that package's documentation.
plot_simtheory(sim_y, title = "Simulated Data Values", ylower = NULL, yupper = NULL, sim_color = "dark blue", overlay = TRUE, cont_var = TRUE, target_color = "dark green", binwidth = NULL, nbins = 100, Dist = c("Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Dagum", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", "Negative_Binomial"), params = NULL, fx = NULL, lower = NULL, upper = NULL, seed = 1234, sub = 1000, legend.position = c(0.975, 0.9), legend.justification = c(1, 1), legend.text.size = 10, title.text.size = 15, axis.text.size = 10, axis.title.size = 13)
plot_simtheory(sim_y, title = "Simulated Data Values", ylower = NULL, yupper = NULL, sim_color = "dark blue", overlay = TRUE, cont_var = TRUE, target_color = "dark green", binwidth = NULL, nbins = 100, Dist = c("Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Dagum", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", "Negative_Binomial"), params = NULL, fx = NULL, lower = NULL, upper = NULL, seed = 1234, sub = 1000, legend.position = c(0.975, 0.9), legend.justification = c(1, 1), legend.text.size = 10, title.text.size = 15, axis.text.size = 10, axis.title.size = 13)
sim_y |
a vector of simulated data |
title |
the title for the graph (default = "Simulated Data Values") |
ylower |
the lower y value to use in the plot (default = NULL, uses minimum simulated y value) on the y-axis |
yupper |
the upper y value (default = NULL, uses maximum simulated y value) on the y-axis |
sim_color |
the histogram fill color for the simulated variable (default = "dark blue") |
overlay |
if TRUE (default), the target distribution is also plotted given either a distribution name (and parameters) or PDF function fx (with support bounds = lower, upper) |
cont_var |
TRUE (default) for continuous variables, FALSE for count variables |
target_color |
the histogram fill color for the target distribution (default = "dark green") |
binwidth |
the width of bins to use when creating the histograms (default = NULL) |
nbins |
the number of bins to use when creating the histograms (default = 100); overridden by |
Dist |
name of the distribution. The possible values are: "Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq",
"Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz",
"Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", |
params |
a vector of parameters (up to 4) for the desired distribution (keep NULL if |
fx |
a PDF input as a function of x only, i.e. |
lower |
the lower support bound for a supplied |
upper |
the upper support bound for a supplied |
seed |
the seed value for random number generation (default = 1234) |
sub |
the number of subdivisions to use in the integration to calculate the CDF from |
legend.position |
the position of the legend |
legend.justification |
the justification of the legend |
legend.text.size |
the size of the legend labels |
title.text.size |
the size of the plot title |
axis.text.size |
the size of the axes text (tick labels) |
axis.title.size |
the size of the axes titles |
A ggplot2-package
object.
Carnell R (2017). triangle: Provides the Standard Distribution Functions for the Triangle Distribution. R package version 0.11. https://CRAN.R-project.org/package=triangle.
Fialkowski AC (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. https://CRAN.R-project.org/package=SimMultiCorrData.
Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using
Mathematica. Journal of Statistical Software, 19(3):1-17.
doi:10.18637/jss.v019.i03.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
calc_theory
,
ggplot
, geom_histogram
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) plot_simtheory(Nmix$Y_mix[, 1], title = "Mixture of Normal Distributions", fx = function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1), lower = -5, upper = 5) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) plot_simtheory(Bmix$Y_mix[, 1], title = "Mixture of Beta Distributions", fx = function(x) mix_pis[1] * dbeta(x, 6, 3) + mix_pis[2] * dbeta(x, 4, 1.5) + mix_pis[3] * dbeta(x, 10, 20), lower = 0, upper = 1) ## End(Not run)
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) plot_simtheory(Nmix$Y_mix[, 1], title = "Mixture of Normal Distributions", fx = function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1), lower = -5, upper = 5) ## Not run: # Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20) Stcum1 <- calc_theory("Beta", c(6, 3)) Stcum2 <- calc_theory("Beta", c(4, 1.5)) Stcum3 <- calc_theory("Beta", c(10, 20)) mix_pis <- c(0.5, 0.2, 0.3) mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1]) mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2]) mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3]) mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4]) mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5]) mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6]) mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03), seq(0.01, 10, 0.01)) Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six) plot_simtheory(Bmix$Y_mix[, 1], title = "Mixture of Beta Distributions", fx = function(x) mix_pis[1] * dbeta(x, 6, 3) + mix_pis[2] * dbeta(x, 4, 1.5) + mix_pis[3] * dbeta(x, 10, 20), lower = 0, upper = 1) ## End(Not run)
This function approximates the expected correlation between two continuous mixture variables and
based on
their mixing proportions, component means, component standard deviations, and correlations between components across variables.
The equations can be found in the Expected Cumulants and Correlations for Continuous Mixture Variables vignette. This
function can be used to see what combination of component correlations gives a desired correlation between
and
.
rho_M1M2(mix_pis = list(), mix_mus = list(), mix_sigmas = list(), p_M1M2 = NULL)
rho_M1M2(mix_pis = list(), mix_mus = list(), mix_sigmas = list(), p_M1M2 = NULL)
mix_pis |
a list of length 2 with 1st component a vector of mixing probabilities that sum to 1 for component distributions of
|
mix_mus |
a list of length 2 with 1st component a vector of means for component distributions of |
mix_sigmas |
a list of length 2 with 1st component a vector of standard deviations for component distributions of |
p_M1M2 |
a matrix of correlations with rows corresponding to |
the expected correlation between M1 and M2
Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.
Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.
# M1 is mixture of N(-2, 1) and N(2, 1); # M2 is mixture of Logistic(0, 1), Chisq(4), and Beta(4, 1.5) # pairwise correlation between components across M1 and M2 set to 0.35 L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) rho_M1M2(mix_pis = list(c(0.4, 0.6), c(0.3, 0.2, 0.5)), mix_mus = list(c(-2, 2), c(L[1], C[1], B[1])), mix_sigmas = list(c(1, 1), c(L[2], C[2], B[2])), p_M1M2 = matrix(0.35, 2, 3))
# M1 is mixture of N(-2, 1) and N(2, 1); # M2 is mixture of Logistic(0, 1), Chisq(4), and Beta(4, 1.5) # pairwise correlation between components across M1 and M2 set to 0.35 L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) rho_M1M2(mix_pis = list(c(0.4, 0.6), c(0.3, 0.2, 0.5)), mix_mus = list(c(-2, 2), c(L[1], C[1], B[1])), mix_sigmas = list(c(1, 1), c(L[2], C[2], B[2])), p_M1M2 = matrix(0.35, 2, 3))
This function approximates the expected correlation between a continuous mixture variables and another random
variable
based on the mixing proportions, component means, and component standard deviations of
and correlations
between components of
and
. The equations can be found in the Expected Cumulants and Correlations for
Continuous Mixture Variables vignette. This function can be used to see what combination of correlations between components
of
and
gives a desired correlation between
and
.
rho_M1Y(mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, p_M1Y = NULL)
rho_M1Y(mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL, p_M1Y = NULL)
mix_pis |
a vector of mixing probabilities that sum to 1 for component distributions of |
mix_mus |
a vector of means for component distributions of |
mix_sigmas |
a vector of standard deviations for component distributions of |
p_M1Y |
a vector of correlations between the components of |
the expected correlation between M1 and Y
Please see references for rho_M1M2
.
# M1 is mixture of N(-2, 1) and N(2, 1); pairwise correlation set to 0.35 rho_M1Y(mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), p_M1Y = c(0.35, 0.35))
# M1 is mixture of N(-2, 1) and N(2, 1); pairwise correlation set to 0.35 rho_M1Y(mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), p_M1Y = c(0.35, 0.35))
SimCorrMix generates continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count
(Poisson or Negative Binomial, regular or zero-inflated) variables with a specified correlation matrix, or one continuous variable
with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets
(i.e. plasmodes, as in Vaughan et al., 2009, doi:10.1016/j.csda.2008.02.032). The methods extend those found in the
SimMultiCorrData package. Standard normal variables with an imposed intermediate correlation matrix are transformed to
generate the desired distributions. Continuous variables are simulated using either Fleishman's third-order
(doi:10.1007/BF02293811) or Headrick's fifth-order (doi:10.1016/S0167-9473(02)00072-5) power method transformation (PMT).
Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and
sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation
occurs at the component-level for continuous mixture distributions. The target correlation matrix is specified in terms of
correlations with components of continuous mixture variables. These components are transformed into
the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected
correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of
GenOrd-package
's ordsample
function. Count variables are simulated using the inverse
CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently.
Correlation Method 1 adapts Yahav and Shmueli's 2012 method (doi:10.1002/asmb.901) and performs best with large count variable means and
positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and
Ferrari's 2015 modification of GenOrd-package
(doi:10.1002/asmb.2072) and performs best under the opposite scenarios.
The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also provides functions to calculate the standardized
cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables.
There are several vignettes which accompany this package to help the user understand the simulation and analysis methods.
1) Comparison of Correlation Methods 1 and 2 describes the two simulation pathways that can be followed for generation of correlated data.
2) Continuous Mixture Distributions demonstrates how to simulate one continuous mixture variable using
contmixvar1
and gives a step-by-step guideline for comparing a simulated distribution to the target
distribution.
3) Expected Cumulants and Correlations for Continuous Mixture Variables derives the equations used by the function
calc_mixmoments
to find the mean, standard deviation, skew, standardized kurtosis, and standardized fifth
and sixth cumulants for a continuous mixture variable. The vignette also explains how the functions
rho_M1M2
and rho_M1Y
approximate the expected correlations with continuous mixture
variables based on the target correlations with the components.
4) Overall Workflow for Generation of Correlated Data gives a step-by-step guideline to follow with an example containing continuous non-mixture and mixture, ordinal, zero-inflated Poisson, and zero-inflated Negative Binomial variables. It executes both correlated data simulation functions with and without the error loop.
5) Variable Types describes the different types of variables that can be simulated in SimCorrMix, details the algorithm involved in the optional error loop that helps to minimize correlation errors, and explains how the feasible correlation boundaries are calculated for each of the two simulation pathways.
This package contains 3 simulation functions:
contmixvar1
, corrvar
, and corrvar2
4 data description (summary) function:
calc_mixmoments
, summary_var
, rho_M1M2
, rho_M1Y
2 graphing functions:
plot_simpdf_theory
, plot_simtheory
3 support functions:
validpar
, validcorr
, validcorr2
and 16 auxiliary functions (should not normally be called by the user, but are called by other functions):
corr_error
, intercorr
, intercorr2
,
intercorr_cat_nb
, intercorr_cat_pois
, intercorr_cont_nb
, intercorr_cont_nb2
,
intercorr_cont_pois
, intercorr_cont_pois2
, intercorr_cont
, intercorr_nb
, intercorr_pois
,
intercorr_pois_nb
, maxcount_support
,
ord_norm
, norm_ord
Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi:10.1080/00949655.2014.953534.
Barbiero A & Ferrari PA (2015). Simulation of correlated Poisson variables. Applied Stochastic Models in Business and Industry, 31:669-80. doi:10.1002/asmb.2072.
Barbiero A & Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given
Correlation Matrix and Marginal Distributions. R package version 1.4.0.
https://CRAN.R-project.org/package=GenOrd
Carnell R (2017). triangle: Provides the Standard Distribution Functions for the Triangle Distribution. R package version 0.11. https://CRAN.R-project.org/package=triangle.
Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.
Demirtas H (2006). A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical
Computation and Simulation, 76(11):1017-1025.
doi:10.1080/10629360600569246.
Demirtas H (2014). Joint Generation of Binary and Nonnormal Continuous Data. Biometrics & Biostatistics, S12.
Demirtas H & Hedeker D (2011). A practical way for computing approximate lower and upper correlation bounds. American Statistician, 65(2):104-109. doi:10.1198/tast.2011.10090.
Demirtas H, Hedeker D, & Mermelstein RJ (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27):3337-3346. doi:10.1002/sim.5362.
Emrich LJ & Piedmonte MR (1991). A Method for Generating High-Dimensional Multivariate Binary Variables. The American Statistician, 45(4): 302-4. doi:10.1080/00031305.1991.10475828.
Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. doi:10.1177/096228029600500202.
Ferrari PA & Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. doi:10.1080/00273171.2012.692630.
Fialkowski AC (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. https://CRAN.R-project.org/package=SimMultiCorrData.
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. doi:10.1007/BF02293811.
Frechet M (1951). Sur les tableaux de correlation dont les marges sont donnees. Ann. l'Univ. Lyon SectA, 14:53-77.
Hasselman B (2018). nleqslv: Solve Systems of Nonlinear Equations. R package version 3.3.2. https://CRAN.R-project.org/package=nleqslv
Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi:10.1016/S0167-9473(02)00072-5. (ScienceDirect)
Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi:10.1080/10629360600605065.
Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi:10.1007/BF02294317.
Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using
Mathematica. Journal of Statistical Software, 19(3):1 - 17.
doi:10.18637/jss.v019.i03.
Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.
Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding. New York: Springer-Verlag; 1994. p. 57-107.
Ismail N & Zamani H (2013). Estimation of Claim Count Data Using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models. Casualty Actuarial Society E-Forum 41(20):1-28.
Kendall M & Stuart A (1977). The Advanced Theory of Statistics, 4th Edition. Macmillan, New York.
Lambert D (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 34(1):1-14.
Olsson U, Drasgow F, & Dorans NJ (1982). The Polyserial Correlation Coefficient. Psychometrika, 47(3):337-47. doi:10.1007/BF02294164.
Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.
Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research, 5:155-178. doi:10.1177/096228029600500204.
Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. doi:10.1007/BF02293687.
Vaughan LK, Divers J, Padilla M, Redden DT, Tiwari HK, Pomp D, Allison DB (2009). The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies. Comput Stat Data Anal, 53(5):1755-66. doi:10.1016/j.csda.2008.02.032.
Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi:10.1002/asmb.901.
Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.
Zhang X, Mallick H, & Yi N (2016). Zero-Inflated Negative Binomial Regression for Differential Abundance Testing in Microbiome Studies. Journal of Bioinformatics and Genomics 2(2):1-9. doi:10.18454/jbg.2016.2.2.1.
Useful link: https://github.com/AFialkowski/SimMultiCorrData, https://github.com/AFialkowski/SimCorrMix
This function summarizes the results of contmixvar1
, corrvar
, or
corrvar2
. The inputs are either the simulated variables or inputs for those functions. See their
documentation for more information. If summarizing result from contmixvar1
, mixture parameters may be
entered as vectors instead of lists.
summary_var(Y_cat = NULL, Y_cont = NULL, Y_comp = NULL, Y_mix = NULL, Y_pois = NULL, Y_nb = NULL, means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL)
summary_var(Y_cat = NULL, Y_cont = NULL, Y_comp = NULL, Y_mix = NULL, Y_pois = NULL, Y_nb = NULL, means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL)
Y_cat |
a matrix of ordinal variables |
Y_cont |
a matrix of continuous non-mixture variables |
Y_comp |
a matrix of components of continuous mixture variables |
Y_mix |
a matrix of continuous mixture variables |
Y_pois |
a matrix of Poisson variables |
Y_nb |
a matrix of Negative Binomial variables |
means |
a vector of means for the |
vars |
a vector of variances for the |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
mix_pis |
a list of length |
mix_mus |
a list of length |
mix_sigmas |
a list of length |
mix_skews |
a list of length |
mix_skurts |
a list of length |
mix_fifths |
a list of length |
mix_sixths |
a list of length |
marginal |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
A list whose components vary based on the type of simulated variables.
If ordinal variables are produced:
ord_sum
a list, where the i-th element contains a data.frame with target and simulated cumulative probabilities for ordinal variable Y_i
If continuous variables are produced:
cont_sum
a data.frame summarizing Y_cont
and Y_comp
,
target_sum
a data.frame with the target distributions for Y_cont
and Y_comp
,
mix_sum
a data.frame summarizing Y_mix
,
target_mix
a data.frame with the target distributions for Y_mix
,
If Poisson variables are produced:
pois_sum
a data.frame summarizing Y_pois
If Negative Binomial variables are produced:
nb_sum
a data.frame summarizing Y_nb
Additionally, the following elements:
rho_calc
the final correlation matrix for Y_cat
, Y_cont
, Y_comp
, Y_pois
, and Y_nb
rho_mix
the final correlation matrix for Y_cat
, Y_cont
, Y_mix
, Y_pois
, and Y_nb
maxerr
the maximum final correlation error of rho_calc
from the target rho
.
See references for SimCorrMix
.
contmixvar1
, corrvar
, corrvar2
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) Nsum <- summary_var(Y_comp = Nmix$Y_comp, Y_mix = Nmix$Y_mix, means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) # simulate without the error loop Sim1 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, epsilon = 0.01) Summ1 <- summary_var(Sim1$Y_cat, Y_cont = NULL, Sim1$Y_comp, Sim1$Y_mix, Sim1$Y_pois, Sim1$Y_nb, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey) Sim1_error <- abs(Rey - Summ1$rho_calc) summary(as.numeric(Sim1_error)) ## End(Not run)
# Using normal mixture variable from contmixvar1 example Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) Nsum <- summary_var(Y_comp = Nmix$Y_comp, Y_mix = Nmix$Y_mix, means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0), mix_sixths = c(0, 0)) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) # simulate without the error loop Sim1 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, epsilon = 0.01) Summ1 <- summary_var(Sim1$Y_cat, Y_cont = NULL, Sim1$Y_comp, Sim1$Y_mix, Sim1$Y_pois, Sim1$Y_nb, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey) Sim1_error <- abs(Rey - Summ1$rho_calc) summary(as.numeric(Sim1_error)) ## End(Not run)
This function calculates the lower and upper correlation bounds for the given distributions and
checks if a given target correlation matrix rho
is within the bounds. It should be used before simulation with
corrvar
. However, even if all pairwise correlations fall within the bounds, it is still possible
that the desired correlation matrix is not feasible. This is particularly true when ordinal variables ( categories) are
generated or negative correlations are desired. Therefore, this function should be used as a general check to eliminate pairwise correlations that are obviously
not reproducible. It will help prevent errors when executing the simulation. The ordering of the variables in
rho
must be 1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixture, 4th regular Poisson, 5th zero-inflated
Poisson, 6th regular NB, and 7th zero-inflated NB. Note that it is possible for k_cat
, k_cont
, k_mix
,
k_pois
, and/or k_nb
to be 0. The target correlations are specified with respect to the components of the continuous
mixture variables. There are no parameter input checks in order to decrease simulation time. All inputs should be checked prior to simulation with
validpar
.
Please see the Comparison of Correlation Methods 1 and 2 vignette for the differences between the two correlation methods, and the Variable Types vignette for a detailed explanation of how the correlation boundaries are calculated.
validcorr(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, use.nearPD = TRUE, quiet = FALSE)
validcorr(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, rho = NULL, seed = 1234, use.nearPD = TRUE, quiet = FALSE)
n |
the sample size (i.e. the length of each simulated variable; default = 10000) |
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables (default = 0) |
k_mix |
the number of continuous mixture variables (default = 0) |
k_pois |
the number of regular Poisson and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular Negative Binomial and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the k_cont non-mixture and k_mix mixture continuous variables. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
means |
a vector of means for the k_cont non-mixture and k_mix mixture continuous variables
(i.e. |
vars |
a vector of variances for the k_cont non-mixture and k_mix mixture continuous variables
(i.e. |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
Six |
a list of vectors of sixth cumulant correction values for the |
mix_pis |
a list of length |
mix_mus |
a list of length |
mix_sigmas |
a list of length |
mix_skews |
a list of length |
mix_skurts |
a list of length |
mix_fifths |
a list of length |
mix_sixths |
a list of length |
mix_Six |
a list of length |
marginal |
a list of length equal to |
lam |
a vector of lambda (> 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
seed |
the seed value for random number generation (default = 1234) |
use.nearPD |
TRUE to convert |
quiet |
if FALSE prints messages, if TRUE suppresses message printing |
A list with components:
rho
the target correlation matrix, which will differ from the supplied matrix (if provided) if it was converted to
the nearest positive-definite matrix
L_rho
the lower correlation bound
U_rho
the upper correlation bound
If continuous variables are desired, additional components are:
constants
the calculated constants
sixth_correction
a vector of the sixth cumulant correction values
valid.pdf
a vector with i-th component equal to "TRUE" if variable Y_i has a valid power method PDF, else "FALSE"
If a target correlation matrix rho
is provided, each pairwise correlation is checked to see if it is within the lower and upper
bounds. If the correlation is outside the bounds, the indices of the variable pair are given.
valid.rho
TRUE if all entries of rho
are within the bounds, else FALSE
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the function will stop. It may help to first use find_constants
for each continuous variable to
determine if a sixth cumulant correction value is needed. If the standardized cumulants are obtained from calc_theory
,
the user may need to use rounded values as inputs (i.e. skews = round(skews, 8)
). For example, in order to ensure that skew
is exactly 0 for symmetric distributions.
2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
Please see references for SimCorrMix
.
find_constants
, corrvar
, validpar
validcorr(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) ## End(Not run)
validcorr(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, rho = Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed) ## End(Not run)
This function calculates the lower and upper correlation bounds for the given distributions and
checks if a given target correlation matrix rho
is within the bounds. It should be used before simulation with
corrvar2
. However, even if all pairwise correlations fall within the bounds, it is still possible
that the desired correlation matrix is not feasible. This is particularly true when ordinal variables ( categories) are
generated or negative correlations are desired. Therefore, this function should be used as a general check to eliminate pairwise correlations that are obviously
not reproducible. It will help prevent errors when executing the simulation. The ordering of the variables in
rho
must be 1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixture, 4th regular Poisson, 5th zero-inflated
Poisson, 6th regular NB, and 7th zero-inflated NB. Note that it is possible for k_cat
, k_cont
, k_mix
,
k_pois
, and/or k_nb
to be 0. The target correlations are specified with respect to the components of the continuous
mixture variables. There are no parameter input checks in order to decrease simulation time. All inputs should be checked prior to simulation with
validpar
.
Please see the Comparison of Correlation Methods 1 and 2 vignette for the differences between the two correlation methods, and the Variable Types vignette for a detailed explanation of how the correlation boundaries are calculated.
validcorr2(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, seed = 1234, use.nearPD = TRUE, quiet = FALSE)
validcorr2(n = 10000, k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, seed = 1234, use.nearPD = TRUE, quiet = FALSE)
n |
the sample size (i.e. the length of each simulated variable; default = 10000) |
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables (default = 0) |
k_mix |
the number of continuous mixture variables (default = 0) |
k_pois |
the number of regular Poisson and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular Negative Binomial and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the k_cont non-mixture and k_mix mixture continuous variables. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
means |
a vector of means for the k_cont non-mixture and k_mix mixture continuous variables
(i.e. |
vars |
a vector of variances for the k_cont non-mixture and k_mix mixture continuous variables
(i.e. |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
Six |
a list of vectors of sixth cumulant correction values for the |
mix_pis |
a list of length |
mix_mus |
a list of length |
mix_sigmas |
a list of length |
mix_skews |
a list of length |
mix_skurts |
a list of length |
mix_fifths |
a list of length |
mix_sixths |
a list of length |
mix_Six |
a list of length |
marginal |
a list of length equal to |
lam |
a vector of lambda (> 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
pois_eps |
a vector of length |
nb_eps |
a vector of length |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
seed |
the seed value for random number generation (default = 1234) |
use.nearPD |
TRUE to convert |
quiet |
if FALSE prints messages, if TRUE suppresses message printing |
A list with components:
rho
the target correlation matrix, which will differ from the supplied matrix (if provided) if it was converted to
the nearest positive-definite matrix
L_rho
the lower correlation bound
U_rho
the upper correlation bound
If continuous variables are desired, additional components are:
constants
the calculated constants
sixth_correction
a vector of the sixth cumulant correction values
valid.pdf
a vector with i-th component equal to "TRUE" if variable Y_i has a valid power method PDF, else "FALSE"
If a target correlation matrix rho
is provided, each pairwise correlation is checked to see if it is within the lower and upper
bounds. If the correlation is outside the bounds, the indices of the variable pair are given.
valid.rho
TRUE if all entries of rho
are within the bounds, else FALSE
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the function will stop. It may help to first use find_constants
for each continuous variable to
determine if a sixth cumulant correction value is needed. If the standardized cumulants are obtained from calc_theory
,
the user may need to use rounded values as inputs (i.e. skews = round(skews, 8)
). For example, in order to ensure that skew
is exactly 0 for symmetric distributions.
2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
Please see references for SimCorrMix
.
find_constants
, corrvar2
, validpar
validcorr2(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 pois_eps <- 0.0001 size <- 2 prob <- 0.75 p_zinb <- 0.2 nb_eps <- 0.0001 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed) ## End(Not run)
validcorr2(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable n <- 10000 seed <- 1234 # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 pois_eps <- 0.0001 size <- 2 prob <- 0.75 p_zinb <- 0.2 nb_eps <- 0.0001 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # check parameter inputs validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey) # check to make sure Rey is within the feasible correlation boundaries validcorr2(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, pois_eps, nb_eps, Rey, seed) ## End(Not run)
This function checks the parameter inputs to the simulation functions contmixvar1
,
corrvar
, and corrvar2
and to the correlation validation functions
validcorr
and validcorr2
. It should be used prior to execution of these
functions to ensure all inputs are of the correct format. Those functions do not contain parameter checks in order to decrease
simulation time. This would be important if the user is running several simulation repetitions so that the inputs only have to
be checked once. Note that the inputs do not include all of the inputs to the simulation functions. See the appropriate function
documentation for more details about parameter inputs.
validpar(k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, Sigma = NULL, cstart = list(), quiet = FALSE)
validpar(k_cat = 0, k_cont = 0, k_mix = 0, k_pois = 0, k_nb = 0, method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL, skews = NULL, skurts = NULL, fifths = NULL, sixths = NULL, Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL, mu = NULL, p_zinb = 0, pois_eps = 1e-04, nb_eps = 1e-04, rho = NULL, Sigma = NULL, cstart = list(), quiet = FALSE)
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_cont |
the number of continuous non-mixture variables (default = 0) |
k_mix |
the number of continuous mixture variables (default = 0) |
k_pois |
the number of regular Poisson and zero-inflated Poisson variables (default = 0) |
k_nb |
the number of regular Negative Binomial and zero-inflated Negative Binomial variables (default = 0) |
method |
the method used to generate the |
means |
a vector of means for the |
vars |
a vector of variances for the |
skews |
a vector of skewness values for the |
skurts |
a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0)
for the |
fifths |
a vector of standardized fifth cumulants for the |
sixths |
a vector of standardized sixth cumulants for the |
Six |
a list of vectors of sixth cumulant correction values for the |
mix_pis |
a vector if using |
mix_mus |
a vector if using |
mix_sigmas |
a vector if using |
mix_skews |
a vector if using |
mix_skurts |
a vector if using |
mix_fifths |
a vector if using |
mix_sixths |
a vector if using |
mix_Six |
if using |
marginal |
a list of length equal to |
support |
a list of length equal to |
lam |
a vector of lambda (mean > 0) constants for the Poisson variables (see |
p_zip |
a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the
zero-inflated Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters for the NB variables; order the same as in |
mu |
a vector of mean parameters for the NB variables (*Note: either |
p_zinb |
a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables
(see |
pois_eps |
a vector of length |
nb_eps |
a vector of length |
rho |
the target correlation matrix which must be ordered
1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson,
6th regular NB, 7th zero-inflated NB; note that |
Sigma |
an intermediate correlation matrix to use if the user wants to provide one, else it is calculated within by
|
cstart |
a list of length equal to |
quiet |
if FALSE prints messages, if TRUE suppresses message printing |
TRUE if all inputs are correct, else it will stop with a correction message
contmixvar1
, corrvar
, corrvar2
,
validcorr
, validcorr2
validpar(k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # use before contmixvar1 with 1st mixture variable: # change mix_pis to not sum to 1 check1 <- validpar(k_mix = 1, method = "Polynomial", means = Nstcum[1], vars = Nstcum[2]^2, mix_pis = C(0.4, 0.5), mix_mus = mix_mus[[1]], mix_sigmas = mix_sigmas[[1]], mix_skews = mix_skews[[1]], mix_skurts = mix_skurts[[1]], mix_fifths = mix_fifths[[1]], mix_sixths = mix_sixths[[1]]) # use before validcorr: should return TRUE check2 <- validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) ## End(Not run)
validpar(k_cat = 1, k_cont = 1, method = "Polynomial", means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0, marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2), quiet = TRUE) ## Not run: # 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and # 1 zero-inflated NB variable # Mixture variables: Normal mixture with 2 components; # mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5) # Find cumulants of components of 2nd mixture variable L <- calc_theory("Logistic", c(0, 1)) C <- calc_theory("Chisq", 4) B <- calc_theory("Beta", c(4, 1.5)) skews <- skurts <- fifths <- sixths <- NULL Six <- list() mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5)) mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1])) mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2])) mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3])) mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4])) mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5])) mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6])) mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03)) Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]]) Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]]) means <- c(Nstcum[1], Mstcum[1]) vars <- c(Nstcum[2]^2, Mstcum[2]^2) marginal <- list(0.3) support <- list(c(0, 1)) lam <- 0.5 p_zip <- 0.1 size <- 2 prob <- 0.75 p_zinb <- 0.2 k_cat <- k_pois <- k_nb <- 1 k_cont <- 0 k_mix <- 2 Rey <- matrix(0.39, 8, 8) diag(Rey) <- 1 rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2", "M2_3", "P1", "NB1") # set correlation between components of the same mixture variable to 0 Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0 Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0 Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0 # use before contmixvar1 with 1st mixture variable: # change mix_pis to not sum to 1 check1 <- validpar(k_mix = 1, method = "Polynomial", means = Nstcum[1], vars = Nstcum[2]^2, mix_pis = C(0.4, 0.5), mix_mus = mix_mus[[1]], mix_sigmas = mix_sigmas[[1]], mix_skews = mix_skews[[1]], mix_skurts = mix_skurts[[1]], mix_fifths = mix_fifths[[1]], mix_sixths = mix_sixths[[1]]) # use before validcorr: should return TRUE check2 <- validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey) ## End(Not run)