Package: SimCorrMix 0.1.1

SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

Generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al., 2009 <doi:10.1016/j.csda.2008.02.032>). The methods extend those found in the 'SimMultiCorrData' R package. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman (1978)'s third order <doi:10.1007/BF02293811> or Headrick (2002)'s fifth order <doi:10.1016/S0167-9473(02)00072-5> polynomial transformation method (the power method transformation, PMT). Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation occurs at the component level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of ordsample() in package 'GenOrd'. Count variables are simulated using the inverse CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently. Correlation Method 1 adapts Yahav and Shmueli's 2012 method <doi:10.1002/asmb.901> and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari's 2015 modification of the 'GenOrd' package <doi:10.1002/asmb.2072> and performs best under the opposite scenarios. The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also contains functions to calculate the standardized cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables.

Authors:Allison Cynthia Fialkowski

SimCorrMix_0.1.1.tar.gz
SimCorrMix_0.1.1.zip(r-4.7)SimCorrMix_0.1.1.zip(r-4.6)SimCorrMix_0.1.1.zip(r-4.5)
SimCorrMix_0.1.1.tgz(r-4.6-any)SimCorrMix_0.1.1.tgz(r-4.5-any)
SimCorrMix_0.1.1.tar.gz(r-4.7-any)SimCorrMix_0.1.1.tar.gz(r-4.6-any)
SimCorrMix_0.1.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
SimCorrMix/json (API)

# Install 'SimCorrMix' in R:
install.packages('SimCorrMix', repos = c('https://afialkowski.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/afialkowski/simcorrmix/issues

On CRAN:

Conda:

5.35 score 5 stars 18 scripts 223 downloads 28 exports 38 dependencies

Last updated from:32f2aa8211. Checks:7 NOTE, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64NOTE176
source / vignettesOK281
linux-release-x86_64NOTE176
macos-release-arm64NOTE181
macos-oldrel-arm64NOTE155
windows-develNOTE126
windows-releaseNOTE150
windows-oldrelNOTE134
wasm-releaseOK125

Exports:calc_mixmomentscontmixvar1corr_errorcorrvarcorrvar2intercorrintercorr_cat_nbintercorr_cat_poisintercorr_contintercorr_cont_nbintercorr_cont_nb2intercorr_cont_poisintercorr_cont_pois2intercorr_nbintercorr_poisintercorr_pois_nbintercorr2maxcount_supportnorm_ordord_normplot_simpdf_theoryplot_simtheoryrho_M1M2rho_M1Ysummary_varvalidcorrvalidcorr2validpar

Dependencies:BBbbmlebdsmatrixbootclicpp11cubaturefarverGenOrdggplot2glueGPArotationgtableisobandlabelinglatticelifecycleMASSMatrixmnormtmvtnormnleqslvnlmenumDerivpsychquadprogR6RColorBrewerRcpprlangS7scalesSimMultiCorrDatatrianglevctrsVGAMviridisLitewithr

Comparison of Correlation Methods 1 and 2
Methods Used in Both Pathways: | Ordinal Variables: | Continuous Variables: | Continuous-Ordinal Pairs: | Overview of Correlation Method 1: | Simulation Process: | Overview of Correlation Method 2: | References

Last update: 2018-02-23
Started: 2017-11-30

Continuous Mixture Distributions
Example: Mixture of 2 Normal Distributions | Step 1: Obtain the standardized cumulants | Step 2: Simulate the variable | Step 3: Determine if the constants generate a valid PDF | Step 4: Select a critical value | Step 5: Calculate the cumulative probability for the simulated variable up to $1 - \alpha$ | Step 6: Plot graphs | References

Last update: 2018-02-23
Started: 2017-11-30

Overall Workflow for Generation of Correlated Data
Examples | Step 1: Obtain the distributional parameters | Step 2: Check the parameter inputs | Step 3: Calculate the lower skurtosis bounds for the continuous variables | Simulation using Correlation Method 1: | Step 4: Verify the target correlation matrix falls within the feasible correlation bounds | Step 5: Generate the variables | Step 6: Summarize the results numerically and Step 7: Summarize the results graphically | Simulation using Correlation Method 2: | References

Last update: 2018-02-23
Started: 2017-11-30

Variable Types
Error Loop | Correlation Bounds | Some general methods for determining correlation boundaries: | The Generate, Sort, and Correlate (GSC) Algorithm: | The Frechet-Hoeffding Correlation Bounds: | Methods Used in Both Pathways: | Correlation Method 1: | Correlation Method 2: | References

Last update: 2018-02-23
Started: 2017-11-30

Expected Cumulants and Correlations for Continuous Mixture Variables
Expected Cumulants of Continuous Mixture Variables | Extension to more than two component distributions: | Approximate Correlations for Continuous Mixture Variables: | Correlation between continuous mixture variables M1 and M2: | Correlation between continuous mixture variable M1 or M2 and other random variable Y: | References

Last update: 2018-01-12
Started: 2017-11-30

Readme and manuals

Help Manual

Help pageTopics
Find Standardized Cumulants of a Continuous Mixture Distribution by Method of Momentscalc_mixmoments
Generation of One Continuous Variable with a Mixture Distribution Using the Power Method Transformationcontmixvar1
Error Loop to Correct Final Correlation of Simulated Variablescorr_error
Generation of Correlated Ordinal, Continuous (mixture and non-mixture), and/or Count (Poisson and Negative Binomial, regular and zero-inflated) Variables: Correlation Method 1corrvar
Generation of Correlated Ordinal, Continuous (mixture and non-mixture), and/or Count (Poisson and Negative Binomial, regular and zero-inflated) Variables: Correlation Method 2corrvar2
Calculate Intermediate MVN Correlation for Ordinal, Continuous, Poisson, or Negative Binomial Variables: Correlation Method 1intercorr
Calculate Intermediate MVN Correlation for Ordinal - Negative Binomial Variables: Correlation Method 1intercorr_cat_nb
Calculate Intermediate MVN Correlation for Ordinal - Poisson Variables: Correlation Method 1intercorr_cat_pois
Calculate Intermediate MVN Correlation for Continuous Variables Generated by Polynomial Transformation Methodintercorr_cont
Calculate Intermediate MVN Correlation for Continuous - Negative Binomial Variables: Correlation Method 1intercorr_cont_nb
Calculate Intermediate MVN Correlation for Continuous - Negative Binomial Variables: Correlation Method 2intercorr_cont_nb2
Calculate Intermediate MVN Correlation for Continuous - Poisson Variables: Correlation Method 1intercorr_cont_pois
Calculate Intermediate MVN Correlation for Continuous - Poisson Variables: Correlation Method 2intercorr_cont_pois2
Calculate Intermediate MVN Correlation for Negative Binomial Variables: Correlation Method 1intercorr_nb
Calculate Intermediate MVN Correlation for Poisson Variables: Correlation Method 1intercorr_pois
Calculate Intermediate MVN Correlation for Poisson - Negative Binomial Variables: Correlation Method 1intercorr_pois_nb
Calculate Intermediate MVN Correlation for Ordinal, Continuous, Poisson, or Negative Binomial Variables: Correlation Method 2intercorr2
Calculate Maximum Support Value for Count Variables: Correlation Method 2maxcount_support
Calculate Correlations of Ordinal Variables Obtained from Discretizing Normal Variablesnorm_ord
Calculate Intermediate MVN Correlation to Generate Variables Treated as Ordinalord_norm
Plot Simulated Probability Density Function and Target PDF by Distribution Name or Function for Continuous or Count Variablesplot_simpdf_theory
Plot Simulated Data and Target Distribution Data by Name or Function for Continuous or Count Variablesplot_simtheory
Approximate Correlation between Two Continuous Mixture Variables M1 and M2rho_M1M2
Approximate Correlation between Continuous Mixture Variable M1 and Random Variable Yrho_M1Y
Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture DistributionsSimCorrMix-package SimCorrMix
Summary of Simulated Variablessummary_var
Determine Correlation Bounds for Ordinal, Continuous, Poisson, and/or Negative Binomial Variables: Correlation Method 1validcorr
Determine Correlation Bounds for Ordinal, Continuous, Poisson, and/or Negative Binomial Variables: Correlation Method 2validcorr2
Parameter Check for Simulation or Correlation Validation Functionsvalidpar