--- title: "Continuous Mixture Distributions" author: "Allison C Fialkowski" date: "`r Sys.Date()`" output: bookdown::html_document2: fig_caption: yes bibliography: Bibliography.bib vignette: > %\VignetteIndexEntry{Continuous Mixture Distributions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning=FALSE, fig.width = 6, fig.height = 4, fig.align = "center") ``` ```{r, include=FALSE} library("bookdown") ``` The following example demonstrates the use of the `contmixvar1` function, which simulates one continuous mixture distribution $Y$. @Head2002's fifth-order transformation is used to generate the components of $Y$ with `n = 10000`. @HeadKow outlined a general method for comparing a simulated distribution $Y$ to a given theoretical distribution $Y^*$. These have been modified below to apply to continuous mixture variables and will be shown for **Example 1**. Note that they could easily be modified further for comparison to an empirical vector of data. More details about continuous mixture variables can be found in the [Expected Cumulants and Correlations for Continuous Mixture Variables](corr_mixture.html) vignette. Some code has been modified from the **SimMultiCorrData** package [@SMCD]. 1. **Obtain the standardized cumulants** (skewness, kurtosis, fifth, and sixth) for the components of $Y^*$. This can be done using `SimMultiCorrData::calc_theory` along with either the component distribution names (plus up to 4 parameters for each) or the PDF's with support bounds. In the case of an empirical vector of data, use `SimMultiCorrData::calc_moments` or `SimMultiCorrData::calc_fisherk` [@SMCD]. If you desire $Y$ to have the mean and variance of $Y^*$, use `calc_mixmoments` with the component parameters as inputs to find $E[Y^*]$ and $SD[Y^*]$. 1. **Obtain the constants** for the components of $Y$. This can be done using `SimMultiCorrData::find_constants` on each component or by simulating the mixture variable with `contmixvar1`. 1. Determine whether these constants produce a **valid power method PDF**. The results of `find_constants` or `contmixvar1` indicate whether the constants yield invalid or valid PDF's for the component variables. The constants may also be checked using `SimMultiCorrData::pdf_check`. If the constants generate an invalid PDF, the user should check if the skurtosis falls above the lower bound (using `SimMultiCorrData::calc_lower_skurt`). If yes, sixth cumulant correction values should be used in `SimMultiCorrData::find_constants` or `contmixvar1` to find the smallest corrections that produce valid PDF constants. If all of the component distributions have valid PDF's, the mixture variable has a valid PDF. 1. **Select a critical value** from $Y^*$, i.e. $y^*$ such that $Pr(Y^* \ge y^*) = \alpha$ for the desired significance level $\alpha$. 1. **Calculate** the cumulative probability for the simulated mixture variable $Y$ up to $y^*$ and compare to $1 - \alpha$. 1. **Plot a parametric graph** of $Y^*$ and $Y$. This can be done with the simulated vector of data $Y$ using `plot_simpdf_theory` (`overlay` = TRUE) and specifying the PDF `fx` for the mixture variable. If comparing to an empirical vector of data, use `SimMultiCorrData::plot_sim_pdf_ext`. # Example: Mixture of 2 Normal Distributions {-} The component distributions are *Normal(-2, 1)* and *Normal(2, 1)*. The mixing proportions are $0.4$ and $0.6$. Use @HeadKow's steps to compare the density of the simulated variable to the theoretical density: ## Step 1: Obtain the standardized cumulants {-} The values of $\gamma_1,\ \gamma_2,\ \gamma_3,$ and $\gamma_4$ are all $0$ for normal variables. The mean and standard deviation of the mixture variable are found with `calc_mixmoments`. ```{r} library("SimCorrMix") library("printr") options(scipen = 999) n <- 10000 mix_pis <- c(0.4, 0.6) mix_mus <- c(-2, 2) mix_sigmas <- c(1, 1) mix_skews <- rep(0, 2) mix_skurts <- rep(0, 2) mix_fifths <- rep(0, 2) mix_sixths <- rep(0, 2) Nstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) ``` ## Step 2: Simulate the variable {-} Note that `calc_mixmoments` returns the standard deviation, not the variance. The simulation functions require variance as the input. First, the parameter inputs are checked with `validpar`. ```{r} validpar(k_mix = 1, method = "Polynomial", means = Nstcum[1], vars = Nstcum[2]^2, mix_pis = mix_pis, mix_mus = mix_mus, mix_sigmas = mix_sigmas, mix_skews = mix_skews, mix_skurts = mix_skurts, mix_fifths = mix_fifths, mix_sixths = mix_sixths) Nmix2 <- contmixvar1(n, "Polynomial", Nstcum[1], Nstcum[2]^2, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths) ``` Look at a summary of the target distribution and compare to a summary of the simulated distribution. ```{r} SumN <- summary_var(Y_comp = Nmix2$Y_comp, Y_mix = Nmix2$Y_mix, means = Nstcum[1], vars = Nstcum[2]^2, mix_pis = mix_pis, mix_mus = mix_mus, mix_sigmas = mix_sigmas, mix_skews = mix_skews, mix_skurts = mix_skurts, mix_fifths = mix_fifths, mix_sixths = mix_sixths) knitr::kable(SumN$target_mix, digits = 5, row.names = FALSE, caption = "Summary of Target Distribution") knitr::kable(SumN$mix_sum, digits = 5, row.names = FALSE, caption = "Summary of Simulated Distribution") ``` ## Step 3: Determine if the constants generate a valid PDF {-} ```{r} Nmix2$constants Nmix2$valid.pdf ``` ## Step 4: Select a critical value {-} Let $\alpha = 0.05$. Since there are no quantile functions for mixture distributions, determine where the cumulative probability equals $1 - \alpha = 0.95$. The boundaries for `uniroot` were determined through trial and error. ```{r} fx <- function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1) cfx <- function(x, alpha, FUN = fx) { integrate(function(x, FUN = fx) FUN(x), -Inf, x, subdivisions = 1000, stop.on.error = FALSE)$value - (1 - alpha) } y_star <- uniroot(cfx, c(3.3, 3.4), tol = 0.001, alpha = 0.05)$root y_star ``` ## Step 5: Calculate the cumulative probability for the simulated variable up to $1 - \alpha$ {-} We will use the function `SimMultiCorrData::sim_cdf_prob` to determine the cumulative probability for $Y$ up to `y_star`. This function is based on Martin Maechler's `ecdf` function [@Stats]. ```{r} sim_cdf_prob(sim_y = Nmix2$Y_mix[, 1], delta = y_star)$cumulative_prob ``` This is approximately equal to the $1 - \alpha$ value of $0.95$, indicating the method provides a **good approximation to the actual distribution.** ## Step 6: Plot graphs {-} ```{r} plot_simpdf_theory(sim_y = Nmix2$Y_mix[, 1], ylower = -10, yupper = 10, title = "PDF of Mixture of Normal Distributions", fx = fx, lower = -Inf, upper = Inf) ``` We can also plot the empirical cdf and show the cumulative probability up to y_star. ```{r} plot_sim_cdf(sim_y = Nmix2$Y_mix[, 1], calc_cprob = TRUE, delta = y_star) ``` # References {-}