--- title: "Using the Sixth Cumulant Correction to Find Valid Power Method Pdfs" author: "Allison C Fialkowski" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: Bibliography.bib vignette: > %\VignetteIndexEntry{Using the Sixth Cumulant Correction to Find Valid Power Method Pdfs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE} #knitr::opts_chunk$set(collapse = TRUE, comment = "#>") knitr::opts_chunk$set(fig.width = 6, fig.height = 4.5) ``` Sometimes a target (continuous) distribution's standardized cumulants yield polynomial transformation constants that generate an invalid power method pdf. This is an important consideration when utilizing a simulated distribution to perform power analysis or hypothesis testing. In the case of Headrick's fifth-order approximation [-@Head2002], @HeadKow proposed that a correction can be used to increase the sixth cumulant's value in order to produce a valid pdf. This is discussed in the help pages for `find_constants` and the simulation functions (`nonnormvar1`, `rcorrvar`, and `rcorrvar2`). These functions allow the user to input a `Six` vector of correction values, and then choose the smallest value that generates a valid pdf. When Headrick generated his Table 1 (2002, p.691 - 692) containing the fifth-order polynomial transformation constants for some common symmetrical and asymmetrical theoretical densities, he did not discuss whether the constants generate valid power method pdfs. His 2007 paper with Kowalchuk explained how to verify if a given set of constants generates a valid pdf (for both Fleishman's third-order [-@Fleish] and Headrick's fifth-order method). These checks are implemented in `pdf_check`, which is called by `find_constants`. ## Example The following example illustrates the use of the sixth cumulant correction vector when finding the constants for each of Headrick's Table 1 distributions (given in `Headrick.dist`), and compares distributions generated with the corrections to those generated without the corrections. (Note that the `printr` @Printr package is invoked to display the tables.) ### Step 1) Find the theoretical standardized cumulants for each distribution. The parameters are stored in `H_params`. The order of the columns corresponds to the order of the columns in `Headrick.dist`. The columns of `H_params` are named as inputs for `calc_theory`. Please see the appropriate R help page concerning the parameterization of each distribution. We are rounding the calculated standardized cumulants to *10* decimal places to ensure that the symmetric distributions have $\Large \gamma_{1} = 0$ and $\Large \gamma_{3} = 0$. ```{r, warning = FALSE, message = FALSE} library("SimMultiCorrData") library("printr") H_stcum <- matrix(1, nrow = 4, ncol = ncol(Headrick.dist)) for (i in 1:ncol(H_params)) { if (is.na(H_params[2, i])) { params <- H_params[1, i] } else { params <- as.numeric(H_params[, i]) } H_stcum[, i] <- round(calc_theory(Dist = colnames(H_params)[i], params = params)[3:6], 10) } colnames(H_stcum) <- colnames(Headrick.dist) rownames(H_stcum) <- c("skew", "skurtosis", "fifth", "sixth") round(H_stcum[, 1:6], 5) round(H_stcum[, 7:12], 5) round(H_stcum[, 13:18], 5) round(H_stcum[, 19:22], 5) ``` Note that the standardized cumulants match those found by Headrick, except for the Gamma($\Large \alpha = 10,\ \beta = 10$) distribution. Either there is a mistake in Headrick's table, or he is using a different parameterization. ### Step 2) Use the cumulants to find the constants for each distribution. The sixth cumulant corrections will be chosen based on previous analysis in order to decrease computation time. If the user does not know what a sixth cumulant correction needs to be, a wide range with a small increment in values may be specified, i.e. `Six = seq(0.1, 10, 0.1)`. In situations where the user has a better idea of the necessary correction, a smaller vector should be chosen. In the case of the *Triangular* distribution, the standardized kurtosis has been changed to the lower kurtosis boundary (found using `calc_lower_skurt`). ```{r, warning = FALSE, message = FALSE} Six <- list(NULL, seq(1.7, 1.8, 0.01), seq(0.5, 2, 0.5), seq(25.1, 25.2, 0.01), seq(0.1, 0.3, 0.01), NULL, NULL, seq(0.5, 2, 0.5), NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, seq(0.01, 0.05, 0.01), seq(0.15, 0.2, 0.01), seq(0.5, 2, 0.5), NULL, seq(0.5, 2, 0.5), seq(0.5, 2, 0.5)) H_consol <- list() start.time <- Sys.time() for (i in 1:ncol(H_stcum)) { skurtsH <- ifelse(colnames(H_stcum)[i] == "Triangular", -0.5856216, H_stcum[2, i]) H_consol[[i]] <- find_constants(method = "Polynomial", skews = H_stcum[1, i], skurts = skurtsH, fifths = H_stcum[3, i], sixths = H_stcum[4, i], Six = Six[[i]]) } stop.time <- Sys.time() Time <- round(difftime(stop.time, start.time, units = "min"), 3) cat("Total computation time:", Time, "minutes \n") H_cons <- matrix(1, nrow = 7, ncol = ncol(Headrick.dist)) valid <- numeric(ncol(Headrick.dist)) for (i in 1:ncol(H_stcum)) { H_cons[1:6, i] <- H_consol[[i]]$constants H_cons[7, i] <- ifelse(is.null(H_consol[[i]]$SixCorr1), NA, H_consol[[i]]$SixCorr1) valid[i] <- H_consol[[i]]$valid } colnames(H_cons) <- colnames(Headrick.dist) rownames(H_cons) <- c("c0", "c1", "c2", "c3", "c4", "c5", "sixcorr") ``` ### Step 3) Look at results to see which distributions still have invalid power method pdfs ```{r} colnames(H_cons)[valid == FALSE] ``` Therefore, the Uniform($\Large 0,\ 1$), Chisq($\Large df = 1$), Weibull($\Large \alpha = 6,\ \beta = 10$), Rayleigh($\Large \alpha = 0.5,\ \mu = \sqrt{0.5 * \pi}$), and Pareto($\Large \theta = 10,\ \alpha = 1$) distributions still have invalid power method pdf constants. ### Step 4) Look at constants and sixth cumulant corrections ```{r} round(H_cons[, 1:6], 6) round(H_cons[, 7:12], 6) round(H_cons[, 13:18], 6) round(H_cons[, 19:22], 6) ``` 1. The Gaussian, t($\Large df = 10$), Chisq($\Large df = 2, ..., 32$), Beta($\Large \alpha = 4,\ \beta = 4$), Beta($\Large \alpha = 4,\ \beta = 2$), and Gamma($\Large \alpha = 10,\ \beta = 10$) distributions had valid power method pdfs with the original cumulants. 1. The Logistic($\Large 0, 1$), Triangular($\Large 0, 1$), t($\Large df = 7$), Beta($\Large \alpha = 4,\ \beta = 1.5$), and Beta($\Large \alpha = 4,\ \beta = 1.25$) required relatively small sixth cumulant corrections. 1. The Laplace($\Large 0, 1$) required the largest correction at 25.14. ### Step 5) Simulate distributions We will choose the **Logistic($\Large 0, 1$)** and **Laplace($\Large 0, 1$) distributions** for illustration. First, simulate *without the sixth cumulant correction*. ```{r, warning = FALSE} seed <- 1234 Rey <- matrix(c(1, 0.4, 0.4, 1), 2, 2) # Make sure Rey is within feasible correlation bounds valid <- valid_corr(k_cont = 2, method = "Polynomial", means = rep(0, 2), vars = rep(1, 2), skews = H_stcum[1, c("Logistic", "Laplace")], skurts = H_stcum[2, c("Logistic", "Laplace")], fifths = H_stcum[3, c("Logistic", "Laplace")], sixths = H_stcum[4, c("Logistic", "Laplace")], rho = Rey, seed = seed) A <- rcorrvar(n = 10000, k_cont = 2, method = "Polynomial", means = rep(0, 2), vars = rep(1, 2), skews = H_stcum[1, c("Logistic", "Laplace")], skurts = H_stcum[2, c("Logistic", "Laplace")], fifths = H_stcum[3, c("Logistic", "Laplace")], sixths = H_stcum[4, c("Logistic", "Laplace")], rho = Rey, seed = seed) ``` Look at the maximum correlation error: ```{r} cat(paste("The maximum correlation error is ", round(A$maxerr, 5), ".", sep = "")) ``` Look at the interquartile-range of correlation errors: ```{r} Acorr_error = round(A$correlations - Rey, 6) cat(paste("The IQR of correlation errors is [", round(quantile(as.numeric(Acorr_error), 0.25), 5), ", ", round(quantile(as.numeric(Acorr_error), 0.75), 5), "].", sep = "")) ``` Second, simulate *with the sixth cumulant correction*. ```{r, warning = FALSE} Six <- list(H_cons["sixcorr", "Logistic"], H_cons["sixcorr", "Laplace"]) # Make sure Rey is within feasible correlation bounds valid2 <- valid_corr(k_cont = 2, method = "Polynomial", means = rep(0, 2), vars = rep(1, 2), skews = H_stcum[1, c("Logistic", "Laplace")], skurts = H_stcum[2, c("Logistic", "Laplace")], fifths = H_stcum[3, c("Logistic", "Laplace")], sixths = H_stcum[4, c("Logistic", "Laplace")], Six = Six, rho = Rey, seed = seed) B <- rcorrvar(n = 10000, k_cont = 2, method = "Polynomial", means = rep(0, 2), vars = rep(1, 2), skews = H_stcum[1, c("Logistic", "Laplace")], skurts = H_stcum[2, c("Logistic", "Laplace")], fifths = H_stcum[3, c("Logistic", "Laplace")], sixths = H_stcum[4, c("Logistic", "Laplace")], Six = Six, rho = Rey, seed = seed) ``` Look at the maximum correlation error: ```{r} cat(paste("The maximum correlation error is ", round(B$maxerr, 5), ".", sep = "")) ``` Look at the interquartile-range of correlation errors: ```{r} Bcorr_error = round(B$correlations - Rey, 6) cat(paste("The IQR of correlation errors is [", round(quantile(as.numeric(Bcorr_error), 0.25), 5), ", ", round(quantile(as.numeric(Bcorr_error), 0.75), 5), "].", sep = "")) ``` In both cases, the correlation errors are small, indicating that the error loop does not need to be used. Now *compare the results numerically*. Target distributions: ```{r} as.matrix(round(A$summary_targetcont, 5), nrow = 2, ncol = 7, byrow = TRUE) ``` Without the sixth cumulant correction: ```{r} as.matrix(round(A$summary_continuous[, c("Distribution", "mean", "sd", "skew", "skurtosis", "fifth", "sixth")], 5), nrow = 2, ncol = 7, byrow = TRUE) A$valid.pdf ``` With the correction: ```{R} as.matrix(round(B$summary_continuous[, c("Distribution", "mean", "sd", "skew", "skurtosis", "fifth", "sixth")], 5), nrow = 2, ncol = 7, byrow = TRUE) B$valid.pdf ``` The distributions simulated with the sixth cumulant corrections are closer to the target distributions. *Compare the results graphically*. Logistic Distribution: ```{r, warning = FALSE, message = FALSE} plot_sim_pdf_theory(sim_y = A$continuous_variables[, 1], title = "Logistic Pdf", Dist = "Logistic", params = H_params$Logistic) plot_sim_pdf_theory(sim_y = B$continuous_variables[, 1], title = "Corrected Logistic Pdf", Dist = "Logistic", params = H_params$Logistic) ``` Laplace Distribution: ```{r, warning = FALSE, message = FALSE} plot_sim_pdf_theory(sim_y = A$continuous_variables[, 2], title = "Laplace Pdf", Dist = "Laplace", params = H_params$Laplace) plot_sim_pdf_theory(sim_y = B$continuous_variables[, 2], title = "Corrected Laplace Pdf", Dist = "Laplace", params = H_params$Laplace) ``` ## References