Homework Assignment

Below, you will find some code that runs a bunch of simulations to evaluate the accuracy of Maximum Likelihood Estimates (MLEs). The simulations assume the “true” data are mean with \( \mu=100 \) and \( \sigma^2=200 \). Your homework assignment is to do the following:

Replicate my code exactly. Save the plots. What can we conclude about the bias as the sample size gets large?
Change the values of the true mean and the true variance. Replicate the simulations. Do your conclusions from (1) change?
Change the distribution from Normal to the Bernoulli distribution. To sample from the Bernoulli in R, you need to use command \( rbinomial(n,size=1,p) \), where \( n \) is for sample size and \( p \) is the “true'' probability that you set. We know that the true mean of the Bernoulli is \( p \) and the true variance is \( p(1-p) \). Replicate the code, but remember that the parameters "mu true” and “var true” have no meaning here. The only thing that matters is \( p_{true} \) in setting the simulations. Do the same conclusions hold from part (1)?
Evaluate the asymptotic normality from part (3). Remember that in R, you can type \( plot(density(x)) \) to plot a density estimate of \( x \). A code example below will help.
Pick one more distribution of your own and replicate the analysis.

Please turn in your code, your answers typed up, and the resulting graphs.

BONUS: Change the x-labels, ylabels, and titles of the graphs to make them look pretty.

Simulation Code


## first, I need to introduce a function for the sample standard deviation
sample_variance <- function(x) {
    n <- length(x)  ##this is the sample size
    svar <- sum((x - mean(x))^2)/n  ##definition of sample variance
    return(svar)
}

## Now, set up a set up the parameters for our simulation
sample_sizes <- seq(1, 1000, by = 1)  ##sequence of integers from 1 to 1000 by 1
mu_true <- 100  ##the true mean
var_true <- 200  ##the true variance
nsamples <- length(sample_sizes)  ##this is the number of samples
nsim <- 1000  ##how many simulations will we do?

## We must set up two matrices to hold our results The matrix will have
## one row for every sample and one column for every simulation
mean_estimates <- matrix(NA, nrow = nsamples, ncol = nsim)
variance_estimates <- matrix(NA, nrow = nsamples, ncol = nsim)

## Start the loop over each simulation and again over each sample size
for (i in 1:nsim) {
    for (j in 1:nsamples) {

        ## the 'true' y is normal with mean mu_true and variance var_true
        y <- rnorm(n = sample_sizes[j], mean = mu_true, sd = sqrt(var_true))

        ## what are the MLEs? mu_MLE is the sample mean and var_MLE is the sample
        ## variance
        mu_est <- mean(y)
        var_est <- sample_variance(y)

        ## store the results
        mean_estimates[j, i] <- mu_est
        variance_estimates[j, i] <- var_est
    }
}

## Simulations done! Now, we want to compute the average estimates by
## sample size The apply() function lets us average across rows or columns
## of a matrix
means <- apply(mean_estimates, 1, mean)
variances <- apply(variance_estimates, 1, mean)

You can also embed plots, for example:

## Now, lets plot the estimates by sample size
par(mfrow = c(1, 2))  ##this makes two plots on the same screen

plot(sample_sizes, means, type = "b")
plot(sample_sizes, variances, type = "b")

plot of chunk unnamed-chunk-2

## Bias?
mean_bias <- mu_true - means
variance_bias <- var_true - variances




par(mfrow = c(1, 2))  ##this makes two plots on the same screen
plot(sample_sizes, mean_bias, type = "b")
plot(sample_sizes, variance_bias, type = "b")

plot of chunk unnamed-chunk-3

## Normality?

par(mfrow = c(1, 2))  ##this makes two plots on the same screen
plot(density(mean_estimates[1, ]))  ##this plots the distribution of the means for the smallest sample
plot(density(mean_estimates[nsamples, ]))  ##this plots the distribution for the largest one