Exam questions for FIT2086

Oct 22, 2018

Lecture 1

Define the following terms:

A population

A sample

A model

What is inference?

Lecture 2

What is variance in the context of a random variable?

Lecture 3

Briefly explain what maximum likelihood works is and a reason why it may be so effective

Suppose we have an unbiased estimator \(\hat{\theta}\) of a parameter \(\theta\) with variance \(2.00\). What is the MSE of \(\hat{\theta}\)?

What is the sampling distribution of the mean of a random variable \(Y\) with finite mean and variance?

What does it mean for an estimator to be consistent?

What is the weak law of large numbers?

The pmf for a Poisson distribution is given by

\[ p(y \, | \, \lambda) = \frac{\lambda^y \, \text{exp}(- \lambda)}{y!} \]

Derive the negative log likelihood and therefore maximum likelihood estimator for \(Y \sim Poi(\lambda)\).

Lecture 4

What is the central limit theorem?

Suppose I draw a smaple from a distribution and calculate a 95% confidence interval using that sample. What is the probability that the true paramater of the distribution is contained in my CI?

Give the formula for the \(100(1-\alpha)%\) CI of \(\hat{\mu}\) for a normal distribution with known variance \(\sigma^2\). What changes if the variance is unknown?

Appealing to the central limit theorem, give the formula for a CI on the difference in means for two normal distributions where none of the means or variances are known. Why is this an approximation of the true CI? Is there something it does not take into account?

Lecture 5

What is a null hypothesis?

What is a test-statistic and a p-value?

Give the formula for the p-value for each of the following hypotheses:

\(H_0 : \mu = \mu_0\) vs \(H_A : \mu \not = \mu_0\)

\(H_0 : \mu \leq \mu_0\) vs \(H_A : \mu > \mu_0\)

\(H_0 : \mu \geq \mu_0\) vs \(H_A : \mu < \mu_0\)

Lecture 6

Why might we want to use the sum of squared errors as a measure of fit in our model?

Explain the link between least squares estimator and the maxmimum likelihood of \(\beta_0\) and \(\beta\).

Suppose we have collected a sample of data and fit a multiple linear regression model to the data using 3 predictors for the target. We calculated the RSS to be equal to 300 units. What is the ML and unibiased estimates of the variance of the residuals?

Why is it preferable to use complexity penalties?

Suppose you have collected a sample of 72 measurements on 1531 individuals in Melbourne and are looking to predict their age from that data. Your friend Tim suggests you use the all subsets approach for model selection while your other friend Meg suggests you use a forward selection method. Who do you listen to and why?

Lecture 7

What is the key assumption of Naive Bayes? How does it help reduce the complexity of fitting the model?

Suppose we are approaching a classification problem with \(30\) binary categorical predictors. The target variable has 10 classes.

Why can’t we use the simplest model directly estimating the joint distribution?

How many probabilites would we need to estimate for Naive Bayes?

What “shortcut” does logistic regression take?

What are sensitivity and specificity?

What is AUC?

What is logarithmic loss? Why might it be preferred to classification accuracy?

Lecture 8

What is the Bonferroni approach? Why is it used?

Henry has collected a sample of medical imaging data with \(p = 10,000\) and \(n = 524\). Jasmine suggests he use the \(RIC\) information criterion. What might Jasmine’s rationale be?

Briefly explain the forward and background selection algorithms for predictor selection.

Briefly explain what it means for a model to have statistical instability.

What is the difference between linear, ridge and LASSO regression?

Lecture 9

Briefly outline \(k\) fold CV and LOO CV.

Lecture 10

What is unsupervised learning? How does it differ from supervised learning?

Very briefly outline the steps in the $k$-means algorithm.

In what way is mixture modelling an extension of $k$-means?

What is the matrix completion problem? Give two examples of when it may be used.

Lecture 11

What is a psuedo-random number generator?

What is bootstrapping?

What is bagging?

What is a permutation test?