# Statistics in Research

## Introduction

Welcome to the Statistics in Research website. Here you can freely access video discussions and accompanying lessons that explore data analysis procedures across a wide range of research applications.

The data associated with each application may be used as a standalone resource for exploration. The lessons then provide a structured and comprehensive introduction to data analysis (New Zealand high school up to undergraduate university level) and the use of the statistical software R (About R).

## Videos

Austina Clark, Mathematics and Statistics, University of Otago A designed experiment, stratified sampling; confidence intervals and tests on means; simple linear regression; two factor factorial experiment; multiple linear regression. (length: 9:00).

Mairi Stewart, AgResearch, Ruakura A designed experiment with sequential data; confidence intervals and tests on means; bootstrap confidence intervals; analysis of variance; repeated measures. (length: 15:09).

Adam N. H. Smith, NIWA Observational data with 59 cases; confidence intervals and tests on means; simple linear regression and correlation; principal components analysis. (length: 16:31).

Sam Lucas, Physiology Department, University of Otago Designed study with repeated measures; confidence intervals and tests on matched data; bootstrap confidence intervals; repeated measures and mixed model analysis. (length: 14:47).

Neil Binnie, Auckland University of Technology Census data with up to 9218 cases; data cleaning and exploration; confidence intervals and tests for differences between means. (length: 18:00).

Ian Westbrooke, New Zealand Department of Conservation Matched data samples; confidence intervals using t-distribution and bootstrap confidence intervals (length: 25:07).

Barry Peake, Department of Chemistry, University of Otago Survey sample with 111 cases; comparison of samples; simple linear regression; log transformed data; principal components and other multivariate techniques. (length: 12:45).

Motohide Miyahara, Department of Physical Education, University of Otago Questionnaire with 70 likert questions and 100 selected participants; confidence intervals for differences between means of different groups. (length: 11:41).

Nigel Dickson, Preventative and Social Medicine, University of Otago Observational cohort data with 1890 cases; cross tabulations and confidence intervals for proportions; confounder control by stratification; relative risk; logistic regression (length: 12:30).

John Williams, Marketing Department, University of Otago Survey sample with 1950 cases; design of a survey; cross tabulations; confidence intervals for proportions; logistic and multinomial regression; post stratification (length: 37:21).

Julie Legler, St Olaf College Demographic data for 176 countries; methods for missing data estimates; use of mean and the bootstrap; estimates using stratification; estimates using multiple regression (length: 18:45).

Various presenters, K9 Medical Detection Group Measuring diagnostic test accuracy; sensitivity; specificity; positive and negative predictive values (length: 15:36/37:06).

David Fletcher, Department of Mathematics and Statistics, University of Otago A designed study; simple linear regression; confidence interval for difference between proportions; bootstrap intervals; logistic model for binomial data and over-dispersion (length: 20:25).

David O’Hare, Department of Psychology, University of Otago A survey with 1144 respondents; chi square tests; logistic regression (length: 15:47).

Ian Jamieson, Department of Zoology, University of Otago Designed study with 28 cases; Wilcoxon non-parametric test; bootstrap confidence interval. (length: 26:00).

Elaine Ferguson, Department of Human Nutrition, University of Otago Cross-sectional survey with 323 cases; chi-squared tests; confidence intervals and tests for means, proportions and differences; log transformations; multiple linear regression. (length: 22:11).

Pauline Gulliver, Injury Prevention Research Unit, University of Otago Observational cohort data with 1837 cases; confidence intervals and tests for differences between proportions; chi-squared test; logistic regression. (length: 9:55).

Juergen Gnoth, Marketing Department, University of Otago Survey sample with 3664 cases and over 100 variables; all multivariate techniques used (length: 17:43).

Barbara Clendon, Statistics New Zealand Moving average; multiplicative model; seasonal adjustment; trend; irregular or residuals; emphasis on result interpretation. (length: 16:34).

Paul and Kent, Environment Team, Statistics New Zealand Discussion of a new department of Statistics New Zealand to research National Resource Accounts; two short start up time series available. (length: 8:40).

## How to use this series

### Website structure

If you are unfamiliar with the R language you should first work through Getting started with R Chapter 2 . This contains instructions for using R, including downloading and installing the required software and a walk-through of the building blocks of its functionality. Subsequent lessons will examine particular techniques in detail via applications.

The videos have been grouped according to the type of data being analysed (Continuous, Count, and Time Series), and further ordered according to the complexity of techniques that are being taught.

Early lessons step through the relevant techniques in detail and code is able to be directly copied into your R Markdown, so you can get a good understanding of the procedures.

### Lesson structure

Data is provided to download in Excel format, along with an overview of the variables in the data set for teachers to assess relevance.

Videos are embedded in the html document. They focus on the application and statistical methodologies for real data collected or produced by an active researcher. The results discussed are not always replicable, typically as a result of differences in the data sets used (availability and privacy reasons), however the techniques addressed are the same. They do not rely on any statistical package, and give an appreciation for the research and lesson plan.

Objectives state the main statistical techniques and concepts that will be stepped through in the subsequent lesson, specifying which techniques are presented for the first time or are an opportunity to refine skills.

Tasks aim to teach statistical skills in an accessible and interesting way. Although they are presented for direct implementation in R, the Excel format of data allows the translation of tasks to any software of choice.

The first task of every lesson installs any necessary R packages and loads the data into R. Subsequent tasks then explore a variety of relevant techniques.

R code is provided to help build confidence in R, in addition to practice calculations where code is written from scratch using R Markdown. The content is structured so that earlier lessons provide more support with R and later lessons encourage self-directed learning.

Each task has a tab structure to guide users.

The first tab is the Task which presents instructions for a particular analysis technique.

The second tab is the Code, this provides directly executable and reproducible R code relevant to the task. Depending on the particular task this may contain the complete necessary code, but often some slight modifications are required to answer all components of a task.

The Solution tab provides the complete R code to answer the task, as well as showing the expected R output. This can be used to check your work.

Some tasks also contain an optional Extension which does not have solution code provided. This may involve utilising the task technique in a different context, examining additional concepts, or some external research.

### Interacting with the lessons

To replicate code (both in the later sections of this Introduction and the lessons themselves), you can use the “Copy to clipboard” feature. Hover in the top right of the code chunks we provide, then click to copy the contents of the chunk to your clipboard (button shown below). You can then paste and execute this code within chunks in your R Markdown script.

Note that some of the code chunks included in these lessons utilise a horizontal slider to improve formatting. This operates in the same way as the vertical sliders used to scroll down pages. Scrolling may be necessary to view the entirety of a code chunk.

As the lessons progress, some code is hidden and additional techniques are taught. This will allow you to test your learning by trying to perform some of the calculations independently, before checking your answers by revealing the code.

Code is shown and hidden by clicking the “> Code” button above each chunk.

If you want to show or hide all of the code in a lesson, you can utilise the “</> Code” button in the top left corner (across from the main title).

Important Information is indicated using green call-out boxes.

Important Information

This may be hints, things be aware of, or steps to allow you to follow the lesson code more easily.

Previous Lessons are linked using orange call-out boxes. These are automatically collapsed, but can be revealed with a click.

Previous Lesson

Previous lessons where similar techniques are shown on different data provide a starting point for answering the task.

Previous Tasks within the same lesson are indicated using blue call-outs. They are automatically collapsed, but can be revealed with a click.