Te Tari Pāngarau me te Tatauranga
Department of Mathematics & Statistics

Archived seminars in Statistics

Seminars 1 to 50

Next 50 seminars
Modeling high-dimensional intermittent hypoxia

Abdis Sattar

Department of Population and Quantitative Health Sciences Case Western Research University USA

Date: Tuesday 19 February 2019

Many remarkable advances have been made in the nonparametri and semiparametric methods for high-dimensional longitudinal data. However, there is a lack of a method for addressing missing data in these important methods. Motivated by oxygenation of retinopathy of prematurity (ROP) study, we developed a penalized spline mixed effects model for a highdimensional nonlinear longitudinal continuous response variable using the Bayesian approach. The ROP study is complicated by the fact that there are non-ignorable missing response values. To address the non-ignorable missing data in the Bayesian penalized spline model, we applied a selection model. Properties of the estimators are studied using Markov Chain Monte Carlo (MCMC) simulation. In the simulation study, data were generated with three different percentages of non-ignorable missing values, and three different sample sizes. Parameters were estimated under various scenarios. The proposed new approach did better compare to the semiparametric mixed effects model with nonignorable missing values under missing at random (MAR) assumption in terms of bias and percent bias in all scenarios of non-ignorable missing longitudinal data. We performed sensitivity analysis for the hyper-prior distribution choices for the variance parameters of spline coefficients on the proposed joint model. The results indicated that half-t distribution with three different degrees of freedom did not influence to the posterior distribution. However, inverse-gamma distribution as a hyperprior density influenced to the posterior distribution. We applied our novel method to the sample entropy data in ROP study for handling nonlinearity and the non-ignorable missing response variable. We also analyzed the sample entropy data under missing at random.
Using functional data analysis to exploit high-resolution “Omics” data

Marzia Cremona

Penn State University

Date: Wednesday 30 January 2019

Recent progress in sequencing technology has revolutionized the study of genomic and epigenomic processes, by allowing fast, accurate and cheap whole-genome DNA sequencing, as well as other high-throughput measurements. Functional data analysis (FDA) can be broadly and effectively employed to exploit the massive, high-dimensional and complex “Omics” data generated by these technologies. This approach involves considering “Omics” data at high resolution, representing them as “curves” of measurements over the DNA sequence. I will demonstrate the effectiveness of FDA in this setting with two applications. In the first one, I will present a novel method, called probabilistic K-mean with local alignment, to locally cluster misaligned curves and to address the problem of discovering functional motifs, i.e. typical “shapes” that may recur several times along and across a set of curves, capturing important local characteristics of these curves. I will demonstrate the performance of the method on simulated data, and I will apply it to discover functional motifs in “Omics” signals related to mutagenesis and genome dynamics. In the second one, I will show how a recently developed functional hypothesis test, IWTomics, and multiple functional logistic regression can be employed to characterize the genomic landscape surrounding transposable elements, and to detect local changes in the speed of DNA polymerization due to the presence of non-canonical 3D structures.
Bayesian Latent Class Analysis for Diagnostic Test Evaluation

Geoff Jones

Massey University

Date: Thursday 25 October 2018

Evaluating the performance of diagnostic tests for infection or disease is of crucial importance, both for the treatment of individuals and for the monitoring of populations. In many situations there is no “gold standard” test that can be relied upon to give 100% accuracy, and the use of a particular test will typically lead to false positives or false negative outcomes. The performance characteristics of an imperfect test are summarized by its sensitivity, i.e. the probability of correct diagnosis for a diseased individual, and its specificity i.e. the probability of a correct diagnosis when disease-free. When these parameters are known, valid statistical inference can be made for the disease status of tested individuals and the prevalence of disease in a monitored population. In the absence of a “gold standard”, true disease status is unobservable so the sensitivity and specificity cannot be reliably determined in the absence of additional information. In some circumstances, information from a number of imperfect tests allows estimation of the prevalence, sensitivities and specificities even in the absence of gold standard data. Latent class analysis in a Bayesian framework gives a flexible and comprehensive way of doing this which has become common in the epidemiology literature. This talk will give an introduction and review of the basic ideas, and highlight some of the current research in this area.
Applied Data Science: A Small Business Perspective

Benoit Auvray

Iris Data Science & Department of Mathematics and Statistics

Date: Thursday 27 September 2018

Iris Data Science is a small Dunedin company established in 2013 providing data science solutions (predictive analytics) for clients in a range of areas, particularly in the agricultural and health sectors. In this talk, we will briefly describe deep learning, a machine learning tool we use extensively at Iris Data Science, and give a few examples of our work for some of our clients. We will also discuss the term “data scientist” and share our experiences running a small business using data science, statistics and machine learning as part of our core service offering. Finally, we will outline some of the practical aspects of developing a predictive tool for commercial use, from data collection and storage to timely and convenient delivery of the predictive model outputs to a client.
Bayesian Hierarchical Modelling

Matt Schofield

Department of Mathematics and Statistics

Date: Thursday 20 September 2018

Bayesian hierarchical modelling is an increasingly popular approach for data analysis. This talk is intended to introduce Bayesian hierarchical models with the aid of examples from genetics, anatomy and ecology. We will discuss various advantages to using such models, including improved estimation and a better description of the underlying scientific process. If time permits, we will also consider situations where hierarchical models may lead to misleading conclusions and a healthy dose of skepticism is required.
A missing value approach for breeding value estimation

Alastair Lamont

Department of Mathematics and Statistics

Date: Thursday 13 September 2018

For a particular trait, an individual’s breeding value is the genetic value it has for its progeny. Accurate breeding value estimation is a critical component of selective breeding, necessary to identify which animals will have the best offspring. As technology has improved, genetic data is often available, and can be utilised for improved breeding value estimation. While it is cost efficient to genotype some animals, it is unfeasible to genotype every individual in most populations of interest, due to either cost or logistical issues. This missing data creates challenges in the estimation of breeding values. Most modern approaches tend to impute or average over the missing data in some fashion, rather than fully incorporating it into the model. I will discuss how statistical models that account for inheritance can be specified and fitted, in work done jointly with Matthew Schofield and Richard Barker. Including inheritance allows missing genotype data to be natively included within the model, while directly working with scientific theory.
A 2D hidden Markov model with extra zeros for spatiotemporal recurrence patterns of tremors

Ting Wang

Department of Mathematics and Statistics

Date: Thursday 6 September 2018

Tectonic tremor activity was observed to accompany slow slip events in some regions. Slow slip events share a similar occurrence style to that of megathrust earthquakes and have been reported to have occurred within the source region of some large megathrust earthquakes. Finding the relationship among the three types of seismic activities may therefore aid forecasts of large destructive earthquakes. Before examining their relationship, it is essential to understand quantitatively the spatiotemporal migration patterns of tremors.

We developed a 2D hidden Markov model to automatically analyse and forecast the spatiotemporal behaviour of tremor activity in the regions Kii and Shikoku, southwest Japan. This new automated procedure classifies the tremor source regions into distinct segments in 2D space and infers a clear hierarchical structure of tremor activity, where each region consists of several subsystems and each subsystem contains several segments. The segments can be quantitatively categorized into three different types according to their occurrence patterns: episodic, weak concentration, and background. Moreover, a significant increase in the proportion of tremor occurrence was detected in a segment in southwest Shikoku before the 2003 and 2010 long-term slow slip events in the Bungo channel. This highlights the possible correlation between tectonic tremor and slow slip events.
Developing forage cultivars for the grazing industries of New Zealand

Zulfi Jahufer

AgResearch and Massey University

Date: Thursday 23 August 2018

Grass and legume based swards play a key role in forage dry matter production for the grazing industries of New Zealand. The genetic merit of this feed base is a primary driver in the profitability, production and environmental footprint of our pastoral systems. A significant challenge to sustainability of this dynamic ecosystem will be climate change. Elevation of ambient temperature and increases in the occurrence of moisture stress events will be a major constraint to forage plant vegetative persistence and seasonal dry matter production. Successful animal breeding has resulted in developing breeds that have higher feed requirements, resulting in increased grazing pressure on swards. The forage science group at AgResearch is actively focused on developing high merit forage grass, legume and herb cultivars. The aim is to optimise plant breeding systems and maximise rates of genetic gain applying conventional plant breeding methods, high throughput phenotyping and new molecular research tools.

~~Dr Zulfi Jahufer is a senior research scientist in quantitative genetics and forage plant breeding. He also conducts the Massey University course in plant breeding. His seminar will focus on the development of novel forage grass and legume cultivars; he will also introduce the new plant breeding software tool DeltaGen.~~
Lattice polytope samplers for statistical inverse problems

Martin Hazelton

Massey University

Date: Thursday 16 August 2018

Statistical inverse problems occur when we wish to learn about some random process that is observed only indirectly. Inference in such situations typically involves sampling possible values for the latent variables of interest conditional on the indirect observations. This talk is concerned with linear inverse problems for count data, for which the latent variables are constrained to lie on the integer lattice within a convex polytope (a bounded multidimensional polyhedron). An illustrative example arises in transport engineering where we observe vehicle counts entering or leaving each zone of the network, then want to sample possible inter-zonal patterns of traffic flow consistent with those entry/exit counts. Other areas of application include inference for contingency tables, and capture-recapture modelling in ecology.

In principle such sampling can be conducted using Markov chain Monte Carlo methods, through a random walk on the lattice polytope. However, it is challenging to design algorithms for doing so that are both computationally efficient and have guaranteed theoretical properties. In this talk I will describe some current work that seeks to combine methods from algebraic statistics with geometric insights in order to develop and study new polytope samplers that address these issues.
A faster algorithm for updating the likelihood of a phylogeny

David Bryant

Department of Mathematics and Statistics

Date: Thursday 9 August 2018

##Note day and time. A joint Mathematics and Statistics seminar taking place in the usual slot for Statistics seminars## Both Bayesian and Maximum Likelihood approaches to phylogenetic inference depend critically on a dynamic programming algorithm developed by Joe Felsenstein over 35 years ago. The algorithm computes the probability of sequence data conditional on a given tree. It is executed for every site, every set of parameters, every tree, and is the bottleneck of phylogenetic inference. This computation comes at a cost: Herve Philippe estimated that his research-associated computing (most of which would have been running Felsenstein's algorithm) resulted in an emission of over 29 tons of $CO_2$ in just one year. In the talk I will introduce the problem and describe an updating algorithm for likelihood calculation which runs in worst case O(log ~~n~~) time instead of O(~~n~~) time, where ~~n~~ is the number of leaves/species. This is joint work with Celine Scornavacca.
Sequential Inference with the Finite Volume Method

Richard Norton

Department of Mathematics and Statistics

Date: Thursday 2 August 2018

Filtering or sequential inference aims to determine the time-dependent probability distribution of the state of a dynamical system from noisy measurements at discrete times. At measurement times the distribution is updated via Bayes' rule, and between measurements the distribution evolves according to the dynamical system. The operator that maps the density function forward in time between measurements is called the Frobenius-Perron operator. I will show how to compute the action of the Frobenius-Perron operator with the finite volume method, a method more commonly used in fluid dynamics to solve PDEs.
Adaptive sequential MCMC for combined state and earameter Estimation

Zhanglong Cao

Mathematics and Statistics Department University of Otago

Date: Thursday 19 July 2018

Most algorithms for combined state and parameter estimation in state space models either estimate the states and parameters by incorporating the parameters in the state space, or marginalize out the parameters through sufficient statistics. In the case of a linear state space model and starting with a joint distribution over states, observations and parameters, we implement an MCMC sampler with two phases. In the learning phase, a self-tuning sampler is used to learn the parameter mean and covariance structure. In the estimation phase, the parameter mean and covariance structure informs the proposal mechanism and is also used in a delayed-acceptance algorithm, which greatly improves sampling efficiency. Information on the resulting state of the system is given by a Gaussian mixture. In on-line mode, the algorithm is adaptive and uses a sliding window approach by cutting off historical data to accelerate sampling speed and to maintain appropriate acceptance rates. We apply the algorithm to joint state and parameter estimation in the case of irregularly sampled GPS time series data.
Modelling multilevel spatial behviour in binary-mark muscle fibre configurations

Tilman Davies

Mathematics and Statistics Department University of Otago

Date: Thursday 12 July 2018

The functional properties of skeletal muscles depend on the spatial arrangements of fast and slow muscle fibre types. Qualitative assessment of muscle configurations suggest that muscle disease and normal ageing are associated with visible changes in the spatial pattern, though a lack of statistical modelling hinders our ability to formally assess such trends. We design a nested Gaussian CAR model to quantify spatial features of dichotomously-marked muscle fibre networks, and implement it within a Bayesian framework. Our model is applied to data from a human skeletal muscle, and results reveal spatial variation at multiple levels across the muscle. The model provides the foundation for future research in describing the extent of change to normal muscle fibre type parameters under experimental or pathological conditions. Joint work with Matt Schofield (Maths & Stats); Jon Cornwall (School of Medicine); and Phil Sheard (Physiology).
Where does your food really come from?

Georgia Anderson


Date: Thursday 31 May 2018

Oritain is a scientific traceability company that verifies the origin of food, fibre and pharmaceutical products by analysing the presence of trace elements and isotopes in the product. Born in the research labs at the Chemistry Department in the University of Otago, Oritain has grown to become a multinational company with offices in Dunedin, London, and Sydney, and with clients from around the globe.

Oritain measures a product's origin using 'chemical fingerprints' derived from the compositions of plants and animals. These compounds vary naturally throughout the environment. Multivariate statistical methods such as principal component analysis and linear discriminant analysis are used to extract information and determine this fingerprint from the trace element and isotopic data.

This talk will present the science used at Oritain and explore how statistics is used in a commercial environment.
Project presentations

Honours and PGDip students

Department of Mathematics and Statistics

Date: Friday 25 May 2018

Qing Ruan : ~~Bootstrap selection in kernel density estimation with edge correction~~
Willie Huang : ~~Autoregressive hidden Markov model - an application to tremor data~~

Tom Blennerhassett : ~~Modelling groundwater flow using Finite Elements in FEniCS~~
Peixiong Kang : ~~Numerical solution of the geodesic equation in cosmological spacetimes with acausal regions~~
Lydia Turley : ~~Modelling character evolution using the Ornstein Uhlenbeck process~~
Ben Wilks : ~~Analytic continuation of the scattering function in water waves~~
Shonaugh Wright : ~~Hilbert spaces and orthogonality~~
Jay Bhana : ~~Visualising black holes using MATLAB~~
Modelling the evolution of sex-specific dominance in response to sexually antagonistic selection

Hamish Spencer

Department of Zoology

Date: Thursday 24 May 2018

Arguments about the evolutionary modification of genetic dominance have a long history in genetics, dating back over 100 years. Mathematical investigations have shown that modifiers of the level of dominance at the locus of interest can only spread at a reasonable rate if heterozygotes at that locus are common. One hitherto neglected scenario is that of sexually antagonistic selection, which is ubiquitous in sexual species and can also generate the stable high frequencies of heterozygotes that would be expected to facilitate the spread of such modifiers. I will present a recursion-equation model that shows that sexually specific dominance modification is a potential outcome of sexually antagonistic selection.

The model predicts that loci with higher levels of sexual conflict should exhibit greater differentiation between males and females in levels of dominance and that the strength of antagonistic selection experienced by one sex should be proportional to the level of dominance modification. These predictions match the recent discovery of a gene in Atlantic salmon, in which sex-dependent dominance leads to earlier maturation of males than females, a difference that is strongly favoured by selection. Finally, I suggest that empiricists should be alert to the possibility of there being numerous cases of sex-specific dominance.
Designing randomised trials to estimate the benefits and harms of patient choice of treatment

Robin Turner

Biostatistics Unit, Dunedin School of Medicine

Date: Thursday 17 May 2018

With the increased use of shared decision making, it is increasingly important to provide evidence on the impact that patient treatment choice has on outcomes. Two-stage randomised trials, incorporating participant choice, offer the opportunity to determine the effects of choice, which is not estimable in standard trials. In order to answer important questions about the effect of choice, these trials need to be adequately powered. This talk will cover the design issues for this type of trial and the situations in which this design may be most beneficial.
Using administrative data to improve public health: examples of research using the Integrated Data Infrastructure (IDI) and other data sources

Gabrielle Davie and Rebecca Lilley

Department of Preventive and Social Medicine

Date: Thursday 10 May 2018

Electronically available administrative data are increasingly being used by researchers. Distinct routinely collected administrative datasets are often combined using linkage techniques to enhance the utility of separate data sources for research purposes. Recently in New Zealand, administrative data from a range of government agencies, Statistics NZ surveys, and non-government organisations have been linked at the person-level generating the Integrated Data Infrastructure (IDI). Statistics NZ manage the IDI making it available for ‘approved research projects that are in the public interest’. This presentation will describe our recent experiences with using the IDI for public health research and discuss some learnings applicable to researchers considering using the IDI. We will also present research of ours utilising novel applications of other administrative data (non-IDI) to inform public health.
Spatially explicit capture-recapture for open populations

Murray Efford

Department of Mathematics and Statistics

Date: Thursday 3 May 2018

In this century, capture–recapture methods for animal populations have developed on two tracks. Estimation of abundance has focussed on robust spatially explicit models for data from closed populations, where turnover during sampling may be ignored. Estimation of turnover (survival, recruitment and population trend) has relied on non-spatial models for data from open populations, where mortality etc. may occur between samples. Multiple benefits flow from combining the two approaches, but this has so far been attempted only in one-off applications using Bayesian models, which are slow to fit. I outline a maximum likelihood approach that combines the strengths of Schwarz and Arnason (1996 ~~Biometrics~~ 52:860) and Borchers and Efford (2008 ~~Biometrics~~ 64:377). The methods are now available in the R package ~~openCR~~ that will be demonstrated with data on Louisiana black bears identified from DNA collected at hair snags. Naive spatial implementations of non-spatial methods can perform poorly, but in simulations the present methods appear robust.
Modelling spatial-temporal processes with applications to hydrology and wildfires

Valerie Isham, NZMS 2018 Forder Lecturer

University College London

Date: Tuesday 24 April 2018

Mechanistic stochastic models aim to represent an underlying physical process (albeit in highly idealised form, and using stochastic components to reflect uncertainty) via analytically tractable models, in which interpretable parameters relate directly to physical phenomena. Such models can be used to gain understanding of the process dynamics and thereby to develop control strategies.

In this talk, I will review some stochastic point process-based models constructed in continuous time and continuous space using spatial-temporal examples from hydrology such as rainfall (where flood control is a particular application) and soil moisture. By working with continuous spaces, consistent properties can be obtained analytically at any spatial and temporal resolutions, as required for fitting and applications. I will start by covering basic model components and properties, and then go on to discuss model construction, fitting and validation, including ways to incorporate nonstationarity and climate change scenarios. I will also describe some thoughts about using similar models for wildfires.
Epidemic modelling: successes and challenges

Valerie Isham, NZMS 2018 Forder Lecturer

University College London

Date: Monday 23 April 2018

##Note time and venue of this public lecture##
Epidemic models are developed as a means of gaining understanding about the dynamics of the spread of infection (human and animal pathogens, computer viruses etc.) and of rumours and other information. This understanding can then inform control measures to limit spread, or in some cases enhance it (e.g., viral marketing). In this talk, I will give an introduction to simple generic epidemic models and their properties, the role of stochasticity and the effects of population structure (metapopulations and networks) on transmission dynamics, illustrating some past successes and outlining some future challenges.
Estimating dated phylogenetic trees with applications in epidemiology, immunology, and macroevolution

Alexandra Gavryushkina

Department of Biochemistry

Date: Monday 23 April 2018

##Note day, time and venue for this seminar##
Newly available data require developing new approaches to reconstructing dated phylogenetic trees. In this talk, I will present new methods that employ birth-death-sampling models to reconstruct dated phylogenetic trees in a Bayesian framework. These methods have been successfully applied in epidemiology and macroevolution. Dated phylogenetic histories can be informative about the past events, for example, we can learn from a reconstructed transmission tree which individuals were likely to infect other individuals. By reconstructing dated phylogenetic trees, we can also learn about the tree generating process parameters. For example, we can estimate and predict how fast epidemics spread or how fast new species arise or go extinct. In immunology, dating HIV antibody lineages can be important for vaccine design.
Confidence distributions

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 19 April 2018

In frequentist statistics, it is common to summarise inference about a parameter using a point estimate and confidence interval. A useful alternative is a confidence distribution, first suggested by David Cox sixty years ago. This provides a visual summary of the set of confidence intervals obtained when we allow the confidence level to vary, and can be thought of as the frequentist analogue of a Bayesian posterior distribution. I will discuss the potential benefits of using confidence distributions and their link with Fisher's controversial concept of a fiducial distribution. I will also outline current work with Peter Dillingham and Jimmy Zeng on the calculation of a model-averaged confidence distribution.
A statistics-related seminar in Preventive and Social Medicine: Meta-analysis and its implications for public health policy decisions

Andrew Anglemyer

Naval Postgraduate School, California

Date: Wednesday 4 April 2018

When recommending policies, clinical guidelines, and treatment decisions, policy makers and practitioners alike can benefit greatly from clear evidence obtained from available empirical data. Methods for synthesizing these data that have been developed for use in clinical environments may prove to be a powerful tool in evidence-based decision making in other fields, as well. In this discussion, I will overview examples of how meta-analysis techniques have provided guidance in public health policy decisions (e.g., HIV treatment guidelines), methods for synthesizing data, and possible limitations of these approaches. Additionally, I will apply meta-analysis techniques to a uniquely Kiwi question to illustrate possible ways to provide guidance in health decisions.

~~Dr. Andrew Anglemyer is an epidemiologist who specializes in infectious diseases and study design methodology at Naval Postgraduate School (and previously at University of California, San Francisco). Since 2009 he has been a member of the World Health Organization’s HIV Treatment Guidelines development committee and was the statistics and methods editor for the HIV/AIDS Cochrane Review Group at UC San Francisco until 2014. Dr. Anglemyer has co-authored dozens of public health and clinical peer-reviewed papers with a wide range of topics including HIV prevention and treatment in high-risk populations, firearms-related injury, paediatric encephalitis and hyponatremia. He received an MPH in Epidemiology/Biostatistics and a PhD in Epidemiology from University of California, Berkeley.~~
A statistics-related seminar in Public Health - Mapping for public health: Effective use of spatial analysis to communicate epidemiological information

Jason Gilliland

Western University, Canada

Date: Thursday 29 March 2018

In this seminar I will present some background and lessons on the use of mapping and spatial analytical methods for public health. With practical examples from my own research, I will cover some important considerations for public health researchers wanting to bring GIS-based analyses into their own projects. The presentation will focus on key methodological issues related to using spatial data which are often overlooked by epidemiologists and other health researchers. Discussion will revolve around opportunities for using qualitative data in Health GIS projects and some other future directions and challenges.
~~Professor Jason Gilliland is Director of the Urban Development Program and Professor in the Dept of Geography, Dept of Paediatrics, School of Health Studies and Dept of Epidemiology & Biostatistics at Western University in Canada. He is also a Scientist with the Children's Health Research Institute and Lawson Health Research Institute, two of Canada's leading hospital-based research institutes. His research is primarily focused on identifying environmental influences on children’s health issues such as poor nutrition, physical inactivity, obesity, and injury. He is also Director of the Human Environments Analysis Lab (, an innovative research and training environment which specializes in community-based research and identifying interventions to inform public policy and neighbourhood design to promote the health and quality of life of children and youth.~~
Genetic linkage map construction in the next generation sequencing era: do old frameworks work with new challenges?

Phil Wilcox

Department of Mathematics and Statistics

Date: Thursday 29 March 2018

The low cost and high throughput of new DNA sequencing technologies have led to a data ‘revolution’ in genomics: two-to-three orders of magnitude more data can be generated for the same cost compared to previous technologies. This has facilitated genome-wide investigations in non-model species at scales not previously possible. However, these new technologies also present new challenges, particularly with genetic linkage mapping, where error due to sequencing and heterozygote undercalling upwardly bias estimates of linkage map lengths, and creates difficulties in reliably ordering clustered loci. In this talk I will describe the application of an exome capture based genotyping panel to genetic linkage map construction in ~~Pinus radiata D.Don~~. I will show that previously applied approaches first proposed in the mid-1990s still provide a suitable analytical framework for constructing robust linkage maps even in this modern data rich era.
Case-control logistic regression is more complicated than you think

Thomas Lumley

University of Auckland

Date: Thursday 22 March 2018

It is a truth universally acknowledged that logistic regression gives consistent and fully efficient estimates of the regression parameter under case-control sampling, so we can often ignore the distinction between retrospective and prospective sampling. I will talk about two issues that are more complicated than this. First, the behaviour of pseudo-$r^2$ statistics under case-control sampling: most of these are not consistently estimated. Second, the question of when and why unweighted logistic regression is much more efficient than survey-weighted logistic regression: the traditional answers of 'always' and 'because of variation in weights' are wrong.
Visual trumpery: How charts lie - and how they make us smarter

Ihaka Lecture #3: Alberto Cairo

University of Miami

Date: Wednesday 21 March 2018

With facts and truth increasingly under assault, many interest groups have enlisted charts — graphs, maps, diagrams, etc. — to support all manner of spin. Because digital images are inherently shareable and can quickly amplify messages, sifting through the visual information and misinformation is an important skill for any citizen.

The use of graphs, charts, maps and infographics to explore data and communicate science to the public has become more and more popular. However, this rise in popularity has not been accompanied by an increasing awareness of the rules that should guide the design of these visualisations.

This talk teaches normal citizens principles to become a more critical and better informed readers of charts.

~~Alberto Cairo is the Knight Chair in Visual Journalism at the University of Miami. He’s also the director of the visualisation programme at UM’s Center for Computational Science. Cairo has been a director of infographics and multimedia at news publications in Spain (El Mundo, 2000-2005) and Brazil (Editora Globo, 2010-2012,) and a professor at the University of North Carolina-Chapel Hill. Besides teaching at UM, he works as a freelancer and consultant for companies such as Google and Microsoft. He’s the author of the books The Functional Art: An Introduction to Information Graphics and Visualization (2012) and The Truthful Art: Data, Charts, and Maps for Communication (2016).~~

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Join the local group in the Mathematics and Statistics Department for this live-stream viewing and discussion
Local contact: [;]
Making colour accessible

Ihaka Lecture #2: Paul Murrell

University of Auckland

Date: Wednesday 14 March 2018

The 'BrailleR' package for R generates text descriptions of R plots.

When combined with screen reader software, this provides information for blind and visually-impaired R users about the contents of an R plot. A minor difficulty that arises in the generation of these text descriptions involves the information about colours within a plot. As far as R is concerned, colours are described as six-digit hexadecimal strings, e.g. "#123456", but that is not very helpful for a human audience. It would be more useful to report colour names like "red" or "blue".

This talk will make a mountain out of that molehill and embark on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te Reo, and XKCD. The only disappointment will be the ending.

~~Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the core development team for R, with primary responsibility for the graphics system.~~

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Join the local group in the Mathematics and Statistics Department for this live-stream viewing and discussion
Local contact: [;]

Ihaka Lectures: A thousand words: visualising statistical data

Live-streamed, 1st of 3 lectures

Date: Wednesday 7 March 2018

Following on from the success of last year's inaugural series, the theme of the 2018 Ihaka lectures is A thousand words: visualising statistical data.

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Local contact: [;]
Scalable Gaussian process models for analyzing large spatial and spatio-temporal datasets

Alan E Gelfand

Duke University

Date: Tuesday 28 November 2017

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations becomes large. This talk considers computationally feasible Gaussian process-based approaches to address this problem. We consider two approaches to approximate an intended Gaussian process; both are Gaussian processes in their own right. One, the Predictive Gaussian Process (PGP), is based upon the idea of dimension reduction. The other, the Nearest Neighbor Gaussian Process (NNGP), is based upon sparsity ideas.

The predictive process is simple to understand, routine to implement, with straightforward bias correction. It enjoys several attractive properties within the class of dimension reduction approaches and works well for datasets of order 103 or 104. It suffers several limitations including spanning only a finite dimensional subspace, over-smoothing, and underestimation of uncertainty.

So, we focus primarily on the nearest neighbor Gaussian process which draws upon earlier ideas of Vecchia and of Stein. It is a bit more complicated to grasp and implement but it is highly scalable, having been applied to datasets as large as 106. It is a well-defined spatial process providing legitimate finite dimensional Gaussian densities with sparse precision matrices. Scalability is achieved by using local information from few nearest neighbors, i.e., by using the neighbor sets in a conditional specification of the model. This is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We show a multivariate spatial illustration as well as a space-time example. We also consider automating the selection of the neighbor set size.

For either specification, we embed the PGP as a dimension reduction prior and the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed. However, the future likely lies with the NNGP since it can accommodate spatial scales that preclude dimension-reducing methods.
Why does the stochastic gradient method work?

Matthew Parry

Department of Mathematics and Statistics

Date: Tuesday 24 October 2017

The stochastic gradient (SG) method seems particularly suited to the numerical optimization problems that arise in large-scale machine learning applications. In a recent paper, Bottou et al. give a comprehensive theory of the SG algorithm and make some suggestions as to how it can be further improved. In this talk, I will briefly give the background to the optimization problems of interest and contrast the batch and stochastic approaches to optimization. I will then give the mathematical basis for the success of the SG method. If time allows, I will discuss how the SG method can also be applied to sampling algorithms.
The changing face of undergraduate mathematics education: a US perspective

Rachel Weir

Allegheny College, Pennsylvania

Date: Monday 16 October 2017

##Note day and time of this seminar##
A common theme in the United States in recent years has been a call to increase the number of graduates in STEM (science, technology, engineering, and mathematics) fields and to enhance the scientific literacy of students in other disciplines. For example, in the 2012 report Engage to Excel, the Obama administration announced a goal of "producing, over the next decade, 1 million more college graduates in STEM fields than expected under current assumptions." Achieving these types of goals will require us to harness the potential of all students, forcing us to identify and acknowledge the barriers encountered by students from traditionally underrepresented groups. Over the past few years, I have been working to understand these barriers to success, particularly in mathematics. In this talk, I will share what I have learned so far and how it has influenced my teaching.
What is n?

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 12 October 2017

In some settings, the definition of "sample size" will depend on the purpose of the analysis. I will consider several examples that illustrate this issue, and point out some of the problems that can arise if we are not clear about what we mean by "n".
Project presentations

Honours and PGDip students

Department of Mathematics and Statistics

Date: Friday 6 October 2017

Jodie Buckby : ~~Model checking for hidden Markov models~~
Jie Kang : ~~Model averaging for renewal process~~
Yu Yang : ~~Robustness of temperature reconstruction for the past 500 years~~

Sam Bremer : ~~An effective model for particle distribution in waterways~~
Joshua Mills : ~~Hyperbolic equations and finite difference schemes~~
Gems of Ramanujan and their lasting impact on mathematics

Ken Ono

Emory University; 2017 NZMS/AMS Maclaurin Lecturer

Date: Thursday 5 October 2017

##Note venue of this public lecture##
Ramanujan’s work has had a truly transformative effect on modern mathematics, and continues to do so as we understand further lines from his letters and notebooks. In this lecture, some of the studies of Ramanujan that are most accessible to the general public will be presented and how Ramanujan’s findings fundamentally changed modern mathematics, and also influenced the lecturer’s work, will be discussed. The speaker is an Associate Producer of the film ~~The Man Who Knew Infinity~~ (starring Dev Patel and Jeremy Irons) about Ramanujan. He will share several clips from the film in the lecture.

Biography: Ken Ono is the Asa Griggs Candler Professor of Mathematics at Emory University. He is considered to be an expert in the theory of integer partitions and modular forms. He has been invited to speak to audiences all over North America, Asia and Europe. His contributions include several monographs and over 150 research and popular articles in number theory, combinatorics and algebra. He received his Ph.D. from UCLA and has received many awards for his research in number theory, including a Guggenheim Fellowship, a Packard Fellowship and a Sloan Fellowship. He was awarded a Presidential Early Career Award for Science and Engineering (PECASE) by Bill Clinton in 2000 and he was named the National Science Foundation’s Distinguished Teaching Scholar in 2005. In addition to being a thesis advisor and postdoctoral mentor, he has also mentored dozens of undergraduates and high school students. He serves as Editor-in-Chief for several journals and is an editor of The Ramanujan Journal. He is also a member of the US National Committee for Mathematics at the National Academy of Science.
A statistics-related seminar in Physics: Where do your food and clothes come from? Oritain finds the answer in chemistry and statistics

Katie Jones and Olya Shatova

Oritain Dunedin

Date: Monday 2 October 2017

##A statistics-related seminar in the Physics Department##
##Note day, time and venue##
Oritain Global Ltd is a scientific traceability company that verifies the origin of food, fibre, and pharmaceutical product by combining trace element and isotope chemistry with statistics. Born in the research labs at the Chemistry Department in the University of Otago, Oritain has grown to become a multinational company with offices in Dunedin, London, and Sydney, and with clients from around the globe. Dr Katie Jones and Dr Olya Shatova are Otago alumni working as scientists at Oritain Dunedin. They will provide an overview of the science behind Oritain and discuss their transition from academic research to commercialized science.
Quantitative genetics in forest tree breeding

Mike and Sue Carson

Carson Associates Ltd

Date: Thursday 28 September 2017

Forest tree breeding, utilising quantitative genetic (QG) methods, is employed across a broad range of plant species for improvement of a wide diversity of products, or ‘breeding objectives’. Examples of breeding objectives range from the traditional sawn timber and pulpwood products desired largely from pines and eucalypts, to antibiotic factors in honey obtained from NZ manuka, and including plant oil products from oil palms. The standard population breeding approach recognises a hierarchy of populations (the ‘breeding triangle’) with a broad and diverse gene resource population at the base, and a highly-improved but less diverse deployment population at the peak. With the constraint that the deployment population must contain a ‘safe’ amount of genetic diversity, the main goal for any tree improvement program is to use selection and recombination to maximise deployment population gains in the target traits. The key QG tools used in tree improvement programs for trial data analysis, estimation of breeding values, index ranking and selection, and mating and control of pedigree are in common with most other plant and livestock breeding programs. However, the perennial nature of most tree crops requires tree breeders to place a greater emphasis on the use of well-designed, long-term field trials, in combination with efficient and secure databases like Gemview. Recent advances using factor analytic models are providing useful tools for examining and interpreting genotype and site effects and their interaction on breeding values. Genomic selection is expected to enhance, rather than replace, conventional field screening methods for at least the medium term.
Genomic data analysis: bioinformatics, statistics or data science?

Mik Black

Department of Biochemistry

Date: Thursday 21 September 2017

Analysis of large-scale genomic data has become a core component of modern genetics, with public data repositories providing enormous opportunities for both exploratory and confirmatory studies. To take advantage of these opportunities, however, potential data analysts need to possess a range of skills, including those drawn from the disciplines of bioinformatics, data science and statistics, as well as domain-specific knowledge about their biological area of interest. While traditional biology-based teaching programmes provide an excellent foundation in the latter skill set, relatively little time is spent equipping students with the skills required for genomic data analysis, despite high demand for graduates with this knowledge. In this talk I will work through a fairly typical analysis of publicly accessible genomic data, highlighting the various bioinformatics, statistical and data science concepts and techniques being utilized. I will also discuss current efforts being undertaken at the University of Otago to provide training in these areas, both inside and outside the classroom.
Thinking statistically when constructing genetic maps

Timothy Bilton

Department of Mathematics and Statistics

Date: Thursday 14 September 2017

A genetic linkage map shows the relative position of and genetic distance between genetic markers, positions of the genome which exhibit variation, and underpins the study of species' genomes in a number of scientific applications. Genetic maps are constructed by tracking the transmission of genetic information from individuals to their offspring, which is frequently modelled using a hidden Markov model (HMM) since only the expression and not the transmission of genetic information is observed. However, data generated using the latest sequencing technology often results in partially observed information, which if unaccounted for, typically results in inflated estimates. Most approaches to circumvent this issue involves a combination of filtering and correcting individual data points using ad-hoc methods. Instead, we develop a new methodology that models the partially observed information by incorporating an additional layer of latent variables into the HMM. Results show that our methodology is able to produce accurate genetic map estimates, even in situations where a large proportion of the data is only partially observed.
Network tomography for integer valued traffic

Martin Hazelton

Massey University

Date: Thursday 7 September 2017

Volume network tomography is concerned with inference about traffic flow characteristics based on traffic measurements at fixed locations on the network. The quintessential example is estimation of the traffic volume between any pair of origin and destination nodes using traffic counts obtained from a subset of the links of the network. The data provide only indirect information about the target variables, generating a challenging type of statistical linear inverse problem.

In this talk I will discuss network tomography for a rather general class of traffic models. I will describe some recent progress on model identifiability. I will then discuss the development of effective MCMC samplers for simulation-based inference, based on insight provided by an examination of the geometry of the space of feasible route flows.
TensorFlow: a short intro

Lech Szymanski

Department of Computer Science

Date: Thursday 31 August 2017

TensorFlow is an open source software library for numerical computation. Its underlying paradigm of computation uses data flow graphs, which allow for automatic differentiation and effortless deployment that parallelises across CPUs or GPUs. I have been working in TensorFlow for about a year now, using it to build and train deep learning models for image classification. In this talk I will give a brief introduction to TensorFlow as well as share some of my experiences of working with it. I will try to make this talk not about deep learning with TensorFlow, but rather about TensorFlow itself, which I happen to use for deep learning.
Theory and application of latent variable models for multivariate binomial data

John Holmes

Department of Mathematics and Statistics

Date: Thursday 24 August 2017

A large body of work has been devoted to developing latent variable models for exponential family distributed multivariate data exhibiting interdependencies. For the binomial case however, extensions of models past analysis of binary data is almost entirely missing. Focusing on principal component/factor analysis representations, we will show that under the canonical logit link, latent variable models can be fitted in closed form, via Gibbs sampling, to multivariate binomial data of arbitrary trial size, by applying Pólya-gamma augmentation to the binomial likelihood. In this talk, the properties of binomial latent variable models under Pólya-gamma data augmentation will be discussed from both a theoretical perspective and through application to a range of simulated and real demographic datasets.
Māori student success: Findings from the Graduate Longitudinal Study New Zealand

Moana Theodore

Department of Psychology

Date: Thursday 17 August 2017

Māori university graduates are role models for educational success and important for the social and economic wellbeing of Māori whānau (extended family), communities and society in general. Describing their experiences can help to build an evidence base to inform practice, decision-making and policy. I will describe findings for Māori graduates from all eight New Zealand universities who are participants in the Graduate Longitudinal Study New Zealand. Data were collected when the Māori participants were in their final year of study in 2011 (n=626) and two years post-graduation in 2014 (n=455). First, I will focus on what Māori graduates describe as helping or hindering the completion of their qualifications, including external (e.g. family), institutional (e.g. academic support) and student/personal (e.g. persistence) factors. Second, I will describe Māori graduate outcomes at 2 years post-graduation. In particular, I will describe the private benefits of higher education, such as labour market outcomes (e.g. employment and income), as well as the social benefits such as civic participation and volunteerism. Overall, our findings suggest that boosting higher education success for Māori may reduce ethnic inequalities in New Zealand labour market outcomes and may impart substantial social benefits as a result of Māori graduates’ contribution to society.
Bayes factors, priors and mixtures

Matthew Schofield

Department of Mathematics and Statistics

Date: Thursday 10 August 2017

It is well known that Bayes factors are sensitive to the prior distribution chosen on the parameters. This has led to comments such as “Diffuse prior distributions ... must be used with care” (Robert 2014) and “We do not see Bayesian methods as generally useful for giving the posterior probability that a model is true, or the probability for preferring model A over model B” (Gelman and Shalizi 2013). We consider the calculation of Bayes factors for nested models. We show this is equivalent to a model with a mixture prior distribution, where the weights on the resulting posterior are related to the Bayes factor. These results allow us to directly compare Bayes factors to shrinkage priors, such as the Laplace prior used in the Bayesian lasso. We use these results as the basis for offering practical suggestions for estimation and selection in nested models.
Development and implementation of culturally informed guidelines for medical genomics research involving Māori communities

Phil Wilcox

Department of Mathematics and Statistics

Date: Thursday 3 August 2017

Medical genomic research is usually conducted within a ‘mainstream’ cultural context. Māori communities have been underrepresented in such research despite being impacted by heritable diseases and other conditions that could potentially be unravelled via modern genomic technologies. Reasons for low participation of Māori communities include negative experiences of genomics and genetics researchers - such as the notorious ‘Warrior Gene’ saga – and an unease with technologies that are often implemented by non-Māori researchers in a manner inconsistent with Māori values. In my talk I will describe recently developed guidelines for ethically appropriate genomics research with Māori communities; how these guidelines were informed by my iwi, Ngāti Rakaipaaka, who had previously been involved in a medical genomics investigation; and current efforts to complete that research via a partnership with Te Tari Pāngarau me Tātauranga ki Te Whare Wānaka o Otakou (Department of Mathematics and Statistics at the University of Otago).
Who takes Statistics? A look at student composition, 2000-2016

Peter Dillingham

Department of Mathematics and Statistics

Date: Thursday 27 July 2017

In this blended seminar and discussion, we will examine how student data can help inform curriculum development and review, focussing on the Statistics programme as an example. Currently, the Statistics academic staff are reviewing our programme to ensure that we continue to provide a high quality and modern curriculum that meets the needs of students. An important component of this process is to understand whom our students are and what they are interested in, from first-year service teaching through to students majoring in statistics. As academics, we often have a reasonable answer to these questions, but we can be more specific by poring over student data. While not glamorous, this sort of data can help confirm those things we think we know, identify opportunities or risks, and help answer specific questions where we know that we don’t know the answer.
A missing value approach for breeding value estimation

Alastair Lamont

Department of Mathematics and Statistics

Date: Thursday 20 July 2017

A key goal in quantitative genetics is the identification and selective breeding of individuals with high economic value. For a particular trait, an individual’s breeding value is the genetic worth it has for its progeny. While methods for estimating breeding values have existed since the middle of last century, the march of technology now allows the genotypes of individuals to be directly measured. This additional information allows for improved breeding value estimation, supplementing observed measurements and known pedigree information. However, while it can be cost efficient to genotype some animals, it is unfeasible to genotype every individual in most populations of interest, due to either cost or logistical issues. As such, any approach must be able to accommodate missing data, while also managing computational efficiency, as the dimensionality of data can be immense. Most modern approaches tend to impute or average over the missing data in some fashion, rather than fully incorporating it into the model. These approximations lead to a loss in estimation accuracy. Similar models are used within Human genetics, but for different purposes. With different data and different goals to quantitative genetics, these approaches natively include missing data within the model. We are developing an approach which utilises a human genetics framework, but adapted so as to estimate breeding values.
Assessing and dealing with imputation inaccuracy in genomic predictions

Michael Lee

Department of Mathematics and Statistics

Date: Thursday 13 July 2017

Genomic predictions rely on having genotypes from high density SNP Chips from many individuals. Many national animal evaluations, to predict breeding values, may include millions of animals, where an increasing proportion of these have genotype information. Imputation can be used to make genomic predictions more cost effective. For example, in the NZ Sheep industry genomic predictions can be done by genotyping animals with a SNP Chip of lower density (e.g. 5-15K) and imputing the genotypes for a given animal to a density of about 50K, where the imputation process needs a reference panel of 50K genotypes. The imputed genotypes are used in genomic predictions and the accuracy of imputation is a function of the quality of the reference panel. A study to assess the imputation accuracy of a wide range of animals was undertaken. The goal was to quantify the levels of inaccuracy and to determine a best strategy to deal with this inaccuracy in the context of single step genomic best linear unbiased prediction (ssGBLUP).
Twists and trends in exercise science

Jim Cotter

School of Physical Education, Sport and Exercise Sciences

Date: Thursday 1 June 2017

From my perspective, exercise science is entering an age of enlightenment, but misuse of statistics remains a serious limitation to its contributions and progress for human health, performance, and basic knowledge. This seminar will summarise our recent and current work in hydration, heat stress and patterns of dosing/prescribing exercise, and the implications for human health and performance. These contexts will be used to discuss methodological issues including of research design, analysis and interpretation.