Statistics
Te Tari Pāngarau me te Tatauranga
Department of Mathematics & Statistics

Archived seminars in Statistics

Seminars 151 to 200
Previous 50 seminars
Next 50 seminars
Inference for an emerging epidemic in the presence of control

Matthew Parry

Department of Mathematics and Statistics

Date: Thursday 27 August 2015

Fast-moving and destructive emerging epidemics are seldom left to run their course because of the imperative to control further spread. Contemporaneous control measures, however, greatly complicate the characterization of the disease transmission process and the extraction of the epidemiological parameters of interest. The spread of Huanglongbing on orchard scales is used as a case study for modelling an emerging epidemic in the presence of control. We show that even with missing and censored data, and with seasonal and host age dependencies, it is possible to infer the parameters of a fully spatiotemporal stochastic model of disease spread. The value of the fitted model is that it provides an engine for simulation studies of the costs and benefits of proposed disease control strategies.
150409164319
The adaptive distances method in evolutionary models

Shlomo Moran, The Bernard Elkin Chair in Computer Science

Technion, Israel

Date: Friday 14 August 2015

Note day and time of this special seminar
Evolutionary distances between pairs of aligned DNA sequences (representing species or genes) are very useful for reconstructing phylogenetic (evolutionary) trees. Therefore, obtaining good estimations of these distances is critical for accurate tree reconstruction.
Traditionally, evolutionary distances are identified with evolutionary time - the expected number of mutations (substitutions) per site, which is estimated from the aligned sequences and the assumed evolutionary model.
Common evolutionary models allow few types of mutations, which occur at different rates. Due to the finite length of the aligned sequences, it is easy to estimate the number of slow mutations when the sequences are (time-wise) far apart, and fast mutations are easier to count when these sequences are close (for instance, estimations of the expected number of fast mutations becomes notoriously inaccurate for large evolutionary distances - a phenomenon known as "long-edge-saturation"). The adaptive distance method copes with this problem by adjusting to each pair of aligned sequences an evolutionary distance which is easiest to estimate for this specific pair: eg., when the sequences are far apart, it uses an evolutionary distance which increases the weight of slow mutations count and decreases the weight of fast mutations count.
I will present a general framework which specifies the models which enable the use of adaptive distances, and present theoretical and experimental results which demonstrate the advantages of this method. Time permitting, some extensions and applications of this method will be presented.

The talk is self-contained, and does not assume any prior knowledge of evolutionary models or phylogenetic reconstruction.
150721115642
Ergodic properties of a Markov chain used for inferring genotype from corrupted DNA

Paula Bran

Department of Mathematics and Statistics

Date: Thursday 13 August 2015

Markov models are widely used in different areas such as biology, physics, social sciences, etc. This talk refers to a model which estimates the population of certain species using non-invasive DNA samples. In particular, convergence properties of the algorithm used are the main issues to discuss. The ergodicity theorem establishes the conditions that a Markov chain must hold in order to determine the existence and uniqueness of a steady state distribution. Furthermore, the model will be briefly explained and then these properties of convergence will be discussed in the context of this model.
150409163714
Predictability of random effect contrasts in animal breeding models

John Holmes

Department of Mathematics and Statistics

Date: Thursday 30 July 2015

Best linear unbiased prediction (BLUP) is the default method used in animal breeding to assess genetic merit and referred to as the breeding value of a given animal. The breeding value is the estimated random effect of a given animal. For results obtained from BLUP to be of practical use, the reliability of contrasts between estimated breeding values needs to be evaluated. This talk will give an overview of the existing methods proposed and possible new methods to access this in a computational efficient way.
150409163634
Dental epidemiology in the age of digital dentistry

Jonathan Broadbent

Oral Rehabilitation

Date: Thursday 23 July 2015

Dental problems are highly prevalent yet preventable. The 21st century has brought the digital age of dentistry, with great innovations in dental preventive and interventive treatments. We now have digital records, digital radiographs, digital impressions, computer-aided design and milling of restorations. We can reverse or arrest some tooth decay, regenerate damaged gums, and implant replacement teeth. Despite these advances, the most basic dental problems remain highly prevalent. In fact, many of these innovations have led to an increase in inequitable provision of dental care. This presentation will explore opportunities for investigating dental diseases and the provision of dental care using both routinely-collected treatment data and prospectively-collected longitudinal data.
150430135030
Proofs and evolution

David Bryant

Department of Mathematics and Statistics

Date: Tuesday 21 July 2015

INAUGURAL PROFESSORIAL LECTURE
Note time and venue
150717104304
Some recent developments in sheep breeding infrastructure in NZ

Michael Lee

Department of Mathematics and Statistics

Date: Thursday 16 July 2015

Infrastructure to support the genetic improvement of seed-stock is important. It provides a means to lift the profitability for a whole industry. This increased profitability contributes to our wealth nationally and is cumulative. The sheep industry in NZ has made significant gains in increasing the genetic merit of the national flock. More recently, technologies such as genomics has provided opportunities to increase the rate of genetic gain for the national flock. This seminar will give an overview of the current infrastructure to date and some activities ongoing to help increase the profitability for the industry through genetic improvement.
150409163329
Utilising volatile aroma analysis for understanding flavour perception and challenges for data analysis

Graham Eyres

Department of Food Science

Date: Thursday 9 July 2015

Flavour perception is one of the most important characteristics of any food product, strongly influencing consumer liking of products and their success in the marketplace. A key determinant of flavour perception is the composition of the volatile aroma compounds. This presentation will discuss the use of analytical techniques for characterising aroma compounds in food products as a tool to understand flavour perception. The aroma-active compounds that contribute to flavour perception can be identified by gas chromatography – mass spectrometry (GC/MS) in combination with simultaneous olfactory detection using a human assessor. This analytical data can be correlated to quantitative sensory evaluation data to understand the drivers of flavour quality. The dynamic release of aroma compounds from food during consumption, measured using proton transfer reaction mass spectrometry (PTR-MS), can be used to understand the impact of food composition and structure on temporal flavour perception. This seminar will present examples of research undertaken in the Department of Food Science, discuss the challenges in data analysis and opportunities for collaboration with Maths and Statistics.
150410100511
Coordination variability and the walk to run transition in humans

Peter Lamb

School of Physical Education, Sport and Exercise Sciences

Date: Thursday 2 July 2015

Human gait has two main forms – walking and running – the transition between these forms is sudden and occurs within a narrow speed range. Existing models of human gait fail to predict or explain several key characteristics of the transition between gait modes. Treating the transition as a complex, adaptive behaviour where the emergence of the whole body coordination pattern is a consequence of many inter-related factors may be the starting point to a general model human locomotion. Other researchers have taken this approach in studying the gait transition, but have reported conflicting results. One reason for this may be in their use of low-dimensional discrete variables, which may not be robust to variation in experimental protocols, participants or other factors. This talk will demonstrate the application of self-organising maps as a tool for exploring and visualising the high-dimensional time-series data that represent human locomotion.
150216111821
A statistics-related seminar in Physics Departmentt: Kernel based methods for fitting scattered data with applications

Assoc Prof Rick Beatson

University of Canterbury

Date: Friday 12 June 2015

Kernel based, or radial basis function, methods are an attractive approach to scattered data fitting in many contexts. Early applications were to surface reconstruction from scan data, including the manufacture of titanium, prostheses for the repair of damaged skulls, and an application to large scale custom manufacture of artificial limbs. Later applications include the fitting of divergence free, and curl free, flows to vector field data, and the visualisation of geophysical data such as ore grade in mining exploration software. Kernel based methods are well worth considering for any date fitting problem where the data is scattered rather than gridded. In this talk I will introduce kernel based methods. One fundamental idea is that of a positive definite kernel. I will discuss this concept and some of its consequences. Another fundamental idea is that of the fast evaluation, some approaches to which involve the multipole expansions of mathematical physics. Illustrative animations and movies will be scattered throughout the talk.
150609163701
A statistical method for estimating eruption volumes for Mt Taranaki events

Rebecca Green

Department of Mathematics and Statistics

Date: Thursday 28 May 2015

While temporal forecasting of eruption episodes has been widely studied, forecasts of eruption size are less common. Models that incorporate eruption size depend on the availability of reasonable estimates of eruptive volumes; however there are very few volcanoes with sufficient volume data available.

In this talk I present a Bayesian statistical approach to estimate eruption volumes for a series of eruptions from Mt Taranaki (New Zealand). Most studies focus on large widespread eruptions using isopach maps constructed from tephra (ash) thicknesses observed at exposed locations. Whereas I take a unique approach, incorporating raw thickness measurements from additional unexposed lake and swamp records. This facilitates investigation into the dispersal pattern and volume of much smaller events.

Given the general scarcity of data and the physical phenomena governing tephra attenuation, a high-dimensional complex model is required. Point thickness observations are modeled as a function of the distance and angular direction of each location. Larger well estimated events are used as leverage for understanding the smaller unknown events and uncertainty in thickness measurements can be accounted for. In addition to eruptive volumes the model also estimates the wind and site-specific effects on the tephra deposits. Although the model has some undesirable properties, due to difficulties in separating model parameters, it does provide an implementable framework for estimating volumes from really sparse data.
150217111327
Project presentations

Honours and PGDip students

Department of Mathematics and Statistics

Date: Friday 22 May 2015

STATISTICS
Yunan Wang: Binary segmentation for change-point detection in GPS time series
Patrick Brown: Investigating dynamic time series models to predict future tourism demand in New Zealand
Alastair Lamont: Hierarchical modelling approaches to estimate genetic breeding values
Lyco Wen: Effects of gene by environment interaction on hyperuricemia and related gout risk

15-MINUTE BREAK 2.20-2.35

MATHEMATICS
Callum Nicholson: Wavelets and direct limits
Pareoranga Luiten-Apirana: Morita eqivalence of Leavitt path algebras
Tom McCone: Primitive ideals in graph algebras
141008094300
Marketing statistics practice

Damien Mather

Department of Marketing

Date: Thursday 21 May 2015

The presentation will cover:
• What statistics methods Otago’s marketing department teaches at the undergraduate level
– for support of commercial marketing decision making
– for support of postgraduate research skills

• What statistics methods Otago’s marketing department teaches and/or mentors at the postgraduate level
– Specialised vocational focus:
for commercial marketing research industry sector
for emerging business data scientist roles
– Core postgraduate research methods, frameworks

• What statistics methods are used for staff/postgrad research: moderated by what editors and reviewers understand?

Bio:
Damien has a multidisciplinary undergraduate background in biological and physical sciences and a postgraduate focus on predictive models for marketing applications. He has worked in senior roles in engineering, marketing and commercial marketing research sectors in addition to his academic career in marketing.
150219121822
Analysing language data over time: current techniques and quandaries

Hunter Hatfield

Department of English and Linguistics

Date: Thursday 7 May 2015

NOTE TIME OF THIS SEMINAR; NOT THE USUAL
Linguistic theory has traditionally analysed virtually all language knowledge as structural and hierarchical, be it knowledge of meaning, grammar, words, or sounds. At the same time, actual language use is profoundly temporal. Language practice occurs at many timescales: from speech perception over milliseconds to conversation over minutes to speech community development over months to language learning over years to language change over decades or centuries. While linguistic theory typically asks the researcher to find structure, analysing language data in practice often involves looking at time series. In this talk, I will look at some of the statistical techniques currently employed to look at language data across time. The focus will be on my own experimental research about the processing of syntax over a few seconds, but we will look at other techniques for interpreting data across larger time scales.
150408155128
Model-based PCA

Richard Barker

Department of Mathematics and Statistics

Date: Thursday 30 April 2015

Principal components analysis (PCA) is one of the first multivariate technique that students learn. The idea is that we take a take a $d$-dimension multivariate observation $y_1,\ldots,y_d$ and find linear combinations of these which we can use as "indices'' $z_1,\ldots,z_d$ known as principal components. The indices are uncorrelated and ordered by importance. With highly correlated data most of the variation in $y$ might be captured by the first few principal components in which case principal components analysis allows us to describe $y$ in terms of a smaller number of uncorrelated principal components.

A notable feature of PCA is that it is not based on any particular model which limits its usefulness. However, Tipping and Bishop (1999) showed a connection between PCA and factor analysis. Factor analysis models dependence structure in $y$ by assuming a normal distribution for $y$ with mean expressed in terms of latent factors that are normally distributed with mean 0 and unit variances. Tipping and Bishop showed that under a simple restriction on the covariance matrix we obtain a model-based formulation for PCA. In this seminar I describe these ideas in more detail and their implications for statistical modelling.

Tipping, M.E., and Bishop, C. M. 1999. Probabalistic principal components analysis. J. R. Statist. Soc. B 61: 611-622
150212155403
Multidimensional scaling

Monika Balvočiūtė

Department of Mathematics and Statistics

Date: Thursday 23 April 2015

Multidimensional scaling (MDS) is a data reduction and visualisation technique used for high-dimensional object display in a low-dimensional space. Objects are mapped to points in the low-dimensional space (usually 2D or 3D) so as to preserve, as much as possible, the distances between objects. In this talk we discuss the general problem of MDS and introduce a new agglomerative approach for solving it. The algorithm is extremely fast. Our initial results suggest that it out-performs existing algorithms for large scale MDS (>100,000's of points).
150219094615
Model averaging

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 16 April 2015

Parameter estimation has traditionally been based on a single model, with this model often being selected as the best from a set of candidate models. The process by which we select this best model is often ignored, leading to point estimates being biased and their precision being overestimated. Model averaging is an approach to estimation that makes some allowance for such model uncertainty. I will give a general overview of this area, discuss the links between methods developed in statistics, econometrics and machine learning, and point out the connections between Bayesian and frequentist model averaging.
150213093631
Trends in the teaching of statistics and mathematics in the United States

Christine Franklin

Visiting Fulbright scholar University of Auckland, University of Georgia

Date: Thursday 26 March 2015

NOTE day, time and venue
The United States is realizing the need to achieve a level of quantitative literacy for its graduates to prepare them to thrive in the modern world. Given the prevalence of statistics in the media and workplace, individuals who aspire to a wide range of positions and careers require a certain level of statistical literacy. Because of the emphasis on data and statistical understanding, it is crucial for us as educators to consider how we can prepare a statistically literate population. Students must acquire an adequate level of statistical literacy through their education beginning in the first grade of education.
The Common Core State Standards for mathematics (that include statistics) in grades Kindergarten – 12 have been adopted by most states and the District of Columbia. These national standards for the teaching of statistics and probability range from counting the number in each category to determining statistical significance through the use of simulation and randomization tests. Soon, and for the first time, most of our entering college students will have been taught some statistics and probability, so our introductory college and university statistics courses will have to change. In addition, we must rethink the preparation of future K–12 teachers to teach this curriculum. Change in teacher preparation must thus be implemented in order to respond to the call from society for an increase in statistical understanding.
This presentation will provide a brief history of statistics at K-12 in the United States, an overview of the statistics and probability content of these standards, resources that support the K-12 standards in statistics, consider the effect in our introductory university statistics courses, and describe the knowledge and preparation needed by the future and current K–12 teachers who will be teaching using these standards. A new American Statistical Association strategic initiative, the Statistical Education of Teachers, will be outlined and the desired assessment of statistics at K-12 on the high stakes national tests will be explored.

This is a joint seminar by the Department of Mathematics and Statistics and Otago Mathematics Association (OMA)
150313090846
Random split-times for flexibly modeling non-proportional hazards covariate effects

Dan Gillen

University of California, Irvine

Date: Thursday 26 March 2015

In this talk we develop and apply flexible Bayesian survival analysis methods to investigate the risk of lymphoma associated with kidney transplantation among patients with end stage renal disease. Of key interest is the potentially time-varying effect of a time-dependent exposure: transplant status. Bayesian modeling of the baseline hazard and the effect of transplant requires consideration of two time scales; time since study start and time since transplantation, respectively. Previous related work has not dealt with the separation of multiple time scales. Using a hierarchical model for the hazard function, both time scales are incorporated via conditionally independent stochastic processes; smoothing of each process is specified via intrinsic conditional Gaussian autoregressions. Features of the corresponding posterior distribution are evaluated from draws obtained via a Metropolis-Hastings-Green algorithm.
150224151013
Occupancy modeling of breeding amphibians in the Greater Yellowstone area using covariates

William Gould

New Mexico State University

Date: Thursday 19 March 2015

A long-term amphibian monitoring program has been developed as a means for monitoring ecosystem health in Yellowstone and Grand Teton National Parks. Annual surveys for breeding occupancy were conducted over years 2006-2012. We used multi-season occupancy estimation to assess changes in the occurrence of tiger salamanders, boreal chorus frogs and Columbia-spotted frogs at two scales: small watershed (catchment) and individual site. Catchments were randomly selected and all wetlands within catchments were surveyed for breeding occurrence. In general, populations are stable or increasing over the 7-yr period, which is contrary to trends for amphibians worldwide. Interesting patterns of use will be described using a combination of habitat and climate covariates, but the inclusion of covariates also brings with it some complications that will be discussed.
150106125322
Analytical platforms for detecting signatures of natural selection in genome-wide datasets

Phillip Wilcox

Scion (New Zealand Forest Research Institute Ltd); Department of Biochemistry

Date: Thursday 12 March 2015

Advances in genomics technologies have enabled generation of genome-wide data on multiple individuals, creating new challenges for data management and analyses. Of particular interest to geneticists, are ‘signatures’ of natural and artificial selection within genome-wide data generated on populations. Recently, University of Otago researchers funded via the Virtual Institute of Statistical Genetics (VISG, www.visg.co.nz) developed an analytical pipeline to identify genomic regions that exhibit such signatures. In this seminar I will provide an overview of VISG, describe aspects of the analytical pipeline, and results from applying this to a gene (PPARGC1A) previously hypothesised to be subject to natural selection in Polynesians.
150226102153
Bayesian assessment of muscle fibre-type spatial interaction: Part II

Tilman Davies

Department of Mathematics and Statistics

Date: Thursday 5 March 2015

In November 2014, the unveiling of the CLUSTOMETER rocked the statistical world. This visual tool provides an estimate of the degree to which like-type muscle fibres tend to attract or repel one another when inspecting cross-sections of mammalian muscle fascicles. Driving it are estimates derived from a flexible Bayesian modeling framework, based on a thresholded hidden conditional autoregressive Gaussian field. Already, however, there are modifications that can be made to the model design and MCMC schemes, which can lead to more reliable posteriors and thus improve CLUSTOMETER readings. This talk discusses important changes that are made to the current implementation. Firstly, the prior distribution assigned to the correlation parameter is redefined, providing ‘fairer’ representation of the interactive parameter of interest. In addition, adaptive sampling is implemented the Metropolis-Hastings algorithms, obviating the need to manually tune acceptance rates.
150216103058
How to access your supercomputer

Peter Maxwell

NeSI, University of Otago

Date: Thursday 26 February 2015

NeSI is New Zealand's computing research infrastructure, providing high performance computing and support services. The University of Otago is a NeSI investor, and so Otago researchers can access most NeSI resources at no charge. I will describe the structure of NeSI, the services and hardware resources it makes available, how to access those resources, and the ways in which statisticians and others are using them.
141212140915
Emission Tomography and Bayesian inverse problems

Peter Green

University of Technology, Australia; University of Bristol, UK

Date: Thursday 27 November 2014

Note date, time and venue of this joint Physics and Statistics seminar

Inverse problems are almost ubiquitous in applied science and technology, so have long been a focus for intensive study, aimed at both methodological development and theoretical analysis. Formulating inverse problems as questions of Bayesian inference has great appeal in terms of coherence, interpretability and integration of uncertainty: the Bayesian formulation comes close to the way that many scientists intuitively regard the inferential task, and in principle allows the free use of subject knowledge in probabilistic model building.

The Bayesian approach to reconstruction in single-photon emission computed tomography will be briefly discussed, with several empirical illustrations. Theoretical results about consistency of the posterior distribution of the reconstruction will then be presented, along with a version of the Bernstein-von Mises theorem that provides an effective approximation to the posterior distribution in such ill-posed partly-non-regular generalised linear inverse problems with constraints. Technical details of proofs will be down-played in favour of visualisation and interpretation.

This talk is based on joint work with Natalia Bochkina (University of Edinburgh).
141125125600
A stats-related seminar in Preventive & Social Medicine: Assessing policy counterfactuals with a simulation-based inquiry system

Professor Peter Davis

University of Auckland

Date: Monday 20 October 2014

Note day, time and venue of this Preventive and Social Medicine seminar

Our research group has developed a simulation model that uses real-world data from existing longitudinal studies to mimic a representative sample of biographical trajectories in the early life-course. Because these data populate a simulation model that is calibrated to the real world, we have in effect created an inquiry system that can be interrogated by posing realistic counterfactual arguments of either a policy or theoretical nature.

In this presentation Professor Davis will first outline details of the construction of the inquiry system and then illustrate its application by assessing a social determinants of health (SDH) model of health and social outcomes, weighing the relative impact of structural-level versus personal factors in the SDH model. The findings have been that results are broadly similar across three domains – in health, education, and social behaviour – and show that structural factors can make more of a difference than personal behaviour, and can do so in such a way that the benefits of “intervention” flow disproportionately to the most disadvantaged.
141014104321
A stats-related seminar in Physics: Fast solutions to LARGE Bayesian linear models of 3-D confocal microscope images of biofilms

Dr Albert Parker

Montana State University

Date: Monday 20 October 2014

Note day, time and venue of this Physics seminar

Well established iterative techniques from numerical linear algebra can be used to quickly and inexpensively solve Bayesian linear inverse problems modeled by multivariate Gaussians. In this talk, the method of Chebyshev polynomials is applied to optimally speed up the geometric convergence rate of Gibbs samplers from these large models. The improved convergence rate is compared to the convergence rate of the common approach via Cholesky factorization. Results quantify the uncertainty of biofilm volumes, before and after microbocide treatments, estimated from 3-D movies from a confocal scanning laser microscope.
141017091932
The effect of non-normality on the t-test and an application of statistics to detect counterfeit medicines

Austina Clark

Mathematics and Statistics

Date: Thursday 9 October 2014

There are two parts of this talk. In the first part of the talk, we do a simulation study using four non-normal distributions to the t-test. Then Jarque-Bera test is used to generate sets of optimal parameters for normality. These parameters and the Jarque-Bera test statistic are examined together, using various sample sizes and confidence levels for the four distributions. A conjecture regarding the Jarque-Bera test statistic is suggested.

In the second part of the talk, we illustrate how statistics are used together with some pre-processing procedures to detect counterfeit medicines based on the active pharmaceutical ingredients.
140630085219
A statistics-related seminar in Physics: Statistical mechanics of flocking

Andy Martin

University of Melbourne

Date: Monday 6 October 2014

Note date, time and venue of this Physics seminar
Flocking refers to a system where coherent large scale behaviour is exhibited by self-propelled particles without a predetermined leader; the term self-propelled particles, refers to systems where the particles maintain an average constant speed. This sort of behaviour is displayed across a variety of biological systems, which vary widely in size, both in the size of the group and the physical extent of the system. For example, the spiral patterns and clustering motion of bacteria have been attributed to flocking behaviour. At a larger scale, flocking is observed in the motion of schools of fish and flights of birds; both in the way the groups move as a whole, as well as their response to predators.

In this talk I will discuss the current models underpinning our understanding of flocking behaviour. In particular, I will initially focus on minimal agent based models which demonstrate flocking behaviour through the introduction of an aligning force between the individual agents. I will then move on to introduce a different approach to studying flocking behaviour. This approach will be based on treating the flock as a gas with “novel” aligning interactions. From this perspective I will show how it is possible, adapting undergraduate statistical mechanics techniques, to analytically determine the thermodynamic properties of a flock. This approach will then be compared with numerical simulations of an equivalent agent based model. I will then speculate on future directions for flocking research.
140930120747
The statistical uses of entropy

Matt Parry

Mathematics and Statistics

Date: Thursday 2 October 2014

Entropy is a fundamental concept in information theory and in the physical sciences. Thanks to Fisher and Jaynes, entropy also has a long and (reasonably) respectable history in statistics. In this talk, I will focus on two particular applications: first, the role entropy plays in scoring rules and statistical estimation, and second, the maximum entropy principle. As a particular example of the latter, l will discuss the MaxEnt approach for species habitat modelling. I will also show how to win the game of "Twenty Questions".
140626144237
Statistical analysis of geophysical processes: abundance vs scarcity of data

Ting Wang

Department of Mathematics and Statistics

Date: Thursday 18 September 2014

Statistical methods have been widely applied to model geophysical processes. These include both processes with abundant data for a systematic analysis and processes with few observations, which often suffer from missing data problems. I will present a study for each case and discuss the challenges that we are facing in this field of research.

The first case study presents a systematic investigation aiming to extract anomalous signals in GPS measurements of ground deformation observed at 66 stations in Japan and test whether the filtered GPS signals can be used to improve earthquake forecasts. The second case study presents modelling of missing data in marked point processes with applications to volcanic hazard estimation using incomplete eruption records.
140626122507
Statistical considerations in developing trace metal signatures to establish the geographical origins of food

John Harraway

Department of Mathematics and Statistics

Date: Thursday 11 September 2014

A trace metal signature of a food sample can provide a way of identifying the geographic origin of the sample with potential commercial implications. This presentation reports our statistical analysis of trace metal data generated from samples of ginseng collected from farms in New Zealand, China, Canada and Wisconsin (USA). Ginseng takes up trace elements from surrounding soil and the pattern of the different concentrations produces a unique chemical signature for a given growing area. Wisconsin ginseng has reputedly a number of unique health properties. A widespread fraudulent commercial practice is to market the ginseng grown in these other areas as coming from Wisconsin. This concerns the Ginseng Board of Wisconsin who funded this research to develop a procedure that would uniquely identify their product.

Five samples of ginseng roots from each of 10 Wisconsin farms, 8 Canadian sources, 7 Chinese sources and 2 New Zealand farms were analysed. The concentrations of 40 trace metals were recorded. The origins of the variation present in the concentrations were investigated using principal component analysis and multidimensional scaling. Discriminant function analysis identified clear region differences with 96% of the samples correctly classified using a jack knife procedure. The Fisher Classification Functions were used to predict the probabilities that a new sample should be allocated to one of the regions. The results of re-sampling to test the prediction success will be reported along with an appreciation from the Wisconsin Ginseng Board. If time permits four other local applications of trace element analysis will be reported.
140808152639
Radiocarbon dating and New Zealand archaeology: the problem of chronology in a short archaeological sequence

Ian Barber

Department of Anthropology and Archaeology

Date: Wednesday 3 September 2014

Note - different day to usual for statistics seminar

The discipline of archaeology is dependent on the science of radiocarbon dating to build its chronologies. However, radiocarbon is not a straight-forward nor consistently reliable dating method. Variables that might impact radiocarbon measurement science include sample suitability, calibration curves and margins of error. In this seminar I review these variables as they affect radiocarbon chronologies of colonisation and change within the relatively short New Zealand archaeological sequence.
140725102528
Mark-recapture, misidentification and algebraic statistics

Matt Schofield

Department of Mathematics and Statistics

Date: Thursday 21 August 2014

Here we explore mark-recapture problems where an observed vector of counts, y, is considered as a linear function of a vector of latent counts, x, such that y = Ax. This setup is relatively general and has been used to fit mark-recapture models where individuals can be misidentified upon capture, as well as mark-recapture models where data are available from multiple sources that cannot be linked together. Current Bayesian approaches to model fitting use a Metropolis-Hastings algorithm to sample from the full conditional distribution of x, where new proposals are generated by sequentially adding elements from a basis of the null space (kernel) of the matrix A. We use this algorithm for three examples involving mark-recapture data and show that such an approach may not produce an irreducible Markov chain. To find a solution, we turn to algebraic statistics. We will give a brief introduction of algebraic statistics, before presenting results that allow us to specify an algorithm that does produce an irreducible Markov chain. We demonstrate these results for the examples introduced previously.
140808152545
Statistical applications in volcanology

Rebecca Green

Massey University

Date: Tuesday 12 August 2014

Note - different day, time and venue to usual for statistics seminar
The estimation of hazard arising from volcanic eruptions is a research topic of great interest to New Zealand, given the number and location of active and dormant volcanoes. Probabilistic temporal models are required to handle the stochastic nature of observed records. Such models are usually assembled using point process techniques renewal theory and most are purely temporal in the sense that they only consider the distribution of event or inter-event times as predictors of further volcanic activity. In this presentation I will illustrate through the use of a high-resolution eruption record from Mt. Taranaki (New Zealand) that by incorporating geochemical data using a proportional hazards type approach, the performance of current renewal-type models can be improved on.

Probabilistic forecasting of course relies on the accuracy and completeness of historical eruption records which poses the question of how to establish a detailed record of past volcanic events? Multiple sites are needed to build the most accurate composite tephra record, but correctly merging them by recognizing events in common and site-specific gaps remains complex. I present an automated procedure for finding the most feasible set of event matches by employing stochastic local optimization techniques.

After demonstrating the matching algorithm through application to stratigraphic records obtained from Mt Taranaki; methods of estimating the eruptive volume of events are investigated. Utilizing isopach maps and individual point observations a model is formulated, in a Bayesian framework, for the thicknesses of tephra deposits as a function of the distance and angular direction of each location. I estimate, in addition to estimating eruptive volume, the wind and site-specific effects for thickness deposits. These findings lead on to methods of incorporating eruptive volumes in hazard estimation, a future area for researching.
140807153714
Structural vector auto-regressions in economics: an example

Alfred Haug

Department of Economics

Date: Thursday 7 August 2014

The seminar will explain a commonly used technique in empirical macroeconomic analysis, using econometric time-series methods. It is based on “letting the data talk” by studying the dynamic interactions of a set of variables, a vector of variables, regressed on their own histories (auto-regressions). Data issues such as unit roots and cointegration will be discussed briefly before focusing on an application. The example used in the seminar is a so-called structural vector-autoregression with recent monetary and fiscal data for Poland. Fiscal foresight, in the form of implementation lags, is accounted for with respect to both discretionary government spending and tax changes. The importance of combining monetary and fiscal transmission mechanisms is demonstrated. However, ignoring fiscal foresight has no statistically significant effects. Government spending multipliers take on values from 0.16 to 1.61, depending on how they are calculated. The tax multiplier is not very large.
140331164635
Estimating Hector’s dolphin abundance: practical challenges and statistical solutions

Darryl MacKenzie

Proteus Wildlife Research Consultants

Date: Thursday 31 July 2014

A joint work with Deanna Clement, Cawthron Institute
Hector’s dolphin (Cephalorhynchus hectori hectori) are currently listed as ‘endangered’ by the IUCN and ‘nationally vulnerable’ by the Department of Conservation, with an estimated population size of 7,300 in 2004. The Ministry of Primary Industries is in the process of obtaining updated estimates of Hector’s dolphin abundance and distribution around the South Island, with aerial line-transect surveys being conducted along the east coast during the summer and winter of 2013. In this talk I shall discuss the design and analysis of the surveys; identifying some of the practical challenges faced such as allocation of limited survey effort, partial-overlapping observation zones and unobservable animals, and outlining the statistical solutions used to address those issues including the extension of mark-recapture distance-sampling methods. The results suggest that Hector’s dolphins are substantially more numerous along the east coast than previously believed.
140710094346
Meta-analysis of variation: ecological and evolutionary applications and beyond

Shinichi Nakagawa

Department of Zoology

Date: Thursday 29 May 2014

Meta-analysis has become a standard way of summarizing empirical studies in many fields, including ecology and evolution. At least, in ecology and evolution, meta-analyses comparing two groups (usually experimental and control groups) have almost exclusively focused on comparing the means, using standardized metrics such as Cohen’s / Hedges’ d or the response ratio. However, an experimental treatment may not only affect the mean but also the variance. Investigating differences in the variance between two groups may be insightful, especially when a treatment influences the variance in addition to or instead of the mean. We propose the effect size statistic, lnCVR (the natural logarithms of, log, ratio between the coefficients of variation, CV from two groups), which enables us to meta-analytically compare differences between the variability of two groups. We illustrate the use of lnCVR with examples from ecology and evolution. Further, as an alternative approach to the use of lnCVR, we propose the combined use of lnS (the log standard deviation) and lnX (the log mean) in a hierarchical model. The use of lnS with lnX overcomes potential limitations of lnCVR and it provides a more flexible, albeit more complex, way to examine variation beyond two group comparisons.
140326124835
Honeybees, mites and viruses

Graham Wood

Department of Biochemistry, University of Otago and University of Warwick

Date: Tuesday 27 May 2014

Joint mathematics and statistics seminar

A recombinant virus appears to be behind the loss of honeybee colonies to varroa mite infestation. Given genetic information about the viral recombinants in the honeybee from next generation sequencing, mathematical and statistical tools have been developed to determine both the recombinants present and their relative proportions. The method involves setting the problem geometrically and the use of appropriately constrained quadratic programming.

This seminar will present the background to the problem, together with the mathematical and statistical ideas that underpin the recombinant discovery. Output from software which runs the method, termed “MosaicSolver”, will be shown. This work is part of the “Insect Pollinators Initiative” currently underway in the UK.
140331164532
Project presentations


Date: Tuesday 27 May 2014

Chuen Yen Hong: Confidence intervals for the mean effect size in random-effects meta-analysis

Ben Atkins: The Trojan female technique: a revolutionary approach for effective pest control

Cain Edie-Michell: Ideal structure of Steinberg algebras

Chris Palmer: Actions of discrete groups on the Cantor set

Silong Liao: Glaucoma treatment: survival analysis

Baylee Smith: Spatial summaries of phase four Bronze Age burials at Ban Non Wat, Thailand
140523154928
Measuring lack-of-fit of a Bayesian model

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 22 May 2014

When using the Bayesian framework to fit a model, posterior predictive checking is often used to assess lack-of-fit. In particular, the posterior predictive p-value can help decide whether the model needs to be improved. We propose a posterior predictive calculation that quantifies the amount of lack-of-fit. It can be used to allow for overdispersion in a manner that is similar to use of quasi-likelihood in the frequentist framework. This is joint work with Peter Dillingham (University of New England, Armidale).
140317145245
The naive usage of inverse statistical inference in plant ecology

Steven Higgins

Department of Botany

Date: Thursday 15 May 2014

140326124628
A statistics-related seminar in Geology: Long-term forecasting of volcanic explosivity

Mark Bebbington

Massey University

Date: Friday 9 May 2014

Note day, time and venue
It has been thirty years since the terms time-predictable (repose length increases with previous eruption size) and size-predictable (eruption size increases with repose length) entered the volcanological lexicon. While much evidence of, and models for, the former have emerged, the latter is still largely unsubstantiated. Statistical tests for size-predictability from individual volcanoes suffer from insufficient power and the inherent non-normality and non-linearity in the relationship. Aggregating data from several volcanoes is difficult due to the different temporal and size scales involved. Here we consider characterizing the VEI distribution by a parameter, which is itself influenced by the length of the previous repose, the state of the conduit (open or closed) and possibly other factors. Dependency between the parameters for different eruptions at the same volcano is introduced using a multilevel (hierarchical) Bayesian formulation. Using data from Indonesia, largely since AD 1800, we find that there is a significant probability (> 0.999) that the VEI of the next eruption from closed conduit volcanoes increases with increasing repose length. For example, a further 10-year wait for the next eruption from Kelut increases the probability of a VEI >2 by approximately 11%. On the other hand, open conduit volcanoes show no evidence of an increase in VEI with repose length. The results are insensitive to the details of the VEI distribution, prior distributions or number of levels in the Bayesian structure.
140416090727
Spatial cue-mixture models for estimating bird song rate and population density

Murray Efford

Department of Mathematics and Statistics

Date: Thursday 8 May 2014

This is joint work with Deanna Dawson, USGS Patuxent Wildlife Research Center

Existing methods for the analysis of sound recordings to estimate bird population density rely either on distinguishing individuals or on estimating the per capita song rate. What should we do when individuals cannot be distinguished and there is no credible estimate of song rate? We suggest a solution for recordings replicated in space that uses a hierarchical model in which song rate is a latent variable. Our data are the breeding-season songs of three warbler species in 10-minute recordings at each of 272 forested points in Maryland, USA. We also operated a 4-microphone array for 10 minutes at 66 of these points, allowing us to estimate distance-related sound attenuation and spatial detection probability for songs of these species. This gave plausible estimates of density, and the 2-phase design is easily scaled to survey birds across landscapes. However, evidence will be presented that casts doubt on the robustness of the approach.
140319130408
Continuous time capture-recapture

Richard Barker

Department of Mathematics and Statistics

Date: Thursday 1 May 2014

Motivated by field sampling of DNA fragments, we describe a general model for capture-recapture modeling of samples drawn one at a time in continuous-time. Our model is based on Poisson sampling where the sampling time may be unobserved. We show that previously described models correspond to partial likelihoods from our Poisson model and their use may be justified through arguments concerning S- and Bayes-ancillarity of discarded information. We demonstrate a further link to continuous-time capture-recapture models and explain observations that have been made about this class of models in terms of partial ancillarity.
140319130938
Just can't say no: A showcase of consultant-style collaborations

Tilman Davies

Department of Mathematics and Statistics

Date: Thursday 10 April 2014

The practice of statistics unquestionably spans many disciplines. As we all are aware, there is an ongoing demand throughout most fields for sound data analysis. This became particularly obvious when, two months into a job, my phone started ringing with requests from complete strangers. In this talk, I put aside my own research to share some of the collaborative ‘consultant-style’ experiences I have had to date as an infant academic. This includes both glamorous (novel problems requiring interesting methodology) and less than glamorous (pie charts in Excel) pursuits, all of which serve to answer various research questions in the applied sciences.
140317120135
Optimization of vaccine allocation in a foot-and-mouth disease outbreak in the USA

Will Probert

Penn State University

Date: Tuesday 25 February 2014

140221113038
Point processes in statistical risk analysis

Winfried Stute

University of Giessen, Germany

Date: Monday 9 December 2013

NOTE: Different day and time to our usual
Point processes (in time) describe important changes in the status of a biological, economic or technical “unit”. Its dynamics may be very complicated, and in several applications (market research, medicine) they may be subject to manipulations. In this talk we discuss the role of hazard processes in modeling such processes and discuss some issues of their statistical analysis.
131203151912
Quantifying climate niches

Ralf Ohlemüller

Department of Geography

Date: Thursday 7 November 2013

Any location is characterised by a multivariate set of environmental conditions and these conditions are one of the filters determining which species occur at that location. Species are adapted to certain environmental conditions (their niche) and this niche may or may not be modified by the species’ interactions with other species. With changing environmental conditions, species either need to adapt to the new conditions or move to areas where suitable conditions remain. Insights into the spatial distribution and dynamics of past, current and future climate niche conditions inform our understanding of a species’ ecology and evolution. In this seminar I will illustrate recent approaches for quantifying the spatial distribution of climate niche conditions for a range of environments and spatial and temporal scales.
130712100640
Living with heterogeneity in capture–recapture data: lessons from grizzly bears

Murray Efford

Department of Mathematics and Statistics

Date: Thursday 24 October 2013

Conventional capture–recapture models use a scalar parameter p to adjust for incomplete detection in samples from biological populations. Variation in p between individuals ("heterogeneity") causes bias in estimates of population size and keeps statisticians employed. In spatially explicit capture–recapture (SECR), p is replaced by a function of distance that usually has two parameters – one non-spatial and one representing the scale of movement (home-range size). Individual animals routinely differ in their scale of movement, and I show by simulation that heterogeneity in this parameter causes bias in density estimates. However, this is not the end of the story. Both the non-spatial and spatial detection parameters are likely to vary between individuals, and under a simple home-range model the variation is reciprocal. Simulations show the bias in density estimates then approaches zero. This seems to be the reality for a grizzly bear (Ursus arctos) DNA dataset from British Columbia: sex differences in the two parameters of detection were almost perfectly compensatory, so there was no difference in effective sampling area between sexes and no bias in density estimates from a null model. The finding has implications for model selection and suggests the SECR model should be parameterised in terms of effective sampling area. However, this brings problems of its own, and I propose a simple alternative.
130912134701
Project presentations

Honours / PGdip students


Date: Friday 11 October 2013

Statistics
Claire Flynn : Adaptive kernel density methods for estimating relative risk in geographical epidemiology
Yuki Fujita : Analysis of bird count data

Mathematics
Jack Cowie : Groupoids, partial actions and Baumslag-Solitar groups
Rotem Edwy: The integer Heisenberg group acts self-similarly
Calum Rickard : Unbounded extension of the Hille-Phillips functional calculus
Emily Irwin : HNN extensions, normal forms and Baumslag-Solitar groups

Note day and time of this event
120904102920