# 11 Epidemiology

This lesson investigates whether childhood circumcision reduces the risk of acquiring genital herpes in men, providing a brief introduction to the field of epidemiology. It is presented by Dr Nigel Dickson (University of Otago Medical School).

## Data

There are 2 files associated with this presentation. The first contains the data you will need to complete the lesson tasks, and the second contains descriptions of the variables included in the data file.

## Video

## Objectives

## Tasks

### 0. Read and Format data

#### 0a. Read in the data

First check you have installed the package `readxl`

(see Section 2.6) and set the working directory (see Section 2.1), using instructions in Getting started with R.

Load the data into R.

## Code

```
#loads readxl package
library(readxl)
#loads the data file and names it herpes
<-read_xls("Herpes data.xls")
herpes
#view beginning of data frame
head(herpes)
```

## Code

```
#loads readxl package
library(readxl)
```

`Warning: package 'readxl' was built under R version 4.2.2`

## Code

```
#loads the data file and names it herpes
<-read_xls("Herpes data.xls")
herpes
#view beginning of data frame
head(herpes)
```

```
# A tibble: 6 × 9
ID CIRCUMCISED HSV2POS UNDER_10_PART NUM_PART SOCIO…¹ SAMESEX SCHOO…² STI5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 15 2 0 2 0
2 2 1 0 1 6 3 0 1 0
3 3 0 0 1 1 3 0 3 1
4 4 1 0 1 2 3 0 1 0
5 5 0 0 0 15 3 0 2 0
6 6 0 0 0 17 3 0 2 0
# … with abbreviated variable names ¹SOCIOECONOMIC, ²SCHOOLING
```

The data used for this study has been modified for teaching purposes by supposing a larger cohort of men than were originally recorded. 1890 men were followed from birth (1972), with the aim to investigate whether early childhood circumcision (by age 3) reduces the risk of herpes (the original cohort was approximately 1100 men).

The variables recorded are **ID**, **CIRCUMCISED**, **HSV2POS** (positive test for herpes), **UNDER_10_PART** (Under 10 sexual partners), **NUM_PART** (total number of sexual partners), **SOCIOECONOMIC** (socio-economic status), **SAMESEX** (same sex contact), **SCHOOLING** and **STI5**(sexually transmitted infection in the last 5 years). It should be noted that there are many variables recorded in this data set, however it is not necessary to use them all at one time. It is helpful to collect as much information as possible when conducting a study so that different hypotheses can be tested later on.

#### 0b. Format the data

A lot of these variables are actually factors, the numbers represent categories or groups. R has classified them as numeric variables, so we will recode them as factors.

If you have already completed the earlier lessons focusing on continuous data, you will have seen some examples of numeric and character vectors that are actually factor variables. Try to identify and convert these yourself in the `herpes`

data before revealing the code.

## Code

```
$CIRCUMCISED<-as.factor(herpes$CIRCUMCISED)
herpes$UNDER_10_PART<-as.factor(herpes$UNDER_10_PART)
herpes$SOCIOECONOMIC<-as.factor(herpes$SOCIOECONOMIC) herpes
```

The variables **SAMESEX**, **SCHOOLING** and **STI5** should also be recoded as factors.

## Code

```
$CIRCUMCISED<-as.factor(herpes$CIRCUMCISED)
herpes$UNDER_10_PART<-as.factor(herpes$UNDER_10_PART)
herpes$SOCIOECONOMIC<-as.factor(herpes$SOCIOECONOMIC)
herpes$CIRCUMCISED<-as.factor(herpes$CIRCUMCISED)
herpes$UNDER_10_PART<-as.factor(herpes$UNDER_10_PART)
herpes$SOCIOECONOMIC<-as.factor(herpes$SOCIOECONOMIC)
herpes$SAMESEX<-as.factor(herpes$SAMESEX)
herpes$SCHOOLING<-as.factor(herpes$SCHOOLING)
herpes$STI5<-as.factor(herpes$STI5) herpes
```

### 1. Cross-Tabulation (2 variables)

The first step of the analysis is to get a cross-tabulation of some of the data we are looking at. Create a table of **CIRCUMCISED** and **HSV2POS** data.

Calculate the percentage of men that tested HSV2-positive (positive for herpes) for both the circumcised group and the non-circumcised group. What do you see?

## Code

```
#create basic table of counts across the categories of 2 variables
#dnn= c() title vector has 2 entries
#specified in the same order the variables were provided to the table function
<-table(herpes$CIRCUMCISED,herpes$HSV2POS,dnn=c("Circumcised","HSV2POS"))
CHVtable
#adds row and column sums
addmargins(CHVtable)
```

## Code

```
#create basic table of counts across the categories of 2 variables
#dnn= c() title vector has 2 entries
#specified in the same order the variables were provided to the table function
<-table(herpes$CIRCUMCISED,herpes$HSV2POS,dnn=c("Circumcised","HSV2POS"))
CHVtable
#adds row and column sums
addmargins(CHVtable)
```

```
HSV2POS
Circumcised 0 1 Sum
0 1022 98 1120
1 722 48 770
Sum 1744 146 1890
```

## Code

`98/1120)*100 (`

`[1] 8.75`

## Code

`48/770)*100 (`

`[1] 6.233766`

8.75% of men in the non-circumcised group tested positive for herpes, slightly more than the 6.23% of men in the circumcised group who tested positive.

### 2. Confidence Intervals, Hypothesis Tests (proportions, difference in proportions)

#### 2a. Confidence Intervals, Hypothesis Tests (proportions)

Construct 95% confidence intervals for the prevalence (proportions) at age 26 of herpes **HVS2POS** for both the non-circumcised and circumcised groups.

What are your conclusions from the two confidence intervals and hypothesis tests?

It is most efficient to calculate the confidence intervals in R using the `prop.test()`

function. This is similar to the `t.test()`

function we have previously used for continuous data, the only difference is it is used for proportions.

## Code

```
#first argument is the number of successes (in this case having herpes and not being circumcised)
#second argument is number of trials
#note the use of length to sum the counts
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0")),
n=length(which(herpes$CIRCUMCISED=="0")))
```

From the output above we have a 95% confidence interval for the true proportion of non-circumcised men who have herpes at age 26. Along with this interval, `prop.test()`

provides a best estimate of the proportion which you can multiply by 100 to check your earlier percentage estimate.

We also have the result of a hypothesis test, where the null hypothesis is that the true proportion of non-circumcise men with herpes at age 26 is equal to 0.5.

It is most efficient to calculate the confidence intervals in R using the `prop.test()`

function. This is similar to the `t.test()`

function used for continuous data, the only difference is it is used for proportions.

## Code

```
#first argument is the number of successes (in this case having herpes and not being circumcised)
#second argument is number of trials
#note the use of length to sum the counts
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0")),
n=length(which(herpes$CIRCUMCISED=="0")))
```

```
1-sample proportions test with continuity correction
data: length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0")) out of length(which(herpes$CIRCUMCISED == "0")), null probability 0.5
X-squared = 760.65, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.07192269 0.10597295
sample estimates:
p
0.0875
```

We can be 95% confident that the true proportion of non-circumcised men who have herpes at age 26 lies between 0.0719 and 0.1060, with a best estimate of 0.0875 (8.75%).

The corresponding hypothesis test has a p-value of < 2.2e-16, giving us significant evidence to reject the null hypothesis in favour of the alternative and conclude that the true proportion of non-circumcised men with herpes at age 26 is not equal to 0.5 (i.e. chance).

Construct a confidence interval and hypothesis test for the proportion of men with herpes in the circumcised group.

#### 2b. Confidence Interval and Hypothesis Test (difference in proportions)

Construct a 95% confidence interval for the difference in **HSV2POS** proportions according to the **CIRCUMCISED** variable.

What can you conclude from the confidence interval and hypothesis test for the difference between the two proportions? Are any of the results that you have found so far significant?

This follows a similar, but slightly more involved procedure as the confidence intervals for proportions.

## Code

```
#First argument is a vector of successes for each of the groups to be compared
#second argument is a vector of the number of trials for each group
prop.test(c(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0")),length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"))),
n=c(length(which(herpes$CIRCUMCISED=="0")),length(which(herpes$CIRCUMCISED=="1"))))
```

## Code

```
#First argument is a vector of successes for each of the groups to be compared
#second argument is a vector of the number of trials for each group
prop.test(c(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0")),length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"))),
n=c(length(which(herpes$CIRCUMCISED=="0")),length(which(herpes$CIRCUMCISED=="1"))))
```

```
2-sample test for equality of proportions with continuity correction
data: c(length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0")), length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "1"))) out of c(length(which(herpes$CIRCUMCISED == "0")), length(which(herpes$CIRCUMCISED == "1")))
X-squared = 3.7077, df = 1, p-value = 0.05416
alternative hypothesis: two.sided
95 percent confidence interval:
0.0002870711 0.0500376042
sample estimates:
prop 1 prop 2
0.08750000 0.06233766
```

We can be 95% confident that the true difference in proportions of non-circumcised and circumcised men with herpes at age 26 lies between 0.0003 and 0.0500, with a best estimate of 0.0252 (2.52%). This confidence interval is entirely (just) above 0, suggesting that the proportion of non-circumcised men with herpes is higher than the proportion of circumcised men.

The corresponding hypothesis test has a p-value of 0.0542, this gives us borderline significant evidence to reject the null hypothesis in favour of the alternative but is still greater than 0.05. From the hypothesis test we would therefore conclude that the true difference in proportions of non-circumcised and circumcised men with herpes at age 26 is not significantly different to 0.

### 3. Relative Risk

Relative risk (RR) is the chance of an event (in this case the development of herpes (**HSV2POS**)) occurring relative to exposure. I.e. it is a ratio of the probability of an event occurring in the exposed group relative to that in the non-exposed group (here ‘exposure’ is circumcision (**CIRCUMCISED=1**)).

To calculate the relative risk for herpes in the circumcised group relative to the non-circumcised group we use the table we produced in Task 1. We use the formula:

\[ \frac{p_{\text{circumcised}}}{p_{\text{non-circumcised}}}\]

where \(p_{\text{circumcised}}\) is the probability of testing HSV2-positive in the circumcised group and \(p_{\text{non-circumcised}}\) is the probability of testing HSV2-positive in the non-circumcised group.

Calculate the above relative risk and state your conclusion in relation to the effect of **CIRCUMCISED** on the likelihood of contracting herpes **HSV2POS**.

## Code

```
<-(98/1120)
pncirc<-(48/770)
pcirc
<-pcirc/pncirc
rr rr
```

`[1] 0.7124304`

The risk of testing positive for herpes at age 26 in the circumcised group is 0.7124 times the risk of testing positive for herpes in the non-circumcised group. As this value is less than 1 (1 is equal risk) this suggests there is a reduced herpes risk for men who are circumcised, but it is not a dramatic decrease.

To make a definitive conclusion about the relative risk of herpes in circumcised and non-circumcised men a confidence interval would need to be calculated. This accounts for the uncertainty that is inherent in our above estimate due to using a sample rather than the entire population of men aged 26. If the confidence interval included 1 there would be no significant difference in the relative herpes risk, if it was entirely less than 1 we would conclude the risk is lower for circumcised men, if it was entirely greater than 1 (not possible in this case as our estimate, which makes up the mid-point of the interval, is less than 1) we would conclude the risk is higher for circumcised men.

### 4. Experimental Design

Discuss the importance of a designed (experimental) study as opposed to an observational study.

The discussion should cover adjustment for confounders either by stratification (as is described in the accompanying video) or by constructing a designed (clinical) trial to control confounders.

See video Section 11.1

An experimental study is able to establish a causal relationship between the predictors and response of interest. Observational studies can show a correlation, but it is not possible to conclude causality as this may instead be due to confounding variables.

Confounding variables have an association with both the predictor and response values so can obscure their true relationship.

Adjustment for potential confounders can be carried out by performing separate analysis of the effect of predictor variables on the response for different levels of the confounding variable. This allows us to see the relationship between the predictor and response within similar confounding values, and takes into account the confounding variable to test for a true effect.

In a designed clinical trial participants are randomly assigned to receive either the treatment of interest or the control, so the impact of potential confounders is reduced as they can no longer influence whether a participant receives the treatment or not. As circumcision is a surgical procedure with relatively permanent effects it is not ethical to carry out an experimental study when this is the treatment of interest.

### 5. Cross-Tabulation (3 variables)

#### 5a. Cross-Tabulation

Stratify the groups by the number of sexual partners. We need a categorical variable for stratification so use **UNDER_10_PART** (under 10 sexual partners, yes/no).

Add the stratifying binary variable **UNDER_10_PART** to the contingency table from Task 1. Try this yourself and then reveal the code.

## Code

```
<-table(herpes$CIRCUMCISED,herpes$HSV2POS,herpes$UNDER_10_PART,
CHV10tablednn=c("Circumcised","HSV2POS","Under 10 Partners"))
addmargins(CHV10table)
```

## Code

```
<-table(herpes$CIRCUMCISED,herpes$HSV2POS,herpes$UNDER_10_PART,
CHV10tablednn=c("Circumcised","HSV2POS","Under 10 Partners"))
addmargins(CHV10table)
```

```
, , Under 10 Partners = 0
HSV2POS
Circumcised 0 1 Sum
0 552 74 626
1 186 24 210
Sum 738 98 836
, , Under 10 Partners = 1
HSV2POS
Circumcised 0 1 Sum
0 470 24 494
1 536 24 560
Sum 1006 48 1054
, , Under 10 Partners = Sum
HSV2POS
Circumcised 0 1 Sum
0 1022 98 1120
1 722 48 770
Sum 1744 146 1890
```

#### 5b. Calculate Proportions

Calculate the proportion of men that tested positive for HSV2 for each of the four combinations by hand.

Not Circumcised, ≥ 10 Partners:

## Code

`74/626`

Not Circumcised, ≥ 10 Partners:

## Code

`74/626`

`[1] 0.1182109`

Not Circumcised, < 10 Partners:

## Code

`24/494`

`[1] 0.048583`

Circumcised, ≥ 10 Partners:

## Code

`24/210`

`[1] 0.1142857`

Circumcised, < 10 Partners:

## Code

`24/560`

`[1] 0.04285714`

There appears to be a higher proportion of men with more than 10 partners testing positive for herpes compared to those with less than 10, across both the circumcised and non-circumcised groups.

### 6. Practice: Confidence Intervals, Hypothesis Tests (proportions, difference in proportions)

#### 6a. Confidence Intervals, Hypothesis Tests (proportions)

Calculate the confidence intervals for the proportions in Task 5.

Discuss any significant results that you see from the confidence intervals and hypothesis tests for the four proportions.

The calculation of the proportion,confidence interval, and hypothesis test for men with HSV2 that are not circumcised and have had greater than or equal to 10 sexual partners is shown below.

## Code

```
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),
n=length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")))
```

The null hypothesis is that the proportion of non-circumcised men with 10 or more partners who have herpes is equal to 0.5, and the alternative hypothesis that this proportion is not equal to 0.5.

## Code

```
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),
n=length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")))
```

```
1-sample proportions test with continuity correction
data: length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "0")) out of length(which(herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "0")), null probability 0.5
X-squared = 363.46, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.09449736 0.14672750
sample estimates:
p
0.1182109
```

## Code

```
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0")),
n=length(which(herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0")))
```

```
1-sample proportions test with continuity correction
data: length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "0")) out of length(which(herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "0")), null probability 0.5
X-squared = 123.43, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.07605235 0.16712714
sample estimates:
p
0.1142857
```

## Code

```
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="1")),
n=length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="1")))
```

```
1-sample proportions test with continuity correction
data: length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "1")) out of length(which(herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "1")), null probability 0.5
X-squared = 400.86, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.03204103 0.07245579
sample estimates:
p
0.048583
```

## Code

```
prop.test(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="1")),
n=length(which(herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="1")))
```

```
1-sample proportions test with continuity correction
data: length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "1")) out of length(which(herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "1")), null probability 0.5
X-squared = 466.29, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.02824317 0.06402501
sample estimates:
p
0.04285714
```

All proportions are significantly different from chance (0.5).

#### 6b. Confidence Intervals, Hypothesis Tests (difference in proportions)

Calculate the confidence interval for the difference between proportions.

What conclusions can you draw from your confidence intervals? Discuss any significant (or non-significant) differences that you see and compare these with the results found in Task 2.

What effect has the inclusion of ‘number of sexual partners’ had on the conclusions that were made?

The procedure for the difference between proportions for the circumcised group with \(\geq 10\) sexual partners and the non-circumcised group with \(\geq 10\) sexual partners is shown.

## Code

```
#for a difference in proportions we supply vectors to the arguments for number of successes and number of trials
prop.test(c(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0"))),
n=c(length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),length(which(herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0"))))
```

The null hypothesis is that for men with more than 10 partners the difference in the proportion of circumcised and non-circumcised men with herpes is equal to 0, and the alternative hypothesis that this proportion is not equal to 0. The standard error formula for this test uses the pooled variance.

Repeat this procedure for the difference between proportions for the circumcised group with <10 sexual partners and the non-circumcised group with <10 sexual partners.

## Code

```
#for a difference in proportions we supply vectors to the arguments for number of successes and number of trials
prop.test(c(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0"))),
n=c(length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="0")),length(which(herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="0"))))
```

```
2-sample test for equality of proportions with continuity correction
data: c(length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "0")), length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "0"))) out of c(length(which(herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "0")), length(which(herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "0")))
X-squared = 0.00084447, df = 1, p-value = 0.9768
alternative hypothesis: two.sided
95 percent confidence interval:
-0.04916763 0.05701792
sample estimates:
prop 1 prop 2
0.1182109 0.1142857
```

## Code

```
#for a difference in proportions we supply vectors to the arguments for number of successes and number of trials
prop.test(c(length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="1")),length(which(herpes$HSV2POS=="1"&herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="1"))),
n=c(length(which(herpes$CIRCUMCISED=="0"&herpes$UNDER_10_PART=="1")),length(which(herpes$CIRCUMCISED=="1"&herpes$UNDER_10_PART=="1"))))
```

```
2-sample test for equality of proportions with continuity correction
data: c(length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "1")), length(which(herpes$HSV2POS == "1" & herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "1"))) out of c(length(which(herpes$CIRCUMCISED == "0" & herpes$UNDER_10_PART == "1")), length(which(herpes$CIRCUMCISED == "1" & herpes$UNDER_10_PART == "1")))
X-squared = 0.088153, df = 1, p-value = 0.7665
alternative hypothesis: two.sided
95 percent confidence interval:
-0.02149373 0.03294544
sample estimates:
prop 1 prop 2
0.04858300 0.04285714
```

There are no longer significant differences in the proportion of non-circumcised and circumcised men with herpes at age 26 after stratifying by number of partners. This shows number of partners was a confounding variable (affecting both likelihood of circumcision and likelihood of herpes) that influenced our earlier conclusion.

### 7. Logistic Regression

Perform a logistic regression with **HSV2POS** (herpes positive or negative) as the response and **CIRCUMCISED** (yes or no) as the predictor.

Write down the model for the regression of circumcision on herpes. State any conclusions you have based on the model estimates and discuss these with reference to the Chi-squared value.

Use the `glm()`

function to fit a generalised linear model in R. By specifying the family as binomial (binary response) with a logit link, we perform a logistic regression.

## Code

```
#fit model using formula response~predictor
<-glm(HSV2POS~CIRCUMCISED,data=herpes,family=binomial(link="logit"))
CHVmodel
summary(CHVmodel)
#perform ANOVA test for significance of predictor
anova(CHVmodel,test="Chisq")
```

## Code

```
#fit model using formula response~predictor
<-glm(HSV2POS~CIRCUMCISED,data=herpes,family=binomial(link="logit"))
CHVmodel
summary(CHVmodel)
```

```
Call:
glm(formula = HSV2POS ~ CIRCUMCISED, family = binomial(link = "logit"),
data = herpes)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4279 -0.4279 -0.4279 -0.3588 2.3559
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3445 0.1057 -22.171 <2e-16 ***
CIRCUMCISED1 -0.3663 0.1828 -2.004 0.0451 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1028.2 on 1889 degrees of freedom
Residual deviance: 1024.0 on 1888 degrees of freedom
AIC: 1028
Number of Fisher Scoring iterations: 5
```

## Code

```
#perform ANOVA test for significance of predictor
anova(CHVmodel,test="Chisq")
```

```
Analysis of Deviance Table
Model: binomial, link: logit
Response: HSV2POS
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1889 1028.2
CIRCUMCISED 1 4.1474 1888 1024.0 0.0417 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The regression equation for the fitted model may be written as follows

\[ \log \frac{p}{(1-p)} = -2.3445 - 0.3663_{CIRCUMCISED} \] where \(p\) is the probability for males of testing positive for herpes at age 26.

For males the odds of testing positive for herpes if circumcised is decreased by a multiplicative factor of \(exp(- 0.3663) = 0.6933\) compared to non-circumcised men. With a p-value of 0.0451 this is a (just) significant effect.

The analysis of deviance chi-squared value for the **CIRCUMCISED** variable is 0.0417, indicating a borderline significant increase in deviance if it was removed from the model. Therefore, it has some explanatory power and may indicate a significant effect. However, based on our earlier findings it may be useful to take into account some potentially confounding variables in our model and re-assess our conclusions.

Perform a logistic regression with **HSV2POS** as the response and **CIRCUMCISED** + **UNDER_10_PART** as the predictors.

Write down the second model and state any new conclusions that arise. Discuss the parameter estimates and chi-squared value with reference to the results you found in 7.

Do these results agree with those found earlier in the lesson?