BIAS & CONFOUNDING

Bias & Confounding Problem Bank Answers

Question 1
Systematic error occurs if there is a difference between the bulls-eye of the target you think you are shooting at and the bulls-eye that you are actually shooting at. Random error occurs if your shot at the bulls-eye you are actually aiming at does not hit the bulls-eye.

Question 2
OR is biased away from the null because cell A is erroneously inflated and cell C is deflated.

OR is biased towards the null because non-differential misclassification of the exposure will bias results towards the null.

OR is biased towards the null because non-differential misclassification of the exposure will bias results towards the null.

OR is biased away from the null because B is deflated.

OR is biased away from the null because cell A is inflated (differential misclassification).

OR is biased away from the null because cell A is inflated (differential misclassification).

Question 3
If the bias is towards the null, then the true (i.e., population) odds ratio has to be at least as large as the estimate (3.5) obtained from your study. This means that the true odds ratio is quite strong (i.e., larger than 3.5) which supports the conclusion of an exposure effect.

If the bias is away from the null, then the true (i.e., population) odds ratio will be less than the estimate (3.5) obtained from your study. This means that the true odds ratio is may even be close to the null value (i.e., no association), but alternatively the true odds ratio may still be large (though less than 3.5). Despite the fact that it is still possible that the drug is effective (since the true OR may still be >1), a generally accepted principle in clinical trial trials studies is that any drug already marketed that is found from further study to have efficacy that is biased away from the null will typically be taken off the market until a corrected value of its effect can be determined from additional trials or analyses. If the question had been modified as follows, “Assuming that the bias is away from the null, can you also conclude that the drug is not effective?”, you answer would be that the drug may still be effective, since the true OR may be > 1.

Question 4
If persons who had high blood pressure (a known risk factor for CHD) prior to the start of follow-up were excluded from the comparison (no allergy) group, but NOT from the FA (allergy) group, then there should be concern about selection bias that would be AWAY FROM THE NULL.

Question 5
Recall bias of the exposure information Yes

Differential misclassification of the exposure Yes

Non-differential misclassification of the exposure No

Selection bias Yes (because it is a prospective cohort study with nearly 100% follow-up)

Question 6
Interviewer bias: mask interviewers to the study hypothesis and to the disease or exposure status of the study subjects, and carefully design the interview instrument

Recall bias: Mask study subjects to the study hypothesis, use diseased controls if conducting a case-control study, and carefully design an interview instrument

Ensure that selection of cases and controls is independent of exposure (in a case-control study) and that selection of exposed and unexposed groups is independent of outcome (in a retrospective cohort study) and obtain high follow-up and participation rates (all types of studies).

Use the most accurate source of information, and use sensitive and specific criteria to define the exposure and disease.

Question 7
Both retrospective cohort and case-control studies

Question 8
A.
Case-control would probably be the most appropriate and efficient design. TSS is a rare disease and tampon use is a common exposure.

B.
Please note that there are multiple ways to approach a question like this one. The objective here is not to elicit one right answer, but rather to get you thinking about selection bias in the context of a real example.

Control selection may be difficult. Cases would most likely present to emergency rooms and would be admitted to hospitals. Hospitalized controls may not represent the exposure distribution in the source population. In general, populations found in hospitals are less healthy, have more comorbidities, and may have a condition related to the exposure (tampon use). For example, if controls were selected from patients admitted for bone fractures, and young women with fractures are more likely to be athletes and therefore have amenorrhea (thus using fewer tampons), the measure of effect would be biased. Or, if women were selected from hospitalized women with OB/GYN issues they may be less likely to use tampons. Neighborhood and friend controls could be too closely matched on the exposure of interest (it is likely that females who use tampons have friends who use tampons); community controls may refuse to participate (non-response bias).

C.
Some specific types of information bias we might expect in a study such as this are:

Recall Bias: Women with TSS might be more likely to recall tampon use and the brand or absorption of tampons used because they have an interest in knowing the cause of their disease.

Interviewer Bias: If the interviewer is not blinded to women’s TSS status, then the interviewer may be more likely to search for exposure information on the putative cause (tampon use). Unlike diagnostic suspicion, which influenced the probability about getting into the study, this is about the probability of identifying someone as exposed or unexposed after knowing their outcome status.

D.
Use objective, accurate sources of information (e.g., ask women to bring one of their unused tampons from home so the brand and absorbency can be recorded)

Use standardized data collection forms, methods, etc.

Use recall cues & techniques (e.g., calendars, photos of brand logos)

Increase privacy to facilitate truthful answering of sensitive questions on questionnaires. This might include audio- or computer-assisted interviewing methods

Obtain high levels of participation. When this is not possible, collect all possible information on non-responders to determine whether and how participants and non-participants are different

Maintain high adherence rates / low levels of loss-to-follow-up (keep high quality and updated contact information on participants)

Masking those collecting data and those providing data

Question 9
a.
If we had complete follow-up on all 1,347 women, we would have a total of 1347 * 8 = 10,776 woman-years for the exposed. The total possible woman-years of follow-up is 10,776. Since we actually have only 4,222 woman-years of follow-up, we know that 10776 - 4222 = 6554 woman-years were lost to follow-up. You could also calculate this as (311*8) + (1036*8)-4222=6554 if you want to consider those completely lost and those partially lost separately.

b.
If symptoms of disease increase follow-up visits, then the incidence density ratio would be overestimated (away from the null). Those exposed women lost to follow-up were likely to have been disease-free and because their person-time is missing the incidence density would be artificially increased among the exposed.

c.
We observed an ID of 36/4222 = .0085. In the most extreme case, where there was no disease among
those lost to follow-up, the incidence density would be ID = 36/10776 = .0033. We can compare the observed IDR to the IDR assuming no disease among those lost to follow-up by examining the ratio of the observed ID in the exposed to the ID in the exposed assuming no disease among those lost to follow-up. Comparing IDs is equivalent to comparing IDRs, since the ID in the unexposed is in the denominator of each of these measures. The ID in the unexposed cancels out, because we assume it is unchanged in both scenarios; it has no direct impact on the calculations.

In the case of the data observed in this study: IDRobserved = IDobserved_exposed / IDunexposed

In the case where we assume no disease among those lost to follow-up: IDRassumed = IDassumd_exposed / IDunexposed

So we can compare: IDRobserved / IDRassumed

= (IDobserved_exposed / IDunexposed) / (IDassumed_exposed / IDunexposed)

= IDobserved_exposed / IDassumed_exposed

Comparing this to the ID we observed, we have .0085/.0033 = 2.55. The ratio indicates that the IDR could be overestimated by as much as 155% (ratio of 2.55 implies 155% higher).

We can also use the Kleinbaum formula to estimate the impact of bias: (ϑbiased- ϑtrue)/ϑtrue=(0.0085-0.0033)/0.0033=1.58, so again our estimate based on the above loss to follow-up, is considerably overestimated.

Question 10

Question 11

Question 12
Recall bias occurs when the level of accuracy differs between the compared groups. It occurs in a case—control study when cases remember or report their exposures differently (more or less accurately) from controls. It occurs in a cohort study when individuals who are exposed remember or report subsequent illnesses differently than those who are unexposed.

The healthy worker effect occurs in occupational studies when disease and death rates in a working population are compared with those among the general population. The rates of disease and death among workers are typically lower than those in the general population because there is a higher proportion of ill people in the general population.

Control selection bias is a type of selection bias that occurs in case-control studies when the controls do not accurately represent the exposure distribution in the source population that produced the cases. It occurs when different criteria are used to select cases and controls, and these criteria are related to the exposure.

Question 13
Nondifferential misclassification of the exposure. Some women who filled the health maintenance organization’s prescriptions for antihistamines may not have used them, and other women may have obtained antihistamines from outside sources. This type of misclassification is as likely to occur among cases as among controls, so this is nondifferential.

Recall and interviewer bias. Subjects were not asked to recall their exposures, and interviews were not used to obtain the exposure data.

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20
A.
This bias is called recall bias (a form of information bias). To be considered recall bias, assessment of past exposure must affect cases and controls differently – i.e., differential.

B.
In this example, it will bias the OR toward the null because some cases are reported as not exposed but were actually exposed prior to developing the disease. This is misclassification error (a form of information bias) and in this example it is differential by disease status (individuals from cell A move to cell C).

C.
In this case, the controls are also likely to have high levels of biomass exposure. This would bias the OR towards the null. The study includes more exposed (potential) controls than unexposed (potential) controls relative to the target population. This is a form of selection bias, and it would be differential by disease status.

D.
Since both cases and controls have similar problems in recall, the misclassification will be non- differential by disease status and on average it would bias the OR towards the null. This is NOT a form of recall bias, which would require that recall be differential between cases and controls.

Contrast this with 4.A (above).

E.
If the interviewers knew the disease status, they could probe for better exposure data from the TB patients and this could bias the OR away from the null, assuming that this caused TB patients to report more exposure than they would have otherwise. This bias is called interviewer bias (a form of information bias) and it would be differential by disease status.

F.
Assuming a cumulative case-control design, if exposure leads to rapid death among TB patients, then the cases who end up being recruited into the study may have contracted TB due to other exposures, since they are still alive and well enough to take part in a study. This would result in a form of selection bias, as the exposure distribution among the cases does not reflect that in the target population – those exposed to biomass smoke are less likely to have been selected. The OR would likely be biased towards the null because the measured effect would likely be smaller than the true effect. People would be missing from the A cell. Remember, selection bias can be towards or away from the null, depending on what’s happening!

G.
More refined measurement of the exposure status would increase the validity of those measurements (and therefore the OR), but it would not necessarily have any effect on the precision of the OR.

H.
We cannot be sure if our OR is biased toward or away from the null. Non-differential misclassification when there are more than two categories of the exposure or disease that is misclassified does not necessarily result in bias towards the null. We have three levels of biomass smoke exposure.

Question 21

Question 22
In nondifferential misclassification, inaccuracies that occur on one axis (exposure or disease) are independent of the other axis. For example, if there is an error in exposure misclassification, it occurs with equal likelihood among diseased and nondiseased individuals. In differential misclassification, inaccuracies that occur on one axis (exposure or disease) are dependent on the other axis. For example, if there is an error in exposure misclassification, it occurs more often in the case group than the control group. Nondifferential misclassification of dichotomous variables (i.e., variables with two categories) biases the results toward the null. Differential misclassification can bias the results either toward or away from the null.

Question 23

Question 24
A.
RR = (40/100) / (20/100) = 2.0

B.

	Injured	Not injured	Total
Community A	[(0.840)+(0.0560)] = 35	[(0.240)+(0.9560)] = 65	100
Community B	[(0.820)+(0.0580)] = 20	[(0.220)+(0.9580)] = 80	100

RR = (35/100) / (20/100) = 1.75

C.

This is non-differential misclassification, which biases our estimates of association towards the null value (which is 1). This problem presents an example of how misclassification of disease could affect the observed RR. When we are conducting studies, we never know the “truth” about how many of our participants have been misclassified. Knowing whether misclassification is likely to be differential or non-differential comes from knowledge of the measurement tools and intuition about the direction of bias (e.g., over or under reporting). If we believe that there is no reason for diagnosis of head injury to differ by Community, then we assume that misclassification is likely the same between Community A and B.

D.
True RR = 2.0 versus the observed RR when misclassification occurs = 1.75

E.

In any study, when the probability of selection is associated with both the exposure and the outcome, selection bias can occur. In case-control studies, participants are selected on disease status. Selection bias is likely when either cases or controls are not selected independently of their exposure status. For instance, this can happen if exposed cases are more likely to participate than those who are unexposed. In cohort studies, participants can be selected based on exposure status such as in the case of workplace exposure studies of exposed and unexposed workers. In order to avoid selection bias, exposed and unexposed participants need to be selected independent of their outcome status. A major concern with cohort studies is loss to follow up. If participants who are lost to follow up are different from those who remain in the study due to a factor associated with both, exposure and outcome, selection bias can occur.

Question 25

Question 26
Confounding is a mixing of effects between an exposure, outcome, and a third extraneous variable that is termed the confounder. Confounding distorts the crude relationship between an exposure and outcome because of the relationships between the confounder and the exposure. And the confounder and the disease.

Residual confounding means that an association remains confounded even after some confounders have been controlled. It arises from lack of information on all confounding variables, classifying confounders in overly broad categories, or mismeasuring confounders.

Positive confounding means that true crude association is exaggerated, and negative confounding means that the true crude association in underestimated.

Question 27
A confounder is associated with the exposure in the source population that produced the cases and an independent cause or predictor of the outcome under study. The latter means that it is associated with the disease among both exposed and unexposed individuals. In addition, a confounder cannot be an intermediate step in the causal pathway between the exposure and disease.

Question 28
Yes

Yes

No

No

Question 29

Question 30
A confounder must be associated with the disease

A confounder must be associated with the exposure

A confounder must not be on the causal pathway between exposure and disease

Question 31
a.
Crude OR: (167*1599) / (2051*264) = 0.493.

Low SES: (56*1107)/(131*1095)=0.432.

High SES: (111*492)/(133*956)=0.430.

b.
= [((56*1107)/2389) + ((111*492)/1680)] / [((131*1095)/2389) + ((133*956)/1680)] = 0.431. Women who died of breast cancer had 0.431 times the odds of having a mammogram than those who did not die, adjusting for SES.

c.
The null hypothesis is that the stratum-specific ORs are not different from the MH OR (or that the stratum-specific ORs are not different from one another).

AND

The results of the above questions show that the adjusted OR appears to be meaningfully different from the crude OR, which suggests confounding is present. Furthermore, the results of part C show that the stratum-specific ORs are not statistically different from one another, which means that effect modification is not present. Based on all of this, we’d conclude that SES is a confounder.

OR

The results of parts B-C show that the adjusted OR does not appear to be meaningfully different from the crude OR, which suggests confounding is not present. Furthermore, the results of part D show that the stratum-specific ORs are not statistically different from one another, which means that effect modification is not present. Based on all of this, we’d conclude that SES is a neither a confounder or an effect modifier.

d.
Selection bias. In this case, we’re missing people who belong in our A cell (those who were exposed to mammograms and died became “cases”). This would mean that our OR is lower (farther from the null) than it should be.

e.
The A cell (exposed and diseased) in your 2x2 table would be artificially inflated (1 point), resulting in a larger OR (which, in this case, would represent a bias toward the null) (1 point).

f.
Restriction: could restrict the study just to women with (or without) a family history of breast cancer.

Matching: could match cases and controls based on their family history.

Randomization: could randomly assign some women to get mammograms and others never to receive a mammogram. This doesn’t seem very ethical or feasible, but it is a commonly used approach to avoid confounding.

Question 32
Epidemiologists usually compare the crude/confounded measure of association with the adjusted measure of association. If there is an appreciable difference between the two (i.e. at least a 10% difference), confounding is considered present.

Question 33

Question 34

Question 35
a.
OR(smokers) = (40x150)/(65x45) = 2.05 = OR(non-smokers)

COR = (190x190)/(110x110) = 2.98

b.
Smoking status in a confounder in these data (i.e., there is data-based confounding) because the crude odds ratio of 2.98 is meaningfully different from the adjusted odds ratio of 2.05

c.
There is no evidence of interaction because stratum-specific OR’s are equal.

d.
i.
OR(SC| No Ulcer) = (45x40)/(150x65) = .185

Smoking is related to Coffee Consumption among Non-Ulcer subjects

ii.
OR(SU| Low C) = (65x40)/(150x45) = .385

Smoking is related to Ulcer status among Low Coffee subjects

e.
Smoking is a confounder in this study because both OR(SC|No Ulcer) and OR(SU|Low C) are very different from the null value of 1.

f.
Yes, same conclusion

Question 36
Randomization

Matching

Restriction

Question 37
Randomization is the act of assigning or ordering using a random process. It means that everyone in the study has an equal chance of being assigned to one of the groups (such as treatment vs comparison). The main advantage of randomization is that it controls for both known and unknown confounders, if the sample size is sufficiently large. Its main disadvantage is that it can be used only in experimental studies. Matching is the process of making the distribution of confounders identical in the compared groups while selecting the study subjects. It is good for controlling for confounding by complex nominal variables such as neighborhood or sibship and for controlling confounding in small studies. Its main disadvantages include the difficulty and expense of finding appropriate matches. Restriction means that the investigator limits admission into a study to individuals who fall within a specific category or categories of a confounder. Its main advantages are simplicity and relatively low expense, and its disadvantages include difficulty in identifying a sufficient number of subjects (this depends on the number and characteristics of the restrictions) and limiting the generalizability of the study).

Question 38
Restriction

Matching

Stratified analysis

Question 39

Question 40

Question 41

Question 42

Question 43
a.
cOR = (27x650)/(30x100) = 5.85

b.
OR(MD=1) = 1.67, OR(MD=0) = 1.60

c.
There is little evidence of interaction, so this is not a reason for controlling for MD.

d.
The variable MD should be controlled for confounding since the crude OR of 5.85 is meaningfully different from the OR’s that result when MD is controlled.

e.

ORMD,E = 13.6 and ORMD,E|D=0 = 12. These results indicate that students taking introductory statistics are more likely to have a mental disorder than students not taking introductory statistics. Yes, these results support the conclusion that MD is a confounder, but one needs also to demonstrate that ORD,MD|E=0 is also meaningfully different from 1, which it is (=12).

Question 44
A.
Risk = 238/400 = 0.595 x 100 = 59.5%

B.
Risk = 14/415 = 0.034 x 100 = 3.4%

C.
RR = 0.595/0.034 = 17.5

D.
Risk of asthma among youth exposed to high air pollution is 17.5 times the risk of asthma among youth exposed to low air pollution in Los Angeles during this study period.

E.
RR = (174/309) / (11/362) = 18.5 RR = (64/91) / (3/53) = 12.3

F.
Among Los Angeles youth who smoke, the risk of asthma among those who are exposed to high levels of pollution is 18.5 times the risk of developing asthma among youth with low levels of exposure to pollution in this study population.

G.
Among Los Angeles youth who do not smoke, the risk of asthma among those who are exposed to high levels of pollution is 12.3 times the risk of developing asthma among youth with low levels of exposure to pollution in this study population.

H.
The chi-squared test of homogeneity (χ2) assesses whether the differences in the stratum-specific risk ratios are due only to random variation.

I.
The unadjusted RR for asthma among people with high air pollution is high (RR=17.5). After stratifying the data for smoking, we find that air pollution is still strongly associated with the development of asthma. Stratification suggests that the risk of asthma among smokers (RR=18.5) is greater than the risk of asthma among non-smokers (RR=12.3), suggesting that there is effect modification by smoking.

Question 45
A.
OR = ad/bc = (689*714)/(360*311) = 4.39

B.
Individuals who developed pancreatic cancer had 4.39 times the odds of drinking soda compared to individuals who did not develop pancreatic cancer.

Smokers: Non-smokers:

OR = (660*46)/(264*110) = 1.05 OR = (29*668)/(96*201) = 1.00

Among both smokers and non-smokers, the odds of pancreatic cancer among soda drinkers is at or close to an OR of 1 compared to non-soda drinkers. In other words, we no longer observe an association between soda drinking and pancreatic cancer after stratifying by smoking.

C.

Yes, smoking does confound the effect of soda drinking on pancreatic cancer. We see that the crude estimate (4.39) of the relationship of soda consumption and pancreatic cancer is different from the stratum-specific estimates (1.05 and 1.00) after we adjust for smoking status. We indicate a bidirectional arrow between smoking and soda drinking because it’s not clear whether soda drinkers are more likely to smoke or vice versa. Note, because there is a bidirectional arrow, this is not a DAG.

D.

	All participants
	Cancer	No Cancer
Smokers	770	310
Nonsmokers	230	764

OR = (770*764)/(310*230) = 8.25

E.

	Soda drinkers			Soda non-drinkers
	Cancer	No Cancer		Cancer	No Cancer
Smokers	660	264	Smokers	110	46
Nonsmokers	29	96	Nonsmokers	201	668

Soda drinkers: Soda non-drinkers:

OR = (660*96)/(264*29) = 8.28 OR = (110*668)/(46*201) = 7.95

F.
No, soda consumption does not confound the association between smoking and pancreatic cancer. After stratifying the data by soda drinking status, we see that the stratified estimates are similar to each other and the crude estimate.

Question 46

Question 47
a.
ORhigh=(150*150)/(75*100)=3.0

ORlow=(50*50)/(25*300)=0.33

Among people with high ferritin levels, men have 3 times the odds of having CHD compared to women, over the course of a year.

Among people with low ferritin levels, men have 0.33 times the odds of CHD compared to women (i.e., men have one third the odds of women of having CHD), over the course of a year.

b.
ORpooled (200*200)/(400*100)=1.0

Men and women have equal odds of having CHD, over the course of a year. The unadjusted (for ferritin) OR indicates that gender does not affect odds of CHD.

c.

Men have 1.59 times the odds of having CHD compared to women, over the course of a year after adjusting for ferritin levels.

d.
Statistical test of homogeneity:
Ho: The measure of association is the same across strata.

The p-value is less than 0.2, therefore, we reject the null hypothesis and conclude that the stratified estimates differ significantly from the MH estimate. Clinically, the stratified estimates are also meaningfully different from the MH estimate (and from each other). Ferritin is an effect modifier; in other words, the relationship between gender and CHD varies by ferritin level.

Note: There will be some rounding error depending on how many decimals you include in your ORs.

e.
Generally, when there is evidence of effect modification it is more useful to present stratified estimates. However, if you are only interested in the net association, you could present an adjusted measure. This would hide the nuance of the relationship between gender and risk of CHD across levels of ferritin but would give you an idea of the net association between CHD and gender in the total population. In this case, since one measure of association is protective and the other harmful, you would likely still present stratified estimates.

f.
Among individuals with high ferritin levels, male gender appears to be a strong risk factor for coronary heart disease (OR: 3.0). Among individuals with low ferritin levels, male gender appears to be a strong protective factor for coronary heart disease (OR: 0.33).

(SPECULATION FOLLOWS:) We hypothesize that differences in iron metabolism between men and women affect the stability of plaque formation in the lumen of the coronary arteries. Further study is called for to determine if intervention to reduce ferritin levels in men will reduce the incidence of coronary artery disease in men. Please fund our grant.

Question 48
a.

1. The factor causes (or is a surrogate for) the outcome in the source population: We believe injection drug use increases risk of HCV.

2. Factor must be associated with the exposure in the source population: We believe people who smoke crack cocaine are more likely to also inject drugs.

3. Factor must not be caused by exposure or disease: We believe injection drug use is not caused by HCV or smoking crack cocaine.

In sum, injection drug use is a potential confounder in this study.

b.
IDD among people who inject drugs: 90/1200-47/800 = 0.0163

IDD among people who do not inject drugs: 12/500-9/1200 = 0.0165

Among people who used injection drugs, there was an excess of 0.0163 HCV cases per person-month (16.3 per 1,000 person-months) among people who used crack pipes compared to people who did not use crack pipes, over the course of the study.

Among people who did not use injection drugs, there was an excess of 0.0165 HCV cases per person-month (16.5 cases per 1,000 person-months) among people who used crack pipes compared to people who did not use crack pipes, over the course of the study.

c.
102/1700-56/2000=0.032

There were an excess of 0.032 HCV cases per person-month (32 per 1,000 person-months) among people who used crack pipes compared to people who did not use crack pipes, over the course of the study.

d.

ANSWER

There were an excess of 0.0164 HCV cases per person-month (16.4 cases per 1,000 person-months) among people who used crack pipes compared to people that did not use crack pipes, after adjusting for injection drug use, over the study period.

Question 49
a.
IDRhigh_air_pollution=601.6/58.4=10.3 IDRlow_air_pollution=127.3/11.3=11.3

The rate ratios are approximately constant across categories of air pollution exposure.

b.
RDhigh_air_pollution = 601.6 - 58.4=543.2

RDlow_air_pollution = 127.3 - 11.3=116

The rate differences are not constant across categories of air pollution exposure.

c.
Statistical interaction is scale- (or model-) dependent. In this case, there appears to be no effect measure modification on the multiplicative scale because the rate ratios are approximately constant across the categories of air pollution exposure. However, on the additive scale there is considerable effect measure modification because the rate differences vary by air pollution exposure.

d.
Interaction on the additive scale can be indicative of underlying causal interaction (see Jewell reading pp 150 – 152). You might think about this using the pies. If there is a pie where both air pollution and radon exposure are necessary to cause disease, the IDD will capture the portion of the population with this pie (people for whom disease is only present when both exposures are present)

BIAS & CONFOUNDING - ANSWERS

Share This Story, Choose Your Platform!

Post a Comment

Tags

Get in Touch
Questions and enquiries

BIAS & CONFOUNDING - ANSWERS

Share This Story, Choose Your Platform!

Post a Comment

Tags

Get in Touch Questions and enquiries

Get in Touch
Questions and enquiries