Causal inference in environmental epidemiology

Sanghyuk Bae; Hwan-Cheol Kim; Byeongjin Ye; Won-Jun Choi; Young-Seoub Hong; Mina Ha

doi:10.5620/eht.e2017015

Abstract

Inferring causality is necessary to achieve the goal of epidemiology, which is to elucidate the cause of disease. Causal inference is conducted in three steps: evaluation of validity of the study, inference of general causality, and inference of individual causality. To evaluate validity of the study, we propose a checklist that focuses on biases and generalizability. For general causal inference, we recommend utilizing Hill’s 9 viewpoints. Lastly, individual causality can be inferred based on the general causality and evidence of exposure. Additional considerations may be needed for social or legal purposes; however, these additional considerations should be based on the scientific truth elucidated by the causal inference described in the present article.

Keywords: Causality, Epidemiology, Environmental exposure, Validity

INTRODUCTION

One of the goals of epidemiology is to elucidate the cause of disease [1], and scientific methods are used to accomplish this goal. Hume [2] and other philosophers had questioned the logical validity of inductive reasoning (determining a relationship between cause and effect by observing recurring phenomena), for instance, merely observing that the sun rises every morning does not guarantee that it will rise tomorrow, and the proposition that “every swan is white,” based on observing multiple white swans, could be disproved by only one observation of a black swan. Consequently, comparison to control and falsification of the null hypothesis became the scientific method of causal inference.

In the modern counterfactual frame, which has evolved as a logical framework to infer a relationship between cause and effect, the cause is defined as “a condition that, if present, makes a difference in (the probability of) the outcome” [3], and the difference of probability is calculated by comparing the effect of the condition present to that of the condition not present. In epidemiology, the condition present is designated as an exposure and the condition not present is designated as an alternative exposure. However, it is almost impossible to observe and compare both conditions. For instance, if we want to determine the causal relationship between air pollution and health, we must compare the state of health of a subject per the level of exposure to air pollution when everything else is the same (ceteris paribus). However, we cannot observe a person who is exposed (exposure) and not exposed (alternative exposure) to the air pollution in a ceteris paribus condition, because there is no such state when the level of air pollution is high and low at the same time.

Thus, in a practical study design, the probability of outcome from exposure is compared to that from alternative exposure of surrogate of theoretical counterfactual subject, and this surrogate is called control. Although we assume ceteris paribus condition between study subject and its control, individual comparison between one each of subject and control cannot be free of inherent differences of individuals; therefore, the sum of individual effects, or population effects, among subjects and controls are compared [4]. Even in an experimental study with random assignment, which ensures comparable controls by distributing the different characteristics evenly between groups, does not enable individual comparison because individual differences remain after randomization.

The evidence from epidemiological studies should be interpreted to give individual implications. For instance, while the association between exposure to humidifier disinfectant and occurrence of interstitial lung disease has been shown to be associated in population [5], the causality of individual case should also be inferred for diagnosis, treatment and compensation. Unfortunately, the results of epidemiological studies are derived from population effect, and inferring causality at the individual level from epidemiological evidence needs additional considerations. The aim of the present article is to review those considerations and to provide guidance for causal inference based on epidemiological evidences in population and individuals. Evaluation of evidence to infer causality consists of three steps: evaluation of validity of the study, inference of causality in population (general causality), and inference of individual causality.

VALIDITY OF THE STUDY

Before evaluating causality of observed association, the precision and accuracy of the estimated association should be evaluated. The precision and accuracy of association is determined by the validity of the study, and it has two components: internal and external.

Epidemiologic studies are conducted by estimating the association between exposure and effect among study participants who were sampled from a source population. The internal validity ensures that the observed association was true among the source population. Most of the conditions which threaten internal validity can be classified into three categories: confounding, selection bias, and information bias [6], and all arise from noncomparability between study subjects and its controls. Confounding occurs when difference in the probability of an outcome is due to difference in the inherent characteristics of the source population, rather than solely due to the exposure of interest. These inherent characteristics often include age, sex, ethnicity, and lifestyle. Selection bias also occurs because of difference in the probability of an outcome due to factors other than the exposure of interest. Selection bias, however, is not due to the inherently different characteristics that cause confounding, but rather to the difference between groups in the probabilities of being selected as study participants from the source population. One typical example of selection bias is known as the “healthy worker effect”; workers who are currently working often have better cardiovascular health than the general population (comparison group) because of the selective survival of healthy workers in the workplace (or selective dropout of unhealthy workers from the workplace) [7]. Lastly, information bias arises from the observed difference of gathered information between subjects and controls, which is not different in the source population. For instance, if there were differences in the methods of measurement of exposure or outcome between groups, and that led to differences between groups, the association observed would be biased.

The external validity concerns whether the result of study can be applied to the target population, i.e., generalizability. If there is no difference in the factors that modify the association between exposure and outcome between study population and target population, the result from the study can be generalized.

In Table 1, we propose a checklist to assess the validity of the study based on Grading of Recommendations Assessment, Development and Evaluation guidelines [8]. Items 1 and 2 evaluate the possibility of random and systematic error. Non-differential random error will decrease the precision of the study and information bias may arise when systematic error or differential random error occur. Items 3 and 4 evaluate potential confounding, as well as whether the methods used to control the potential confounders are appropriate. Items 5 and 6 evaluate the possibility of selection bias. Lastly, item 7 evaluates generalizability (Table 1).

INFERENCE OF GENERAL CAUSALITY

After the validity of study is ensured, the extent of the evidence being supportive of causality should be considered. Hill [9] has provided his famous 9 viewpoints (Table 2) for this task, and these viewpoints illustrate the aspects that need to be considered before claiming causality. While many have quoted these viewpoints as “criteria,” and these have been misused as such, Hill himself called them viewpoints and specifically mentions that none of these are essential to infer causality [9]. However, temporality is logically necessary to claim causality, because the cause must precede the effect [10].

Hill has provided these aspects comprehensively, but some concepts need to be elaborated to be applied to modern epidemiology, especially in regard to environmental exposures. Some environmental exposures, such as asbestos [11] and humidifier disinfectant [12], show association with specific diseases. However, most of the exposures have associations with multiple diseases, and vice versa. Thus, a specific association is certainly strong evidence for causality, but specificity is rarely observed. Biologic gradient assumes linear association between exposure and outcome. However, many environmental exposures exhibit non-linear association with diseases, and this should be considered when analyzing and evaluating the evidence. Finally, the 8th viewpoint (experiment) is sometimes misunderstood as laboratory experiments, such as toxicological or animal studies. However, evidence from laboratory studies should be considered in the aspect of biological plausibility. What Hill intended to explain was a study design that incorporates experimental measure, which usually involve interventions to modify exposure. Many studies have applied experimental design in environmental epidemiology, and the results provide more robust evidence for causality.

In his speech, Hill mentioned two more aspects regarding causal inference. The first is a test of significance. Statistical tests can only show the likeliness of chance being the cause of the association, but does not provide evidence for causality. The second aspect is that uncertainty of causal relationship is unavoidable and should not be a reason to postpone action [9,13].

One of the common challenges in epidemiological investigations regarding environmental exposure is small sample size, and therefore a subsequent inability to obtain sufficient statistical power. This may happen, for instance, in an investigation on a small area exposed to a certain environmental exposure. In this case, a lack of statistical significance in the estimates of association between health outcome and exposure of interest should not be regarded as evidence of non-causality. Instead, the strength of association and consistency in the pattern of results among different groups or conditions can be reasoned in causal inference.

INFERENCE OF INDIVIDUAL CAUSALITY

In this step, it should be investigated that the individual who developed a disease that has a causal association with a certain environmental factor has been exposed to that causal environmental factor. Inference of individual causality is accomplished by deductive reasoning based on the general causality and individual evidence of exposure [14]. The evidence of exposure may be provided by the following conditions, which are not mutually exclusive: 1) the disease is specific to the exposure, 2) the biomarker for the factor of interest is detected, or 3) the patient has a valid history of exposure.

In the sufficient-component causal model, disease occurs when sufficient cause is present. Sufficient cause consists of multiple component causes, and it is possible for a certain disease to have different sets of component causes to comprise different sufficient causes [6]. When general causality is inferred and the patient’s exposure to a relevant environmental factor can be assumed, the exposure is probably one of the component causes. Since it is impossible to determine which set of sufficient causes the patient had that led to the disease, the existence of other component causes should not be interpreted as evidence of non-contribution of the component cause of interest to the occurrence of the disease.

In some of the previous tort cases, relative risk (RR) or odds ratio ≥ 2 was considered as a benchmark for causality [15]. This is based on the notion that the attributable fraction (AF) is equal to the probability of the exposure being the cause of the disease, or probability of causation (PC). Under this notion, RR ≥ 2 should be converted to AF greater than 50%, and this was interpreted as probability of “more than not”. However, the assertion of AF = PC depends on “restrictive assumptions” that are “unwarranted in typical cases”, and AF usually underestimate PC [16]. In light of this, RR ≥ 2 or AF ≥ 50% does not mean “more than not”. Rather, AF, which can be derived from RR, should be interpreted as the lower bound of the PC [16,17].

CONCLUSION

In summary, when the evidence supports causality and the studies that produced the evidence are valid, the general causality can be inferred. The individual causality can be inferred based on the general causality and evidence of individual exposure. Inferring and interpreting causality may have different meanings per the purpose. The process presented in this article is to infer causality as a scientific truth, and additional considerations may be needed for social and legal purposes [15]. The decision to act depends upon how sufficient the evidence is to consist a case for action, and the extent of sufficient evidence may differ from case to case. Considering precautionary principles, relatively slight evidence may be enough to act, especially in environmental health issues where the cost of inaction probably far exceeds that of wrong action.

ACKNOWLEDGEMENTS

We would like to thank Kuck Hyeun Woo, Jaechul Song, HoJang Kwon, Byoung-Gwon Kim, Hong Jae Chae, Man Joong Jeon, Tae-Won Jang, Young Ill Lee, Se-Yeong Kim, Hae-Kwan Jeong, Seung-Sik Hwang for their invaluable comments and advice on the study.

The study was performed as a part of the project (A Study on the Advancement of Environmental Epidemiological Survey), which was financially supported by the Korean Ministry of Environment, 2016.

Conflict of interest

The authors have no conflicts of interest associated with the material presented in this paper.

REFERENCES

1. Kaufman JS, Kaufman S, Poole C. Causal inference from randomized trials in social epidemiology. Soc Sci Med 2003;57(12):2397-2409.

2. Hume D. A treatise of human nature. Oxford: Clarendon Press; 1739.

3. Kundi M. Causality and the interpretation of epidemiologic evidence. Environ Health Perspect 2006;114(7):969-974.

4. Flanders WD, Klein M. A general, multivariate definition of causal effects in epidemiology. Epidemiology 2015;26(4):481-489.

5. Kim HJ, Lee MS, Hong SB, Huh JW, Do KH, Jang SJ, et al. A cluster of lung injury cases associated with home humidifier use: an epidemiological investigation. Thorax 2014;69(8):703-708.

6. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 5-10.

7. McMichael AJ. Standardized mortality ratios and the “healthy worker effect”: scratching beneath the surface. J Occup Med 1976;18(3):165-168.

8. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 2011;64(4):401-406.

9. Hill AB. The environment and disease: association or causation? Proc R Soc Med 1965;58: 295-300.

10. Greenland S. Evolution of epidemiologic ideas: annotated readings on concepts and methods. Chestnut Hill: Epidemiology Resources Inc; 1987. p. 14.

11. Churg A, Wright JL, Vedal S. Fiber burden and patterns of asbestosrelated disease in chrysotile miners and millers. Am Rev Respir Dis 1993;148(1):25-31.

12. Ha M, Lee SY, Hwang SS, Park H, Sheen S, Cheong HK, et al. Evaluation report on the causal association between humidifier disinfectants and lung injury. Epidemiol Health 2016;38: e2016037.

13. Phillips CV, Goodman KJ. The missed lessons of Sir Austin Bradford Hill. Epidemiol Perspect Innov 2004;1(1):3.

14. Park DW, Paek D. Protocol for work-relevance investigation in workrelated disease. Ulsan: Korea Occupational Safety & Health Agency; 2013. p. 17 (Korean).

15. Cole P. Causality in epidemiology, health policy, and law. Environ Law Report News Anal 1997;27: 10279-10285.

16. Greenland S, Robins JM. Epidemiology, justice, and the probability of causation. Jurimetrics 2000;40(3):321-340.

17. Broadbent A. Philosophy of epidemiology. Basingstoke: Palgrave Macmillan; 2013. p. 162-181.

Table 1.

Checklist for evaluating validity of a study

Item	Considerations
1	Is there any possibility of error in measurements of exposure and outcome?
2	Is there any difference in methods of measurement between groups?
3	Are the methods of controlling confounders, such as adjustment, stratification, and restriction of variables appropriate?
4	Is there any uncontrolled potential confounder?
5	Is there any difference in probability of being selected as study participants between study participants per groups?
6	Are the characteristics of study participants comparable to that of the source population?
7	Is the result generalizable to the target population?

Table 2.

Hill’s viewpoints on causal association [9]

No.	Viewpoints	Meanings
1	Strength of association	The larger the strength of association observed, the more probable the causality is
2	Consistency	Consistent findings across studies support causality
3	Specificity	When specific exposure is associated with specific disease, this supports causality
4	Temporality	Cause must precede the effect
5	Biological gradient	Causality is more probable when higher level of exposure is associated with higher level of outcome
6	Plausibility	When the association is biologically plausible, it is more probable that the association is causal
7	Coherence	The observed association is in accordance with previous knowledge
8	Experiment	When the association agrees with results from an experimental study, the association is more probable to be causal
9	Analogy	Established causal relationship of similar exposure or outcome may be used to explain causality of observed association