Nonparametric Estimation
Life or failure of people, objects, and other living things are always believed to belong in the particular group; mostly referred to as population of interest. For instance, populations of interest may be students in the fourth grade, incarcerated people pending trial, a lifetime of the equipment, among others. Using the knowledge of random variables and probability, statisticians and researcher believe that the life of these population can be modeled using a random variable, say t, belonging to a certain survival distribution. However, these survival distributions are usually unknown. Consequently, these distributions have to be estimated using either parametric or nonparametric estimation models. The seeks to summarize, describe, and criticize an estimator belonging to the family of nonparametric estimation known as Kaplan-Meier Estimator (KME) when applied to censored data.
Nonparametric estimation is a statistical strategy that allows facilitates a functional form of a model to be formulated without the application of guidance or constraints from statistical theory. Therefore, nonparametric estimations processes do not have statistically significant parameter (Fleming & Harrington, 2014). There a number of parametric estimators. One of the most popular and effective estimators in nonparametric estimation is KME (Staub & Gekenidis, 2013). Like many other estimators, KME is used to approximate the survival function in survival analysis. Survival functions are generally expressed as (Hong, 2011):
The function represents the likelihood that a subject (a member of the population of interest) will survive past time t.
Censoring of Data
Also known as reliability analysis or lifetime data analysis, survival analysis is applied in varied areas to analyze data entailing the period between the occurrence of two, or more events (Gillen, 2016). In reliability analysis, the success of determination of survival time is highly dependent on the knowledge of two significant time points in the life of the subject. These are the time of origins such as birth, and the time of failure as death. The period between origin and failure is called the time of uncertainty, and the subject is termed be at risk (Leung, Elashoff, & Afifi, 2012). Censoring is applied in survival analysis when some information is unavailable or unknown about the survival time of the group of interest.
Suppose we want to determine whether boys and girls relapse into alcoholism for varied reasons. We have a sample of 30 girls and 46 boys undergoing an alcohol treatment clinic being run by WAADA that need to be evaluated. In the process, we various social, demographical, and psychological constraints are tracked and evaluated for 5 months to ascertain whether girls and boys relapse for different reasons. In the course of the five months, 18 subjects are lost to follow-up: 5 girls and 13 boys, while 30 subjects, 10 girls and 20 boys, had relapsed. In this case, some subjects were lost to follow-up. Consequently, survival period is known to do beyond a specific unknown value (Leung, Elashoff, & Afifi, 2012).
Kaplan–Meier Estimator
KME examines the probability of a subject to survive a very minute interval of time, between origin time, and time of failure. That is when the subject is at risk. Generally, KME can be defined as the limit of the life table estimator as duration approaches zero.
Censoring of Data
Then KME is defined as
is the consistent estimator of . KME can also be used to estimate The KME estimator of
Long-term Survivors
When the conditions are appropriate, KME estimator of F(x), , retains its consistency properties including in the case where F(x) is an improver function. Consequently, KME estimator of F(x) can be used to estimate F(x) as long as there exist long-term subjects (Lauter & Liero, 2011). For instance,
Given that largest observations have been censored, then all sample of observation at a given time are by default not censored. This implies that
However, if any given subject among the largest observations is censored, then
Therefore, the Kaplan-Meier survival is defined as the likelihood of surviving in a given period while observing time in several small intervals. However, when estimating survival distribution using KME, there a number of assumptions that have must be made. We assume that
- The event takes place at the defined time.
- At any given time, all the censored patients possess the same survival rate as those who are still being followed.
- The survival probabilities are the equal for both subjected enlisted early and late during the study.
In general, KME is defined as
Data Analysis
Suppose we have a set of data of patient undergoing standard ARV. The following data shows the number of days (survival) among patients who underwent the clinical trial where * denotes these participants still exist after mentioned days in the trial (censored points) (Goel, Khanna, & Kishore, 2010).
Raw Data
6 |
39 |
115* |
263 |
346* |
12 |
43 |
139* |
270 |
365* |
21 |
43 |
181* |
295* |
|
27 |
46* |
211* |
311 |
|
217* |
||||
32 |
89 |
261 |
335* |
(Goel, Khanna, & Kishore, 2010)
The first step is to tabulate the result and make it easier to sort and rearrange the data.
Estimated Probability |
|||||
Time of event (t) |
Number of patients who died (d) |
Lives at the start of the day (r) |
Death (d/r) |
Survival (1-(d/r)) |
The probability of survivors at the end of time (L) |
6 |
1 |
23 |
|||
12 |
1 |
22 |
|||
21 |
1 |
21 |
|||
27 |
1 |
20 |
|||
32 |
1 |
19 |
|||
39 |
1 |
18 |
|||
43 |
2 |
17 |
|||
89 |
1 |
14 |
|||
261 |
1 |
8 |
|||
263 |
1 |
7 |
|||
270 |
1 |
6 |
|||
311 |
1 |
4 |
Using the above functions and the following excel formulae, we compute the respective probabilities:
1,2 |
A |
B |
C |
Estimated Probability (D) |
E |
|
3 |
Time of event (t) |
Number of patients who died (d) |
Lives at the start of the day (r) |
Death (d/r) |
Survival (1-(d/r)) |
The probability of survivors at the end of time (L) |
4 |
6 |
1 |
23 |
=B4/C4 |
=1-D4 |
=E4 |
5 |
12 |
1 |
22 |
=B5/C5 |
=1-D5 |
=F4*E5 |
6 |
21 |
1 |
21 |
=B6/C6 |
=1-D6 |
=F5*E6 |
7 |
27 |
1 |
20 |
=B7/C7 |
=1-D7 |
=F6*E7 |
8 |
32 |
1 |
19 |
=B8/C8 |
=1-D8 |
=F7*E8 |
9 |
39 |
1 |
18 |
=B9/C9 |
=1-D9 |
=F8*E9 |
10 |
43 |
2 |
17 |
=B10/C10 |
=1-D10 |
=F9*E10 |
11 |
89 |
1 |
14 |
=B11/C11 |
=1-D11 |
=F10*E11 |
12 |
261 |
1 |
8 |
=B12/C12 |
=1-D12 |
=F11*E12 |
13 |
263 |
1 |
7 |
=B13/C13 |
=1-D13 |
=F12*E13 |
14 |
270 |
1 |
6 |
=B14/C14 |
=1-D14 |
=F13*E14 |
15 |
311 |
1 |
4 |
=B15/C15 |
=1-D15 |
=F14*E15 |
Estimated Probability |
|||||
Time of event (t) |
Number of patients who died (d) |
Lives at the start of the day (r) |
Death (d/r) |
Survival (1-(d/r)) |
The probability of survivors at the end of time (L) |
6 |
1 |
23 |
0.0435 |
0.9565 |
0.9565 |
12 |
1 |
22 |
0.0455 |
0.9545 |
0.9130 |
21 |
1 |
21 |
0.0476 |
0.9524 |
0.8696 |
27 |
1 |
20 |
0.0500 |
0.9500 |
0.8261 |
32 |
1 |
19 |
0.0526 |
0.9474 |
0.7826 |
39 |
1 |
18 |
0.0556 |
0.9444 |
0.7391 |
43 |
2 |
17 |
0.1176 |
0.8824 |
0.6522 |
89 |
1 |
14 |
0.0714 |
0.9286 |
0.6056 |
261 |
1 |
8 |
0.1250 |
0.8750 |
0.5299 |
263 |
1 |
7 |
0.1429 |
0.8571 |
0.4542 |
270 |
1 |
6 |
0.1667 |
0.8333 |
0.3785 |
311 |
1 |
4 |
0.2500 |
0.7500 |
0.2839 |
Following up subjects in a study is never a perfect experience. During the study, some subjects become unable hence making the analysis of the model relatively complicated. KME solves these complexities hence making the survival analysis, in case subjects, become unavailable, possible (Vassarstats, n.d.). KME recognizes that if the researcher tried to salvage the information concerning the unavailable subjects, then the process will involve some form ‘fudge.’ Consequently, the model proposes accounting for the unavailable subjects by including the missing subjects in the survivors’ list but then excluded from the subjects who are at risk for the following period. Kaplan and Meier note that “these conventions may be paraphrased by saying that deaths recorded as of [time] t are treated as if they occurred slightly before t, and losses recorded as of [time] t are treated as occurring slightly after t. In this way, the fudging is kept conceptual, systematic, and automatic…” (Kotz & Johnson, 2012). This is the greatest strength of KME.
However, there some disadvantages of KME. KME is mainly descriptive. The information derived from the estimator may only reveal basic information about the distribution. Additionally, because KME always assumes time-independent variables, the estimator cannot account for time-dependent variables. Lastly, KME does not control for covariates (Rich et al., 2014).
Conclusion
The Kaplan-Meier process is a very important estimator in the survival analysis. KME is not only able to measure the survival distribution in a very short period but also can estimate the time-defined probabilities of the population of interests. The estimator provides a way of accounting for unavailable subjects.
References
Fleming, T., & Harrington, D. (2014). Nonparametric estimation of the survival distribution in censored data. Communications in Statistics-Theory and Methods, 2469-2486.
Gillen, D. (2016, January 7). Estimating the Survival Distribution. Retrieved from the University of California, Irvine: https://www.ics.uci.edu/~dgillen/STAT255/Handouts/lecture2.pdf
Goel, M., Khanna, P., & Kishore, J. (2010). Understanding survival analysis: Kaplan-Meier estimate. International Journal of Ayurveda Research, 274–278. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059453/
Hong, H. (2011). Basic Nonparametric Estimation. Retrieved from Stanford University: https://web.stanford.edu/~doubleh/eco273/newslides4.pdf
Kotz, S., & Johnson, N. (2012). Breakthroughs in Statistics: Methodology and Distribution. New York: Spring.
Lauter, H., & Liero, H. (2011, August). Nonparametric Estimation and Testing in Survival Models. Retrieved from Universitat Potsdam: https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/4874/file/Preprint_2004_05.pdf
Leung, K.-M., Elashoff, R., & Afifi, A. (2012). CENSORING ISSUES IN SURVIVAL ANALYSIS. Annual Review of Public Health, 83-104.
Rich, J., Neely, G., Paniello, R., Voelker, C., Nussenbum, B., & Wang, E. (2014). A PRACTICAL GUIDE TO UNDERSTANDING KAPLAN-MEIER CURVES. HHS Public Access, 331–336.
Staub, L., & Gekenidis, A. (2013, February 3). Kaplan Meier survival curves and the log-rank test. Retrieved from Slide Share: https://www.slideshare.net/zhe1/kaplan-meier-survival-curves-and-the-logrank-test?from_action=save
Vassarstats. (n.d.). Kaplan-Meier Survival Probability Estimates. Retrieved from vassarstats: https://vassarstats.net/survival.html