Skip to main content

Nonparametric Estimation of the Time to the Discovery of a New Specie

Species inventories that list of all species present in a given area are an important tool for both the study of bio-diversity and conservation biology. These lists are typically obtained from fields studies in which biologists record all the species they can observed over a finite time period. Because of the possible presence of rare, and thus hard to observe species, completeness of such lists can never be guaranteed, regardless of the amount of time and energy spent in compiling them. A natural question hence is: How much longer do I need to search until I observe a species I have not yet seen.

For conservation biologists, these predictions provide a measure of the quality of an inventory by quantifying the diminishing return of continued sampling to increase the completeness of the list. For software engineer, they serve as a measure of quality of the software under development by quantifying the time that given software may run before encountering an unforeseen new bug. Bounds on the time to the next discovery are also important in the context of sequential inventorying, for which they help determine reasonable stopping rules.

In my talk, I will present a methodology for obtaining lower bounds for the time of the next discovery without specification of the distribution of the capture probabilities. For this, I will need to look at the statistical properties of estimates of estimates for the expected value of the last observation of a sequence of random variables with increasing means.