Abstracts of Papers Presented at the 40th Anniversary Celebration of FSU's Statistics Department,
April 21-22, 2000
Friday, April 21, 2000
SESSION I
The First and the Fortieth
Ralph A. Bradley, University of Georgia.For the 40th, a welcome is appropriate. For the 1st. few present will know the setting and some remarks may be interesting. It was the post Sputnik era with great American concern that we had fallen behind the Russians in science and technical know-how. There was a spate of new Federal Fellowship programs and training grants in areas deemed to have inadequate numbers and quality of personnel, and science development grants for both university and program development. It is said that timing is everything and we indicate how the new department was able to take full advantage of the opportunities.
Investment Strategies: Maximize Your Portfolio of Expertise
Ron Hobbs, Twin Action Properties, Inc.A Least Squares Solution. Don't be a square! Add to your statistical expertise with a well rounded background of social business, technical, and communications skills. Move Arts into the Arts and Sciences!
Toward 2001: A Statistics Odyssey
Richard L. Scheaffer, University of Florida.
In its voyage over the last 40 years, the FSU Statistics Department
has become one of the most highly respected research programs in the world,
with many additional contributions to the teaching and practice of statistics.
During that same period, statistics as a discipline has matured to become
a respected member of the mathematical sciences community at all levels.
Significant events along this voyage include:
Opportunities abound, but dangers still lurk along the pathways
and it still requires great effort and vigilance on the part of the statistics
community to change opportunities into programs that will endure well into
the next century. There is every possibility, though, that our statistical
odysseys will come to successful and happy conclusions.
On the Asymptotic Properties of Least Trimmed
Squares Estimators
Constance L. Wood and Arnold J. Stromberg, University of Kentucky.
Rousseeuw (1983, 1984) proposed the high breakdown linear regression
estimators Least Trimmed Squares (LTS) and the Least Median of Squares
(LMS). These estimators perform well with up to 50 Here we consider the
strong consistency of such estimators and derive the asymptotic distribution
of the LTS under mild restrictions on the regression variables, the distribution
function of the errors, and the mode of the density of the distribution
of the errors. Assuming no contamination, the asymptotic relative efficiency
for LTS estimator with 50 errors relative to the Least Squares estimator
is shown to be only 7% (35%).
Confidence Limits for the Onset and Duration of Treatment
Effect
Dennis Boos, North Carolina State University.
Studies of biological variables such as those based on blood chemistry
often have measurements taken over time at closely spaced intervals for
groups of individuals. Natural scientific questions may then relate to
the first time that the underlying population curve crosses a threshold
(onset) and to how long it stays above the threshold (duration). In this
paper we give general confidence regions for these population quantities.
The regions are based on the intersection-union principle and may be applied
to totally nonparametric, semiparametric, or fully parametric models where
level-a tests exist pointwise at each time point.
A key advantage of the approach is that no modeling of the correlation
over time is required.
SESSION II
Revisiting Some Lifetesting Problems Developed at
FSU
Ibrahim A. Ahmad, University of Central Florida.
Proschan and Pyke(1967), and Hollander and Proschan(1972 and 1975)
developed the founding testing procedures for the three most widely used
classes of life distributions, the increasing failure rate (IFR), the new
better than used (NBU) and the new better than used in expectation (NBUE).
These procedures while sound and interesting are not easy to teach to undergraduate
or beginning graduate statistics students. In the current talk, we take
a new look at these fundamental problems and offer extremely simple testing
procedures for these cases that are easy to teach in introductory classes
and at the same time these new procedures are better (in the sense of asymptotic
relative efficiencies) than the earlier ones. We also look at the problem
of the power of all tests and the new procedure shows better powers as
well.
Advances in Aggregate Exposure Assessment
Bob Sielken, JSC Sielken.
There have been several recent advances in the methods used for
exposure assessment that aggregate doses over multiple routes (ingestion,
inhalation, dermal) and multiple pathways (food, water, residential, and
institutional). Probabilistic techniques allow the characterization of
the distribution of aggregate exposures among the individuals in a population.
A profile of route-specific doses over the days in a year is constructed
for each randomly sampled individual from the population. A variety of
databases are used to determine the age, gender, location, type of residence,
activity patterns, use of agrochemicals, etc. of each sampled individual.
The focus on an individual promotes temporal, spatial, and demographic
consistency. Acute, short term, intermediate term, and chronic doses are
determined from each individual's 365-day age-specific dose profile. Groups
of individuals are combined to characterize subpopulations of interest.
Some outstanding issues are discussed. The emerging tool for cumulative
and aggregate risk evaluation system (CARES) is introduced.
Intensity-Based Goodness-of-Fit Tests and
Generalized Residuals in Failure-Time Models
Edsel A. Peña, Bowling Green State University.
The use of intensity processes or hazard functions is a natural
and convenient way of specifying failure-time models arising in engineering
and reliability life-testing studies, medical or clinical trials, actuarial
settings, and in economic situations. It is typical in such studies, where
the primary outcome variable is the time to occurrence of an event, to
have data which contains truncated and/or censored observations. Two problems
of interest in these situations are to develop goodness-of-fit procedures,
and to develop methods for validating the assumed model. In this talk I
will describe a general approach in constructing a class of goodness-of-fit
tests for these intensity or hazard-based models and in the presence of
incomplete data. The resulting procedures possess optimality properties
and includes as special cases generalizations of tests used with complete
data -- in particular, a generalization of Pearson's chi-square test is
obtained. The test statistics depend on generalized residuals, and the
theoretical developments of the procedures reveal certain properties of
these residuals. The results shed light on the validity of existing and
ad
hoc model validation methods which rely on these generalized residuals.
Reliability Estimation based on Ranked Set Sampling
Emad El-Neweihi, University of Illinois at Chicago.
In this paper we consider the problem of estimating the reliability
of an exponential component based on a Ranked Set Sample (RSS) of size
n. Given the first r observations of that sample, r = 1,...,n, we construct
an unbiased estimator for this reliability and show that these n unbiased
estimators are the only ones in a certain class of estimators. The variances
of some of these estimators are compared. By viewing the observations of
the RSS of size n as the lifetimes of n independent k-out-of-n systems,
k = 1,..,n, we are able to utilize known properties of these systems in
conjunction with the powerful tools of majorization and Schur functions
to derive our results.
Censoring on the Cause of Failure
Frank M. Guess, University of Tennessee.
When knowledge is partial or missing on the cause of failure,
we have censoring on cause. This is related to, but very different, from
the standard time or response censoring. Since many faculty and students,
present and past, at Florida State University have done much work in this
area, I review some of that research. These problems of censoring on cause
have great applications in biomedical, business, and engineering settings.
Scholars, here and elsewhere, have developed both classical and Bayesian
approaches. I discuss some of my work from IBM research grants, plus stress
the needs in many businesses for further aggressive growth of techniques.
Saturday, April 22, 2000
SESSION III-A
Baseball's All-Time Best Hitters: How Statistics
Can Level the Playing Field.
Michael J. Schell, University of North Carolina.
Tony Gwynn, a current player for the San Diego Padres, is identified
as the "best hitter" for average in American baseball history, after adjustments
are made for: longevity, league batting average, league talent and home
ballpark. The league talent adjustment, inspired by a remark from Dr. Debabrata
Basu, plays a critical role in moving Gwynn past Ty Cobb, who has long
been heralded as baseball's best hitter for average. Due to the four adjustments,
whose appropriateness is defended, baseball stars like Willie Mays, Hank
Aaron, Pete Rose and Mickey Mantle are able to claim their rightful spots
among baseball's best 100 hitters.
The Tale of a Government Executive
Ruth Ann Killion, Bureau of Census.
Following my studies at Florida State, I have a enjoyed a long
career in State and Federal Government, with a side trip to the nonprofit
world of church work. And I would not trade the experiences of each of
these jobs. I attribute my success in the world of applied statistical
work, and church and government policy, to several things. One is an ability
to quickly comprehend the "big picture"; another is hard work. In this
talk, I am going to come out of hiding and express the value my theoretical
statistical education at Florida State University. "What!?!", you say.
"You probably spend most of every day in meetings!" It is true that my
meetings start at 8:00am when I walk in the door and frequently go beyond
6:00pm. Then I have to start my work. Virtually every day I depend on the
things I learned at Florida State and use them in making sound decisions.
It has (thankfully) been a long time since I proved a theorem, but it is
amazing how dropping a name here or there can change the tenor of a discussion.
I hope you will enjoy this tale of how having the strong background I do
makes a difference in the world of executives.
The Role of Mathematician/Statistician in AIDS Research.
Hulin Wu, Frontier Science & Technology
In 1996, the prominent AIDS researcher, Dr. David Ho, was selected
as Man of the Year by Time magazine. One of his major contributions in
AIDS research is the development of HIV viral dynamic models with the assistance
of biomathematicians and statisticians. First I will go over the work of
Ho and his associates. Then I will introduce our recent work in this area.
Basically we have developed nonlinear mixed-effect models (NLME) and inference
tools to incorporate the idea of Ho's viral dynamics in data analysis for
AIDS clinical trials. I will present two challenging problems in the setup
of NLME. One is how to identify significant covariates for viral dynamic
parameters with the presence of missing data. Multiple imputation methods,
implemented using Markov chain Monte Carlo (MCMC) techniques, are used
to deal with this problem. Another problem is how to compare viral dynamic
parameters between two treatment groups under the setting of NLME. We propose
using empirical Bayes estimates as derived variables for further inferences.
This is justified using the concept of exchangeability. Finally I will
summarize the importance of the contributions from mathematicians/statisticians
in this area and point out some future research directions.
Provably Efficient Changepoint estimation of dependence
parameters for stationary LRD processes.
T. V. Kurien, Niksun Inc.
Long range dependent processes are used in the modeling of telecommunications
traffic since they are better able to capture the bursty behavior of traffic
in real-life telecom networks. The degree of dependence is modulated by
the so-called Hurst parameter. A significant change in the Hurst parameter
can lead to longer traffic bursts, that can cause loss or congestion in
the network, due to buffer overflows at intermediate switches or routers.
Consequently, detecting a change in this parameter is useful for active
or Quality of Service based re-routing in such networks. A computational
scheme based on using data compression, to estimate the entropy of the
process is presented. A data structure, called a splay tree (Tarjan, et.
al, 1986) is used to provide a provably efficient implementation of the
compressor, and hence the computational efficiency of the changepoint scheme
will follow.
Computing in Statistics Departments: Issues and Challenges.
Balasubramanian Narasimhan, Stanford University.
I will discuss the issues and challenges facing Statistics Departments.
There are a number of technological advancements that are affecting the
way we use computers and software. These technologies are playing a role
in the development of new statistical computing environments. Part of the
talk will address operating system issues such as Windows, Unix, and Linux
in particular. Another part will deal with technologies like Java, XML,
and CORBA and what they portend for computing in general.
SESSION III-B
U-Statistics and Imperfect Ranking in Ranked Set Sampling.
Brett Presnell and Lora L. Bohn, University of Florida.
Ranked set sampling has attracted considerable attention as an
efficient sampling design, particularly for environmental and ecological
studies. A number of authors have noted a gain in efficiency over ordinary
random sampling when specific estimators and tests of hypotheses are applied
to rank set sample data. We generalize such results by deriving the asymptotic
distribution for random sample U-statistics when applied to ranked set
sample data. Our results show that the ranked set sample procedure is asymptotically
at least as efficient as the random sample procedure, regardless of the
accuracy of judgement ranking. Some errors in the ranked set sampling literature
are also revealed, and counterexamples provided. Finally, application of
majorization theory to these results shows when perfect ranking can be
expected to yield greater efficiency than imperfect ranking.
Empirical Likelihood Regression Analysis
for Right Censored Data.
Gang Li, University of California at Los Angeles.
Empirical likelihood is a nonparametric technique for making inference.
It was first introduced by Owen in i.i.d. settings and was further extended
to various applications including linear models with complete data. In
this paper we develop empirical likelihood methods for linear regression
analysis with right censored data. An adjusted empirical likelihood is
constructed for the regression coefficients using "synthetic data''. We
show that the adjusted likelihood has a central chi squared limiting distribution.
This enables one to make inference using standard chi-square tables. In
addition, we develop empirical likelihood inference for any linear combination
of the regression coefficients. We also discuss how to incorporate auxiliary
information. An appealing feature of the empirical likelihood approach
is that it produces confidence regions whose shape and orientation are
determined entirely by the data. It also showed more accurate coverage
properties than the standard normal approximation method in our simulations.
An illustration is given using the Stanford Heart Transplant data.
Nonparametric regression estimation for current status
data.
Shanty Gomatam, University of South Florida.
We study the problem of nonparametric estimation of the conditional
distribution function when we have current status data on the outcome variable
and a single continuous valued cavort. The estimate proposed is based on
Greenroom's (1991, 1992) work on the nonparametric maximum likelihood estimate
(NPMLE) for current status data. In order to estimate the conditional distribution
function F(Y|X) we extract observations in a covariate-neighborhood
around the target value of X, weight the neighborhood points using
a function of their cavort distances from the target point, and use the
weighted version of the NPMLE for these neighborhood observations as our
estimate. We show that it is possible to characterize this local nonparametric
maximum likelihood estimate (LNPMLE) of the conditional distribution function
as a solution to an isotonic regression problem. We also derive heuristically
an asymptotic distribution for the LNPML estimate of the distribution function
under some conditions, and an expression for the optimal bandwidth.
Model Checking Based on Biased Samples.
Yanking Sun, Ram C. Toward and Suffering CI,
University of North Carolina at Charlotte.
A biased sample from a distribution F refers to a random
sample from a distribution Few where Few is a distribution
resulting from biased sampling of F according to some known biasing
or weight function w. An important example of biased sampling is
the length biased sampling which has occurred in many sampling problems
in medical studies, etiologically studies and in the studies of industrial
life testing. In this paper, we study the goodness-of-fit problem of a
parametric model based on biased samples. The test statistic proposed is
the supreme of the weighted martin gale residual processes from k
biased samples. Because the limiting distribution of this test statistic
under the assumed model is very complicated, a Mountie Carl method, along
with its theoretical justification, is proposed to approximate the critical
values of the proposed test. The consistency of the test is investigated.
Simulations are conducted to assess the performance of the test.
Statistical Challenges in Comparing Chemotherapy and
Bone Marrow Transplantation as a Treatment for Leukemia.
John Klein and Mei-Jie Zhang, Medical College of Wisconsin.
Comparison of survival for patients treated with either post remission
chemotherapy or allogeneic bone marrow transplantation (BMT) for leukemias
is considered. Two designs for the comparison are considered. The first
is a genetic randomized clinical trial. For this type of trial, comparisons
can be made either by an intent-to-treat analysis or by a time dependent
covariate model. The second design compares data from a multicenter chemotherapy
trial with data from a large transplant registry. Here analysis complicated
by the registry only observing patients who are transplanted so adjustments
needs to be made for patients who die or relapse while waiting for transplant.
Corrections suggested for this source of bias are a matching technique,
inclusion of a time dependent covariate and left truncated Cox model. We
examine these techniques through a Monte Carlo study and compare how much
information is lost by using registry data as compare to a generally randomized
trial.
SESSION IV-A
Modeling Reliability Growth and Repairable Systems.
Larry H. Crow, Advanced Technology Systems.
Most complex systems, for example automobiles, aircraft, communication
systems, etc. are repaired and not replaced when they fail. This presentation
will discuss models for evaluating the reliability of repairable systems
during two key life cycle phases: reliability growth development testing
and customer field use. These models, developed by Crow (1973, 1974), are
based on the non homogeneous Poisson Process and are widely used in government
and industry worldwide. Background on the development of these models will
be presented in addition to statistical methods for data analysis of reliability
growth and field data. Numerical examples illustrating these procedures
will be discussed.
A Brief Survey of Limit Theorems for Negatively
Dependent Random Variables
Robert L. Taylor, University of Georgia.
Independence has been a common assumption for most of the basic
results in probability and statistics as illustrated by the classical laws
of large numbers and central limit theorems. Increasing attention has developed
in recent years in obtaining more realistic and applicable models where
alternatives to the assumption of independence are considered. Negative
dependence is one alternative to the usual assumption of independence.
This talk will be a brief survey of recent laws of large numbers and central
limit theorems for negatively dependence random variables. The classical
results for independent random variables will be used a template for comparisons
of results for negatively dependent random variables, and examples will
be provided to illustrate the differences in the results as well as the
sharpness of the results.
An Exponential Order Statistics Representation
and Its Implications.
James Lynch, University of South Carolina.
A threshold representation is given for exponential order statistics
X1;n<X2;n<...<Xn;n.
This representation is used to get bounds relating the asymptotic distribution
of Xk;n to its actual distribution
and to obtain the asymptotic behavior of the extremal process
when rn=o(n2/3). In addition,
the asymptotic behavior of the mixing parameter in the threshold representation
is also obtained.
Multivariate Probit and Logit Models for Binary and
Ordinal
Response Data with Covariates.
Harry Joe, University of British Columbia.
Multivariate models for multivariate binary or ordinal response (with
covariates) will be discussed; these generalize the logit and probit models
for univariate binary and ordinal response. These models are based on a
multivariate normal or multivariate logistic distribution. Specific models
in the statistical literature will be presented, together with discussions
of the computations for the models, and examples where I have used the
models. Some open research problems will be mentioned.
Ranked Set Sampling From the Location-Scale
Families of Distributions.
Ram Tiwari, University of North Carolina at Charlotte.
In situations where the units drawn from a population are difficult
or expensive to be quantified but are easily ranked without requiring their
actual measurements, the ranked set sampling procedure provides an unbiased
estimator of the population mean that is more efficient than the one based
on a simple random sample. Essentially, the procedure involves randomly
drawing a set of n units from the underlying population, where n
is often a small number, ranking the units visually or according to the
characteristic of interest and then quantifying the unit ranked the lowest.
A second set of n units is drawn, ranked and the unit ranked the
second lowest is quantified. This process is continued until a complete
cycle is accomplished wherein, at the nth stage, a set of n
units is drawn, ranked and the unit ranked the largest is quantified. The
sample thus obtained is called a (balanced) ranked set sample. The entire
cycle sequence is often repeated several times. The estimator of the population
mean based on the mean of the quantified units in the ranked set sample
is unbiased and has smaller variance than the mean of a simple random sample
of the same size. The work based on the ranked set sampling has primarily
been in nonparametric setup, however, the sampling procedure has provided
an improved estimator of the population mean when the population is partially
known. In this paper, we specifically consider the location-scale families
of distributions and derive unbiased estimators of these parameters based
on r independent replications of a ranked set sample of size n.
The large sample properties of these estimators as the number of replications,
r,
tends to infinity are obtained. The asymptotic relative efficiency is studied.
SESSION IV-B
Statistics, Statisticians and Cancer Research.
Bill Blot, International Epidemiology Institute.
A brief overview of the critical role of statistics in research to identify
the causes of cancer and quantify risks associated with various environmental
and host factors will be presented. Cancer is now the leading cause of
death among Americans age below 80. Since chemotherapeutic and other treatments
are only marginally successful for a number of cancers, the key to reducing
the burden of this disease is prevention. Statistics and epidemiology are
the basic sciences essential for the population studies needed to discover
causative factors so that measures aimed at prevention can be developed.
Examples will be provided of statistics import in the design, analysis
and interpretation of studies into the causes of major cancers in the United
States and around the world.
A Factorization of a Positive Definite Matrix with
Applications.
N. Rao Chaganty, Old Dominion University.
There are several decompositions of a positive definite matrix A, in
the literature. But not so well known is the unique factorization A=R
LR
where
R
is a correlation matrix and L is a diagonal
matrix. In this talk I will present an algorithm and discuss probability
and statistical applications of the factorization.
On Bivariate Sign Tests.
Ron Randles, University of Florida.
This talk will review approaches to constructing bivariate sign tests.
It will compare the procedures in the literature, the assumptions each
makes about the underlying distribution and the distribution-free properties.
Nonparametric mixture models for incorporation of
heterogeneity into multiple capture studies,
Jim Norris, Wake Forest University.
For multiple capture studies, we utilize nonparametric mixture models to allow flexible, random differences in the animals' "capture rates". First, we extend Lindsay and Roeder's nonparametric mixture MLE results to the case of an unknown sample size, and utilize the resulting full probability model mle estimate for many things, including goodness-of-fit tests, tests on heterogeneity, and variability estimates. Secondly, we emphasize the applicability of our results for examining diversity. Lastly, we examine the potential of the nonparametric mixture likelihood for Bayesian analysis.