Research using VAM typically occurs in school systems with a large number of students (e.g. New York City, Los Angeles, Chicago, etc.) or in statewide assessments that are combined across school districts (e.g. Tennessee). VAM performance in school systems with small numbers of students is unknown.

One common issue with estimation based on small samples is lack of precision. An area of statistics that has developed methodology for small sample sizes is small area estimation. One approach in this area is indirect estimation which links similar subjects together allowing the small groups to “borrow strength” from each other.

This dissertation introduces a multi-stage model that incorporates small area estimation techniques with the traditional TVAAS. The performance of both the multi-stage and TVAAS models are studied through data simulated for small school systems. The precision of predicted teacher value added scores is assessed for both modeling methods.

Adviser: Walter W. Stroup, Erin E. Blankenship

]]>Two models based on breed specific haplotype clusters where developed to account for differences across multiple breeds. The first model utilizes the breed composition of the individual, while the second utilizes the breed composition from the sire and dam. Haplotype clusters were modeled as hidden states in a hidden Markov model where the genomic effects are associated with loci located on the unobserved clusters. Similar to the Bayes C model, we can model the genomic effects at the loci using a prior, which consists of a mixture of a multivariate normal and a point mass at zero distribution.

The performance of the first model will be evaluated in a composite beef cattle population, representing various fractions of several breeds, using five weight traits, seven carcass traits, and two other traits related to calving on 6,552 cattle genotyped for 99,827 mapped SNPs. The performance of the second model will be evaluated in a two-way cross population, which was a cross between two independent lines, using age of puberty records on 1,654 swine genotyped for 48,408 mapped SNPs. Both models will also be evaluated in a simulated composite population of two lines of 12,500 individuals and 61,255 mapped SNPs.

Overall, the breed specific haplotype models led to larger and more clearly observed estimated QTL. However, the prediction accuracy for the haplotype models were typically lower than those for the traditional Bayesian GWAS models. Therefore, while our ability to locate QTLs was increased, the traditional models are still the preferred choice for prediction as they have higher prediction accuracy when it comes to estimating an animal’s genetic merit.

]]>This dissertation consists of simulated investigations into frequentist and ethical properties of an new RAR biased coin design. Chapter 2 proposes a new adaptive design for phase III clinical trials, a modification of the 2001 Bandyopadhyay and Biswas biased coin design. Simulations show how the new design continues to ethically expose patients to the better treatment while simultaneously mitigating power loss inherent in the original design. Chapters 2 and 3 expand the applicability of the new design to scenarios where treatment variances or covariate-treatment impacts are unequal. In Chapter 4, simulations demonstrate that the new response-adaptive biased coin design can be more ethical than equal allocation, even when patient outcomes are not immediately available. Each chapter illustrates the utility and benefits of the new design through a real-world application of an HIV treatment adherence intervention. Asymptotic results are applied to a special case of the BBS design and small sample implications are compared with simulated outcomes in Chapter 5.

Adviser: Kent M. Eskridge

]]>Advisor: David B. Marx

]]>Advisor: Anne M. Parkhurst

]]>Adviser: David Marx

]]>Adviser: Walter W. Stroup

]]>Statistical methods are the main data analysis technique used for developing quantitative predictions in the life sciences, but these methods are rarely applied to long-term datasets because the methods are underdeveloped in most cases. This underdevelopment of statistical methods and applications was the motivation for my research. In Chapter 1, I develop a time series analysis method for populations that accounts for errors in detection. In Chapter 2, I develop and apply a variety of methods to predict an extinction threshold using long-term monitoring data from a population of bobwhite quail (*Colinus virginianus*). In Chapter 3, I link the unified framework of missing data developed in the statistical literature to species distribution modelling, which is a common method used to analyze historical location reports of a species. In Chapter 4 I introduce an example using location records of one of the rarest avian species in the world—the whooping crane (*Grus americana*). The whooping crane location records were imprecisely recorded, and in Chapter 4, I extend regression calibration methods to correct for the location error. In Chapter 5, I explore when a commonly used statistical estimation method will fail for analyses using historical location records; I then test several alternative estimation methods. Finally, in Chapter 6, I present an application by predicting the spatial and temporal distribution of whooping cranes using historical location records. This application was developed to determine what habitat is used by whooping cranes during migration and what habitat may require special protection to ensure survival of the species.

Advisors: Erin E. Blankenship and Richard A.J. Tyre

]]>A simulation study was conducted using a closed network made up of ten nodes and three different edge density values (low, moderate, and high) to randomly generate the edges (connections) between nodes. A Poisson AR(1) process was used to generate the number of communications between nodes at each time period. Changes were then randomly assigned in time periods 26 and 52, and the aR^{2}’s calculated between adjacent time periods. A separate simulation was conducted for each combination of edge density (3 levels), AR(1) correlation parameter (3 levels), number of edges perturbed (3 levels), perturbation factor (3 levels), time period of perturbation (2 levels), and configuration dimension (2 levels). The results suggest that under these conditions the method as proposed has reasonable power for detecting “abnormal” changes in the number of communications.

Adviser: David B. Marx

]]>Advisor: Christopher R. Bilder

]]>For the first problem, we examine group testing regression models when identification of the positive and negative statuses for individuals is performed. The identification aspect leads to additional tests, known as “retests,” beyond those performed for initial groups of individuals. We show how regression models can be fit in this setting while also incorporating the extra information from these retests. Through Monte Carlo simulations, we present evidence that significant gains in efficiency occur by incorporating retesting information. Furthermore, we demonstrate that some group testing protocols can actually lead to more efficient estimates than individual testing when diagnostic tests are imperfect. Finally, we show that halving and matrix testing protocols are the most efficient to use in application.

For the second problem, we consider situations when individuals are tested in groups for multiple diseases simultaneously. This problem is important because assays frequently screen for more than one disease at a time. When these assays are used in a group testing setting, the individual positive/negative statuses consist of unobserved, correlated random variables. To estimate models in this setting, we develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures.

Advisor: Christopher R. Bilder

]]>Advisor: Anne M. Parkhurst

]]>To investigate the differences between weighted kriging and ordinary kriging, a simulation study was conducted. Validation statistics were used to evaluate and compare the prediction procedures, and it was found that weighted kriging yields more desirable results than traditional kriging methods. As a follow-up, the prediction procedures were compared using real data from a groundwater quality study.

Bayesian Maximum Entropy (BME) is then introduced as an alternative method to utilize soft data in prediction. Numerical implementation of this approach is possible with the Spatiotemporal Epistemic Knowledge Synthesis-Graphical User Interface (SEKS-GUI). Using this interface, two simulation studies were conducted to investigate the differences between BME and weighted kriging. In the first study, probabilistic soft data in the form of the Gaussian distribution were used. However, since proponents of the BME approach claim that it performs extremely well when the soft data are skewed, the second study used nonsymmetrical soft data generated using a triangular distribution. In both studies, the weighted kriging validation statistics were more desirable than those from BME.

Advisor: David B. Marx

]]>We then present a stochastic model named Multi-Order Markov Model under Hidden States (MMMHS) for representing heterogeneous sequences. MMMHS is similar to the conventional Hidden Markov Model (HMM) and Double Chain Markov Model (DCMM) in terms of using hidden states to describe the non-homogeneity of a sequence, but it provides a more flexible dependency structure by changing the order of Markov dependency under different hidden states. We extend the forward-backward procedure to MMMHS and provide the complete model estimation procedure based on Expectation-Maximization (EM) algorithm. The method is then illustrated with applications on several real data sets, and the results are compared with that of traditional methods.

]]>The first paper examines spatial clustering when only one numeric response has been recorded for each observation. The geographic or spatial location is incorporated into the likelihood of the multivariate normal distribution through the variance-covariance matrix. The variance-covariance matrix is computed using any appropriate spatial covariance function, although the spherical covariance function was used for this research. The second paper extends the clustering algorithm to the multivariate case, i.e. when more than one response has been recorded on each observation. Again, the spatial location is incorporated through the variance-covariance matrix of the multivariate normal distribution. However, the actual construction of the variance-covariance matrix must take into account the cross-covariance between the variates. Oliver’s (2003) approach for modeling the cross-covariance is incorporated into the clustering algorithm.

Since not all recorded variables of interest are numeric, the third paper investigates incorporating categorical (non-numeric) responses into the spatial clustering algorithm. This paper looks first at the case where only categorical responses are recorded on the observations. After this has been implemented, the final step is to spatially cluster observations which contain both numeric and categorical responses. The algorithm must account for the spatial pattern of the data, the actual numeric responses and the categorical responses, and an appropriate weighting of the spatial component is determined. The final clustering algorithm clusters both numeric and categorical data while incorporating the geographic location of the observations.

]]>