U.S. Department of Health and Human Services


Date of this Version


Document Type



JOURNAL OF BIOPHARMACEUTICAL STATISTICS 2019, VOL. 29, NO. 5, 860–873 https://doi.org/10.1080/10543406.2019.1657134


U.S. government works are not subject to copyright.


Background During the past two decades, the number and complexity of clinical trials have risen dramatically increasing the difficulty of choosing sites for inspection. FDA’s resources are limited and so sites should be chosen with care.

Purpose To determine if data mining techniques and/or unsupervised statistical monitoring can assist with the process of identifying potential clinical sites for inspection.

Methods Five summary-level clinical site datasets from four new drug applications (NDA) and one biologics license application (BLA), where the FDA had performed or had planned site inspections, were used. The num- ber of sites inspected and the results of the inspections were blinded to the researchers. Five supervised learning models from the previous two years (2016–2017) of an on-going research project were used to predict site inspections results, i.e., No Action Indicated (NAI), Voluntary Action Indicated (VAI), or Official Action Indicated (OAI). Statistical Monitoring Applied to Research Trials (SMARTTM) software for unsupervised statistical monitoring software developed by CluePoints (Mont-Saint-Guibert, Belgium) was utilized to identify atypical centers (via a p-value approach) within a study.Finally, Clinical Investigator Site Selection Tool (CISST), devel- oped by the Center for Drug Evaluation and Research (CDER), was used to calculate the total risk of each site thereby providing a framework for site selection. The agreement between the predictions of these methods was compared. The overall accuracy and sensitivity of the methods were gra- phically compared.

Results Spearman’s rank order correlation was used to examine the agree- ment between the SMARTTM analysis (CluePoints’ software) and the CISST analysis. The average aggregated correlation between the p-values (SMARTTM) and total risk scores (CISST) for all five studies was 0.21, and range from −0.41 to 0.50. The Random Forest models for 2016 and 2017 showed the highest aggregated mean agreement (65.1%) amongst out- comes (NAI, VAI, OAI) for the three available studies. While there does not appear to be a single most accurate approach, the performance of methods under certain circumstances is discussed later in this paper.

Limitations Classifier models based on data mining techniques require historical data (i.e., training data) to develop the model. There is a possibility that sites in the five-summary level datasets were included in the training datasets for the models from the previous year’s research which could result in spurious confirmation of predictive ability. Additionally, the CISST was utilized in three of the five site selection processes, possibly biasing the data.

Conclusion The agreement between methods was lower than expected and no single method emerged as the most accurate.