Graduate Studies
First Advisor
Bertrand Clarke
Degree Name
Doctor of Philosophy (Ph.D.)
Committee Members
Souparno Ghosh, Vinod Variyam, Xueheng Shi
Department
Statistics
Date of this Version
8-2025
Document Type
Dissertation
Citation
A dissertation presented to the Graduate College of the University of Nebraska in partial fulfillment of requirements for the degree of Doctor of Philosophy
Major: Statistics
Under the supervision of Professor Bertrand Clarke
Lincoln, Nebraska, August 2025
Abstract
We present two new approaches for point prediction with streaming data based on a) the Count-Min sketch and b) Gaussian Process Priors with random bias. The methods are intended for the most general case where no true model can be usefully formulated for the data stream. In statistical contexts, this is often called the M open problem class. For the Count Min Sketch method we show that the predicted distribution function ^F converges to F under the assumption that the data consists of i.i.d samples from a fixed distribution function F. To implement the Gaussian Process Prior methods, we used representative subsets based on streaming K− means to keep the dimension of the variance matrices bounded.
We form four versions of our hash function based predictor (HBP) and compare them to six other predictors that are based on existing methods. Four are fully Bayesian and the other two are derived from the Shtarkov solution. For comparisons we use real data. In our experiments, we use absolute cumulative error as criterion for predictive success. Preliminary experiments suggest that the one-pass median version of our method performs the best compared to any other methods for data whose complexity is not based on spread. For spread-complex data, often Gaussian Processes are best. In some cases when our method is best, some of the Shtarkov methods are comparable. We argue that when this happens, our method is preferred over Shtarkov methods because of its simplicity.
Advisor: Bertrand Clarke
Recommended Citation
Chanda, Aleena, "Online Prediction of Streaming Data" (2025). Dissertations and Doctoral Documents from University of Nebraska-Lincoln, 2023–. 325.
https://digitalcommons.unl.edu/dissunl/325
Included in
Applied Statistics Commons, Categorical Data Analysis Commons, Databases and Information Systems Commons, Data Science Commons
Comments
Copyright 2025, the author. Used by permission