Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Use of Vapnik-Chervonenkis Dimension in Model Selection

Merlin T Mpoudeu, University of Nebraska - Lincoln

Abstract

In this dissertation, I derive a new method to estimate the Vapnik-Chervonenkis Dimension (VCD) for the class of linear functions. This method is inspired by the technique developed by Vapnik et al. Vapnik et al. (1994). My contribution rests on the approximation of the expected maximum difference between two empirical Losses (EMDBTEL). In fact, I use a cross-validated form of the error to compute the EMDBTEL, and I make the bound on the EMDBTEL tighter by minimizing a constant in of its right upper bound. I also derive two bounds for the true unknown risk using the additive (\ERM1) and the multiplicative (\ERM2) Chernoff bounds. These bounds depend on the estimated VCD and the empirical risk. These bounds can be used to perform model selection and to declare with high probability, the chosen model will perform better without making strong assumptions about the data generating process (DG). I measure the accuracy of my technique on simulated datasets and also on three real datasets. The model selection provided by VCD was always as good as if not better than the other methods under reasonable conditions. To understand the behavior of my method, I introduced the concept of ‘consistency at the true model’. I compare my technique to established model selection techniques such as two forms of empirical risks minimization (ERM’s), the Bayes Information Criterion (BIC), Smoothly Clipped Absolute Deviation (SCAD, see Fan and Li (2001)), and Adaptive LASSO (ALASSO, see Zou (2006a)). On simulations, we conclude that when design points are well chosen and the sample size is sufficient, our method performs as well as BIC and better than ERM’s. On real datasets with dependence structure, our method performs better than BIC. As a generality, sparsely methods such as SCAD and ALASSO, gave models that were so oversimplified as to be invalid. Thus under reasonable conditions on design points and sample size, our method never gave models that were obviously deficient compared to those of other methods.

Subject Area

Statistics

Recommended Citation

Mpoudeu, Merlin T, "Use of Vapnik-Chervonenkis Dimension in Model Selection" (2017). ETD collection for University of Nebraska-Lincoln. AAI10615871.
https://digitalcommons.unl.edu/dissertations/AAI10615871

Share

COinS