Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

The Generalized Pareto-Negative Binomial Model in Finding Relationships between Text Data and Quantitative Data

Nur Firyal Roslan, University of Nebraska - Lincoln


The Generalized Pareto-Negative Binomial (GP-NB) model was introduced to find the connections between text data and quantitative data. The assumptions for the model were word counts are Poisson distributed, quantitative financial variables are Gamma distributed and both are conditional on Gamma distributed latent variables. These assumptions resulted in a Negative Multinomial distribution for word counts and a Generalized Pareto distribution for the quantitative financial variables. Model parameters were estimated using Maximum Likelihood (MLE) and Quasi-Likelihood (QL) Estimation. Based on simulated data, we compared the bias and standard errors of the estimation methods and found no one method was best. We evaluated the model in predicting financial ratios, portfolio allocation and interpreting text data. In predicting financial ratios, the conditional MLE was better in terms of the Mean Square Prediction Error (MSPE) and Thiel's coefficient compared to the random walk and Multivariate Generalized Linear Model - Quasi Likelihood (MGLM-QL) method. Then using the mean-variance analysis to find financial portfolios, we found that the Generalized Pareto (GP) distribution combined with Twitter data based on MGLM-QL model reduced the downside risk which was calculated based on the conditional Value-at-Risk (CVaR). Lastly, we used U.S state governors’ speeches and state economic data from Bikienga (2018) to evaluate the models’ interpretative ability and found that the model explained considerably more variation than a CCA-PCA approach and found that governors who spoke more about services for people tended to be associated with increased public welfare budgets.

Subject Area


Recommended Citation

Roslan, Nur Firyal, "The Generalized Pareto-Negative Binomial Model in Finding Relationships between Text Data and Quantitative Data" (2019). ETD collection for University of Nebraska - Lincoln. AAI22584432.