Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Representing Relationships between Variables from Discrete, Continuous and Mixed Data with Graphical Models

Haluk Dogan, University of Nebraska - Lincoln


The world is very complex, uncertain, and hard to understand. Our innate capacity for describing the phenomena with simple stories, and interpreting them with narratives often help science and technology to overcome these difficulties. Probability is a well-established scientific tool that we use everyday to make inferences and draw conclusions. Probabilistic graphical models (PGMs) are powerful frameworks that helps us formalize phenomena to do reasoning, inference, and learning on a formal mathematical ground. They are highly flexible and extensible. Graphical models are a marriage between graph and probability theories. Graph theory provides a powerful, compact, and intuitive representation. Probability theory allows us to incorporate uncertainty in our models. In this dissertation, we focus on PGMs and use undirected, specifically Gaussian graphical models (GGMs), and directed, specifically Bayesian networks (BNs). We discovered hidden structural patterns in groups using these models. Our motivation is to explain disease progression in breast cancer through these structural patterns that differ in cancer stages. We conduct an ontology analysis to explain the roles of these patterns with well-studied known processes. In so doing, we imposed sparsity assumptions and to highlight the most important differences, we assumed the presence of high similarity between groups. We found out that detrimental changes take place in later stages of cancer. We listed novel candidate changes effective in the development cancer. In the digital age, there is wealth of text data. On the other hand, human comprehension is limited due to time and complexity of this data. Topic modeling can help in modeling this data to draw associations and discover patterns. In our last project, we use latent Dirichlet allocation (LDA) to find out emerging patterns in electronic health records (EHRs). Our model recovered keywords known to be associated with particular diseases such as heart disease. We analyzed a large volume of EHRs and learned lower dimensional representations of diseases. In this low dimensional space, diseases with similar symptoms were found to be close to each other.

Subject Area

Computer science|Information Technology|Science education|Applied Mathematics|Oncology|Information science|Organization Theory

Recommended Citation

Dogan, Haluk, "Representing Relationships between Variables from Discrete, Continuous and Mixed Data with Graphical Models" (2021). ETD collection for University of Nebraska - Lincoln. AAI28713045.