Computing, School of

 

School of Computing: Conference and Workshop Papers

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Document Type

Article

Citation

G. Nagy, D. W. Embley, Seth Seth, End-to-End Conversion of HTML Tables for Populating a Relational Database, Proc. DAS 2014, Tours, France, 2014.

Abstract

Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).

Share

COinS