Semi-automated exploration and extraction of data in scientific tables

Wednesday, September 26, 2018 - 5:00pm to 6:30pm
Columbia University
New York, NY 10027
United States

Columbia Data Science Institute Industry Innovation Seminars

Ron Daniel, Jessica Cox, Corey Harper

Most of the experimental results reported in scientific articles, and recorded in databases or in supplements to the article, are provided in tables. Unfortunately, the amazing recent progress in natural language understanding is of little help if we want to automatically understand those tables. Tables are, after all, not your grandmother’s natural language. Despite this, we believe significant progress can be made towards the goal of combining tables of related information into larger sets that can be analyzed, visualized, understood, and used as the basis for decisions. Elsevier Labs is prototyping tools to help guide people in the exploration of tables from many articles and the extraction and merging of the data they contain. This talk will show examples of what has been accomplished by manually merging such data. With those as examples of the desired outcomes, we will describe our experiments to duplicate such examples, the work flow in which they operate, and our most recent results.

