Patent applications in the United States are classified by technology in a large database but search results often leave out related technologies. In a project for an investment banking firm in New York City, this team used a topic modeling technique called Latent Dirichlet Allocation (LDA) to analyze the text of all utility patents filed in 2014 to infer their underlying themes. They found that their technique nicely complemented the U.S. Patent and Trademark Office’s classification system to provide a fuller picture of overlapping technologies. The team used Python, Kibana, Elastic Search and Shiny to explore, validate and visualize their data and results.
Students: Gabrielle Agrocostea, Francisco Arceo, Abdus Khan, Justin Law and Tony Paek.
The team’s algorithm classified patents into multiple categories, complementing the USPTO's classification system. It picked up some novel categories, including "computer systems," represented by Topic 1 above, which the team discovered had underlying ties to patents related to medicine/cancer and hardware patents (frames, rails, brackets), Topics 4 and 10 respectively.