Event

February 26-29, 2012 Bethesda, Maryland

Genomic Sciences Program (GSP) 2012

Contractors-Grantees Meeting X
Clustering of patents by organism and technology classification. Preliminary experiments using the EPO Green technology patent collection from Fairview Research (n=380,000 patents) reveal the potential power of Semiotic Fingerprinting. A set of patents containing prokaryotic names (n=3,900) was produced using the N4L:: PatentScribe, which also extracts vectors of patent metadata (i.e., inventor, assignee, patent classification, patent authority, citations). The resulting similarity matrix was clustered, visualized as a heatmap, and output as an ordered list of patent IDs.
Clustering of patents by organism and technology classification. Preliminary experiments using the EPO Green technology patent collection from Fairview Research (n=380,000 patents) reveal the potential power of Semiotic Fingerprinting. A set of patents containing prokaryotic names (n=3,900) was produced using the N4L:: PatentScribe, which also extracts vectors of patent metadata (i.e., inventor, assignee, patent classification, patent authority, citations). The resulting similarity matrix was clustered, visualized as a heatmap, and output as an ordered list of patent IDs.

Charles Parker and George Garrity will be presenting poster 228 (“The NamesforLife Semantic Index of Phenotypic and Genotypic Data”, Abstracts Book, pages 183-184) during the Monday evening mixer (5:30pm-8:00pm) in the Grand Ballroom. We will be highlighting our team’s recent research on text mining and automated vocabulary extraction.

The long-term objective of this STTR project is to develop a semantic index of bacterial and archaeal phenotypes that can be used to augment annotation efforts and to provide a basis for predictive modeling of microbial phenotype. The index is based on published descriptions of taxonomic type and non-type strains that have been the subject of ongoing genome sequencing efforts as this will provide a mechanism whereby hypotheses can be tested and reproducibility verified. This project is tightly coupled with ongoing DOE projects (Genomic Encyclopedia of Bacteria and Archaea, the Microbial Earth Project, the Community Sequencing Project) and with two key publications, Standards in Genomic Sciences and the International Journal of Systematic and Evolutionary Microbiology. The first step towards accomplishing this goal, and the primary objective of this Phase I project is the development of a draft vocabulary.

Parker et al., “The NamesforLife Semantic Index of Phenotypic and Genotypic Data

Download Poster (2MB PDF)

[permalink] Posted February 24, 2012.

Back to top