About GraphSpider and MPL
A syntactic pattern matching framework.
Plain text can easily be searched with regular expressions, and various tools exist to make more complex searches over linguistically-annotated text that can be constrained by features such as part-of-speech tags or constituent tree structures.
MPL (Meta-Pattern Language) is a new pattern-description language that allows queries to be written over syntactic dependency graphs produced by the Stanford NLP Group’s Java toolkit, and GraphSpider is a tool to search parsed text using MPL queries.
Although MPL is general-purpose and not restricted to any particular task or domain of text, it was developed in the context of genomic information extraction, where GraphSpider has been shown capable of very high-precision results through the use of carefully-engineered query patterns. It also has potential applications in interactive text mining and corpus analytics.
GraphSpider can be used as either a scriptable command-line tool or a Java library, and is freely available as a research and development tool for the natural language processing community.
- Read parsed text in treebank or dependency graph format
- Search for sentences or words matching user’s MPL queries
- Integration with Stanford toolkit
- Built-in noun-phrase chunking
- Plugin system for arbitrary post-processing of results