Chinese Parser

Chinese Parser

In order to derive a high precision and broad coverage Chinese grammar, we study grammar extraction, generalization, and specialization. The design of the CKIP parser is based on PCFG (Probabilistic Context-free Grammar) and is refined with probabilities of word-to-word association in disambiguation. A semantic-role assignment capability is also incorporated into the system. The complete parser consists of the preprocessing systems of the word segmenter and pos-tagger built by CKIP.

System Implementation

Grammar extraction
Grammar rules and associated probabilities are extracted from Sinica Treebank. To derive a high precision grammar with broad coverage, we need to find effective ways to generalize, as well as specialize, grammar rules.
Parser design
A major challenge is how to select the best structure from numerous ambiguous structures effectively. In addition to rule probability, we take word-to-word associations into account for structure preference evaluation.

Research Results

A Chinese parser, including word segmentation/POS tagging/parsing/role assignment, has been completed.

Example:

我們都喜歡蝴蝶
We all like butterfly
我們(Nh) 都(D) 喜歡(VK) 蝴蝶(Na)
We(Nh) all(D) like(VK) butterfly(Na)
parser_ex Different generalization and specialization of rules influence performance of the parser significantly. With proper generalization and specialization, we can improve the labeling F-score from 81.45% to 86.14% (Hsieh et al., 2004).

Online Demos

CKIP CoreNLP

CKIP CoreNLP

CKIP CoreNLP provides a set of human language technology tools — word segmentation, sentence parsing, name-entity recognition, and corerference detection.

Demo
Chinese Parser

Chinese Parser

直接輸入中文句子,系統會自動進行句子斷詞標記/句子剖析/角色指派的動作,最後將結果顯示出來,其內容包含輸入文本/斷詞標記/句子剖析等資訊。

Demo

Resources

Publications

References

Researchers and Developers

謝佑明、楊敦淇