In order to derive a high precision and broad coverage Chinese grammar, we study grammar extraction, generalization, and specialization. The design of the CKIP parser is based on PCFG (Probabilistic Context-free Grammar) and is refined with probabilities of word-to-word association in disambiguation. A semantic-role assignment capability is also incorporated into the system. The complete parser consists of the preprocessing systems of the word segmenter and pos-tagger built by CKIP.

  • Grammar extraction: Grammar rules and associated probabilities are extracted from Sinica Treebank. To derive a high precision grammar with broad coverage, we need to find effective ways to generalize, as well as specialize, grammar rules.
  • Parser design: A major challenge is how to select the best structure from numerous ambiguous structures effectively. In addition to rule probability, we take word-to-word associations into account for structure preference evaluation

A Chinese parser, including word segmentation/POS tagging/parsing/role assignment, has been completed.

We(Nh) all(D) like(VK) butterfly(Na).

  • Different generalization and specialization of rules influence performance of the parser significantly. With proper generalization and specialization, we can improve the labeling F-score from 81.45% to 86.14% (Hsieh etc., 2004).

A demo version of the Chinese Parser Server is available to the public at


Duen-Chi Yang, Yu-Ming Hsieh, and Keh-Jiann Chen, 2008, Resolving Ambiguities of Chinese Conjunctive Structures by Divide-and-conquer Approaches, IJCNLP2008

Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen, 2007, Improve Parsing Performance by Self-Learning, Computational Linguistics and Chinese Language Processing, vol. 12, No. 2, June 2007, pp.195-216

Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen, 2006,Improve Parsing Performance by Self-Learning, Proceedings of ROCLING XVIII, pp.63-76. 2006

Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen, 2005, Linguistically-Motivated Grammar Extraction, Generalization and Adaptation, The Second International Joint Conference on Natural Language Processing (IJCNLP-05).

Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen, 2004, Grammar Extraction, Generalization or Specialization, Proceedings of ROCLING XVI, pp.141-150

Keh-Jiann Chen, Yu-Ming Hsieh, 2004, Chinese Treebanks and Grammar Extraction, Proceedings of IJCNLP-04, pp560-565

You, Jia-Ming, Keh-Jiann Chen, 2004, Automatic Semantic Role Assignment for a Tree Structure, Proceedings of SIGHAN workshop


