Chinese Parser
In order to derive a high precision and broad coverage Chinese grammar, we study grammar extraction, generalization, and specialization. The design of the CKIP parser is based on PCFG (Probabilistic Context-free Grammar) and is refined with probabilities of word-to-word association in disambiguation. A semantic-role assignment capability is also incorporated into the system. The complete parser consists of the preprocessing systems of the word segmenter and pos-tagger built by CKIP.
System Implementation
- Grammar extraction
- Grammar rules and associated probabilities are extracted from Sinica Treebank. To derive a high precision grammar with broad coverage, we need to find effective ways to generalize, as well as specialize, grammar rules.
- Parser design
- A major challenge is how to select the best structure from numerous ambiguous structures effectively. In addition to rule probability, we take word-to-word associations into account for structure preference evaluation.
Research Results
A Chinese parser, including word segmentation/POS tagging/parsing/role assignment, has been completed.
Example:
我們都喜歡蝴蝶
We all like butterfly
我們(Nh) 都(D) 喜歡(VK) 蝴蝶(Na)Different generalization and specialization of rules influence performance of the parser significantly. With proper generalization and specialization, we can improve the labeling F-score from 81.45% to 86.14% (Hsieh et al., 2004).
We(Nh) all(D) like(VK) butterfly(Na)
Online Demos
Resources
Publications
- Yu-Ming Hsieh, Wei-Yun Ma. “N-best Rescoring for Parsing Based on Dependency-Based Word Embeddings”. IJCLCLP, Vol. 21, No. 2, pp. 19–34, Dec 2016.
- 詞庫小組. “句結構樹中的語意角色”. No. 13-01, Jan 2013.
- Yu-Ming Hsieh, Ming-Hong Bai, Jason S. Chang, Keh-Jiann Chen. “Improving PCFG Chinese Parsing with Context-Dependent Probability Re-estimation”. CIPS-SIGHAN, Dec 2012.
- Duen-Chi Yang, Yu-Ming Hsieh, Keh-Jiann Chen. “Resolving Ambiguities of Chinese Conjunctive Structures by Divide-and-Conquer Approaches”. IJCNLP, Jan 2008.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Improve Parsing Performance by Self-Learning”. IJCLCLP, Vol. 12, No. 2, pp. 192–216, Jun 2007.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Improve Parsing Performance by Self-Learning”. ROCLING, Sep 2006.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Linguistically-Motivated Grammar Extraction, Generalization and Adaptation”. IJCNLP, Oct 2005.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Grammar Extraction, Generalization and Specialization”. ROCLING, Sep 2004.
- Chen Keh-Jiann, Yu-Ming Hsieh. “Chinese Treebanks and Grammar Extraction”. IJCNLP, Mar 2004.
- Jia-Ming You, Yu-Ming Hsieh. “Automatic Semantic Role Assignment for a Tree Structure”. SIGHAN, Jul 2004.
References
Researchers and Developers
謝佑明、楊敦淇