In order to derive a high precision and broad coverage Chinese grammar, we study grammar extraction, generalization, and specialization. The design of the CKIP parser is based on PCFG (Probabilistic Context-free Grammar) and is refined with probabilities of word-to-word association in disambiguation. A semantic-role assignment capability is also incorporated into the system. The complete parser consists of the preprocessing systems of the word segmenter and pos-tagger built by CKIP.
- Grammar extraction
- Grammar rules and associated probabilities are extracted from Sinica Treebank. To derive a high precision grammar with broad coverage, we need to find effective ways to generalize, as well as specialize, grammar rules.
- Parser design
- A major challenge is how to select the best structure from numerous ambiguous structures effectively. In addition to rule probability, we take word-to-word associations into account for structure preference evaluation.
A Chinese parser, including word segmentation/POS tagging/parsing/role assignment, has been completed.
We all like butterfly
我們(Nh) 都(D) 喜歡(VK) 蝴蝶(Na)Different generalization and specialization of rules influence performance of the parser significantly. With proper generalization and specialization, we can improve the labeling F-score from 81.45% to 86.14% (Hsieh et al., 2004).
We(Nh) all(D) like(VK) butterfly(Na)
- Yu-Ming Hsieh, Wei-Yun Ma. “N-best Rescoring for Parsing Based on Dependency-Based Word Embeddings”. IJCLCLP, Vol. 21, No. 2, pp. 19–34, Dec 2016.
- 詞庫小組. “句結構樹中的語意角色”. No. 13-01, Jan 2013.
- Yu-Ming Hsieh, Ming-Hong Bai, Jason S. Chang, Keh-Jiann Chen. “Improving PCFG Chinese Parsing with Context-Dependent Probability Re-estimation”. CIPS-SIGHAN, Dec 2012.
- Duen-Chi Yang, Yu-Ming Hsieh, Keh-Jiann Chen. “Resolving Ambiguities of Chinese Conjunctive Structures by Divide-and-Conquer Approaches”. IJCNLP, Jan 2008.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Improve Parsing Performance by Self-Learning”. IJCLCLP, Vol. 12, No. 2, pp. 192–216, Jun 2007.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Improve Parsing Performance by Self-Learning”. ROCLING, Sep 2006.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Linguistically-Motivated Grammar Extraction, Generalization and Adaptation”. IJCNLP, Oct 2005.
- Yu-Ming Hsieh, Duen-Chi Yang, Keh-Jiann Chen. “Grammar Extraction, Generalization and Specialization”. ROCLING, Sep 2004.
- Chen Keh-Jiann, Yu-Ming Hsieh. “Chinese Treebanks and Grammar Extraction”. IJCNLP, Mar 2004.
- Jia-Ming You, Yu-Ming Hsieh. “Automatic Semantic Role Assignment for a Tree Structure”. SIGHAN, Jul 2004.