The goal of Sinica Treebank is to provide a syntactic, structure-tagged corpus for Chinese natural language processing. By extracting grammatical information from Treebank, we can improve the performance of the parser and learn more about the syntactic knowledge.
Sinica Treebank was built by CKIP in 1997 with texts taken from the Sinica Corpus. Based on ICG grammar (Information-based Case Grammar), the contexts are automatically parsed before being manually checked. The present version, Sinica Treebank v3.0, includes 61,087 trees (361,834 words). There are 1,000 tree structures open to the public for researchers to download. Meanwhile, a search interface on the website helps users who are interested in Chinese syntax and semantic relation.
The structural frame of Sinica Treebank is based on the Head-Driven Principle; that is, a sentence or phrase is composed of a core Head and its arguments, or adjuncts. The Head defines its phrasal category and relations with other constituents. For example, the Head of a sentence (S) or verb phrase (VP) is a verb (V). See “中文句結構樹資料庫 (Sinica Treebank) 的構建” (Chen et al. 1999) for details of supplementary principles, symbol illustrations, semantic roles, and phrasal structures.
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “A Probe into Ambiguities of Determinative-Measure Compounds”. IJCLCLP, Vol. 11, No. 3, pp. 245–280, Sep 2006.
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “Feature Representations and Logical Compatibility Between Temporal Adverbs and Aspects”. IJCLCLP, Vol. 10, No. 4, pp. 445–458, Dec 2005.
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “A Probe into Ambiguities of Determinative-Measure Compounds”. ROCLING, Sep 2005.
- Chen Keh-Jiann, Yu-Ming Hsieh. “Chinese Treebanks and Grammar Extraction”. IJCNLP, Mar 2004.
- Li Shih-Min, Su-Chu Lin, Keh-Jiann Chen. “Feature Representations and Logical Compatibility Between Temporal Adverbs and Aspects”. CLSW, Jun 2004.
- Lin Su-Chu, Shu-Ling Huang, Keh-Jiann Chen. “Taxonomy of Fine-Grain Semantic Roles for Nominal Modifiers”. CLSW, Jun 2004.
- Jia-Ming You, Yu-Ming Hsieh. “Automatic Semantic Role Assignment for a Tree Structure”. SIGHAN, Jul 2004.
- Keh-Jiann Chen, Chi-Ching Luo, Ming-Chung Chang, Feng-Yi Chen, Chao-Jan Chen, Chu-Ren Huang, Zhao-Ming Gao. “Sinica Treebank: Design Criteria, Representational Issues and Implementation”. In Book “Treebanks — Building and Using Parsed Corpora”, Ch. 13, pp. 231–248, 2003.
- Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, Zhao-ming Gao, Kuang-Yu Chen. “Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface”. SIGHAN, Oct 2000.
- Feng-Yi Chen, Pi-Fang Tsai, Keh-Jiann Chen, Chu-Ren Hunag. “中文句結構樹資料庫 (Sinica Treebank) 的構建”. IJCLCLP, Vol. 4, No. 2, pp. 87–104, Aug 1999.
- Chen Keh-Jiann et al. “The CKIP Chinese Treebank: Guidelines for Annotation”. ATALA Workshop — Treebanks, Jun 1999.
- Chen Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, Hui-Li Hsu. “SINICA CORPUS : Design Methodology for Balanced Corpora”. PACLIC, Dec 1996.
- Keh-Jiann Chen. “A Model for Robust Chinese Parser”. IJCLCLP, Vol. 1, No. 1, pp. 183–204, Aug 1996.
- Chen Keh-Jiann, Chu-Ren Huang. “Features Constraints in Chinese Language Parsing”. ICCPOL, 1994.
- 詞庫小組. “中文詞類分析”. No. 93-05, May 1993.
- Chen Keh-Jiann. “Design Concepts for Chinese Parsers”. International Conference on Chinese Information Processing, 1992.
- 林甫雯. “ICG中的論旨角色”. No. 92-01, Jan 1992.
- Wen-Jen Wei, Keh-Jiann Chen. “The Grammar Representation of Conjunctions — A Representation Based on ICG”. ROCLING, Aug 1991.