中文句結構樹
中文句結構樹資料庫從86年起由中央研究院詞庫小組(CKIP)從中央研究院現代漢語平衡語料庫(Sinica Corpus)中,抽取句子,以訊息為本格位語法(Information-based Case Grammar, ICG)的表達模式為基本架構,經由電腦自動剖析成結構樹,再加以人工修正、檢驗後的所得的成果。中文句結構樹資料庫研究,目前發展至3.0版,包含了6個檔案,61,087個中文樹圖,361,834個詞;此「中文句結構樹資料庫」目前開放網上檢索及資料移轉,以供學者專家在中文句法、語意關係研究參考之用。另有1000個句結構樹開放下載。
中文句結構樹資料庫(Sinica Treebank)建構的主要目的是提供中文自然語言處理研究一個具有句結構標記的語料作為研究素材,我們可以從這個中文句結構樹資料庫中抽取語法知識,也藉由語法知識的抽取與瞭解使剖析系統功能更趨完善。
中文句子的語法結構表達採取中心語主導原則(Head-Driven Principle)。剖析中文句子時,詞組類型由中心語決定,並且參照中心語和其他成分所記載的語法和語意訊息,表達出句子中詞和詞之間的語法結構和語意角色關係。同時我們提出三項輔助原則:詞類小而美原則、由左至右聯併原則、扁平原則。中文句結構樹的表達原則與輔助原則細節、符號說明、語意角色、詞組結構等,請參見 “中文句結構樹資料庫 (Sinica Treebank) 的構建” (Chen et al. 1999)。
研究成果
線上系統展示
下載軟體與資源
論文發表
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “A Probe into Ambiguities of Determinative-Measure Compounds”. IJCLCLP, Vol. 11, No. 3, pp. 245–280, Sep 2006.
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “Feature Representations and Logical Compatibility Between Temporal Adverbs and Aspects”. IJCLCLP, Vol. 10, No. 4, pp. 445–458, Dec 2005.
- Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen. “A Probe into Ambiguities of Determinative-Measure Compounds”. ROCLING, Sep 2005.
- Chen Keh-Jiann, Yu-Ming Hsieh. “Chinese Treebanks and Grammar Extraction”. IJCNLP, Mar 2004.
- Li Shih-Min, Su-Chu Lin, Keh-Jiann Chen. “Feature Representations and Logical Compatibility Between Temporal Adverbs and Aspects”. CLSW, Jun 2004.
- Lin Su-Chu, Shu-Ling Huang, Keh-Jiann Chen. “Taxonomy of Fine-Grain Semantic Roles for Nominal Modifiers”. CLSW, Jun 2004.
- Jia-Ming You, Yu-Ming Hsieh. “Automatic Semantic Role Assignment for a Tree Structure”. SIGHAN, Jul 2004.
- Keh-Jiann Chen, Chi-Ching Luo, Ming-Chung Chang, Feng-Yi Chen, Chao-Jan Chen, Chu-Ren Huang, Zhao-Ming Gao. “Sinica Treebank: Design Criteria, Representational Issues and Implementation”. In Book “Treebanks — Building and Using Parsed Corpora”, Ch. 13, pp. 231–248, 2003.
- Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, Zhao-ming Gao, Kuang-Yu Chen. “Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface”. SIGHAN, Oct 2000.
- Feng-Yi Chen, Pi-Fang Tsai, Keh-Jiann Chen, Chu-Ren Hunag. “中文句結構樹資料庫 (Sinica Treebank) 的構建”. IJCLCLP, Vol. 4, No. 2, pp. 87–104, Aug 1999.
- Chen Keh-Jiann et al. “The CKIP Chinese Treebank: Guidelines for Annotation”. ATALA Workshop — Treebanks, Jun 1999.
- Chen Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, Hui-Li Hsu. “SINICA CORPUS : Design Methodology for Balanced Corpora”. PACLIC, Dec 1996.
- Keh-Jiann Chen. “A Model for Robust Chinese Parser”. IJCLCLP, Vol. 1, No. 1, pp. 183–204, Aug 1996.
- Chen Keh-Jiann, Chu-Ren Huang. “Features Constraints in Chinese Language Parsing”. ICCPOL, 1994.
- 詞庫小組. “中文詞類分析”. No. 93-05, May 1993.
- Chen Keh-Jiann. “Design Concepts for Chinese Parsers”. International Conference on Chinese Information Processing, 1992.
- 林甫雯. “ICG中的論旨角色”. No. 92-01, Jan 1992.
- Wen-Jen Wei, Keh-Jiann Chen. “The Grammar Representation of Conjunctions — A Representation Based on ICG”. ROCLING, Aug 1991.
參與開發人員
林素朱