Search All site
Search CKIP site

The CKIP (Chinese Knowledge and Information Processing) group is a research team formed by the Institute of Information Science and the Institute of Linguistics of Academia Sinica in 1986. Its purpose is to establish a fundamental research environment for Chinese natural language processing. The preliminary goal of the project was to construct research infrastructures with reusable resources that could be shared by domestic and international research institutes. The accomplished resources include Chinese electronic dictionaries, Mandarin Chinese corpora, and processing technologies for Chinese texts. With these environments and technologies now well established, we are focusing on knowledge-based information processing. This area of research is motivated by the flood of information on the WWW for which effective and autonomous information processing tools are still lacking. To achieve high-level intelligent information processing, many of the most challenging research problems in the areas of knowledge acquisition, knowledge representation, and knowledge utilization are currently being addressed.


Construction of ontology and common sense knowledge databases is very time consuming. In the last twenty some years, we have developed an infrastructure for Chinese language processing that includes part-of-speech tagged corpora, treebank, Chinese lexical databases, Chinese grammar, word identification systems, and sentence parsers. In the future we will utilize the developed infrastructure to extract linguistic and domain knowledge from various corpora and texts on the Web to enhance current knowledge databases. The targeted databases include general domain ontology, special domain ontology, as well as lexical, syntactic, and semantic knowledge databases. The various databases will be inter-connected to form a ConceptNet for language processing and logical inference


For knowledge representation we study the logical foundation of ontology and fine-grain semantic representation. We also study near-synonyms to identify fine-grain differences between synonyms. These processes enable us to better understand meaning representation and meaning composition. We will remodel the current ontology structures of WordNet, HowNet, and FrameNet to achieve a better and more unified representation, called Extended-HowNet. In addition, we will study modal logics and integrate modal logic systems into a unified framework and develop automated inference and theorem proving methods based on the logical framework.


We will focus on conceptual processing of Chinese documents. The design of knowledge-based language processing systems will utilize statistical, linguistic, and commonsense knowledge provided by our evolving ConceptNet to parse the conceptual structures of sentences and interpret the meanings of sentences. Knowledge-based language processing systems incorporate knowledge bases to form a learning system. Thus, language processing systems increase their processing power due to enhancement of the knowledge bases. Conversely, the knowledge bases are evolving due to the automatic knowledge extraction made by language processing systems.


Wei-Yun Ma - Assistant Research Fellow, Institute of Information Science