A statistical language model is a probability distribution over sequences of words. It provides context to distinguish between words and phrases that sound similar. For example, in American English, the phrases "recognize speech" and "wreck a nice beach" sound similar, but mean different things.
Language models are used in many natural language processing applications, such as speech recognition, machine translation, part-of-speech tagging, syntax analysis, handwriting recognition and information retrieval.
Data sparsity is a major problem in building language models. Most possible word sequences are not observed in training. Ambiguities are easier to resolve when evidence from the language model is integrated with a pronunciation model and an acoustic model.
The Transformer is a deep learning model introduced in 2017, used primarily in the field of natural language processing (NLP).
Like recurrent neural networks (RNNs), Transformers are designed to handle sequential data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, Transformers do not require that the sequential data be processed in order. For example, if the input data is a natural language sentence, the Transformer does not need to process the beginning of it before the end. Due to this feature, the Transformer allows for much more parallelization than RNNs and therefore reduced training times.
Transformers have rapidly become the model of choice for NLP problems, replacing older recurrent neural network models such as the long short-term memory (LSTM). Since the Transformer model facilitates more parallelization during training, it has enabled training on larger datasets than was possible before it was introduced. This has led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which have been trained with huge general language datasets, such as Wikipedia Corpus, and can be fine-tuned to specific language tasks.
We released a series of traditional Chinese transformers models — including ALBERT, BERT, GPT2. In addition to language models, we also trained corresponding models on various natural language tasks (including word segmentation, part-of-speech tagging, entity recognition).
Please refer https://github.com/ckiplab/ckip-transformers for details and the download links.