Microsoft lightrnn recurrent neural network efficient use of memory and computing of Sohu Technology

The academic | Microsoft heavy paper LightRNN: efficient use of memory and computing technology – recurrent neural network Sohu selected from the arXiv machine of the heart: Li Zenan, Wu Pan, compiled in the Jiang Siyuan recurrent neural network (RNN) has made the most outstanding performance in many tasks such as Natural Language Processing, Machine Translation and modeling language. However, when the vocabulary is very large, the RNN model will become very large (probably more than the maximum memory capacity of GPU), so that training will become very inefficient. In this work, we propose a novel approach to address this challenge. The key idea is to use the two component (2-Component (2C)) shared word representation of the embedding (embedding for word representations). We will each word in the vocabulary are assigned to a table, each row is associated with a vector, each column is associated with another vector. According to the position of a word in a table, the word can be represented by two dimensions of a row vector and a column vector. Because the table in the same line with the same vector, with the same column vectors of the same column, so we need only 2p|V| vector to represent the vocabulary with |V| words, the number of |V| vector that is much less than the existing methods need. Based on the two component (2-Component) shared embedding method, we design a new RNN algorithm and evaluate it on several benchmark data sets. The results show that our algorithm can significantly reduce the size of the model, and can accelerate the training speed without sacrificing accuracy in case (it is realized with the current best language model of similar or better perplexity (perplexity)). It is worth noting that, in the One-Billion-Word benchmark data sets, our algorithm and previous language model perplexity almost, but the size of the model is reduced by 40 to 100 times, the training process also accelerated 2 times. We named our proposed algorithm LightRNN, which mainly reflects the size of the model and the speed of training. The perplexity of the contrast when ACLW-French introduction training recently, recurrent neural network (RNN) has been used to deal with a variety of Natural Language Processing (NLP) tasks, such as language modeling, Machine Translation, sentiment analysis and answer. There is a popular RNN architecture is the long and short term memory network (LSTM), which can be modeled by the memory unit (memory cell) and gate function (gating function) long-term dependence and solve the problem of gradient fade. Because of these elements, LSTM recurrent neural network has achieved the best performance in many of the current Natural Language Processing missions, although it’s almost the way to learn from scratch. Although RNN is becoming more and more popular, but it also has a limitation: when applied to n相关的主题文章:

« »

Comments closed.