Summary

Background

Recurrent Neural Networks (RNNs),and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood.

2018.1.6.003.jpeg

Using character-level language models as an interpretable testbed, this paper aim to bridge this gap by providing an analysis of their representations, predictions and error types.

2018.1.6.004.jpeg 2018.1.6.005.jpeg 2018.1.6.006.jpeg

Test Result

2018.1.6.009.jpeg

Gates Activation

2018.1.6.011.jpeg 2018.1.6.012.jpeg

Gates Saturation

2018.1.6.014.jpeg

Long Term Interaction

2018.1.6.016.jpeg

Error Statistics

2018.1.6.018.jpeg 2018.1.6.019.jpeg 2018.1.6.020.jpeg 2018.1.6.021.jpeg 2018.1.6.022.jpeg

Conclusion

This paper have presented a comprehensive analysis of Recurrent Neural Networks and their representations, predictions and error types. In particular, qualitative visualization experiments, cell activation statistics and in-depth comparisons to finite horizon n-gram models demonstrate that these networks learn powerful, long-range interactions. They have also conducted a detailed error analysis that illuminates the limitations of recurrent networks and allows us to suggest specific areas for further study. In particular, n-gram errors can be significantly reduced by scaling up the models and rare words could be addressed with bigger datasets. However, further architectural innovations may be needed to eliminate the remaining errors.

Visualizing And Understanding Recurrent Networks

VisRNN