Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Nowadays, machine learning becomes much more popular, and many companies deploy related programs in the production environment. These programs are mainly distributed across multiple machines in even heterogeneous environments. How can we handle large-scale operations and heterogeneous runtime environments? How can we achieve high scalability?
TensorFlow is a machine learning system designed by Google, and it trains models in a single dataflow graph that can be distributed across many machines in a cluster, and within a machine across multiple computational devices (e.g., multicore CPUs, general purpose GPUs, and custom-designed ASICs). In this architecture, TensorFlow combines the high-level programming models of dataflow systems and the low-level efficiency of parameter servers so that it can use the unified dataflow graph to represent both the computation in an algorithm and the state on which the algorithm operates.
- The success of machine learning is due to the invention of the novel and sophisticated machine learning models, the availability of large datasets, and the development of software platforms. Moreover, more and more powerful computational resources for training models are also the key factors.
- Based on the design of TensorFlow, it can utilize hundreds of powerful (GPU-enabled) servers for fast training. Besides, it also supports large-scale inference and runs trained models on various platforms, ranging from large distributed clusters in a datacenter, even down to running locally on mobile devices. Furthermore, TensorFlow is flexible enough to experiment with new models as well as to embrace new system0level optimizations.
- Unlike traditional dataflow systems, TensorFlow uses a unified dataflow graph to represent both the computation and the state. Vertices of the graph represent computations that own or update mutable state, while edges carry tensors between nodes. By doing so, TensorFlow gives developers greater freedom to experiment with different parallelization schemes.
- Though TensorFlow's flexible dataflow representation enables power users to achieve excellent performance, there does not exist determined default policies that are suitable for all users.
- TensorFlow faces the limitations of a static dataflow graph, especially for algorithms like deep reinforcement learning. If TensorFlow provides dynamic structure, it will still need to solve the problem of providing a system that transparently and efficiently uses distributed resources.
- For the evaluation of the training step time, TensorFlow outperforms the best on only one model. Neon has a better overall performance than TensorFlow.
- For image classification performance, the authors measure the scalability of training Inception-v3 using multiple replicas. The authors compare the performance training Inception using asynchronous SGD on TensorFlow and MXNet as well as investigate the effect of coordination on training performance. The results indicate that TensorFlow achieves promising scalability.
TensorFlow does not provide the best performance, but it simplifies the data model by using a unified dataflow graph model. This paper states that this flexible dataflow representation enables power users to achieve excellent performance. However, TensorFlow itself cannot outperform other libraries. The simplified representation does provide great flexibility, but how to balance the engineering usability and the overall performance is still an open question.