MLPerf Training v0.6 results
7/10/19: MLPerf releases Training results showing industry progress
Mountain View, CA - July 10, 2019 - Today the MLPerf effort released results for MLPerf Training v0.6, the second round of results from their machine learning training performance benchmark suite. MLPerf is a consortium of over 40 companies and researchers from leading universities, and the MLPerf benchmark suites are rapidly becoming the industry standard for measuring machine learning performance. The MLPerf Training benchmark suite measures the time it takes to train one of six machine learning models to a standard quality target in tasks including image classification, object detection, translation, and playing Go. To see the results, go to mlperf.org/training-results-0-6.
The first version of MLPerf Training was v0.5; this release, v0.6, improves on the first round in several ways. According to the MLPerf Training Special Topics Chairperson Paulius Micikevicius, “these changes demonstrate MLPerf’s commitment to its benchmarks’ representing the current industry and research state." The improvements include:
- Raises quality targets for image classification (ResNet) to 75.9%, light-weight object detection (SSD) to 23% MAP, and recurrent translation (GNMT) to 24 Sacre BLEU. These changes better align the quality targets with state of the art for these models and datasets.
- Allows use of the LARS optimizer for ResNet, enabling additional scaling.
- Experimentally allows a slightly larger set of hyperparameters to be tuned, enabling faster performance and some additional scaling.
- Changes timing to start the first time the application accesses the training dataset, thereby excluding startup overhead. This change was made because the large scale systems measured are typically used with much larger datasets than those in MLPerf, and hence normally amortize the startup overhead over much greater training time.
- Improves the MiniGo benchmark in two ways. First, it now uses a standard C++ engine for the non-ML compute, which is substantially faster than the prior Python engine. Second, it now assesses quality by comparing to a known-good checkpoint, which is more reliable than the previous very small set of game data.
- Suspends the Recommendation benchmark while a larger dataset and model are being created.
Submissions showed substantial technological progress over v0.5. Many benchmarks featured submissions at higher scales than v0.5. Benchmark results on the same system show substantial performance improvements over v0.5, even after the impact of the rules changes are factored out. (The higher quality targets lead to higher times on ResNet, SSD, and GNMT. The change to overhead timing leads to lower times especially on larger systems. The improved engine and different quality target make MiniGo times substantially different.) “The rapid improvement in MLPerf results shows how effective benchmarking can be in accelerating innovation.” said Victor Bittorf, MLPerf Submitters Working Group Chairperson.
MLPerf Training v0.6 showed increased support for the benchmark and greater interest from submitters. MLPerf Training v0.6 received sixty-three entries, up more than 30%. Submissions came from five submitters, up from three in the previous round. Submissions included the first submission to the “Open Division” submission, which allows the model to be further optimized or a different model to be used (though the same model was used in the v0.6 submission) as a means of showcasing more potential performance innovations through software changes. The MLPerf effort now has over 40 supporting companies, and recently released a complementary inference benchmark suite.
“We are creating a common yardstick for training and inference performance. We invite everyone to become involved by going to mlperf.org or emailing email@example.com” said Peter Mattson, MLPerf General Chair.
MLPerf Inference launched
6/24/19: New Machine Learning Inference Benchmarks Assess Performance Across a Wide Range of AI Applications
Mountain View, CA - June 24, 2019 - Today a consortium involving more than 40 leading companies and university researchers introduced MLPerf Inference v0.5, the first industry standard machine learning benchmark suite for measuring system performance and power efficiency. The benchmark suite covers models applicable to a wide range of applications including autonomous driving and natural language processing, on a variety of form factors, including smartphones, PCs, edge servers, and cloud computing platforms in the data center. MLPerf Inference v0.5 uses a combination of carefully selected models and data sets to ensure that the results are relevant to real-world applications. It will stimulate innovation within the academic and research communities and push the state-of-the-art forward.
By measuring inference, this benchmark suite will give valuable information on how quickly a trained neural network can process new data to provide useful insights. Previously, MLPerf released the companion Training v0.5 benchmark suite leading to 29 different results measuring the performance of cutting-edge systems for training deep neural networks.
MLPerf Inference v0.5 consists of five benchmarks, focused on three common ML tasks:
- Image Classification - predicting a “label” for a given image from the ImageNet dataset, such as identifying items in a photo.
- Object Detection - picking out an object using a bounding box within an image from the MS-COCO dataset, commonly used in robotics, automation, and automotive.
- Machine Translation - translating sentences between English and German using the WMT English-German benchmark, similar to auto-translate features in widely used chat and email applications.
MLPerf provides benchmark reference implementations that define the problem, model, and quality target, and provide instructions to run the code. The reference implementations are available in ONNX, PyTorch, and TensorFlow frameworks. The MLPerf inference benchmark working group follows an “agile” benchmarking methodology: launching early, involving a broad and open community, and iterating rapidly. The mlperf.org website provides a complete specification with guidelines on the reference code and will track future results.
The inference benchmarks were created thanks to the contributions and leadership of our members over the last 11 months, including representatives from: Arm, Cadence, Centaur Technology, Dividiti, Facebook, General Motors, Google, Habana Labs, Harvard University, Intel, MediaTek, Microsoft, Myrtle, Nvidia, Real World Insights, University of Illinois at Urbana-Champaign, University of Toronto, and Xilinx.
The General Chair Peter Mattson and Inference Working Group Co-Chairs Christine Cheng, David Kanter, Vijay Janapa Reddi, and Carole-Jean Wu make the following statement:
“The new MLPerf inference benchmarks will accelerate the development of hardware and software to unlock the full potential of ML applications. They will also stimulate innovation within the academic and research communities. By creating common and relevant metrics to assess new machine learning software frameworks, hardware accelerators, and cloud and edge computing platforms in real-life situations, these benchmarks will establish a level playing field that even the smallest companies can use.”
Now that the new benchmark suite has been released, organizations can submit results that demonstrate the benefits of their ML systems on these benchmarks. Interested organizations should contact firstname.lastname@example.org.
MLPerf Training v0.5 results
12/12/18: MLPerf Results Compare Top ML Hardware, Aim to Spur Innovation
Today, the researchers and engineers behind the MLPerf benchmark suite released their first round of results. The results measure the speed of major machine learning (ML) hardware platforms, including Google TPUs, Intel CPUs, and NVIDIA GPUs. The results also offer insight into the speed of ML software frameworks such TensorFlow, PyTorch, and MXNet. The MLPerf results are intended to help decision makers assess existing offerings and focus future development. To see the results, go to mlperf.org/training-results-0-5.
Historically, technological competition with a clear metric has resulted in rapid progress. Examples include the space race that led to people walking on the moon within two decades, the SPEC benchmark that helped drive CPU performance by 1.6X/year for the next 15 years, and the DARPA Grand Challenge that helped make self-driving cars a reality. MLPerf aims to bring this same rapid progress to ML system performance. Given that large scale ML experiments still take days or weeks, improving ML system performance is critical to unlocking the potential of ML.
MLPerf was launched in May by a small group of researchers and engineers, and it has since grown rapidly. MLPerf is now supported by over thirty major companies and startups including hardware vendors such as Intel and NVIDIA (NASDAQ: NVDA), and internet leaders like Baidu (NASDAQ: BIDU) and Google (NASDAQ: GOOGL). MLPerf is also supported by researchers from seven different universities. Today, Facebook (NASDAQ: FB) and Microsoft (NASDAQ: MSFT) are announcing their support for MLPerf.
Benchmarks like MLPerf are important to the entire industry:
- “We are glad to see MLPerf grow from just a concept to a major consortium supported by a wide variety of companies and academic institutions. The results released today will set a new precedent for the industry to improve upon to drive advances in AI,” reports Haifeng Wang, Senior Vice President of Baidu who oversees the AI Group.
- “Open standards such as MLPerf and Open Neural Network Exchange (ONNX) are key to driving innovation and collaboration in machine learning across the industry,” said Bill Jia, VP, AI Infrastructure at Facebook. “We look forward to participating in MLPerf with its charter to standardize benchmarks.”
- “MLPerf can help people choose the right ML infrastructure for their applications. As machine learning continues to become more and more central to their business, enterprises are turning to the cloud for the high performance and low cost of training of ML models,” – Urs Hölzle, Senior Vice President of Technical Infrastructure, Google.
- “We believe that an open ecosystem enables AI developers to deliver innovation faster. In addition to existing efforts through ONNX, Microsoft is excited to participate in MLPerf to support an open and standard set of performance benchmarks to drive transparency and innovation in the industry.” – Eric Boyd, CVP of AI Platform, Microsoft
- “MLPerf demonstrates the importance of innovating in scale-up computing as well as at all levels of the computing stack — from hardware architecture to software and optimizations across multiple frameworks.” --Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA
Today’s published results are for the MLPerf training benchmark suite. The training benchmark suite consists of seven benchmarks including image classification, object detection, translation, recommendation, and reinforcement learning. The metric is time required to train a model to a target level of quality. MLPerf timing results are then normalized to unoptimized reference implementations running on a single NVIDIA Pascal P100 GPU. Future MLPerf benchmarks will include inference as well.
MLPerf categorizes results based on both a division and a given product or platform’s availability. There are two divisions: Closed and Open. Submissions to the Closed division, intended for apples-to-apples comparisons of ML hardware and ML frameworks, must use the same model (e.g. ResNet-50 for image classification) and optimizer. In the Open division, participants can submit any model. Within each division, submissions are classified by availability: in the Cloud, On-premise, Preview, or Research. Preview systems will be available by the next submission round. Research systems either include experimental hardware or software, or are at a scale not yet publicly available.
MLPerf is an agile and open benchmark. This is an “alpha” release of the benchmark, and the MLPerf community intends to rapidly iterate. MLPerf welcomes feedback and invites everyone to get involved in the community. To learn more about MLPerf go to mlperf.org or email email@example.com.
MLPerf Training launched
5/2/18: Industry and Academic Leaders Launch New Machine Learning Benchmarks to Propel Innovation
Today, a group of researchers and engineers released MLPerf, a benchmark for measuring the speed of machine learning software and hardware. MLPerf measures speed based on the time it takes to train deep neural networks to perform tasks including recognizing objects, translating languages, and playing the ancient game of Go. The effort is supported by a broad coalition of experts from tech companies and startups including AMD (NASDAQ: AMD), Baidu (NASDAQ: BIDU), Google (NASDAQ: GOOGL), Intel (NASDAQ: INTC), SambaNova, and Wave Computing and researchers from educational institutions including Harvard University, Stanford University, University of California Berkeley, University of Minnesota, and University of Toronto.
The promise of AI has sparked an explosion of work in machine learning. As this sector expands, systems need to evolve rapidly to meet its demands. According to ML pioneer Andrew Ng, “AI is transforming multiple industries, but for it to reach its full potential, we still need faster hardware and software.” With researchers pushing the bounds of computers’ capabilities and system designers beginning to hone machines for machine learning, there is a need for a new generation of benchmarks.
MLPerf aims to accelerate improvements in ML system performance just as the SPEC benchmark helped accelerate improvements in general purpose computing. SPEC was introduced in 1988 by a consortium of computing companies. CPU Performance improved 1.6X/year for the next 15 years. MLPerf combines best practices from previous benchmarks including: SPEC’s use of a suite of programs, SORT’s use one division to enable comparisons and another division to foster innovative ideas, DeepBench’s coverage of software deployed in production, and DAWNBench’s time-to-accuracy metric.
Benchmarks like SPEC and MLPerf catalyze technological improvement by aligning research and development efforts and guiding investment decisions. * "Good benchmarks enable researchers to compare different ideas quickly, which makes it easier to innovate.” summarizes researcher David Patterson, author of Computer Architecture: A Quantitative Approach. * According to Gregory Stoner, CTO of Machine Learning, Radeon Technologies Group, AMD: “AMD is at the forefront of building high-performance solutions, and benchmarks such as MLPerf are vital for providing a solid foundation for hardware and system software idea exploration, thereby giving our customers a more robust solution to measure Machine Learning system performance and underscoring the power of the AMD portfolio.” * MLPerf is a critical benchmark that showcases how our dataflow processor technology is optimized for ML workload performance." remarks Chris Nicol, CTO of the startup Wave Computing. * AI powers an array of products and services at Baidu. A benchmark like MLPerf allows us to compare platforms and make better datacenter investment decisions,” reports Haifeng Wang, Vice President of Baidu who oversees the AI Group.
Because ML is such a fast moving field, the team is developing MLPerf as an “agile” benchmark: launching early, involving a broad community, and iterating rapidly. The mlperf.org website provides a complete specification with reference code, and will track future results. MLPerf invites hardware vendors and software framework providers to submit results before the July 31st deadline.