Microsoft is open-sourcing the recipe it used to pre-train its language representation model BERT (Bidirectional Encoder Representations from Transformers). According to Microsoft, BERT led to significant advances in the ability to capture the intricacies of language. It also improved functionality of natural language applications like text classification, extraction, and question gathering.
The broad range of BERT means that data scientists often just use a pre-trained version of it instead of building their own. But according to Microsoft, this is only ideal is the data of the domain is similar to the data the model was trained on. This means that it is not as accurate when it moves over to a new problem space.
In addition, in order to advance language representation, users have to change the model architecture, training data, cost function, tasks, and optimization routines.
To solve those issues, the company decided to open-source a recipe that can be used to train BERT models on Azure Machine Learning. Azure Machine Learning service is a cloud-based environment for prepping data, training, testing, deploying, managing, and tracking machine learning models.
By building their own versions of BERT, organizations will be able to scale the model to fit their needs.
“To get the training to converge to the same quality as the original BERT release on GPUs was non-trivial,” said Saurabh Tiwary, applied science manager at Bing. “To pre-train BERT we need massive computation and memory, which means we had to distribute the computation across multiple GPUs. However, doing that in a cost effective and efficient way with predictable behaviors in terms of convergence and quality of the final resulting model was quite challenging. We’re releasing the work that we did to simplify the distributed training process so others can benefit from our efforts.”