About a year ago, Google announced the launch of Vertex AI, a managed AI platform designed to help enterprises accelerate the deployment of AI models. To mark the service’s anniversary and the launch of Google’s Applied ML Summit, Google announced new features heading to Vertex this morning, including a dedicated AI system training server and “example-based” explanations. “.
“We launched Vertex AI a year ago with the goal of enabling a new generation of AI to enable data scientists and engineers to do satisfying and creative work,” said Henry Tappen, product manager of the Google Cloud group, to TechCrunch via email. “The new vertex AI capabilities we are launching today will continue to accelerate the deployment of machine learning models in organizations and democratize AI so that more people can deploy models in production, continuously monitor and driving business impact through AI. »
As Google has always presented, the advantage of Vertex is that it brings together Google Cloud services for AI under a unified user interface and API. Customers such as Ford, Seagate, Wayfair, Cashapp, Cruise and Lowe’s use the service to build, train and deploy machine learning models in a single environment, Google says, taking the models from experimentation to production.
Vertex competes with managed AI platforms from cloud providers such as Amazon Web Services and Azure. Technically, it falls under the category of platforms known as MLOps, a set of best practices for businesses to run AI. Deloitte predicts the MLOps market will be worth $4 billion in 2025, growing nearly 12 times since 2019.
Gartner projects the emergence of managed services like Vertex will drive cloud market growth of 18.4% in 2021, with cloud expected to account for 14.2% of total global IT spending. “As enterprises increase their investments in mobility, collaboration, and other remote working technologies and infrastructure, the growth of public cloud [will] be sustained through 2024,” Gartner wrote in a November 2020 study.
Among Vertex’s new features is AI Training Reduction Server, a technology that Google says optimizes the bandwidth and latency of multi-system distributed training on Nvidia GPUs. In machine learning, “distributed training” refers to spreading the work of training a system across multiple machines, GPUs, CPUs, or custom chips, thereby reducing the time and resources needed to complete training.
“This dramatically reduces the training time required for large language workloads, like BERT, and further enables cost parity between different approaches,” said Andrew Moore, vice president and general manager of cloud AI at Google, in an article published today on the Google Cloud blog. “In many critical business scenarios, a shortened training cycle allows data scientists to train a model with higher predictive performance within the confines of a deployment window. »
In preview, Vertex also offers tabular workflows, which aim to bring greater customization to the model building process. As Moore explained, Tabular Workflows allows the user to choose which parts of the workflow they want Google’s “AutoML” technology to handle versus which parts they want to design themselves. AutoML, or Automatic Machine Learning – which is not unique to Google Cloud or Vertex – encompasses any technology that automates some aspect of AI development and can address development stages from inception with a raw dataset to creating a ready-to-deploy machine learning model. AutoML can save time, but can’t always beat a human touch, especially when precision is required.
“Tabular workflow elements can also be integrated into your existing Vertex AI pipelines,” Moore said. “We’ve added new managed algorithms, including advanced search models like TabNet, new algorithms for feature selection, model distillation and… more.
As for development pipelines, Vertex is also gaining (preview) integration with Spark Serverless, the serverless version of the Apache-managed open-source analytics engine for data processing. Now Vertex users can launch a serverless Spark session to interactively develop code.
Elsewhere, customers can analyze data features in the Neo4j platform and then deploy models using Vertex through a new partnership with Neo4j. And thanks to a collaboration between Google and Labelbox, it’s now easier to access Labelbox’s data labeling services for images, text, audio and video data from the Vertex dashboard. Labels are necessary for most AI models to learn to make predictions; models train to identify re-entry labels, also called annotations, and sample data (for example, the caption “frog” and a photo of a frog).
In the event that data is mislabeled, Moore offers example-based explanations as a solution. Available in preview, new Vertex features leverage “example-based” explanations to help diagnose and address data issues. Of course, no explainable AI technique can detect all errors; Computational linguist Vagrant Gautam warns against overconfidence tools and techniques used to explain AI.
“Google has documentation on limitations and a more detailed white paper on explainable AI, but none of that is mentioned anywhere. [today’s Vertex AI announcement]they told TechCrunch via email. The announcement emphasizes that “proficiency in skills shouldn’t be the criteria for participation” and that the new features they provide can “evolve AI for non-software experts.” What worries me is that non-experts have more confidence in AI and the explainability of AI than they should, and now various Google customers can build and deploy models more quickly without stopping to wonder if this is a problem that needs a machine learning solution in the first place, and calling their explainable (and therefore trustworthy and good) models without knowing the full extent limitations around that for their particular cases. »
Nonetheless, Moore suggests that example-based explanations can be a useful tool when used in tandem with other model auditing practices.
“Data scientists shouldn’t need to be infrastructure engineers or operations engineers to keep models accurate, explainable, scalable, disaster-resistant, and secure, in an ever-changing environment,” added Moore. “Our customers demand tools to easily manage and maintain machine learning models. »