Hugging Face Vertex AI is a powerful integration that allows you to deploy and manage Hugging Face models on Google Cloud Vertex AI.
Hugging Face models can be easily uploaded to Vertex AI, and the platform provides a user-friendly interface for model deployment and management.
By integrating Hugging Face with Vertex AI, you can take advantage of Vertex AI's scalable and secure infrastructure to deploy your models at scale.
This integration also provides features such as automated model versioning, model serving, and model monitoring, making it easier to manage your models and get them into production.
Check this out: Huggingface Offline Model
Model Management
To upload a model from the HuggingFace Hub, you'll first need to decide which model to use, in this case, the facebook/bart-large-mnli model is a good choice. This model is a zero-shot classification model.
To pull the model from the HuggingFace Hub, you'll use git pull, which requires git lfs to be installed in advance to handle large files. This will allow you to access the large files from the repository.
Once the model is uploaded to Google Cloud Storage (GCS), you can then register the model in Vertex AI.
For more insights, see: Huggingface Git
Model Registry
The Model Registry is a crucial component of Vertex AI, allowing you to manage and organize your machine learning models in a centralized location.
To register a model on Vertex AI, you can use the google-cloud-aiplatform Python SDK. This involves initializing the Vertex AI session and then uploading the model configuration, not the model weights, which will be automatically downloaded from the Hugging Face Hub in the Hugging Face DLC for TEI on startup via the MODEL_ID environment variable.
You can specify various parameters when registering a model, including display_name, serving_container_image_uri, serving_container_environment_variables, and serving_container_ports. The serving_container_image_uri is the location of the Hugging Face DLC for TEI that will be used for serving the model, while serving_container_environment_variables are the environment variables that will be used during the container runtime.
Here are the parameters you can specify when registering a model on Vertex AI:
- display_name: the name that will be shown in the Vertex AI Model Registry
- serving_container_image_uri: the location of the Hugging Face DLC for TEI that will be used for serving the model
- serving_container_environment_variables: the environment variables that will be used during the container runtime
- (optional) serving_container_ports: the port where the Vertex AI endpoint will be exposed, by default 8080
By registering your models in the Model Registry, you can easily manage and deploy them on Vertex AI, streamlining your machine learning workflow and improving collaboration among team members.
Use Case Overview
Model management is a crucial part of any machine learning project. Using libraries from Hugging Face is a great way to fine tune a model, as they offer a wide range of pre-trained models and datasets.
You can use the Hugging Face transformers library to fine tune a Bert model on the IMDB dataset, which is a great resource for text classification tasks. The dataset will be downloaded from the Hugging Face datasets library.
Fine tuning a Bert model on the IMDB dataset allows you to predict whether a movie review is positive or negative. This can be a useful tool for anyone who wants to analyze movie reviews or sentiment.
The IMDB dataset is a great resource for text classification tasks, and fine tuning a Bert model on it can produce impressive results.
Here's an interesting read: How to Create a Huggingface Dataset
Integration Setup
To set up the integration between Google Vertex AI and Hugging Face, you can create custom workflows using n8n's nodes. These nodes come with global operations and settings, as well as app-specific parameters that can be configured.
You can use the HTTP Request node to query data from any app or service with a REST API, making it easy to make custom API calls. This node also supports predefined or generic credential types.
Google Vertex AI is a unified machine learning platform that enables developers to build, deploy, and manage models efficiently. It provides a wide range of tools and services, such as AutoML and datasets, to accelerate the deployment of AI solutions.
To connect Google Vertex AI and Hugging Face, you need to establish a link between the two platforms to route data through the workflow. This connection will allow data to flow from the output of one node to the input of another.
You can have single or multiple connections for each node, giving you flexibility in how you set up your workflow.
For another approach, see: Huggingface Inference Api
Workflow Configuration
To configure a workflow for Hugging Face and Google Vertex AI, you'll need to set up data flow between the two platforms. This involves choosing the right configuration based on your specific needs.
You can configure data flow to go from Google Vertex AI to Hugging Face or vice versa. This flexibility allows you to work with your data in the way that makes the most sense for your project.
To test and activate your workflow, you'll need to save and run it to see if everything works as expected. This will give you a chance to check past executions and isolate any mistakes.
Once you've tested your workflow, be sure to save it and activate it to make it live and ready for use.
Take a look at this: Huggingface save Model
API and Methods
To get started with Hugging Face Vertex AI, you'll need to understand the API and methods involved. You can use generic authentication, which is supported by Hugging Face.
To access the Container Registry API, you'll need to enable it in the Container Registry. This will allow you to create a container for your custom training job.
The Container Registry API is a crucial step in setting up your Vertex AI project. By enabling it, you'll be able to create a container that can be used for your custom training job.
Recommended read: Distributed Training Huggingface
Supported Methods
You can use generic authentication with Hugging Face.
Hugging Face has integrations that you can see.
Using generic authentication is a supported method, and you can learn more about it by seeing Hugging Face integrations.
Hugging Face has a variety of integrations that you can explore.
The Hugging Face integrations page is where you can find more information on their supported methods.
Enable the API
Enabling the API is a crucial step in setting up your custom training job.
To enable the Container Registry API, navigate to the Container Registry and select Enable if it isn't already. You'll use this to create a container for your custom training job.
This step is necessary to access the features you need to create a container.
Worth a look: Huggingface Training Service
Model Deployment and Usage
Model deployment is a crucial step in making your Hugging Face model available on Vertex AI. You can deploy your model using the aiplatform.Model object returned by the upload method, which will deploy an endpoint using FastAPI.
You might enjoy: Hugging Face Local
To deploy an endpoint, you'll need to specify a machine type, such as the n1-standard-4 from the N1-Series, which comes with GPU acceleration and 4 vCPUs. This process can take around 15-20 minutes.
After deployment, your model will be registered in the Vertex AI Model Registry and ready for generating images on a Vertex AI Endpoint. You can also access Hugging Face models on Vertex AI by searching for them in the partners section and clicking on Hugging Face.
To deploy a Hugging Face model, you'll need to provide a Hugging Face access token and deploy it as shown below. This will create a new model instance in Vertex AI Model Registry and make it ready for generating images on a Vertex AI Endpoint.
The recommended deployment recipe provided by Vertex AI Model Garden includes the associated machine type, but you'll also need to specify the model inference runtime to generate inference. The Hugging Face Deep Learning container comes into play for this purpose.
Here's a summary of the arguments provided to the deploy method:
Note that the machine_type and accelerator_type are tied together, so you'll need to select an instance that supports the accelerator you're using.
Sources
- https://huggingface.co/blog/alvarobartt/deploy-from-hub-to-vertex-ai
- https://n8n.io/integrations/google-vertex-ai/and/hugging-face/
- https://medium.com/google-cloud/open-models-on-vertex-ai-with-hugging-face-lets-get-started-b6bbc750e734
- https://codelabs.developers.google.com/vertex-training-autopkg
- https://huggingface.co/docs/google-cloud/en/examples/vertex-ai-notebooks-deploy-embedding-on-vertex-ai
Featured Images: pexels.com