Databricks Huggingface Simplifies Open-Source AI Development

Author

Posted Nov 13, 2024

Reads 1.2K

A Person Holding a Cellphone with Logo on the Screen
Credit: pexels.com, A Person Holding a Cellphone with Logo on the Screen

Databricks and Hugging Face have partnered to make open-source AI development more accessible. This collaboration brings together the power of Databricks' unified data engineering and analytics platform with Hugging Face's Transformers library.

Databricks provides a scalable and secure environment for data engineering and analytics, while Hugging Face's Transformers library offers a wide range of pre-trained models and a simple interface for building and deploying AI models.

With this partnership, developers can now leverage the strengths of both platforms to build and deploy AI models more efficiently.

Getting Started

Databricks is a fast, easy, and collaborative platform for data and AI teams. It's built on top of Apache Spark, which makes it a great choice for big data processing.

To get started with Databricks and Hugging Face, you'll need to create a Databricks account and log in to the Databricks workspace. This will give you access to the Databricks notebook interface, where you can write and run Python code.

Credit: youtube.com, Running a Hugging Face LLM on your laptop

Databricks is compatible with popular libraries like Hugging Face Transformers, which makes it easy to integrate with pre-trained models. You can use the Transformers library to load and use pre-trained models in your Databricks notebooks.

Hugging Face Transformers is a library of pre-trained models for natural language processing and computer vision tasks. It's a great resource for anyone looking to get started with AI and machine learning.

To get started with Hugging Face Transformers in Databricks, you'll need to install the library using pip. This will give you access to the pre-trained models and the ability to use them in your Databricks notebooks.

Using Hugging Face with Databricks

Using Hugging Face with Databricks is a game-changer for text processing at scale. You can use Pandas UDFs to distribute model computation on a Spark cluster, allowing you to perform computation on worker CPUs or GPUs.

Databricks recommends encapsulating a Hugging Face pipeline in a Pandas UDF to distribute inference on Spark. This makes it easy to use GPUs when available and allows batching of items sent to the GPU for better throughput.

Here's an interesting read: How to Use Huggingface Models in Python

Credit: youtube.com, LLM Module 1 - Applications with LLMs | 1.3 Hugging Face

The Hugging Face pipelines for translation return a list of Python dict objects, each with a single key translation_text and a value containing the translated text. You can extract the translation from the results to return a Pandas series with just the translated text.

To use the UDF to translate a text column, you can call the UDF in a select statement. This is a simple and efficient way to process text at scale on Databricks.

With the latest Hugging Face release, you can load a Spark dataframe into a Hugging Face dataset using the "from_spark" function. This makes it much simpler to accomplish the same task, saving time and cost.

Using Spark to load and transform data for training or fine-tuning a model, then mapping it into a Hugging Face dataset, combines cost savings and speed from Spark and optimizations like memory-mapping and smart caching from Hugging Face datasets. This can cut down processing time by more than 40% in some cases.

Model Development

Credit: youtube.com, ML & MLOPS Databricks: LLMs HuggingFace Transformers & Databricks #datascience #machinelearning

You can store a pre-trained model as an MLflow model, making it easier to deploy for batch or real-time inference. This allows model versioning through the Model Registry and simplifies model loading code for your inference workloads.

The first step is to create a custom model for your pipeline, which encapsulates loading the model, initializing the GPU usage, and inference function. The code closely parallels the code for creating and using a pandas_udf.

Hugging Face transformers pipelines make it easy to save the model to a local file on the driver, which is then passed into the log_model function for the MLflow pyfunc interfaces.

Preparing Data for Download

To start working with your training data, you need to format it into a table that meets the expectations of the Trainer.

The table should have two columns: a text column and a column of labels. This is a standard setup for text classification tasks.

Credit: youtube.com, Preparing Data Models for Data Explorer

You can use a DataFrame to store your data, and if you have string labels, you can collect this information using a pandas_udf to create an integer id column.

The model expects tokenized input, so you'll need to use the AutoTokenizer loaded from the base model to apply the tokenizer consistently to both the training and testing data.

Specifying a DBFS cache directory will allow you to efficiently download the dataset and reuse it in the future.

Integrating Spark Dataframes for Model Development

Traditionally, users had to write data into parquet files and then reload them using Hugging Face datasets. This method circumvents the efficiencies and parallelism inherent to Spark, making it cumbersome and time-consuming.

Spark dataframes were previously not supported by Hugging Face datasets, despite the platform's extensive range of supported input types. This limitation forced users to rely on inefficient methods, such as writing data to disk and then reloading it.

Credit: youtube.com, Building Competing Models Using Apache Spark DataFrames - Abdulla Al-Qawasmeh

However, with the latest Hugging Face release, users can now use Spark to efficiently load and transform data for training or fine-tuning a model. This is achieved through the new "from_spark" function in Datasets, which allows users to directly integrate their Spark dataframes into Hugging Face datasets.

Using Spark to load and transform data can drastically reduce data processing time and costs. For example, a 16GB dataset that took 22 minutes to process using the traditional method can now be processed in just 12 minutes.

Here are some key benefits of using Spark dataframes for model development:

  • Efficient data loading and transformation
  • Reduced data processing time and costs
  • Improved performance and scalability

By leveraging Spark dataframes and the "from_spark" function, users can streamline their model development process and focus on more complex tasks, such as fine-tuning and optimizing their models.

Batch Size

Batch Size is a crucial factor in model development. Databricks recommends trying various batch sizes for the pipeline on your cluster to find the best performance.

Credit: youtube.com, The Wrong Batch Size Will Ruin Your Model

A batch size of 1 may not use the resources available to the workers efficiently. Choose a batch size that is large enough to drive the full GPU utilization without resulting in CUDAoutofmemory errors.

Monitor GPU performance by viewing the live cluster metrics for a cluster, and choosing a metric such as gpu0-util for GPU processor utilization or gpu0_mem_util for GPU memory utilization. This will help you identify the optimal batch size for your model and hardware.

Detaching and reattaching the notebook to release the memory used by the model and data in the GPU is necessary when receiving CUDAoutofmemory errors during batch size tuning.

On a similar theme: Learning with Errors

Performance Optimization

Performance Optimization is crucial when working with Databricks and Hugging Face. To use each GPU effectively, you can adjust the batch size sent to the GPU by the Transformers pipeline.

Changing the batch size can significantly impact performance. For example, if you're using a GPU cluster, you can try batch sizes that are a multiple of the number of GPUs on your workers.

Credit: youtube.com, Azure Databricks and ADF Performance Tuning Sessions #PerformanceTuning #performance #databricks

Making sure your DataFrame is well-partitioned can also help utilize the entire cluster. A good rule of thumb is to repartition your Spark DataFrame to use a multiple of the number of GPUs or cores across the workers.

Caching the Hugging Face model can save model load time or ingress costs. This is especially useful if you're working with large models or datasets.

To monitor GPU performance, you can view live metrics for a cluster, such as "Per-GPU utilization" or “Per-GPU memory utilization (%)”. This can help you identify areas for improvement and optimize your batch size accordingly.

Your goal with tuning the batch size is to set it large enough to drive full GPU utilization without causing "CUDA out of memory" errors.

Fine-Tuning and Inference

Fine-tuning your models on a single machine is a breeze with Hugging Face Transformers Trainer, which makes it easy to set up and perform model training on moderately sized datasets. You can fine-tune a pre-trained model on your own data to create a custom text classifier or spam classifier.

Credit: youtube.com, Run LLM Batch Inference with ai_query() on Databricks

To fine-tune a model, create a single machine cluster with GPU support, prepare and download your dataset to the driver, perform model training using Trainer, and log the resulting model to MLflow. This process is straightforward and efficient, allowing you to fine-tune your models without leaving Databricks.

For larger datasets, Databricks supports distributed multi-machine multi-GPU deep learning, giving you the flexibility to scale your model training as needed.

Fine-Tuning Transformers on a Single Machine

You can fine-tune pre-trained models on a single machine using the 🤗 Transformers Trainer utility. This is a great option for moderately sized datasets that can fit on a single machine with GPU support.

The Trainer utility makes it easy to set up and perform model training, so you don't need to leave Databricks to fine-tune your models.

For larger datasets, Databricks supports distributed multi-machine multi-GPU deep learning, but this is not necessary for moderately sized datasets.

Consider reading: Fine Tune Llama Huggingface

Credit: youtube.com, Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

To get started, create a single machine cluster with GPU support, which is a straightforward process.

Once your cluster is set up, you can prepare and download your dataset to the driver, which is the machine running the cluster.

After that, you can use the Trainer utility to perform model training, and then log the resulting model to MLflow for tracking and versioning.

Transformers Inference and MLflow Logging

Hugging Face Transformers inference is a great way to get started with text summarization quickly. You can use the Hugging Face Transformers pipelines inference and MLflow logging to create an end-to-end example.

To get started, you can load any logged or registered model into a spark UDF using MLflow. This provides an easy interface to look up a model URI from the Model Registry or logged experiment run UI.

You can store a pre-trained model as an MLflow model to make it easier to deploy a model for batch or real-time inference. This also allows model versioning through the Model Registry.

Curious to learn more? Check out: Solomonoff's Theory of Inductive Inference

Credit: youtube.com, Log with MLflow and Hugging Face Transformers

The first step is to create a custom model for your pipeline, which encapsulates loading the model, initializing the GPU usage, and inference function. This code closely parallels the code for creating and using a pandas_udf.

Hugging Face interfaces nicely with MLflow, automatically logging metrics during model training using the MLflowCallback. However, you must log the trained model yourself.

You can wrap training in an MLflow run, constructing a Transformers pipeline from the tokenizer and the trained model, and writes it to local disk. Finally, log the model to MLflow with mlflow.transformers.log_model.

Loading the model for inference is the same as loading the MLflow wrapped pre-trained model.

Carrie Chambers

Senior Writer

Carrie Chambers is a seasoned blogger with years of experience in writing about a variety of topics. She is passionate about sharing her knowledge and insights with others, and her writing style is engaging, informative and thought-provoking. Carrie's blog covers a wide range of subjects, from travel and lifestyle to health and wellness.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.