NVIDIA FLARE Enhances Federated XGBoost for Efficient Machine Learning

On Jun 29, 2024

According to the NVIDIA Technical Blog, NVIDIA has introduced significant enhancements to Federated XGBoost with its Federated Learning Application Runtime Environment (FLARE). This integration aims to make federated learning more practical and productive, particularly in machine learning tasks such as regression, classification, and ranking.

Key Features of Federated XGBoost

XGBoost, a machine learning algorithm known for its scalability and effectiveness, has been widely used for various data science tasks. The introduction of Federated XGBoost in version 1.7.0 allowed multiple institutions to train XGBoost models collaboratively without sharing data. The subsequent version 2.0.0 further enhanced this capability to support vertical federated learning, allowing for more complex data structures.

NVIDIA FLARE, since 2023, has built-in integration with these Federated XGBoost features, including horizontal histogram-based and tree-based XGBoost, as well as vertical XGBoost. Additionally, support for Private Set Intersection (PSI) for sample alignment has been added, making it possible to conduct federated learning without extensive coding requirements.

Running Multiple Experiments Concurrently

One of the standout features of NVIDIA FLARE is its ability to run multiple concurrent XGBoost training experiments. This capability allows data scientists to test various hyperparameters or feature combinations simultaneously, thereby reducing the overall training time. NVIDIA FLARE manages the communication multiplexing, eliminating the need for opening new ports for each job.

*Figure 1. Two concurrent XGBoost jobs with a unique set of features. Each job has two clients shown as two visible curves*

Fault-Tolerant XGBoost Training

In cross-region or cross-border training scenarios, network reliability can be a significant issue. NVIDIA FLARE addresses this with its fault-tolerant features, which automatically handle message retries during network interruptions. This ensures resilience and maintains data integrity throughout the training process.

*Figure 2. XGBoost communication is routed through the NVIDIA FLARE Communicator layer*

Federated Experiment Tracking

Monitoring training and evaluation metrics is crucial, especially in distributed settings like federated learning. NVIDIA FLARE integrates with various experiment tracking systems, including MLflow, Weights & Biases, and TensorBoard, to provide comprehensive monitoring capabilities. Users can choose between decentralized and centralized tracking configurations based on their needs.

*Figure 3. Metrics streaming to the FL server or clients and delivered to different experiment tracking systems*

Adding tracking to an experiment is straightforward and requires minimal code changes. For instance, integrating MLflow tracking involves just three lines of code:

from nvflare.client.tracking import MLflowWriter
mlflow = MLflowWriter()
mlflow.log_metric("loss", running_loss / 2000, global_step)

Summary

NVIDIA FLARE 2.4.x offers robust support for Federated XGBoost, making federated learning more efficient and reliable. For more detailed information, refer to the NVIDIA FLARE 2.4 branch on GitHub and the NVIDIA FLARE 2.4 documentation.

Image source: Shutterstock

Credit: Source link