Modelserve: Golem Network’s New AI Inference Service

On Jul 15, 2024

Joerg Hiller
Jul 15, 2024 15:19

Golem Network introduces Modelserve, a scalable and cost-effective AI model inference service designed for developers and startups.

Golem Network has unveiled Modelserve, a new service aimed at providing scalable and affordable AI model inferences, according to a recent announcement by the Golem Project. This service is designed to allow seamless deployment and inference of AI models through scalable endpoints, enhancing the efficiency and cost-effectiveness of AI applications.

What Is Modelserve?

Modelserve, developed in collaboration with an external team and Golem Factory, integrates into the Golem Network ecosystem. It aims to support the AI open-source community and attract developers of AI applications for GPU providers. The service allows for the seamless deployment and inference of AI models through scalable endpoints, ensuring efficient and cost-effective AI apps operations.

Why Is Golem Network Introducing Modelserve?

The introduction of Modelserve aims to meet the growing demand for computing power in the AI industry. By leveraging consumer-grade GPU resources, which offer sufficient power and memory, the service can effectively run AI models such as diffusion models, automatic speech recognition, and small to medium language models. This approach is more cost-effective compared to traditional methods. The decentralized architecture of the Golem Network serves as a marketplace for matching supply and demand for these resources, enabling access to computing power that is perfectly suited to AI applications.

The addition of Modelserve to the Golem ecosystem plays a key role in getting AI use cases, driving demand for providers and contributing to the broader adoption of the Golem Network.

Target Audience

Modelserve is designed for a diverse range of users including service and product developers, startups, and companies operating in both Web 2.0 and Web 3.0 environments. These users typically:

Utilize small and medium-sized open-source models or create their own models from scratch
Require scalable AI model inference capabilities
Seek an environment to test and experiment with AI models

Technical Implementation

Modelserve comprises three key components:

Website: Allows users to create and manage endpoints
Backend: Manages GPU resources to handle inferences, featuring a load balancer and auto-scaling capabilities. It leverages GPU resources available in the market, sourcing them from the Golem open and decentralized marketplace and other platforms offering GPU instances
API: Enables the running of AI model inferences and management of endpoints

The service uses USD payments for user transactions, while settlements with Golem GPU providers are conducted using GLM, the native token of the Golem Network.

Benefits for Users

Maintenance-Free AI Infrastructure (AI IaaS): Users do not need to manage model deployment, inference, or GPU clusters as Modelserve handles these tasks
Affordable Autoscaling: The system automatically scales GPU resources to meet application demands without requiring user intervention
Cost-Effective Pricing: Users are charged based on the actual processing time of their requests, avoiding the costs associated with hourly GPU rentals or maintaining their own clusters

Synergy with Other AI/GPU Projects

Modelserve integrates with GPU Provider and AI Provider GamerHash AI, which is currently in the proof-of-concept stage. Additionally, the first version of Golem-Workers has been created as part of Modelserve, which will be developed as a separate project in the future.