Deploying a model with EzDeploy¶
Requirements¶
- You need access to model serving in your region
- Your region needs to support gpus (T4, A10, A100, or H100)
- You need to have access to any compute to run the script from a notebook. (Serverless or interactive)
- Access to unity catalog schema to register the model.
What is EzDeploy?¶
EzDeploy will take a prebuilt configuration and deploy it to databricks model serving. This is meant for production use cases. It will support either vLLM or SGLang as engines.
Deployment Steps¶
1. Install the library¶
2. Identify the model to deploy¶
In this scenario we will deploy a Nous Hermes model to model serving.
from mlflow_extensions.databricks.deploy.ez_deploy import EzDeploy
from mlflow_extensions.databricks.prebuilt import prebuilt
deployer = EzDeploy(
# The model config to deploy
config=prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_64K,
# The model to register in unity catalog
registered_model_name="main.default.nous_research_hermes_3_1"
)
deployer.download()
deployer.register()
# Deploy the model to model serving using the following endpoint name
endpoint_name = "my-endpoint-name"
deployer.deploy(endpoint_name)
3. Monitor the deployment¶
You will receive an url for the model serving endpoint. Monitor that url to see the status of the deployment.
Querying using OpenAI SDK¶
The models are deployed as a pyfunc and they do not support natural json and need to fit the pyfunc spec. To allow you to use OpenAI, langchain, etc. we offer a compatability interface for those clients.
from mlflow_extensions.serving.compat.openai import OpenAI
from mlflow.utils.databricks_utils import get_databricks_host_creds
workspace_host = spark.conf.get("spark.databricks.workspaceUrl")
endpoint_name = f"https://{workspace_host}/serving-endpoints/{endpoint_name}/invocations"
token = get_databricks_host_creds().token
# this is imported from mlflow_extensions.serving.compat.openai
client = OpenAI(
base_url=endpoint_name,
api_key=token
)
response = client.chat.completions.create(
# models will have their own name and will also have an alias called "default"
model="default",
messages=[
{
"role": "user",
"content": "Hi how are you?"
}
],
)
Querying using Langchain SDK¶
You can also use query the data using ChatOpenAI using langchain sdk.
from mlflow_extensions.serving.compat.langchain import ChatOpenAI
# if you want to use completions
# from mlflow_extensions.serving.compat.langchain import OpenAI
# this ChatOpenAI is imported from mlflow_extensions.serving.compat.langchain
model = ChatOpenAI(
model="default", # default is the alias for the model
base_url="https://<>.com/serving-endpoints/<model-name>",
api_key="<dapi...>"
)
model.invoke("hello world")