Deploy A Custom Text Encoder Model

This guide helps you deploy the BERT model with Hyperpod AI and perform an inference through a simple API.

🚀 RECOMMENDED: Try our ready-made Colab notebook — run everything in your browser with zero hassle!

🤖 What is BERT?

BERT (Bidirectional Encoder Representations from Transformers) is a groundbreaking natural language processing model developed by Google AI. It understands language context by analyzing text in both directions, enabling it to excel at a wide range of tasks like sentiment analysis, question answering, and document classification.

To use BERT, we first tokenize text—converting words and sentences into numerical representations that the model can process. BERT then generates rich contextual embeddings, which can be leveraged for downstream NLP tasks or further fine-tuned for specific applications.

🚀 Step 1: Downloading the model

In this tutorial we will be using the BERT model. Download the sample model using this link, as well as the sample data for later use.

Head to app.hyperpodai.com

Create a free account (includes 10 free hours)
Click on “Upload your Model”

Welcome Page

🛠️ Step 3: Set Up Your Project

Fill in the project form as follows:

Project Name: (Any name — for your reference)
Upload Your Model: Upload the ONNX model file you downloaded
Model Type: ONNX
Input Batch Size: -1

Batch size controls how many samples are processed at once. Note that in this setup, positive values indicate a fixed batch size, while negative values allow dynamic batching up to the specified maximum.
Input Shape: -512

The model expects input sequences up to 512 tokens (maximum context length), with a flexible batch dimension.
Output Shape: -512,768

The model outputs an embedding of shape [-512, 768].
Input Type: INT64
Output Type: FP32
Sample Data
- Allows the platform to validate your model’s inputs and outputs, optimizing deployment.
- Upload the sample_data you downloaded earlier (e.g., tokenized input arrays in .npy format).
- Trying to genereate your own sample data? See the guide to creating sample inputs.

✅ Your setup should look like this: bert_setup_page

Click Create Project. This may take a few minutes as Hyperpod sets up and optimizes the deployment for you automatically.

🔑 Step 4: Set up your API

After the deployment is done it should look like something this
Go to Deployment
You can choose between Test mode and Production mode. For this guide, select Test mode — it’s the most cost-effective option. Read more about the difference between test and production mode.

⚠️ Warning: Be sure to manually turn it off by clicking the Off button when you’re not using it, to avoid unnecessary charges.

The deployment process may take around 10 minutes. Once it’s ready, copy the Endpoint URL and Project ID that will be visible from this page.
Navigate to the API Keys section.

Create API Keys

Set a name (for your own reference), and choose a validity period. Then click Create Key
After the key is generated, copy and store it somewhere safe — you won’t be able to view it again from the platform.

⚠️ Warning: Never store your API key in public locations (like GitHub repositories). If exposed, malicious users could access your API and incur charges on your behalf.

✅ Final Step: Call the API

To generate inputs, you’ll need to first tokenize your text using the Huggingface BertTokenizer tool:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "Hello, this is a sample sentence for BERT!"

inputs = tokenizer(
    text,
    padding='max_length',
    truncation=True,
    max_length=32,
    return_tensors="pt"
)

input_arr = inputs['input_ids']
input_arr.dtype
input_arr.numpy().shape

Now that you have the tokenized text array, send it to the inference endpoint:

import httpx

client = httpx.Client(http2=True, timeout=httpx.Timeout(30.0))
endpoint_url = 'your-endpoint-url'
api_key = 'your-api-key'
project_id = 'your-project-id'

payload = {
    "project_id": project_id,
    "grpc_data": {
        "array": input_arr.numpy().tolist(),
        "dtype": "int64"
    }
}

response = client.post(
    endpoint_url,
    headers={'content-type': 'application/json', 'x-api-key': api_key},
    json=payload
)

print(response.text)
predicted = response.json()["array"][0] # use your label dictionary to identify what this label is

🚀 RECOMMENDED: Try our ready-made Colab notebook — run everything in your browser with zero hassle!​

🤖 What is BERT?​

🚀 Step 1: Downloading the model​

🔐 Step 2: Sign Up on Hyperpod AI​

🛠️ Step 3: Set Up Your Project​

🔑 Step 4: Set up your API​

✅ Final Step: Call the API​