Deployment Modes
There are 3 deployment modes - off
, test
and production
. You can control these in the deployment tab in the project page. You can freely change between any of these modes, however changes might take a few minutes to complete.
Your project needs to be in a ready state before the deployment tab is available, hang on tight while our system verify and prepare your project for deployment!
This section details the purpose or use cases for the different deployment modes, for information regarding connection please refer to inference API documentation.
Off
The default mode for all projects, no infrastructure is setup and hence no inference request can be made. You will not be billed while your project is in this mode.
Test
You can test and/or verify your connection to the inference server in this mode. In this mode, there should only be a single instance serving requests made.
Do note that in this mode, you will be billed hourly. You can take advantage of the test mode to verify your connection to keep cost low before moving to the production mode.
Production
In this mode, we will deploy your model in a reliable, auto-scaling infrastructure. As the load of requests increases/decreases, we will allocate sufficient resources to meet your needs.
Similar, you will also be billed hourly when you are in this mode.
Non-Use Timings
Non-Use Timings are time period(s) which you do not expect any requests to be sent. This will help you reduce cost
You can expect requests made during the Non-Use Timings to be rejected/fail