Launching a Spot instance VM on AWS EC2
AWS spot instance utilizes unused EC2 resource with a good discount (up to 90%) compared to On-Demand instance price. The tradeoff for this discount is that your instance may stop when AWS needs to sell your instance to someone else at On-Demand price. So, the spot instance needs to be carefully configured before initiating the instance for the stop and restart/continue strategy of the job (see tips from this Cloud Clinic).
This post walks through launching an H100 spot instance on AWS EC2.
First the instance is selected, “p5.48xlarge”. When a high-demand GPU instance is selected, additional menu options will pop up under Advanced Configuration. We recommend the following selections in the below screen capture to launch a VM instance with a spot configuration.
Here notice the followings in the instance configuration:
- “Custom” and “spot instance” are selected. Notice that capacity block and capacity reservation is explicitly disabled since NAIRR credits are currently not able to be used for their purchase.
- “No maximum” price limit is given here. Maximum price can be set as well.
- Interruption behavior is set to “Stop” to override the default behavior of “Terminate” which would delete your instance once it is interrupted. “Stop” will retain the volume.
In the network configuration of the above instance selection, notice that ssh is allowed only from “My IP” to help protect your VM once it is started. Since AWS IPs are continuously scanned by hackers, limiting SSH to your IP address will disable their attempts to log into your VM.
Select enough storage size for your job. We recommend storing large datasets in S3 and mount the S3 bucket to your to your VM using Mointpoint. The screen capture below sets only 8GB as an example.
After pressing the “Launch Instance” button, the instance will start when there is availability (available unused instance and your price set are met). The following screenshot shows the details for a running spot instance:
Notice information shown in the above screen capture showing the detail of the instance running currently. It indeed shows “spot” in Lifecycle and other spot instance related options in the current instance run like “stop protection” and “stop-hibernate”
More details of the spot instance details linked with the instance is shown above. Notice the max price on this spot instance request and “persistence” type as “persistent” which automatically resubmits the job to the spot instance when AWS resource is available again.
Here are some general spot instance tips around.
- Be flexible about instance types and zones
- Use good price and resource capacity strategy including autoscaling if it is considered
There are many tips recommended shown in the web document links:
- Spot Instance Best Practice: Spot Instance Best Practices - Overview of Amazon EC2 Spot Instances
- Managed Spot Training in Amazon SageMaker AI**: h**ttps://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html
Termination of Spot Instance request
To terminate the requested spot instance service anytime, use “cancel request” choice from “Spot Requests” (loated in the “Instances” column in EC2 dashboard). The choice can be found in the “Request Spot Instances” menu button:
Canceling will show a confirmation popup window:
After confirming, the requested spot instance service will be stopped and any instances involved in this spot service will be terminated.