How much does it cost to train an ML model on the cloud?

Projects like training an ML model break down into compute and storage, the former of which is charged hourly and the latter is charged per gigabyte-month. Because of the variability in ML model complexity it’s hard for us to say in the general case how much it would cost to retrain something like a GAN, but if you can identify:

  • roughly how long (in hours) it takes you to retrain a model on local hardware
  • how much training data you need

you can then correlate these numbers to the pricing pages provided by the cloud vendors:

So purely for example – if it takes you 48 hours to train a model on a machine with 4 CPU cores, 16 GB of RAM and a good GPU, and you need 80 GB worth of training data, we can look up the cost of an equivalent virtual machine and that much data storage. On AWS, we’d look through their VM descriptions for something that references machine learning, and look for things on the lower end of the price range. Maybe we’d settle on a VM of type “g4dn.xlarge”, which costs $.526 to rent per hour. Storage will cost us $.43 per GB per month. So now we have something like:

($.526/vm-hour * 48 hours) + ($.043/gb-month * 80 GB) ~= $26 compute + $4 storage ~= $30

Keep in mind the storage cost is per month, so once your data is on the cloud the only cost you’re racking up for the rest of the month is for training runs.