I have a question on the basic viability of using the cloud for research.
If I purchase six computers to do my data analysis: I own them and I can use them until the smoke comes out of them in a few years. When I have a new idea for a computation: No problem, I just do it, many times over as needed.
In contrast my cloud use has a calculated budget that corresponds to how many calculations I can do. So how do I “plan” to use the cloud to come in on budget when – by the nature of research – there is always going to be a new idea for a new calculation? I just don’t see the value-for-cost of renting VMs; how do cloud users deal with this?
First we feel very strongly: Don’t use the cloud “just because you can” or “because it is there”. The public cloud has to make financial sense as a research computing platform. On the one hand this is a case-by-case decision process; and on the other hand it can be easy to mis-underestimate the costs versus benefits.
Cloudbank advocates taking some time to do a careful cost analysis, ‘buying versus renting’. A key idea (we find) is wall-clock time: Having a single computer do a very parallel-izable job in 1000 hours costs the same on the cloud as having 1000 computers do the same job in one hour. That’s great when wall clock time is an issue. But if your “best possible” research project timeline involves many more calculations than your budget will allow, then your budget is the limiting resource.
To get an accurate cost estimate the best procedure we know if is to time and cost a small-scale typical computation on the cloud if at all possible. This allows you to assign an observed cost rather than guessing at the cost. This unfortunately puts a research team in the position of “we need some cloud skills” before doing this test and arriving at a good cost estimate. So there is some risk in investing in that learning time.