To bring object storage the fastest and safest way would be to use existing software. At the time we reevaluated the solutions that had been selected beforehand. There were three of them. Ceph, an open source industry standard, OpenIO a provider of an open source object storage, and Scality a provider of a closed source product.
There were multiple criteria to take into account. The price of the solution, support from the vendor, the ease of use and installation, the possibility of being able to patch the product to fix issues and add features, and lastly how would the solution work over time as the cluster grew.
The last item ended up being the final selector for reasons we believed were critical. We were building a public object storage where we sold an apparent infinite storage for customers. Of course, nothing is infinite in the physical world, everything ends up being a large number of hardware that you handle. Because the product is infinite you need to be able to grow your clusters. There is no limit for a customer. They could store one gigabyte or several hundred petabytes in a bucket if they had the funds.
Because of how the protocol is done, a bucket can be infinite, you either have to grow the cluster indefinitely or be able to share a bucket on multiple clusters. The later option I considered to be extremely risky because we didn’t have sufficient knowledge, at the time, of the internals of S3 to understand the full limitations.
Our choice was to have one cluster per region and thus have to grow it indefinitely or, in reality, sufficiently enough to meet customer demand. Therefore we initially would have to grow the number of machines quite a lot.
Ceph and Scality used a mechanism to evenly spread the data over the machines. This had great performance implications but it required that when you added a single storage media to the cluster, you needed to rebalance all the cluster and with that an immediate performance impact until the operation was completed. We had had this experience with the private object storage clusters where multiple times it had translated to a customer impact.
We knew that initially the cluster would grow. Our planned unit of deployment was a rack. To grow we would need to add a rack. The first time, half the data would have to be moved. The second time, a third of it. The third time, a fourth… Over time it would be less and less in percentage but more and more in volume. In the distant future it might be a better solution to use that algorithm. For us, at least in the short term, because we had to grow these clusters frequently, it would be better not to have such an algorithm. It could cause too many issues. OpenIO did not have this issue because it did not balance the data using the same mechanism. It used another mechanism that we preferred. It would not rebalance the data by itself.
Another consideration was that both Ceph and OpenIO used OpenStack swift for the API support and OpenIO seemed more consistent in their support of S3 in addition of SWIFT.
We chose OpenIO. The team agreed. We knew the limitations but they were less problematic than the others. What we didn’t know was how much drama it would generate. Some engineers in other teams tried to force our hand in using Ceph. We didn’t bulge. We were fiercely independent. We marched on.
(1) Photos available here by Emmanuel Caillé
(2) Photos by the launch event photographer - if you know who it was please tell me
If you have missed it, you can read the previous episode here
To pair with :
- Interlude - The Blaze
- The Family Trade by Charles Stross