Splitting light

Season 2 Episode 39

Next steps

If you are no longer interested in the newsletter, please unsubscribe

Around end of May 2019

We continued to push forward. Our next enemy in the path, our next boss in our raid, was lifecycle policy. Now that we had both Object Storage and Cold Storage, as in Carbon14, we could link them together. The damaged dealers started to hit it. Nicolas (a) and Louis (b) worked on making this happen. It was a multi-step journey. The preparations started around May 2019.

Nicolas worked on the integration with OpenIO. This consisted of an engine to modify data and additions to be compliant with the S3 lifecycle API. Louis on his side worked on the storage hardware integration. This was writing the right software to store the data on the custom cold storage hardware. The one used by Carbon14.

We split off part of Carbon14 to enhance Object storage. In essence, a Dragon Ball Z fusion 👉👈💥

In other donjons, Théo (c) was pushing to have more deployments. That meant more regions or private cloud setups. We discussed how to reduce the size of a deployment. Where we could deploy additional racks. His business plan required a certain number of Object storage and Block storage to break even our costs.

We had designed the architecture in a certain way. But once we launched, we realized quite a few things could be simplified. So I worked on folding together some of the components. Reducing part of the complexity. Also, we fixed bugs as they appeared and enhanced the dashboards more.

Purple elements are a critical path. They all have to be functional for Object storage to work. It is better to have a single point of failure than three single points of failure.

On his side Folays (d) was working to replace the existing FTP backup service. Théo had him do a filesystem that used S3 as underlying storage. The idea would be to do it in the right way so that when we would build the Filesystem as a Service product, it would be plug-and-play.

A lot of the work was managing production. Bugs happened. Failures happened. Automatic configuration triggered incidents. It was a process of disabling and enabling the right things.

We found out that; in a specific flow; some shards of the underlying data were not deleted correctly. So Quentin (e) and Nicolas worked on fixing that. We had to do multiple trials to fix that issue properly. Eventually it was fixed for good a year later.

Another element of our work was accompanying the compute team to migrate their data from the previous private Object storage to the new public Object Storage. We were able to safely retire the old clusters in summer 2019. We turned the machines off and released the rack space for new hardware or products.