Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.
Share
Splitting Light: Season 2 - Episode 11
Published 2 months ago • 3 min read
Splitting light
Season 2 Episode 11
The bunker
If you are no longer interested in the newsletter, please unsubscribe
When we had launched Carbon14 a year and half back, Arnaud had decided to buy a specific building in Paris to build a new datacenter inside. This building was very particular as it used to be a ministry building and most importantly exactly 30 meters underneath the building was a nuclear strike proof bunker that had been built in the 1960. Its purpose was to safeguard government officials in case of a nuclear attack on Paris.
One of the rooms of the Iena office "La maison", photo by Quentin Selle
This bunker was also famous because it used to be connected to underground tunnels in Paris. The tunnels were named catacombs. It was where you would go crawling underground for fun.
It was a good place to host a long term storage product. The transformation process to transform the building and the bunker to a datacenter was almost finished. A 30 meter elevator shaft had been dug and the alcoves had been rehabilitated. Interesting fact, the tiles used were the same as the one used for the Paris metro. Around this time, in February 2018, we could start finally plugging computer hardware there.
Racking hardware in the bunker (1)
The team huddled up. We needed to deploy Carbon14 there. The storage elements were being assembled. Multiple 10 petabyte racks, 600 watts each. But we needed a storage buffer as well. Florian had experience with ZFS and we decided to go fancy. Using two caching mechanisms on SSD. A detail that would be very significant later.
Loic took in hands to add deployment automation using salt stack to deploy the software. We deployed, did a few tests and were happy with the system. We plugged the zone into the old Scaleway console, we were a click away from launch.
During that time I had to go several times down in the bunker. Under ground, you had no phone signal. Nothing. Wifi had not yet been installed. You could get network connectivity if you physically plugged into an ethernet cable. Even then, it had to be the right socket. Similarly to any datacenter, it was very cold. You would sit on the floor but had to remember not lean against the walls where water infiltration would flow in specifically made gutters. It was not a comfortable place. But unlike most data centers, there were no screaming fans around me. It was near silent.
Fun facts about the bunker, aka DC4. Each carbon14 server was a bit more than 50 kg. So a rack full of them, with 22 of them, was about 1 metric ton. The ground had to be engineered in a specific way to support 1 metric ton per cm2. C14 racks were very heavy. The heat generated from the machines underground was reused in a nearby building for district heating. This would reduce the carbon footprint of the datacenter.
Carbon14 chassis, aka MrFreeze (2)
After we pushed to production, as soon as we had opened the zone to customers, we started to have a strange issue.
Sometimes, the container where we exported the customer data would not shut down. A restart made it work again, but by itself it would hang. Searching the web, I eventually found a github issue that described exactly what was happening. The issue was on ZFS. We didn’t really have the time to dig into the ZFS codebase, so we waited for a fix. For several months, we connected by hand to the machines and performed the restarts.
A few months later, someone posted a comment that by disabling a specific cache mechanism that we also used, the bug disappeared. I promptly did that with the help of Florian. Lo and behold, it fixed the issue. More importantly, it had no visible performance impact. We found out that we had over-engineered that specific part.
Screenshot of the time saving comment in the ZFS bug report (3)
The few minutes that the person took to report a single comment in the bug report saved us many hours and enabled a final fix to be implemented.
But before that happened, we continued on Object Storage!
Splitting light Season 2 Episode 20 Sharpened sense of purpose If you are no longer interested in the newsletter, please unsubscribe By early June 2018, 8 months in, we were advancing quickly. All these bricks started to be assembled into something that worked. It almost felt like advancing following a lego model manual. Except we didn’t have a manual. The hardware, the software, the integration with existing systems, the testing started to converge into something that could be used. It...
Splitting light Season 2 Episode 19 Bandwidth waves If you are no longer interested in the newsletter, please unsubscribe At every step we would test the performance. Crude methods at first. Sowing together scripts would enable us to get more kick out of the performance testing. The more performance we wanted to extract, the harder it was to do the tests. At first one powerful machine was enough to generate the request and traffic. Then we needed two of them. Then twenty… Then a hundred. We...
Splitting light Season 2 Episode 18 Controlling latency If you are no longer interested in the newsletter, please unsubscribe We didn't only increase our scope in iteration phases to reduce risk or go faster. We also did it for customer facing metrics. One specifically required some tradeoffs; it was latency. To be more precise, time to first byte. Object storage is a generic way of storing and fetching data. The maximum data you could store in a single object at the time was 5 terabytes but...