Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.
Share
Splitting Light: Season 1 - Episode 22
Published about 2 months ago • 4 min read
Splitting light
Season 1 Episode 22
My very first storage product
If you are no longer interested in the newsletter, please unsubscribe
As I was qualifying the storage hardware, one person from the cloud team was assigned to work on the product part. Just as I was about to hand it over , he decided that he wanted to spend more time with his family which lived halfway across the globe, so he resigned. As we were between cycles, the C2 manufacturing order had been sent and the C3 wasn’t “physical” yet, so I took over the product and started working on it.
At the time, there was a cold storage product in AWS called GLACIER, not the S3 storage class, a distinct product that I think has been sunsetted by AWS. Because of the hardware specificity, the only sensible product to design was a similar product. I studied how it operated and started designing different aspects of how it would work. I experimented with several technologies to have more abstraction but each software I tried wasn’t up to the task. Ceph could not be used because of hardware constraints. Network block device (NBD) didn’t seem ready for production after I triggered multiple weird bugs. ATAoE wasn’t flexible enough for us. I was stumped.
I took a step back and put things into perspective. I had hardware with very high density in gb per cm3 per watt. The goal was to archive data for customers. Access latency was not required to be very short but durability was paramount. Constraints were that all the drives could not be powered up at once and random access on them was suboptimal.
After a few rounds of thinking and brainstorming, I eventually decided that I would not be afraid of handling data myself. I put on a brave mask and started designing the data storage system in its entirety. I separated the product into three isolated parts. 1. The hardware itself and the small software to manage the drives; 2. The data pipeline where I would process the data stream and control the integrity of the data; 3. The data intake which customers would have to interact with to send the data.
The first part was done at the handover. I dived into the second one. The idea was that customers could send a bunch of data and they would “seal” it into an archive. We did not care about what the data was and we gave a maximum size of 10 terabyte.To handle that amount of data meant that we could not process the data in-situ, we had to stream it. To handle that, I built a glorified shell pipe.
Let me explain for the people who didn’t understand the last phrase. I would have data arriving through the network in a single connection and it would be processed by a python script. It would count the number of bytes and do a very fastchecksum on the data. I needed integrity checksums not tamper proof checksums. Then the data would be sliced and I would apply reed solomon encoding on it using a library that was fast. Reed Solomon encoding is a mathematical method of adding some additional data so that if you lose part of it you are still able to recover it. It’s used on space probes or many things that you use in your day to day life.
After prototyping everything, I built the real pipeline with the associated metadata database. Then came the testing phase. Performance, reliability and integrity. Performance was important because moving terabytes of data takes time and if you process it slowly it takes even more time. Reliability was important because if the system was down, customers could not access their data. Integrity was the most important component, whatever happened to the storage mediums, we had to give the data back in its original form.
I tested the code on data archives from wikipedia, on home movies, on random data or even empty data. Checking performances and then I started to corrupt the data bit by bit, then byte by byte, continuing until I corrupted the data to the limit of our system and furter to check the failure mode code. I killed some of the processes during an archival or retrieval. I cut the data stream in the middle or suspended it. Every edge case had to be ironed out. Each time, I compared the data that had gone in with the data that came out using checksums of multiple algorithms.
Why did I do all these tests? When you are storing data over a long period of time, many things can happen. Ranging from a bit flip due to cosmic rays, to a disk mechanical failure, to a chip silicon failure or a software failure. Each of these cases had to be handled as best as we could.
After I was confident that the code could handle multiple failures I built an additional safety mechanism. All the metadata was stored in a database. If that database died, even though we would still have data, we would not be able to reconstruct it. At every step, whether be hardware discovery, data archival or data extraction I emitted logs lines with every detail of the action. With those logs, I was able to rebuild the database. It was similar to a write-ahead log, but instead it was more a write-after log. The logs would be stored elsewhere; it was an additional failsafe mechanism. Like for the rest, I started by removing lines in the database to tests, removing more and more data until I was sure that everything worked and that I could compare database dumps before/after and there was no modification except auto-generated timestamps.
While I was doing all these things, two people from the dedicated team were busy doing the customer facing part which was soon to be launched.
To pair with :
Kuj Yato - Clap! Clap!
Heartsnatcher (L'Arrache-coeur) by Boris Vian
If you have missed it, you can read the previous episode here
Splitting light Season 1 Episode 29 Hardware documentation If you are no longer interested in the newsletter, please unsubscribe We gave a lot of documentation to the cloud team or the dedicated team for each new hardware. This documentation was to help them implement support for the hardware, adapt their information system and operate it. Their day to day was managing the hardware and we tried to make it as simple as possible for them. There is a reason why computer engineering is layer upon...
Splitting light Season 1 Episode 28 Juggling code If you are no longer interested in the newsletter, please unsubscribe Scaleway's first custom router Three years into my tenure in the lab, we had launched many products. From the revolutionary C1 where we had 900 nodes per rack to the cold storage product. We had two compute devices production, one network device and one storage device. That did not count one project that had been terminated and the SCADA devices which I did not work on....
Splitting light Season 1 Episode 27 From components to a display screen If you are no longer interested in the newsletter, please unsubscribe We had used very simple LEDs to display the status of the nodes and management system for the first two compute generations. You could not do any actions except power down or reset the system with buttons. We eventually got feedback from the team who managed big quantities of these devices. It worked great but, sometimes, it was hard to handle in the...