Splitting Light: Season 2 - Episode 03


Splitting light

Season 2 Episode 03

Remote hands

If you are no longer interested in the newsletter, please unsubscribe

Even before we could do anything other than look at what object and block storage were, we had to assure the continuity of the existing products. That meant getting the people who used to take care of them to give us all the information they knew and what we needed to continue running them.

One thing helped us very much. Both Folays and Loic had worked on some of the products. We could more easily learn how they worked. I had done carbon14 product, and Théo the interface part of it. Florent also had some experience with the existing object storage product.

Having knowledge inside the team made it easier but it still wasn’t easy. We had to find where and how they were monitored, what the alerts meant, and how to fix issues that were raised by the support team. These were live products with live customers.

After documenting internally what we could, we had one last problem. The problem every person that has worked with hardware in a distant datacenter has... Replacing parts. Hardware fails. It’s inevitable. Parts fail at different rates but eventually something fails. The most frequent failure for us was hard drives. We had quite a large, at least for me at that time, fleet of machines each equipped with a number of disks. An inaccurate number, it’s been a long time, would maybe be a thousand plus drives. Many of them were several years old. They failed sometimes. Each machine could tolerate two, maybe three disks that died before data loss.

Having them replaced was, how to say, painful at first. We had to find out how the drives were configured, what types of drives they were, and how to tell a datacenter technician which drives to swap… Many things had to be learned with live products.

Blinking the disks worked… Sometimes but most of the time not. Getting the defective disk serial number worked until the disk was dead which was when we needed it… Using a slightly different disk model worked or not depending on the RAID card and hardware model… Sometimes changing the faulty drive triggered a failure in another while the data was being rebuilt. Sometimes multiple times in a very short time.\

It is safe to say that for me, it vaccinated me from ever using hardware RAIDs again. Combined with my prior experience in building hardware, it cemented my understanding that any piece of hardware had to be sufficiently redundant until an intervention could be scheduled.

We all eventually learned a lot and documented. We had handbook procedures with to-do lists. What steps, how, which interface, which tool… Each time something new happened it was supposed to be written down.

Eventually, we managed to take care of things in a timely manner. While this was painful, we learned many lessons and team members who had never worked with hardware got experience. It enabled us to get a better understanding of the products which would lead us to make some hard decisions…

If you have missed it, you can read the previous episode here

To pair with :

  • Acid Tracks - Phuture
  • The Caves of Steel by Isaac Asimov

Vincent Auclair

Connect with me on your favorite network!

Oud metha, Dubai, Dubai 00000
Unsubscribe · Preferences

Symbol Sled

Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.

Read more from Symbol Sled

Splitting light Season 2 Episode 02 Birth of team storage 💾 If you are no longer interested in the newsletter, please unsubscribe The five of us had been tasked to build two new products in the short term. Object and block storage. What we had missed was that we would pick-up all the existing storage products as well. At the time, in the cloud department there had already been two tries at object storage, the first one had been a public offering based on RiakCS, a now defunct database. The...

Splitting light Season 2 Episode 01 51 avenue Iena If you are no longer interested in the newsletter, please unsubscribe November 2017, the cloud industry was now booming. AWS was 11 years old, Azure was 9 years old, GCP was 6 years old and Oracle Cloud was now 2 years old. For me, the adventure continued but on a different wavelength. I had jumped energy level, and it was a whole new light. The storage team was created. There were initially five of us. Me, the lead engineer, Théo the manager...

Splitting light Season 1 Episode 40 Thirty years old If you are no longer interested in the newsletter, please unsubscribe I did my last student loan payment in August 2017 just before turning 30 a few weeks later. 30 for me had a meaning. I had survived. I had accomplished and did things I enjoyed. It was time to give back to friends and family. I organized my birthday at my uncle who had a property in the south of France. Me at my birthday My goal was to bring all these different people...