2024-01-02
md
More Trouble
<-Things Like SD Cards and Code Do Fail

In mid-August, the 2.5" 1 TB SSD on which the desktop machine Linux operating system was installed started to fail some S.M.A.R.T. diagnostic tests. There was no panic, a spare 1 TB SSD by Western Digital (a WD Blue SA510) was on hand. But first, it seemed best to open up the case to check that the SATA cable and power cable were well seated. While doing that and removing an amazing amount of accumulated dust in the case, I noticed that there was an mSATA SSD connector on the motherboard almost hidden under the video card. Immediately, I hatched a plan to replace the no longer trusted 2.5" SSD with an mSATA drive and to add another SATA SSD.

mSATA and SATA drives A spare 128 GB 2.5" SSD from a refurbished computer that I converted into a small Linux server for someone else was available. Windows 10 could be installed on that since I do not really use Windows for much. So I had to purchase a 1 TB mSATA drive to replace the flaky 2.5" SSD as the system drive while the WD Blue SA510 would remain a spare.

Initially called mini-SATA, mSATA, announced in 2009, basically provides the means to add small solid-state drive compatible with the SATA specification using a PCI Express Mini Card connector. Given the comparative size of mSata drives, and their performance (SATA III just like 2.5" SATA SSDs), they were mostly used in notebook and portable computers. However, the standard has become much less popular with the introduction of M.2 SSD's that can be SATA compatible or even faster if using the NVMe protocol. Maybe that explains why finding an affordable 1 TB mSATA drive was difficult. Indeed, I could not find a Samsung, WD or Lexar model at any price in that size. So I settled on a KingSpec M1-1TB. I had never used a product from this company, but at about half the price of a 1TB SSD EVO from Samsung, I decided to give it a try.

While waiting for the mSATA drive, I removed all drives from the computer, connected the 128 GB Samsung SATA drive and installed Windows on it. Luckily, there was no problem reactivating Windows 10 because the new drive was the only real change to the system from the last time Windows 10 had been used with that computer.

When the mSata drive arrived, I removed the Windows 10 SSD and installed the new drive which was a snap because, thankfully, the small screw needed to pull the board down was supplied with the SSD. I then installed Linux on the new drive. It would have been faster to just clone the 2.5" SSD, but I wanted to upgrade from Mint 20.1 to Mint 21.2. Furthermore, I no longer trusted the 2.5" SSD. Thankfully, there was not much difference between the two versions of Mint, which made it possible to copy most configuration files from the Mint 20.1 SSD. That drive was placed in an external USB 3 enclosure to have temporary access to its content.

The installation was also simplified by the presence of other hard drive on the system: a 500 GB SSD that contains two partitions and a 8 TB HD that contains three equal partitions. The bigger partition on the SSD is mounted on ~/Downloads so that the many OS images and much information about devices accumulated over months remained available. The other partition on the SSD is called Versions and it is used to store bare copies of the Mercurial repositories of the source files used to build this site and the various bits of code created. The spinning rust partitions hold respectively all the digital photos I possess, backups of older drives accumulated over many years and the Timeshift snapshots. Note that the latter were not used to rebuild the system because of the upgrade.

In all it took about three days to get most everything running as before. I was quite pleased because

My old desktop machine was working well with this new Linux installation. While nominally the mSATA drive is as fast as the 2.5" SATA SSD, I did notice that there was an occasional "latency" problem not unlike when a traditional hard drive at rest has to be spun up when accessed. I didn't think much of that, and S.M.A.R.T. reported that the drive was fine.

I was so pleased with the changes that I made some hardware upgrades with the aim to reduce the noise level as much as possible. That meant installing a new Noctua case fan and replacing the video card with a slightly more powerful yet affordable fanless NVidia video card. The machine became noticeably quieter.

In early September, my small NAS crashed. I initially thought that the problem was with the power supply. Since it was running on an HP SFF machine with a proprietary PSU which would have been relatively expensive to replace, I decided it was time to build a new NAS from the ground up. I slowly began to gather components and in the process decided to upgrade part of the LAN to 2.5 Gbs. While running initial tests on the new NAS, I noticed that the SATA connector of one of the NAS drives was in pretty bad shape. It seemed that plastic from the connector had melted onto the drive pads. Closer inspection of the power cable in the HP showed that it was broken. So perhaps the power supply had not malfunctioned after all. At that juncture, the point was moot, the new NAS would be going on line no matter if the old HP could be made to work again. The old NAS was running Open Media Vault. I cloned its image it onto a NVMe drive and it booted the new NAS requiring only one tweak in the network setup because the interface had a different name. The new NAS with the old OMV configuration seemed to run well at 2.5 Gb/s which I thought that was quite impressive, but I didn't test thoroughly. I was impatient to try True NAS. Things have not gone smoothly; the system configuration had to be completed on the 1 Gb/s portion of the LAN and then results were disappointing when switching to 2.5 Gb/s. The ethernet interface constantly goes down and then reconnects quickly. Don't yet know if this is a problem with the Intel I226 driver or if the problem is with a cheap TP Link 2.5 Gb/s switch. All that to say that the NAS has been down for the last couple of months. Of course, that was bad timing.

In early November, the home automation system broke down spectacularly as described here. With a lot of work, I managed to get the system fixed. There was a silver lining, it became clear that it was possible to run the home automation system on a converted Alfawise S92 TV box as I had wanted to do previously. Once the home automation system was running correctly on the converted TV box, the system on the Orange Pi PC 2 was updated and made functionally equal to the system on Alfawise. The later was taken out of service, and Android was reinstalled in preparation with a complete rewrite of Turning an Amlogic S192 Android TV Box Into a Linux Appliance. This was a major undertaking because of recent developments. There are two competing Armbian images for the Amlogic S192 which I was also comparing to Armbian on the Orange Pi PC 2 (Alwinner H5). At the same time, I wanted to better describe updating Android and how to use the Amlogic flash tool which runs on Windows. In other words, I was creating many files and saving many links and references. When I get into such a groove, I often leave the computer on all day and night, not worrying too much about backing up working files.

It's not quite clear how it happened, but I got infected by a real virus. As a consequence of a high fever that lasted about three days, I slept most of the time. When I returned to the computer, it was frozen. It was possible to reboot, although I lost all the open files. Checking showed that all drives passed the quick S.M.A.R.T tests. Note that "all drives" included the new mSATA drive. Yet things felt wrong. I tried rebooting a number of times, with progressively worse results and finally the machine would not reboot at all. It was possible to boot from a live USB version of Mint but the mSATA drive failed to show. I removed all other drives except for the new drive, to no avail. No matter what I tried, the mSATA drive was no longer visible.

To add insult to injury, when I went tried to install Mint 21.2 on the WD Blue SA510 1 1B SATA drive, it could not be partitioned. It did not matter to which SATA port it was connected or even if it was placed in a USB 3 external enclosure; the drive was dead for all intent and purposes. There was nothing to do except getting a Samsung 870 EVO 1TB SSD from a store about a half-hour drive. I was back into the same situation faced in August. I basically had to install Mint 21.2 from scratch on this new drive with reference to the old Mint 20.1 installation for many configuration files. Again, it took about 3 days of constant work to get the system back on line. Luckily, I had backups for most of the work done between August and November except for all the work done on the new installation of Armbian on the Alfawise S92 TV Box. The source code for a couple of posts on this site was lost, but that was no big problem since the HTML and image files were on the Web. As far as I can tell, I lost a single changed password, but that was easily recovered.

How unlucky can a guy be? In the last four months,

Questioning my luck is probably not the correct attitude. First of all, I am largely to blame for the first two problems. I probably forced the power connector when shoehorning the drive into the small NAS case. Thankfully, the drive and its data survived. And as I explained before, the immediate cause for the home automation µSD card failure was in all probability a programming error on my part. Of course, I am not responsible for the breakdown of the two SATA drives. Nevertheless, I should have anticipated the possibility of these failures. Failures most commonly occur near the beginning and near the ending of the lifetime of the parts, resulting in the bathtub curve graph of failure rates (source: Failure of electronic components). The Wikipedia article goes on to say that Burn-in procedures are used to detect early failures. I had clearly not done any burn-in of the mSATA drive and the only use of the spare WD Blue SATA drive that I can recall was a few speed tests done in May 2023. I should have known better.

Have I learned any lesson? The rush to get the desktop system back into operation was so great that I did not test the newly purchased Samsung EVO SSD, except for looking at the quick S.M.A.R.T. test results. However, I did do something proactive. As the reconstruction of the system neared its completion, I did make a complete backup of the new Samsung SSD to a disk image with Clonezilla. Furthermore, Timeshift is enabled and daily snapshots of the system are being made to another drive. It would be best to add regular automatic backup of my home directory with Déja-Dup or something similar, because my manual backup in Mercurial repositories was not a foolproof solution. I have considerably improved backups of the home automation system by turning on automatic database backups in Domoticz and by adding an automatic weekly backup of all the configuration files of the home automation software and of the operating system. This is still work in progress; the code is awful and backups are only to a USB thumb drive. Remote backup has yet to be implemented. I should also look into installing smartmontools to automatically monitor the state of the hard drives. Hopefully, many of these needed improvements will be in place before the next failure!

Epilogue

Both the mSATA drive and the backup SATA drive were under warranty. I decided against trying to get redress for the KingSpec mSATA drive. It contained too much confidential information (passwords, financial records and so on) to pass it on to any party. Had I read Is it possible to restore data, and what does it cost? beforehand, perhaps I would have chanced it. As it was, I tried reworking the drive with a hot air gun in the hope that the problem was a cold solder joint. It was not the best idea because the drive was in a worse state after that. Lesson learned.

I obtained an RMA from WD and returned the Blue SA510 not long before Christmas. This Monday morning, on the first of January, my spouse found a small box by the front door which must have been there for a couple of days. The box contained a new WD Blue SA510 in a sealed retail package with no information about why the old drive failed or anything else for that matter. It was fortunate that WD has a Canadian representative, because I had to pay postage to return the old drive and that was not cheap - almost a third of the cost of a new drive. In all likelihood, it would have been too expensive to return the drive to the US. So kudos to Western Digital for such a quick return right in the middle of the holiday rush.

Now I have to establish the best approach to back up my boot drive. Timeshift is already running, has it has been for a few years. Its default settings may not be optimal because the local settings are not saved. That may be a moot point, because the last two installations of my Linux desktop cause by hardware failures were from scratch since the OS was updated in both instances.

Many experienced Ubuntu users don't bother with system backups and just start with a fresh installation in a new computer. They may also keep a list of apps they have installed, and install them again after installing Ubuntu. (source).

That's certainly what I did, but having installed Linux Mint twice in the last few months has made it obvious that it takes a lot of time and effort to restore applications and system settings from scratch.

Currently I am toying with the idea of cloning the boot drive on a regular basis, and switching to the freshly cloned drive after each operation. The safest approach might be a 3-drive setup, so that there are two presumably working backups available at any time. That means purchasing another drive, but it does not seem to matter if the manufacturer is Samsung, Western Digital, Crucial, SanDisk etc. the reviews contain horror stories about sudden breakdowns and lost data. So new drives means more chances for things to go wrong... of course.

<-Things Like SD Cards and Code Do Fail