Over the past 15 years, my homelab has seen a biiiig variety of iterations and solutions, in somewhat of an order:
Then, the latest upgrade about 6 years ago:
I kept the Synology for biiig VMs, but for the smaller, this 4TB of SSD was quiiiiiiick.
But.. of all days, Jan 2nd, I saw that an SSD had failed. For those who are used to hard drive failures, you might not think too much about it. For those who have used SSDs before (especially consumer SSDs, like in my case), this spelt baaad news. Since these are in RAID, all SSDs are getting same number of writes (which is the killing factor here), meaning that it was just a matter of time before more failed..
I had luckily bought 2 spares when I originally purchased the array, so I replaced the failed drive and things were fine.
Of course this had to happen in 2026, the year where SATA SSDs are basically non-existent (I literally could not find any on Amazon) and those that existed (higher capacity) were much more expensive.
So, looking at moving to 2TB SSDs, I’d be looking at ~£500 to replace the array - keeping in mind the original array was only £400, I wasn’t too happy - particularly because only having 3 drives would mean:
I considered how much of a performance impact I’d had with spinning rust. Looking at (at least what I’d considered) creme’de la creme of hard drives (2.5" 15K SAS drives), I could get somewhat of a comparison:
| Crucial MX500 SSD | Dell 15K 2.5" SAS | |
|---|---|---|
| IOPs | ~95K | ~150-210 |
| Throughput (write) | ~510MB/s | ~115MB/s |
| Throughput (read) | ~560MB/s | ~115MB/s |
| Latency | <1ms | ~2-3ms |
Unfortunately, this isn’t a surprise, we can clearly we can see who the real winner is.
And to be clear, my memory of running 50+ VMs on spinning disk is definitely a bit hazey, but I started to remember burst of high load averages (> 300 !!!) and times where a machine could barely claw 1MB/s of read/write. I know it’s a homelab, but I know I’ve gotten used to the SSD lifestyle.
But, at the same time, 10 x 1.2TB SAS drives for £120 is beyond cheap.
This should really be a non-issue, but after putting in a single drive to test it, I could not get it to be detected by the operating system. After checking a variety of things (the server is not easily physically accessible and I didn’t want to turn it off whilst the SSDs were actively working).. I came to the conclusion that the hard drive frontplane was SSD-only. So after buying a replacement frontpanel and cable (£30), only at this point did I login to the ILO (I leave the network cable unplugged) and check - and course, the drive showed up as a foreign RAID member, so needed initialising :cry:
So I considered what if I could cache as much as possible.
Read cache would obviously be free and easy - I’ve got a whole bunch of SSDs that are yet to fail, so these were a no-brainer: 2 x 500GB read cache was a definite must.
However, the question about write cache is more problematic… failure situations, data corruption and data loss. My knowledge around this wasn’t great, but at least from some very ancient history on the topic, my worst case is:
So I set out what I care about and what I don’t care about:
As long as I could get a write cache that could handle this, then I might be good - and I still have 2 basically brand new SSDs that had just gone into the array.
The technology chain up to this point had been:
SSDs -> md -> LVM PV -> VG -> LV (per VM) -> Exposed to qemu as VM disk
I’d used some per-LV caching in LVM before for some read/write caches for the NAS, but honeslty I wasn’t keen:
ZFS on the other hand..
This isn’t particularly interesting, but thought I’d note, just because I was quite particular about ensuring this transfer worked:
This isn’t my first rodeo with copying block devices over a network, so for anyone who’s interested:
dd if=/dev/md0 bs=1M | pv | ssh user@temp-r720 "dd of=/dev/backup_vg/tmp_backup bs=1M"
I then ran the following to verify the transfer:
ssh user@temp-r720 "cmp /dev/backup_vg/tmp_backup - " < /dev/md0
The next step was to setup each ZFS volume, copy data from SSD array and then also wanted to then compare each ZFS VM volume with the temp server (this veries copy from original against the backup):
LV=VM-DISK-NAME
LV_BYTES=$(blockdev --getsize64 /dev/ssd-1/$LV)
# Create ZFS volume with same size
zfs create -V ${LV_BYTES}B vm_pool/$LV
# Disbale sync for copy
zfs set sync=disabled vm_pool/$LV
dd if=/dev/ssd-vg/$LV of=/dev/zvol/vm_pool/$LV bs=1M oflag=direct
# Re-enable sync
zfs set sync=standard vm_pool/$LV
zpool sync vm_pool
# Checksum target volume to compare against temp backup volumes
sha256sum /dev/zvol/vm_pool/$LV
ssh user@temp-r720 "sha256sum /dev/ssd-vg/$LV"
I noted that, as with a lot of my homelab, this physical server had been setup and maintenance only done as necessary.. meaning it was running Ubuntu 14.04. This meant that it had an old version of ZFS - to avoid tempting fate, I decided to upgrade to the latest version to avoid any horrendous old bugs.
Fortunately, this was incredibly easy… 5 rounds of do-release-upgrade and reboots actually worked flawlessly.
I went into this quite blind - I had to migrate and get the data safe, no matter the performance.
However, somewhat magically, I actually found some of the applications running were actually quicker.
ZFS provides a wealth of information about it’s status and I found it wonderful that I could you just take all of the raw data and ChatGPT could give me a great insight into how it’s doing.
I took three takeaways from the output:
The final thing I really really hadn’t considered which was very interesting:
Overall, I’ve reduced my reliance on consumer SSDs, allowing for more of a mix-and-match (I can replace the current SSDs without data transfer)
I’ve now moved to enterprise hard drives, which are avaiblable cheaply from eBay and can hoard a bunch of spares for little money.