The title could also be``How to change a bad drive in TrueNAS?`` and in my case ``How to upgrade the pool storage space?`` but had it the end it was ``How to trouble shoot during this process?``.
The documentation I used can be found had https://www.ixsystems.com/documentation/freenas/11.2/storage.html#replacing-a-failed-disk
How to fix a degraded pool state in TrueNAS ?
In my homelab environment I am using, one off the disks is bad in my TrueNAS. I had to detach the faulty drive to be able to boot the server up again.
To replace the disk you must have a replacement disk from the same size or bigger. Because I have to upgrade my pool to a bigger capacity I will change all my 2TB drives to 6TB drives but for obvious reasons I will begin with the bad one: ``I like my data``. Before replacing the drive I will scrub the pool and set the faulty drive offline (which was not possible because the drive was not connected). Replace the drive and adding it again to the pool has a replacement. The resilvering will take up again. When the resilvering is done the pool has to be online again but not in a degraded state.
And how to upgrade the pool in capacity? Repeat this process for the other drives in the pool with the remaining new 6TB drives.
Do not forget to take backups off the valuable data in the pool where you are changing the drives from. I use replication to the other pool in my system to have a copy on hand of all my valuable data.
How do you know which drive is the bad one?
Go to the disk list and the S/N not there is the bad drive or the port that is missing in your list has the bad drive connected to it. In my case it is SATA7.
Day 1
- Replacing the broken drive (first try with a dead on arrival HDD) redoing it with another drive.
Day 2
- Resilvering done with the second attempt to replace the drive on SATA7. RAID5 pool is ONLINE
- Pool scrub
- Taking the second drive on port SATA6 to replace OFFLINE. Shutdown of the TrueNAS.
- Replacing the disk that is offline SATA6 in the server I use the serial number to check the disk to replace. At the back of almost every HDD you can find a label with the S/N on.
- After the resilvering I did the same thing for the HDD on SATA port 4.
Day 10
- The same thing for HHD on SATA0 that I did with the HDD on SATA6 and SATA4.
- But this was another story cecksum errors all over the place when the resilvering process was started. After a scrub the same thing happened, so I had to find the cause of the problem.
- The first thing I did was replaced the SATA cable (lucky shot or not). When the SATA cable was replaced on SATA port 0, I went to the command line and did a "zpool clean RAID5" followed by a scrub and no more checksum warnings and all the files where usable like before even those with the errors. I did a zpool status -v RAID5 to see the list of affected files before I did the zpool clean and scrub and tested a few files.
- I did the following "Expand Pool", something you can not do with checksum errors. When "Expand Pool" is executed, the extra storage will be added to the pool.
Well done and thanks. I'll keep this bookmarked for when that day arrives.
ReplyDelete