When you format a hard disk, ext4 reserves around 5% of it to ensure you don't run out of space with catastrophic (non-bootable) consequences. If you are formatting an external USB drive for backup purposes, non-booting from the drive is not an issue, so there's little point in reserving any space at all (especially as 5% of a 6TB drive is practically 300GB!). Therefore, un-reserve that space with the command:
sudo tune2fs -m 0 /dev/sdX1
(where sdX1 is actually /dev/sda1, /dev/sdb1 or whatever else your actual physical disk partition is). That sets the “minimum [to keep] free” to nothing, meaning you get to use all your hard disk space. Whilst (for example) the BSD man pages for tune2fs seem to indicate that setting -m to less than 5% is going to hammer performance, that's not what an Ext4 developer says. Obviously, this assumes your backup hard disk is using ext4!
You've got a hard disk with contents that are in a relatively unknown-state and you want to wipe the entire drive as quickly as possible so you can start with a clean slate for a new series of backups? The following commands will help:
sudo wipefs -a /dev/sdX echo "label: gpt" | sudo sfdisk /dev/sdX && echo ",," | sudo sfdisk /dev/sdX sudo mkfs.ext4 -F /dev/sdX1 lsblk -no UUID /dev/sdX1
Replace the “X” with the correct drive letter (e.g., /dev/sda or /dev/sda1 and so on). Wipefs doesn't laboriously clean a disk: it simply wipes all drive partitioning signatures from the disk, so it's a quick operation and effectively renders the disk blank for nearly all known tools. The sfdisk formatting then creates a new partitioning table to replace the wiped signatures and then mkfs formats the new partition. Very dangerous, very destructive… but also very efficient and very quick for 'starting from a blank slate'.
If I mis-catalogue a new recording, it will sometimes be because I've said its composer is “Arvo Part” rather than “Arvo Pärt”. The lack of umlaut on the 'a' is a tiny typo, but suddenly means Giocoso and Niente will report I've got an extra composer in my music collection than I really ought to have. This Niente query helps spot near-misses like this:
WITH RECURSIVE numbers(n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM numbers WHERE n<100
),
name_values AS (
SELECT
composername,
SUM(unicode(substr(composername, n, 1))) AS value
FROM tracks
JOIN numbers ON n<=length(composername)
GROUP BY composername
)
SELECT
composername,
LAG(composername) OVER (ORDER BY composername) AS prev_name,
LAG(value) OVER (ORDER BY composername) AS prev_value,
value - LAG(value) OVER (ORDER BY composername) AS diff,
CASE WHEN abs(value - LAG(value) OVER (ORDER BY composername)) < 100 THEN 'NEAR MATCH' ELSE '' END AS near_match
FROM name_values
ORDER BY composername;
This takes unique composer names, orders them alphabetically, then compares row n+1 with the composer name in row n. For each composer name, a numeric value of its letters is computed. If row n was “Benjamin Britten” and row n+1 was “Richard Wagner”, you'd expect the two numbers to be wildly different from each other. If row n was “Benjmin Britten” and row n+1 was “Benjamin Britten”, however, you'd expect the two numbers to be very close to each other. If the two numbers are within 100 of each other, then they're flagged as a 'near match'. You can then investigate whether that's just coincidence or an accident of catalogue mis-typing!
It's not a perfect way of doing it: missed letters or badly typed letters might mean a row's composer name is being compared to quite the wrong composer name. For example, “Aaron Copland”, “Aarre Merikanto” and “Aaton Copland” would mean that “Aaton Copland” would be compared to “Aarre Merikanto” not “Aaron Copland”, because the 't' makes it sort after Merikanto's 'r' in the same spot, even though it clearly involves a mis-typing of Copland's first name. Nevertheless, that might still be helpful:
Aaron Copland Aarre Merikanto Aaron Copland 104890 -80053 Aaton Copland Aarre Merikanto 24837 -23601 Adolph Weiss Aaton Copland 1236 -81 NEAR MATCH Adolphe Adam Adolph Weiss 1155 4365 Adrian Willaert Adolphe Adam 5520 4693
Even though “Aaton Copland” is being compared to the wrong composer name, the 'Near Match' flag is still raised by the query, and that's enough of a pointer to make one realise what has gone on.
You will need to run the report multiple times, too, with different values for the 'near match' threshold. It's not a sure thing that a typo of “Part” for “Pärt” will trigger the near match flag, even with the threshold set into the thousands, for example. But by increasing the threshold significantly, though you'll get plenty of false positives, you do improve your chances of spotting the actual mis-catalogues, too.