Integrity Checks
1.0 Introduction
When you populate a Niente database, you fill it with data about which FLACs exist -and nothing much else. Here is a screenshot of the Niente database in a third-party tool that understands how to read and display databases after scanning for FLACs (Database menu, Option 2) but before performing any integrity checks:
You'll note that each row lists FLAC files found: that's the data which a Database menu Option 2 scan generates. You'll also note, however, that each row in this TRACKS table is supposed to have 14 columns of data (one of them fell off the right-hand edge of that screenshot!) -from MD5 hash values, to extracted PERFORMER, COMMENT and ALBUM tag data (to mention just a few). Immediately after a file scan all these columns are NULL for all rows: Niente knows nothing at all about the FLAC files, except for the fact that they exist.
A Niente integrity check is the process by which these columns are filled in with data, by visiting each listed FLAC in turn and reading all the metadata tags and copying the data found there into the relevant columns. Without an integrity check, therefore, Niente can report nothing meaningful about your music collection, other than the mere number of FLAC files that exist.
Generally speaking, populating these missing columns of data doesn't take a huge amount of time: one only needs to visit a FLAC file once to read all its tags in one fell swoop, so it's not particularly demanding on your CPU or hard disk. Tag data is, in other words, logical data about the FLAC; and logical data is easily and swiftly read by merely opening the file in question.
However, one particular set of columns requires quite a bit more work to populate: the HASH_ORIG and HASH_NEW columns are the columns in which the MD5 'digital fingerprint' of the audio signal within a FLAC are recorded. The HASH_ORIG is easy to fill, in that every FLAC is 'born' with an in-built MD5 hash value the moment it is created …and so all we have to do is read it, like any other logical tag. The HASH_NEW column, however, is where Niente stores the results of re-computing the MD5 fingerprint from scratch -and re-computation involves reading the entire audio stream (so, for a Wagner opera that's going to be a lot of music to read!) and performing a fairly CPU-intensive computation on the data thereby retrieved. Filling the HASH_NEW column is therefore a major slog… and is involved in determining the physical characteristics or state of a FLAC.
In short, Niente integrity checks can be logical or physical, or both (though Niente doesn't have an option to perform only physical checks). Logical ones tend to be relatively fast, and populate around 13 of the columns in the TRACKS table. Physical ones tend to be quite slow and computationally expensive and end up populating the HASH_NEW column in the TRACKS table.
- The Integrity Checks menu, Option 1 : Perform a full integrity check is the option used to perform logical+physical integrity checks, for all records in the TRACKS table, ab initio.
- The Integrity Checks menu, Option 2 : Perform a differential integrity check also performs logical+physical integrity checks, but it only does so for those records which are already known to have failed the various logical and physical integrity tests that Niente performs. If you add 3 new recordings to the database, for example, they will start off without any of the 14 columns of associated data and Niente will regard that as meaning that they fail its logical and physical tests: only those 3 new recordings would therefore have their logical and physical characteristics read, computed and loaded into the database using this option: the other 15,000 rows in the database that are known to have passed the physical and logical integrity checks previously performed would be skipped this time round.
- The Integrity Checks menu, Option 3 : Perform a fast integrity check only performs a logical integrity check: it does not pay attention to hash value mismatches in determining what TRACKS rows to visit and if it visits a FLAC for any reason, it won't re-compute the MD5 'fingerprint' for that file (which is why it's quite a fast check to perform). Like the differential integrity check, however, the fast integrity check only visits those FLACs which are recorded in the database as having previously failed one or more of the logical tests that Niente applies. It will not, therefore, visit all the rows mentioned in the TRACKS table (unless all rows lack data for the metadata columns in the TRACKS table, of course). You'd use this option if you get a report saying 'three of your FLACs have a recording year that doesn't match what's in their ALBUM tag': that's a logical tagging failure, so you'd fix up those tags and to get Niente to re-check just those three files for logical consistency, this is the menu option you'd take.
By way of summary, then:
- Full: Logical and physical tests of all known FLACs; previous results are wiped
- Differential: Logical and physical re-tests of only those FLACs that have failed previous logical or physical tests
- Fast: Logical-only re-tests for FLACs that are known to have failed previous logical tests
When you have just created and populated a database with Niente, therefore, your first integrity check should be a full one. If that produces details of any logical tagging errors that you need to fix (such as the recording year not appearing in the ALBUM tag, for example), then you re-tag the affected FLACs and get Niente to notice your corrections by running a new fast integrity check: since the fixes were all in tag metadata, there's no need to worry about re-computing the physical hash value for the audio signal in any of those FLACs. If Niente ever reports physical file corruption, however, then you'd re-rip that CD, or restore the files from a known good backup -and you get Niente to notice the fix by this time performing a new differential integrity check, since that will re-compute the MD5 hash values for the affected FLACs. Finally, if you are continuously adding to your music collection, you'd regularly run new differential tests to get the new recordings noticed and tested for both logical and physical issues.
2.0 Performing an Integrity Check
All integrity checks use the contents of the TRACKS table in the database to determine what FLACs should be visited and inspected. That is, they don't go off and do a re-visit of the file system to search for FLACs: that's what loading the database is for.
All integrity checks are performed in the same way: select the Integrity Checks menu, then select options 1, 2 or 3 (as explained in Section 1.0 above, each type of check is used in different circumstances). Immediately one of these options is taken, Niente starts visiting the appropriate FLACs listed in its database and reads their contents from disk:
The program display will show you which file is currently being read -and will list out some of the data that it has found within the file. Since this is a screenshot of a full integrity check, you'll notice that it displays the MD5 Hash which Niente has re-computed from the FLAC's audio signal (remember, full and differential checks both concern themselves with a file's physical integrity and MD5 hash computations are precisely the mechanism they use to do that). If I'd selected to do a fast integrity check, the screen would have looked like this:
It's the same sort of logical data as shown before, but this time there's no 'MD5 Hash' item to display. Fast checks don't read or re-compute MD5 hashes at all, because they deliberately don't concern themselves with the physical integrity aspect of a FLAC.
Note that you can interrupt an integrity check at any time (by pressing Ctrl+C, which will trigger the exit from Niente completely), and a new integrity check will simply resume from where it left off (unless it's a full integrity check: they always start from row 1, from scratch). There is a slight catch to terminating-and-resuming an integrity check, however: Niente always creates a 'scan lock' the moment it starts to perform an integrity check, to prevent second simultaneous integrity checks from confusing it. Terminate Niente abruptly, therefore, and the program is killed off before it has a chance to remove that scan lock. This means that if you restart Niente and try to re-run an integrity check you'll see this message:
The fix for this is to tap the L key: that will forcibly remove the program lock and allow a second integrity check to pick up from where the original one got to.
Be warned: if you remove a program lock when it's not safe to do so, you can corrupt your database. That's not fatal, of course: you'd simply wipe it, re-load it and start a fresh integrity check. For a large music collection, however, that would be a lot of work to re-perform, so it's not recommended! Remove the program lock if you have to and if you can guarantee to yourself that no other integrity check is running; don't do it casually or unnecessarily.
At the end of an integrity check, if you were able to see inside the Niente database, you'd see this sort of thing in the TRACKS table:
Compare that to the first screenshot shown in the introduction to this page and you'll see that all the columns associated with each row of data now have meaningful data in their associated columns, not just the word 'NULL'. This means we now have data about a FLAC that we can read, interpret… and point out when it appears logically inconsistent (which is what Niente's various reporting options will do for you).
3.0 Album Art and Volume Boost Checks
Integrity checks gather metadata tags from FLACs and (at least if it's a full or differential check) re-computes the MD5 hash or 'fingerprint' of a FLAC's audio stream. This allows Niente to report on most sources of logical inconsistency or physical corruption. There are two special exceptions, however: checking if album art embedded within a FLAC is of a suitable size and shape; and checking if the volume levels of the FLACs within a folder are as loud as they could be. These album art and volume level checks are not performed as part of a 'standard' integrity check -simply because not everyone wants to volume-boost their FLACs or to embed album art within them, whereas everyone wants to know if their FLACs are physically corrupt!
Accordingly, the Integrity Checks menu, Option 4 : Check album art for all files and Option 5 : Check album art for new files both perform album art checks, either against every single FLAC in your collection (option 4), or only for those files known to Niente but not previously checked for album art status (option 5).
Similarly, Option 6 : Check all files for volume boosts and Option 7 : Check new files for volume boosts are provided to perform an analysis of the music stream of a FLAC file and determine its peak loudness. Option 6 performs that analysis afresh for every FLAC file known to Niente, whilst option 7 performs the same sort of analysis, but only for those FLAC files known to Niente but which have not previously been volume-level assessed at all.
In both cases, the 'all files' option literally wipes the existing album art or volume data from the Niente database and starts collecting it once more, from scratch, for every FLAC in your music collection: that makes these potentially lengthy options to run. When you've just added a couple of new CD rips to your collection and want their album art and volume levels collected, however, that 'start over' approach would be a bit of overkill: the 'new files' versions of each type of check thus helpfully only bothers to collect such data for files which are known about (thanks to them being added to Niente via the Database menu, Option 3) but which are known to lack any prior volume or album art data.
Both the Album Art and Volume Boost checks work the same way a 'regular' integrity check does: you tap the relevant menu option and Niente immediately runs off to read the FLAC files in its TRACKS table and work out the appropriate data to store back in its database.
An album art check-up displays minimal information as it works:
Such checks will be over fairly swiftly, even on huge music collections, as it's trivially easy to check for a piece of artwork's pixel dimensions (for this reason, you cannot interrupt an album art check. That is, of course you're allowed to press Ctrl+C any time you like, but if you do that in the middle of an album art check, all work performed up to that point is lost entirely: a renewed art check therefore has to start over from scratch).
The results of these album art scans are stored in their own table (called ALBUMART), as follows:
As you can see, only two bits of data are collected about any piece of embedded album art: it's height and width, in pixels. Niente does not (cannot!) concern itself with whether your album art is in focus, too old, too new or of questionable artistic merit! This data is, however, sufficient to tell us (via the appropriate reports, of course) that such-and-such a file has non-square album art (which is a problem for some people!) or that this-or-that FLAC has album art which is generally deemed too small or stupidly huge.
The volume boost check display contains a little more information as it proceeds:
The filename of the FLAC being analysed is displayed, along with the loudest volume level detected in that file. If the loudness is below a configurable threshold (by default, it's -2dB, but the Administration menu, Option 1 lets you alter that), then the display shows 'Volume boost possible' in the main part of the screen and increments a counter of possible volume boosts shown in the bottom right-hand corner of the display. It bears repeating that Niente isn't actually applying a volume boost: it's not its job and Niente never modifies the FLACs it knows about anyway. It simply collects the data that lets you determine whether to apply a volume boost or not.
The volume boost scan populates its own table in the database, called MAXVOLUMES. If you could see inside the Niente database, you'd see it looks like this:
You'll note that the table does not list individual FLACs, but folders. That's because you cannot meaningfully volume boost individual FLACs regardless of the volumes of the other files in a folder: you'd end up making the quiet movement of a concerto or symphony as loud as the loud movement, which is not the effect you're after! You'd never want to adjust the relative loudness of movements like that. On the other hand, if the loudest movement was really quite quiet, then it would be fine to boost the volume of all the tracks in the folder so that whilst their relative loudness remained the same, their absolute volume increased.
This is what Niente is calculating with its volume boost check: it's finding the loudest file in a folder of FLACs and working out what the maximum possible volume boost to that FLAC could be without introducing distortion or clipping. If you can apply a boost of (say) +6dB to the loudest file, then you can also apply the same +6dB boost to the quiestest file: both files then have an absolute volume increase of +6dB, but their relative loudness, one to another, remains a constant.
Performing a volume boost check is CPU intensive and can take a long time, since the audio stream data in every FLAC has to be read to determine its peak loudness. Imagine having to wade through Wagner's entire Ring Cycle to work out what the loudest bits sound like! It's going to take a while …and therefore volume boost checks can be interrupted (with a Ctrl+C). You'd resume from where you left off by taking the Integrity Checks menu, Option 6, which picks up from where things had got to, without re-calculating everything from scratch (which is what Option 5 does). You'd have to remove the program lock (press 'L') to resume an interrupted boost scan, of course.
4.0 Scheduling Unattended Integrity Checks
Some integrity checks take a long time to complete (especially full integrity checks and volume boost checks). You generally don't want to be hanging around watching your screen for multiple hours as they plough through their work! Accordingly, Niente lets you schedule unattended integrity checks by launching it with various run-time parameters. This means that instead of launching Niente with the bare command niente, you add 'switches' to the command to make Niente do something immediately without further user intervention.
For example, if I launch Niente by opening a terminal and typing the command niente –check-full, Niente will immediately start to perform a new full integrity check. Or I could type the command niente –check-volume and a new complete volume boost check will begin. When Niente is launched with one of these run-time parameters, it provides absolutely no feedback at all to the user: nothing appears on the screen, which will simply go black and sit there as if nothing is happening at all (even though the program is running like crazy in the background!) The program provides no feedback in these situations, of course, because you're not meant to be sitting there watching it: they're for unattended operation of the program!
The complete list of run-time parameters that you can launch Niente with is as follows:
| Parameter | Purpose/Function | Menu Equivalent |
|---|---|---|
| --scan-full | Wipes the existing tracks from a database and then re-scans default music folder to re-populate it from scratch | Database → 2 |
| --scan-new | Scans the default music folder for new or modified recordings and adds them to the existing database | Database → 3 |
| --check-full | Performs a full integrity check (i.e., physical and logical checks for all recordings in the database) | Integrity Checks → 1 |
| --check-differential | Performs a differential integrity check (i.e., physical & logical checks for recordings with previously-detected physical or logical corruption issues) | Integrity Checks → 2 |
| --check-fast | Performs a fast integrity check (i.e., logical check only of recordings already known to the database) | Integrity Checks → 3 |
| --check-art | Performs a new album art check for all recordings known to the database | Integrity Checks → 4 |
| --check-volume | Performs a complete check of possible volume boosts for all recordings in the database | Integrity Checks → 6 |
| --aggstats | Generates a quick aggregate statistics report and writes it to /tmp/nientestats.csv | Reporting → General → 1 |
The real idea of these parameters is, of course, that you'll use your operating system's scheduler to make these activities happen during the dead of night. In most cases, this means adding a suitable entry to your system's crontab. Here's the crontab I use:
59 23 * * * /usr/bin/niente.sh --scan-new 00 1 * * SUN /usr/bin/niente.sh --check-full 00 1 * * MON-SAT /usr/bin/niente.sh --check-differential 00 3 2 * * /usr/bin/niente.sh --check-art 00 3 3 * * /usr/bin/niente.sh --check-volume
…which means, as follows:
- Run a scan for FLACs which aren't already in the database and add them, at 11:59PM every night
- Run a full integrity check at 1AM every Sunday
- Run a differential integrity check at 1AM every night that isn't a Sunday
- On the 2nd day of every month, at 3AM, run a fresh, all-files album art check
- On the 3rd day of every month, at 3AM, run a fresh, all-files volume check
There is no run-time parameter, you'll note, to trigger a fresh album art or volume check only for those files that don't have the relevant information collected already: it's the full 'all files' version of those checks or nothing.
There's also a potential issue if the 2nd or 3rd day of the month happens to be a Sunday: my full integrity check is running on Sunday and by 3AM it's only been running for a couple of hours, which means it won't have finished. The art and volume checks that are meant to run that morning at 2AM are thus likely to spot the existence of a program lock and therefore not be able to run themselves. I consider this a rare enough occurrence not to care about a missed art or volume check, though, so I live with it. If that ever changes, I'll have to launch those checks later on in the day, maybe in the afternoon
Anyway: you get the idea. You can use the run-time parameters to schedule integrity checks to be performed in the dead of night, when you're not having to sit around and press menu options to make them happen!
Just be aware that there's no runtime parameter to specify the database to use or the music folder to scan for recordings: those have to come from a configured default database and music folder (so visit the Administration menu, Option 1 to set them).
5.0 Conclusion
Integrity checks of all kinds end with a whimper, not a bang! That is, no alarm bells go off, nor flashing lights annoy, whenever an integrity check reveals data corruption, logical inconsistencies or poorly-sized album art. The job of an integrity check is simply to collect the data that indicates those things exist: it's up to you to run the various reports that will tell you, precisely, which files are affected by such things.
Having run an integrity check, therefore, it's important to run one or more reports to find out what the integrity check discovered as it worked.
Remember that most integrity checks can be interrupted by pressing Ctrl+C, but that doing so leaves behind a program lock that will prevent any future integrity check from starting. Tapping the 'L' key removes that lock, permitting new checks to start: but if you remove the lock when you shouldn't do, your new check will completely screw up the work being performed by the already-running check.
| Back to Software Home | Back to Niente Documentation Home | Database Menu | Intergrity Checks Menu | Reporting Menu |








