This site's Aggregate Statistics page shows the current state of play with my music collection as analysed by my own Niente FLAC checking software -but the significance or precise meaning of each number might not be obvious. Here are some explanations.
Hopefully self-explanatory, but maybe not obvious: Niente scans for individual files and it would be wrong to count files as 'recordings' (since Beethoven's Fifth symphony probably comes as four separate files, one for each movement, but should only count as one recording). Accordingly, Niente groups files by the physical folder they're found in …and it's the count of these folders that is displayed on the statistics report, where one folder is taken to mean one recording, no matter how many individual FLACs it may contain.
Technically, the number comes as a count(distinct substr(filename, 1, length(filename) - instr(replace(filename, '/', 'x'), '/'))) as 'Count', which looks horrible but basically takes each individual FLAC's filename, removes the bit after the last forward slash (which is literally the FLAC's filename) and thus creates a 'folder name' for each FLAC. It then 'distincts' those folder names, so that four FLACs' folder is only counted once. The count shown is thus a count of distinct folders, not individual FLACs.
Every FLAC ever created is 'born' with an internal bit of data which is an MD5 hash of the audio signal component of the FLAC. Niente then performs a fresh MD5 hash of the audio signal during its physical integrity checks: you would hope that the new MD5 hash matches the one the FLAC was initially created with. If it does, it proves that the FLAC's audio signal is bit-for-bit identical today with the signal as it was first created: there's been no bit-rot and the file is entirely without corruption.
The definition for this statistic, therefore, is that the new MD5 hash is not identical to the 'natal' MD5 hash: something about this FLAC's audio signal has changed since the FLAC was first created. That strongly implies internal, physical corruption of the FLAC files -and that can only be fixed by restoring it from a known good backup… or by re-ripping the file in its entirety.
This site's Axioms of Classical Music Tagging's very first Axiom is that there are eight metadata tags that we definitely ought to have when tagging up FLAC files. If Niente detects that all of those tags are empty, it will declare the file(s) involved as likely having never been subject to one of its integrity checks and thus count them on this part of the aggregate statistics report. This is a sort-of “process violation”: you've added FLACs to the Niente database by performing an incremental database refresh. You've then failed to go on to perform a differential or full integrity check on those new FLACs, so data about them in Niente's database is empty. A fresh integrity check will sort this problem out swiftly.
This is a corrollary to the above statistic: if some of the eight essential tags are empty but others are not, this suggests that the file(s) concerned have been integrity checked at some point, but that the empty tags are empty because they were never populated. That's not a process issue: that's a “you're tagging your FLACs incorrectly” problem… and the only fix for it is to re-tag your FLACs in a more standard way.
Niente won't double-count, by the way: any file which is on the 'not analysed at all' report will definitely not be on the 'some tags are missing' report.
Axiom 5 of the Axioms of Classical Music Tagging state that the ALBUM tag is used to store the extended composition name. The extended composition name is further defined in that Axiom as “the actual composition name plus the distinguishing artist's name plus the year of recording”. That means your ALBUM tag might be something like Peter Grimes (Britten - 1958). That means that there's a recording year mentioned in the ALBUM tag. Unfortunately, there's also a recording year mentioned in the DATE (or YEAR) tag: see Axiom 7.
As two independent tags, there's no earthly reason why the year mentioned in the ALBUM tag should agree with the year mentioned in the YEAR or DATE tag… but common sense suggests that they ought to do so. This statistic counts any recording for which the “YEAR=ALBUM's date component” equation is not true. The fix for this statistic is to re-tag the affected FLACs and get the recording year correct, consistently and then to perform a fresh integrity check.
Left to their own devices, most CD rippers will say tracks 1 to 4 belong to Mozart's symphony and tracks 5 to 8 to Beethoven's, but Axiom 9 states that if Mozart deserves tracks 1 to 4, Beethoven's symphony must also have tracks 1 to 4, because it's just as valid and independent a recording as Mozart's, even though it was shipped on the same CD. Thus, every independent recording should always start with a track number of '1' -and this report will list any folders where that's not true. The fix for this statistic is to renumber your FLACs' track numbers, starting from 1, and then perform a fresh integrity check.
Axiom 3 states that we list all the performers on a recording in the COMMENT tag. We might therefore expect to see listed there such names as “Leonard Bernstein, New York Philharmonic”, say. Axiom 6 then goes on to say that the PERFORMER tag is where we mention the distinguishing artist for a recording: the one performer whose presence tells us the difference between this recording of a particular symphony and that recording of the same symphony. In our example, the New York Philharmonic will have recorded Mahler's symphonies under numerous composers, so they cannot possibly be used to distinguish between those recordings. You can, however, tell Bernstein's Mahler 2nd from Bruno Walter's or Zubin Mehta's, even though the New York Phil were involved on each occasion: so, the artist that distinguishes between recordings is, in this case, the conductor.
The logical issue then becomes: if you're going to mention Bernstein in the COMMENT tag and if he's the distinguishing artist that gets mentioned in the PERFORMER tag, you'd expect the COMMENT and PERFORMER tags to agree on his name! You cannot have “COMMENT includes Leonard Bernstein” and PERFORMER mentions Lennard Bernsteen“, for example. This statistic highlights how many times that correspondence between the two tags has failed to be implemented. The fix is to re-tag your FLACs and then perform a fresh integrity check.
We've already seen that Axiom 5 states that the ALBUM tag should contain the extended composition name -and that this means it will consist, in part, of part of the name of the distinguishing artist for a given recording. For example, “Symphony No. 5 (Bernstein - 1961)”. We've also just seen that the PERFORMER tag is expected, by Axiom 6, to mention the full name of the distinguishing artist. Put the two axioms together and they logically imply that the part-name in the ALBUM tag ought to be present to some extent in the PERFORMER tag. If it's “Bernstein” in one, it can't be “Bernstten” in the other.
This statistic counts the number of FLACs for which this correspondence between PERFORMER and ALBUM tags is not true: the fix is to re-tag your FLACs and perform a fresh integrity check.
Axiom 2 states that the ARTIST tag is used to store the name of the composer of a recorded piece and that the COMPOSER tag will store exactly the same information. Duplicating data in this way is not ideal but is done to assuage the music player gods: some of them display “ARTIST”, some display “COMPOSER” and some expose both: it's therefore a good idea to have identical information in both tags so that your music is accessible in a variety of music players.
This statistic therefore simply counts the number of tracks for which the ARTIST and COMPOSER equivalence is not true: the fix is, as usual, to re-tag your files and perform a fresh integrity check.
Unusually, we now come to a Niente Statistic which is not mandated by the Axioms of Classical Music Tagging! When you rip music from a standard CD, it will conform to the Red Book standard for CD Audio, meaning that it will consist of one sample created ever 1/44,100th of a second and with each sample having one of 65,536 possible values. That's a 16-bit number, so we say that standard CD Audio is a 16-bit value, sampled at 44,100Hz. SACD audio is likely to be a 24-bit value sampled at 88,200Hz. Even higher resolution audio might be produced as 24-bit values sampled at 192,000Hz: it remains the case (at the time of writing, at least) that no-one is selling audiophiles 32-bit audio samples! The point is that we expect “ordinary” audio to be 16-bit and 44.1KHz (or sometimes 48KHz); we similarly expect 24-bit audio to be 48KHz or higher. If we encounter a 192KHz FLAC at 16-bit, or a 24-bit FLAC at 44.1KHz, something would appear to be a bit 'off'!
This statistic therefore counts the number of FLACs with oddly contradictory bit depths and sample rates. If it's 16 bit and sampled higher than 48KHz, it's counted. If it's 24 bit and sampled less than 48KHz, it's also counted. The only way to fix such apparently 'odd' FLACs would be to re-rip them from the source media at more appropriate bit depths and sampling rates (and then do a fresh integrity check).
The last few statistics on the report tend to be 'impressionistic' ones: there's no absolute standard to judge things by and some of them may be matters of taste or personal preference.
The first is the use of embedded album art and whether that artwork is of sufficient quality. Axiom 16 mentions that album art is important and ought to be of sufficient quality so as to aid in music identification and recognition. It mandates it should therefore be of 'decently-large size'.
Quite what counts as 'decently large' is obviously an entirely subjective matter -which is why Niente lets you configure a 'too small' threshold (defaults to 300x300px) and a 'too large' one (defaults to 1400x1400px). This statistic counts the number of FLACs with embedded art which breaches one of those configured conditions. The fix for FLACs caught in this count is to obtain fresh album art of a more appropriate size and to re-embed that new art within your FLACs using suitable tagging tools.
This statistic only appears after a special 'volume boost check' integrity test has been performed. Niente measures the peak loudness detected in each file and then groups by folder to determine if everything in a folder (which corresponds to a single 'work' or recording of a work) could be physically volume boosted, to bring the peak loudness closer to the maximum, theoretical, non-distorting peak loudness of 0dB. In Niente's configuration file, you can set a “Threshold dB for volume boosts”, which defaults to 2dB. That is, anything which could be volume-boosted by less than 2dB will be ignored, but anything that can be boosted by more than 2dB is counted as a 'too quiet' FLAC that could benefit from real volume boosting.
Peak volume boosting involves re-coding the audio signal in a FLAC: lots of people dislike the idea of physically (and irreversibly) messing with the audio ripped from a CD or an SACD. The ReplayGain industry standard for volume boosting is a way of analysing an audio file and measuring its 'perceived loudness' and comparing that to an 'ideal loudness' (of +89dB). If a FLAC can be boosted up (or down) in volume to achieve that ideal loudness, a metadata tag can be written to the FLAC describing what that boost should be. A player can then use that information to dynamically perform a volume boost as the FLAC is being played, without altering the audio signal stored within the FLAC in any way: this is potentially a more desirable way to volume boost, rather than messing with the audio signal physically.
Niente reads these metadata tags if they're present -and counts them for this statistic if they're not. Semplice can perform this ReplayGain analysis and write the necessary tags for you, so that's the fix for any files listed on this report.
When you are searching through your music collection using nothing more than your operating system's File Manager, it would be helpful (perhaps) to know whether a particular file was originally ripped from a standard CD or is a high-resolution FLAC ripped from the likes of an SACD. For that to be possible, however, you need to see the “24 bit, 88.2KHz” or “16-bit, 44.1KHz” audio data in the physical file name.
Semplice can be auto-configured to add these 'markers' into a file name, so that “01-Allegro.flac” becomes, for example, “01-Allegro-16-44100.flac”. Other FLAC taggers can be configured similarly. This statistic then counts the number of FLAC files for which this file naming convention is not true. The fix is to manually re-name the files (and then to perform a new integrity check), or to use better ripping/tagging software that puts the relevant bits of information into the physical file names at the time of ripping/tagging automatically.
When you rip a Beethoven fifth symphony, you probably expect that it will be stored as four separate FLACs, one for each movement. There is no actual requirement for that to be true, however: some rippers will 'merge' multiple tracks into a single 'work'; my own Semplice tool will similarly merge tracks into single 'SuperFLACs' after the ripping is complete.
A single SuperFLAC that contains all the audio belonging to a recorded work can be practically useful: file systems tend to like fewer, larger files than to have to deal with multiple smaller ones. Some music playing software also cannot really handle 'gapless playback' when switching from one track to another: they'll introduce an audible pause or glitch between tracks. Having everything on one file means there is no track switching to worry about, so gapless playback becomes automatic.
For these sorts of reasons, Niente has been coded to believe that if there's more than one FLAC in a folder, you're “doing it wrong”! This statistic therefore counts the number of folders in which multiple FLACs simultaneously exist. The fix for any folders so identified is to use a tool like Semplice to merge the multiple per-track FLACs into a single SuperFLAC.
We've already mentioned the MD5 hash 'fingerprint' of a FLAC's audio signal: change one single bit in that audio signal and the MD5 hash value will change wildly, such is the hash computation algorithm. It therefore follows that if two MD5 hashes are identical, the chances of that happening by happenstance are utterly miniscule. Put another way, if two MD5 hashes are identical, it is almost an absolutely certainty that we're actually looking at identical recordings appearing in the same Niente database.
That is what this statistic records: instances of where the same MD5 hash appears more than once in the database of recordings. It has happened to me when I've re-purchased a recording, having forgotten that I already own it, and on the re-rip, I decide the piece is an orchestral one, rather than a symphonic work. I thus end up with two folders in two different genre sub-folders, but both ripped from the same recording at separate times. This statistic will flag that sort of cataloguing (and purchasing!) mishap.
The fix for this statistic is to pick one of the recordings as 'the winner' and to delete the second and redundant copy. Then do a fresh integrity check!