The problem with training neural networks is always finding the training data. Samples of individual notes are available on the internet. Samples of chords are not available.
There are 16 chord types that are recognized by the Audiophile’s Analyzer. There are 96 notes in 8 octaves. So, it is possible to play 1536 different cords on a piano keyboard. It is not practical to play, record and label so many samples.
To solve this a feature has been added to the CNN tab of the Audiophile’s Analyzer which will produce MIDI files for every possible chord for a selected instrument.
These MIDI files are then converted to .wav audio files using a third-party application. When read by the Audiophile’s Analyzer a spectrogram is produced.
The Training set creation utility is then used to slice the spectrogram to create the training images.
Monochrome images are used to train the CNN, color is for humans.
This utility will recognize the filename format and produce the labels file. Multiple labels will be applied to each file.
The Audiophile’s Analyzer is used to build and train a CNN using the labeled images.
The output layer of the neural network will have 96 outputs, labeled from 0C to 7B. During training the required outputs will be set for each note in the file label.
Training results are recorded:
Using the Audiophile’s Analyzer to transcribe a .wav file:
Version 1.2 now released supports both MIDI file output from a score transcribed from audio, and MIDI file input for analysis and transcription.
To illustrate the MIDI functionality simply I will use a single note played on a bassoon.
Using the metronome this transcribes as:
Saving to MIDI we get these events:
Opening the saved MIDI file we see:
Scoring produces:
Note the missing rest at the end. The MIDI standard has no way of specifically defining a rest. The Audiophile’s Analyzer will infer rests that occur between notes, but this final rest cannot be inferred.