Data from an Investigation of Music Analysis by the Application of Grammar-based Compressor
This dataset is composed of output and result data from various experiments performed on a substantial collection of digital musical scores. It does not contain these scores, but at the time of writing they are publicly available from the following resources:
1. The Acadia Early Music Archive (http://www.acadiau.ca/~gcallon/www/archive/)
2. The Choral Public Domain Library (http://www2.cpdl.org/)
3. Musopen (https://musopen.org/)
4. Music21 (https://web.mit.edu/music21/)
5. KernScores (http://kern.ccarh.org/)
6. The 1850 edition of O‘Neill’s Music Of Ireland (http://trillian.mit.edu/~jc/music/book/oneills/1850)
7. The Meertens Tune Collections (http://www.liederenbank.nl/mtc/)
8. The Johannes Kepler University Patterns Development Database (http://tomcollinsresearch.net/research/data/mirex/JKUPDD-Aug2013.zip)
Digital scores were transformed into specific representations, and compressive models built using various compressors (ZZ, IRR, LZW, BWT, GZIP and COSIATEC). Performance on various tasks was evaluated from the model attributes (primarily model size, measured in symbols). Where possible, model metrics and computed performance figures are included in this dataset.
3.1.1.1. An exhaustive test of sensitivity to an increasing number of errors (for Bach’s Fugue No. 10 from Das Wohltemperierte Clavier Book I)
----------------------------------------------------------------------------------------------------------------------------------------------
Rows: 2-46
Compressor used: ZZ.
Representations used: chromatic and diatonic pitch, chromatic and diatonic intervals, chromatic and diatonic contour, chromatic and diatonic pitch modulo 12, note duration.
Attributes:
Common to all experiments:
representation Type of data taken from each score as input to the compressor.
Compressor output:
The size of the unaltered, compressed model.
Increase in size from initial_model_size as an error is introduced to each position in sequence.
Experiment result:
The average increase in model size over all positions.
Standard deviation within model_size_change.
3.1.1. Sensitivity to point errors
----------------------------------
Rows: 47-260460
Compressors used: ZZ, IRR, LZW, BWT, GZIP, COSIATEC.
Representation used: diatonic intervals.
Attributes:
Common to all experiments:
The changes made to a given position. The experiment is repeated in its entirety for each value.
A list of indices into the input data; an alteration (change_value), representing an error, is made at each index.
Compressor output:
The size of the unaltered, compressed model.
Change in size from initial_model_size when an error is present at exactly one location (from an index in positions_tested).
Only one error is present within the piece at each iteration. One set of size changes exists for each value in change_made.
Experiment result:
The average increase in model size over all positions, for each change made.
Standard deviation within each set of values in model_size_change.
Time taken to perform all compression and measurement operations, in seconds. Times are system-dependent and circumstantial.
3.1.2. Sensitivity to increasing number of errors
-------------------------------------------------
Rows: 260461-501987
Compressors used: ZZ, IRR, LZW, BWT, GZIP, COSIATEC.
Representation used: diatonic intervals.
Attributes:
Common to all experiments:
The changes made to a given position. The experiment is repeated once for each value, but each change is chosen randomly from this list of values.
A list of indices into the input data; an alteration (change_value), representing an error, is made at each index.
Compressor output:
The size of the unaltered, compressed model.
Change in size from initial_model_size when an error is added to a new location (from the indices in positions_tested).
Number of errors within the piece increases at each iteration. One set of size changes exists for each value in change_made, but the actual change at each position is a random selection from change_made.
Experiment result:
The average increase in model size over all positions, for each additional change made.
Standard deviation within each set of values in model_size_change.
Time taken to perform all compression and measurement operations, in seconds. Times are system-dependent and circumstantial.
3.1.3. Automatic selection of candidate Transcription Error Positions
---------------------------------------------------------------------
Rows: 501988-3079962
Compressors used: ZZ, IRR, LZW, BWT, GZIP, COSIATEC.
Representation used: diatonic intervals.
Attributes:
Common to all experiments:
The changes made to a given position. The experiment is repeated in its entirety for each value.
A list of indices into the input data; an alteration (change_value), representing an error, is made at each index.
A list of indices into the input data; an alteration (change_value), representing a possible correction, is made at each index, upon a model containing exactly one alteration (error) at one of each position from the error_positions list.
Compressor output:
The size of a model containing exactly one alteration (representing an error). One model exists for each position in error_positions.
Change in size from initial_model_size when a potential correction is added to a new location (from the indices in positions_tested). For each error, one attempt to correct at all positions_tested is made.
One output set exists for each value in change_made.
Number of true and false positives, chosen by selecting all indices where the "corrected" model is smaller that the model containing only the error.
Number of true positives; when 1, this means an attempt to correct the value at the index of the error resulted in a smaller model size.
Of all sorted unique model sizes occurring at the index of the error, the correct change belongs to the nth group.
When -1, no attempt to correct the error resulted in a smaller model.
Experiment result:
Average F-measure from all attempts to select the correct position of the error.
Average Precision from all attempts to select the correct position of the error.
Average Recall from all attempts to select the correct position of the error.
Average rank from all instances where the correct position of the error was found.
Time taken to perform all compression and measurement operations, in seconds. Times are system-dependent and circumstantial.
3.2.1. Classification of the Meertens Tune Collections by Family
----------------------------------------------------------------
Rows: 3079963-3136586
Compressors used: ZZ, IRR, LZW.
Representation used: chromatic and diatonic pitch, chromatic and diatonic intervals, chromatic and diatonic contour, chromatic and diatonic pitch modulo 12, note duration.
Attributes:
Results (for the given representation):
The size of the unaltered, compressed model for the named piece.
The names of all pieces to whom distance is calculated.
The size of the model resulting from the concatenation [ piece_a piece_b ].
The size of the model resulting from the concatenation [ piece_b piece_a ].
Normalised Compression Formula by the specified formula between piece_a, piece_b.
Normalised Compression Formula by the specified formula between piece_a, piece_b.
Normalised Compression Formula by the specified formula between piece_a, piece_b.
Simple distance calculated as the sum of sizes of models for piece_a and piece_b.
3.3.1. MIREX 2016 Discovery of Repeated Themes & Sections task
--------------------------------------------------------------
Rows: 3136587-3136671
Compressor used: ZZ.
Representation used: diatonic intervals.
Attributes, as defined in the MIREX 2016 Discovery of Repeated Themes and Sections task:
Experiment result:
Number of patterns sought, from ground truth.
Number of patterns identified the by algorithm.
Establishment precision.
Establishment recall.
Establishment F-measure.
Occurrence precision, with detection threshold of 0.75.
Occurrence recall, with detection threshold of 0.75.
Occurrence F-measure, with detection threshold of 0.75.
Three-layer precision.
Three-layer recall.
Three-layer F-measure.
Occurrence precision, with detection threshold of 0.5.
Occurrence recall, with detection threshold of 0.5.
Occurrence F-measure, with detection threshold of 0.5.
Standard precision.
Standard recall.
Standard F-measure.
3.3.2. Structural Analysis of Bach's Well-Tempered Clavier
----------------------------------------------------------
Rows: 3136672-3136719
Compressor used: ZZ
Representation used: diatonic intervals.
Attributes:
Result of matching model segmentation to that specified by S. Bruhn:
Semantic name of defined segment.
Number of the closet-matching rule.
Jaccard Index (intersection over union) for the match.
Voice numbers containing instances of the chosen rule.
Position within the first voice where the first instance of the chosen rule occurs.
Position within the first voice where the first instance of the chosen rule ends.
Research results based upon these data are published at https://doi.org/10.1080/09298215.2021.1978505
Funding
A Study of Music Analysis by Compressive Modelling (2016-10-01 - 2021-12-31); Humphreys, David. Funder: Cardiff University
History
Specialist software required to view data files
NoneLanguage(s) in dataset
- English-Great Britain (EN-GB)