Last updated: 04.05.2026
View on GitHub| Sorted by Year | Sortey by Year (Compact) | Sorted by Key | Related research | Unverified research |
| [Asbert2026] |
Gerard Asbert, Pau Torras, Lei Kang, Alicia Fornés, and Josep Lladós.
GAN-Based Content-Conditioned Generation of Handwritten Musical
Symbols.
In Document Analysis and Recognition - ICDAR 2025
Workshops, Lecture Notes in Computer Science. Springer, Cham, 2026.
[ bib |
DOI |
arXiv ]
The field of Optical Music Recognition (OMR) is hindered by the scarcity of real annotated data, particularly when dealing with handwritten historical musical scores. This study explores the generation of realistic, handwritten-looking music scores by implementing a music symbol-level Generative Adversarial Network (GAN) and assembling its output into a full score using the Smashcima engraving software. The generated symbols exhibit a high degree of realism, marking significant progress in synthetic score generation for OMR.
|
| [Ma2026] |
Menghe Ma, Siqing Wei, Yuecheng Xing, Yaheng Wang, Fanhong Meng, Peijun Han,
Luu Anh Tuan, and Haoran Luo.
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level
Music Intelligence, Apr 2026b.
[ bib |
DOI |
arXiv ]
Omnimodal Notation Processing (ONP) represents a unique frontier for omnimodal AI due to the rigorous, multi-dimensional alignment required across auditory, visual, and symbolic domains. Current research remains fragmented, focusing on isolated transcription tasks that fail to bridge the gap between superficial pattern recognition and the underlying musical logic. This landscape is further complicated by severe notation biases toward Western staff and the inherent unreliability of “LLM-as-a-judge” metrics, which often mask structural reasoning failures with systemic hallucinations. To establish a more rigorous standard, we introduce ONOTE, a multi-format benchmark that utilizes a deterministic pipeline - grounded in canonical pitch projection - to eliminate subjective scoring biases across diverse notation systems. Our evaluation of leading omnimodal models exposes a fundamental disconnect between perceptual accuracy and music-theoretic comprehension, providing a necessary framework for diagnosing reasoning vulnerabilities in complex, rule-constrained domains.
|
| [Ma2026b] |
Junwen Ma, Huhu Xue, Xingyuan Zhao, and Weicheng Fu.
A High-Accuracy Optical Music Recognition Method Based on Bottleneck
Residual Convolutions, Apr 2026a.
[ bib |
DOI |
arXiv ]
Optical Music Recognition (OMR) aims to convert printed or handwritten music score images into editable symbolic representations. This paper presents an end-to-end OMR framework that combines residual bottleneck convolutions with bidirectional gated recurrent unit (BiGRU)-based sequence modeling. A convolutional neural network with ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions is used to extract features that encode both fine-grained symbol details and global staff-line structures. The extracted feature sequences are then fed into a BiGRU network to model temporal dependencies among musical symbols. The model is trained using the Connectionist Temporal Classification loss, enabling end-to-end prediction without explicit alignment annotations. Experimental results on the Camera-PrIMuS and PrIMuS datasets demonstrate the effectiveness of the proposed framework.
|
| [Rigaux2026] | Philippe Rigaux, Bertrand Coüasnon, Christophe Guillotel-Nothmann, Fabien Guilloux, and Aurélie Lemaitre. CollabScore, a collaborative system for OMR validation and correction. Forthcoming, 2026a. [ bib | http ] |
| [Rigaux2026b] | Philippe Rigaux, Bertrand Coüasnon, Christophe Guillotel-Nothmann, Fabien Guilloux, and Aurélie Lemaitre. The collabscore Dataset. Towards Robust and Generalized OMR Evaluation. Forthcoming, 2026b. Dataset and tools available at https://github.com/collabscore/dataset. [ bib | http ] |
| [Romao2026] |
Gustavo Henrique Romão, Hygor Santiago Lara, Jesuliana Nascimento
Ulysses, and Jorge Nei Brito.
A Hybrid Approach to Optical Music Recognition With Object Detection
and Multimodal LLMs.
Revista Eletrônica de Iniciação
Científica em Computação, 24 (1): 134-137, 2026.
ISSN 1519-8219.
[ bib |
DOI |
http ]
This research introduces a hybrid methodology for Optical Music Recognition (OMR), integrating multimodal language models (LLMs) with contemporary object detection approaches. For clef identification, Gemini 2.0 Flash was employed, capitalizing on its visual and contextual interpretation capabilities, while YOLOv8 and YOLOv11 were adopted for processing pitch value and rhythm detection. This task distribution minimizes object detection complexity, enabling YOLO models to concentrate on precise localization and classification of musical symbols. The proposed methodology demonstrated promising outcomes in the task of recognizing digital monophonic scores, with YOLOv11 achieving a mAP50 of 0.995 in the pitch detection network when clef detection is performed through LLMs.
|
| [Villarreal2026] |
Manuel Villarreal, Joan Andreu Sánchez, and Daniel Parres.
Full-page recognition and alignment of historical musical documents.
International Journal on Document Analysis and Recognition
(IJDAR), pages 1-16, Mar 2026.
[ bib |
DOI ]
Optical Music Recognition aims to transcribe musical manuscript images into digital formats by using automatic methods for enhanced accessibility and preservation. This task is challenging for handwritten historical musical pieces from the Late Middle Ages, Early Renaissance, and previous time periods. This music has the interesting characteristic that both musical and lyrical elements are present with an implicit time alignment between them. This paper introduces techniques for simultaneously transcribing the musical and lyrical elements. We research how to automatically obtain the time alignment for an accurate musicological interpretation. Convolutional and Recurrent Neural Networks and Transformer models are explored for holistically transcribing and aligning historical pieces. This paper explores different techniques to improve the training of the models in limited data scenarios. Experiments are conducted on two different datasets from the same time period. Our findings highlight the potential of Transformer models in overcoming the alignment challenge, providing the best alignment capabilities without compromising the quality of transcriptions and offering a promising direction for future research in the automatic recognition of historical musical documents.
|
| [Xu2026] |
Nan Xu, Shiheng Li, and Shengchao Hou.
From Image to Music Language: A Two-Stage Structure Decoding Approach
for Complex Polyphonic OMR, Apr 2026.
[ bib |
DOI |
arXiv ]
We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.
|
| [Li2025] |
Yang Li and Boris Zhang.
Music Notation Recognition Method Based on Improved Generative
Adversarial Network.
Journal of Network Intelligence, 10 (2): 996-1009, May 2025.
ISSN 2414-8105.
[ bib |
.pdf ]
Image fuzzy enhancement is a research hotspot in the field of image processing, which aims to recover enhanced beginning clear images from degraded images. Based on the research of traditional particle swarm optimization algorithm and fuzzy enhancement algorithm, an image fuzzy enhancement method based on membrane computing particle swarm algorithm is proposed. Firstly, in order to make full use of the sparse characteristics of the clear image, the coefficient decomposition under wavelet domain and tightly supported wavelet domain is performed on the image respectively. Then, a joint optimisation model is constructed using L 1 parametric constraints to achieve pretzel noise cancellation. Next, the MMH-PSO algorithm is designed by improving the particle swarm algorithm using membrane computing and Metropolis Hastings sampling. Based on the simulated annealing algorithm temperature drop process,Metropolis Hastings sampling is used to add randomness to the particle swarm algorithm so that it has the ability to jump out of the local optimum. The use of membrane computing enhances the parallelism of the particle swarm algorithm and can reduce the time complexity in solving complex problems. Finally, MMH-PSO is used to simultaneously search out the magnitude of the two fuzzy parameters in the traditional fuzzy enhancement algorithm in order to improve the accuracy of the algorithm. The experimental results show that the proposed algorithm has better S SIM values than the traditional fuzzy enhancement algorithm, which effectively improves the image quality and makes the image edge information more abundant.
|
| [Amezcua2025] |
Alejandro Romero Amezcua and Mariano José Juan Rivera Meraz.
VisionScores - A System-Segmented Image Score Dataset for Deep
Learning Tasks, Jun 2025.
[ bib |
DOI |
arXiv ]
VisionScores presents a novel proposal being the first system-segmented image score dataset, aiming to offer structure-rich, high information-density images for machine and deep learning tasks. Delimited to two-handed piano pieces, it was built to consider not only certain graphic similarity but also composition patterns, as this creative process is highly instrument-dependent. It provides two scenarios in relation to composer and composition type. The first, formed by 14k samples, considers works from different authors but the same composition type, specifically, Sonatinas. The latter, consisting of 10.8K samples, presents the opposite case, various composition types from the same author, being the one selected Franz Liszt. All of the 24.8k samples are formatted as grayscale jpg images of 128 ×512 pixels. VisionScores supplies the users not only the formatted samples but the systems' order and pieces' metadata. Moreover, unsegmented full-page scores and the pre-formatted images are included for further analysis.
|
| [Asbert2025] |
Gerard Asbert, Pau Torras, Lei Kang, Alicia Fornés, and Josep Lladós.
GAN-Based Content-Conditioned Generation of Handwritten Musical
Symbols, Oct 2025.
[ bib |
DOI |
arXiv ]
Optical Music Recognition (OMR) systems face significant challenges when dealing with handwritten historical musical scores, largely due to the scarcity of real annotated data. This work explores the generation of realistic, handwritten-looking scores by implementing a music symbol-level Generative Adversarial Network (GAN) and assembling its output into a full score using the Smashcima engraving software. The generated symbols exhibit a high degree of realism, marking significant progress in synthetic score generation.
|
| [CervetoSerrano2025] |
Joan Cerveto-Serrano, David Rizo, and Jorge Calvo-Zaragoza.
kernpy: a Humdrum **Kern Oriented Python Package for Optical
Music Recognition Tasks.
In Proceedings of the Music Encoding Conference, London,
United Kingdom, 2025. Knowledge Commons.
[ bib |
DOI ]
kernpy is a Python package that provides comprehensive tools for working with symbolic modern and mensural notations in Humdrum format. The package facilitates Optical Music Recognition (OMR) tasks by providing utilities for loading, creating, and manipulating **kern files, supporting both modern and mensural music notations.
|
| [Della2025] |
Michele Della Ventura.
Automatic Correction of Symbolic Musical Text During Optical Music
Recognition.
In Proceedings of the Fourth International Conference on
Innovations in Computing Research (ICR'25), Lecture Notes in Networks and
Systems, pages 97-107. Springer Nature Switzerland, 2025.
ISBN 978-3-031-95651-5.
[ bib |
DOI ]
Automatic message correction is a ubiquitous feature of modern optical recognition systems, suggesting possible solutions to eliminate noise in a message transmitted by the user. Efficiency is crucial to ensure that the system has a realtime responsiveness when operating with different writing codes such as letters and/or numbers. Previous works were based on the use of a trie data structure for fast prefix-search operations even if this method is not always accurate as only completions that are prefixed by the query are returned. This paper aims to describe a method for correcting a symbolic music text and discuss its efficiency effectiveness in relation to other possible approaches. The solution is based on the use of Shannon and Weaver information theory to reconstruct the message, a direction not yet explored in the literature. The solution is based on eliminating noise by inserting musical notes into the musical phrase that allow it to have a low entropy value, corresponding to a high information value. The method has been tested on tonal polyphonic compositions and has given encouraging results. Future improvements of the method and possible applications in other practical areas are briefly discussed at the end of the paper.
|
| [Jung2025] |
Jongmin Jung, Dongmin Kim, Sihun Lee, Seola Cho, Hyungjoon Soh, Irmak Bukey,
Chris Donahue, and Dasaem Jeong.
U-MusT: A Unified Framework for Cross-modal Translation of Score
Images, Symbolic Music, and Performance Audio.
In Proceedings of the LLM4Music Workshop at the 26th
International Society for Music Information Retrieval Conference (ISMIR
2025), Sep 2025.
[ bib |
DOI |
http ]
Traditional Music Information Retrieval (MIR) tasks like Optical Music Recognition (OMR) and Automatic Music Transcription (AMT) typically rely on specialized, single-task models. We challenge this paradigm by proposing a unified framework that trains a single Transformer on multiple cross-modal translation tasks simultaneously. Our approach is enabled by two key contributions: a novel large-scale dataset (YTSV) with over 1,300 hours of paired score-image and audio data, and a unified tokenization scheme that converts all music modalities into a common sequence format. Experiments show our multitask model significantly outperforms specialized baselines, reducing the OMR symbol error rate from 24.58% to a state-of-the-art 13.67%. Most notably, our framework achieves the first successful end-to-end generation of audio directly from a score image, marking a significant breakthrough in cross-modal music understanding and generation.
|
| [Kim2025] |
Dongmin Kim, Danbinaerin Han, Dasaem Jeong, and Jose J. Valero-Mas.
On the Automatic Recognition of Jeongganbo Music Notation: Dataset
and Approach.
ACM Journal on Computing and Cultural Heritage, 18 (3):
52:1-52:21, 2025.
[ bib |
DOI ]
The Jeongganbo notation, the first music representation system in East Asia capable of jointly expressing pitch and duration, has been extensively used-and still is-in the Korean music tradition since its inception in the 15th century. In this regard, there exists a plethora of music works that exclusively endure as physical sheets, which not only constitutes a heritage preservation challenge due to the inherent degradation of this format but also impedes the use of computational tools to study and exploit this music tradition. While the Optical Music Recognition (OMR) field has addressed this issue in a number of music notations from the Western tradition, no previous research has considered the preservation of Jeongganbo scores. This work presents: (i) the first data assortment of real Jeongganbo scores for OMR tasks; (ii) a collection of synthetic data generation and augmentation mechanisms to alleviate the scarcity of manual annotation; and (iii) a neural-based transcription scheme based on state-of-the-art OMR strategies specifically tailored to Jeongganbo scores.
|
| [Marselletti2025] |
Andrea Marselletti, Elia Pacioni, Francisco Fernández de Vega, and Davide
Calvaresi.
Decentralized Optical Music Recognition Using YOLO and FedGP for
Music Education.
In Proceedings of the IEEE International Conference on
Metrology for eXtended Reality, Artificial Intelligence, and Neural
Engineering (MetroXRAINE 2025), 2025.
[ bib |
http ]
This work explores privacy-aware Optical Music Recognition (OMR) by coupling a YOLOv9c detector with FedGP, a genetic-programming-based aggregation strategy tailored to highly non-IID client distributions. The research casts OMR as the end-to-end transcription of printed and handwritten four-part harmony scores into structured MusicXML. A hybrid corpus of 1,810 page images with 112,024 bounding-box annotations across 166 symbol classes is assembled and deliberately partitioned in a non-IID manner among ten virtual clients for federated training.
|
| [MartinezSevilla2025] |
Juan C. Martinez-Sevilla, Francesco Foscarin, Patricia Garcia-Iasci, David
Rizo, Jorge Calvo-Zaragoza, and Gerhard Widmer.
Optical Music Recognition of Jazz Lead Sheets.
In Proceedings of the 26th International Society for Music
Information Retrieval Conference, pages 696-702, 2025.
[ bib |
DOI |
arXiv ]
This paper addresses Optical Music Recognition (OMR) for handwritten jazz lead sheets, a widely used musical score type that encodes melody and chords. The task is challenging due to the presence of chords, a score component not handled by existing OMR systems, and the high variability and quality issues associated with handwritten images. The authors present a novel dataset consisting of 293 handwritten jazz lead sheets of 163 unique pieces, amounting to 2021 total staves aligned with Humdrum **kern and MusicXML ground truth scores.
|
| [Mayer2025] |
Jiří Mayer, Filip Jebavý, Markéta
Herzánová Vlková, and Martina Dvořáková.
MuNG Studio: Annotation Tool for Music Notation Graph.
In Proceedings of the 12th International Conference on Digital
Libraries for Musicology, DLfM '25, New York, NY, USA, Sep 2025.
Association for Computing Machinery.
ISBN 979-8-4007-1636-1.
[ bib |
DOI ]
MuNG Studio is a new annotation tool for the Music Notation Graph (MuNG) format, a high-detail graphical annotation format designed for Optical Music Recognition (OMR) tasks, originally proposed for the MUSCIMA++ dataset. The original MUSCIMarker tool is now obsolete and impossible to install; MuNG Studio provides an easy-to-install web-based viewer and editor for the MuNG format with the goal of expanding and supporting the growing ecosystem around MuNG.
|
| [OlivaBulpitt2025] |
Samuel B. Oliva-Bulpitt, Juan P. Martinez-Esteso, Alejandro Galán-Cuenca,
Francisco J. Castellanos, and Antonio Javier Gallego.
Enhancing Music Score Analysis with Monte Carlo Dropout: A
Probabilistic Approach to Staff-Region Detection.
International Journal on Document Analysis and Recognition
(IJDAR), 28: 441-456, 2025.
ISSN 1433-2825.
[ bib |
DOI ]
Layout Analysis (LA) is a critical process for detecting and isolating different components within a scanned document, allowing for more straightforward and precise processing of each part independently.In Optical Music Recognition(OMR), LA is essential for identifying and extracting music staves,which enables effective music notation recognition and processing. While the literature includes several studies exploring methods for staff retrieval, there remains room for improvement in terms of robustness and accuracy.In this work,we introduce a methodology that integrates Monte Carlo Dropout(MCD)into a neural network model in order to improve reliability in staff retrieval from scanned sheet music.Our approach leverages multiple non deterministic predictions using standard dropout layers during inference and aggregates them through pixel level combination policies.We extend the MCD technique,originally designed for classification and regression tasks using averaged predictions,to the LA task and introduce new combination strategies:maximum and voting criteria.Experiments on three diverse musicscore corpora, including printed and handwritten documents, demonstrated the effectiveness of our approach.The averaging and voting(with 25 and 50 of votes)criteria reduced the relative error by 63.6 compared to the baseline and achieved a 32.1 improvement overstate of the art methods.Our methodology notably enhanced detection accuracy without requiring modifications to the neural architecture, especially at the edges of staves, where conventional models tend to show higher error rates.
|
| [Prionggo2025] |
Dyatmika Mahardhi Hari Prionggo and Ericko Tanuwijaya.
Rule-Based Pitch Inference in Optical Music Recognition on Polyphonic
Scores using YOLOv12.
Jurnal Teknologi dan Manajemen Informatika, 2025.
[ bib |
http ]
This paper presents a rule-based pitch inference method for Optical Music Recognition (OMR) applied to polyphonic scores, using YOLOv12 for symbol detection. The system detects and classifies music notation objects and applies rule-based logic to infer pitch values from detected noteheads, accounting for clef, key signature, and staff position.
|
| [Repolusk2025] |
Tristan Repolusk and Eduardo Veas.
KuiSCIMA v2.0: Improved Baselines, Calibration, and Cross-Notation
Generalization for Historical Chinese Music Notations in Jiang Kui's
Baishidaoren Gequ.
In Document Analysis and Recognition - ICDAR 2025, Lecture
Notes in Computer Science, Cham, 2025. Springer Nature Switzerland.
[ bib |
DOI ]
This paper extends the KuiSCIMA dataset to include all 109 pieces from Baishidaoren Gequ, encompassing suzipu, lülüpu, and jianzipu notations. Improved baselines reduce the Character Error Rate (CER) from 10.4% to 7.1% for suzipu despite 77 highly imbalanced classes, and achieve a CER of 0.9% for lülüpu. The models outperform human transcribers, with an average human CER of 15.9%.
|
| [RiosVila2025] |
Antonio Rios-Vila, Eliseo Fuentes-Martinez, and Francisco J. Castellanos.
An implicit layout-aware transformer for full-page end-to-end optical
music recognition.
International Journal of Multimedia Information Retrieval, 14:
34, 2025.
ISSN 2192-6670.
[ bib |
DOI ]
This paper presents the Layout-Aware Sheet Music Transformer, a novel method based on a single-step transcription strategy that leverages transformer attention mechanisms to extract layout information without explicit annotations. The approach addresses the challenge that recent single-step OMR approaches lose layout information from the original image, which is crucial for accurately interpreting musical documents in practical applications.
|
| [Sanjurjo2025] | Antonio Sanjurjo Rodríguez. Optical Music Recognition (OMR) Tool Based on Machine Learning and Computer Vision Techniques. Bachelor's thesis (TFG), Universidade da Coruña, Jun 2025. [ bib | http ] |
| [Saynganthone2025] |
Hritik Saynganthone.
An Examination of High-Entropy Alternatives of Connectionist Temporal
Classification Loss for Optical Music Recognition Using Convolutional
Recurrent Neural Networks.
MS thesis, Rochester Institute of Technology, Rochester, NY, USA,
Dec 2025.
[ bib |
http ]
The Connectionist Temporal Classification (CTC) loss function is the most commonly used loss function in the field of Optical Music Recognition (OMR). However, OMR suffers from a massive class imbalance problem, exacerbated by the fact that CTC loss is subject to the spiky distribution problem, wherein the blank token introduced by CTC is vastly overpredicted and appears in timesteps where it would make more sense to predict a non-blank token, since CTC will collapse repeated tokens into a single token. This work posits that alternative loss functions to CTC that optimize for an increase in entropy of the prior probability distribution output of the model will lead to better generalization and lower error rates. The three main loss functions tested are FocalCTC, SR-CTC, and EnCTC, each of which optimize for increased entropy for different aspects of the estimated prior distribution. Experiments are conducted on all three. Both FocalCTC and EnCTC show an improvement over baseline CTC.
|
| [Wu2025] |
Jinghao Wu and Wei Guo.
Autoregressive ConvNeXt-Transformer Fusion Framework for Polyphonic
Optical Music Recognition with Focal Loss Optimization.
Acoustical Science and Technology, 2025.
ISSN 1346-3969.
[ bib |
DOI |
http ]
This paper proposes the ConvNeXt-Transformer Fusion (CNTF) framework, an autoregressive end-to-end neural network employing an image-to-sequence architecture optimized for automated transcription of intricate musical scores. It integrates a ConvNeXt-based encoder for sheet music feature extraction and a Transformer-based decoder that generates transcription sequences through autoregressive prediction. Focal Loss optimization is implemented to address class imbalance during training. Experimental results demonstrate state-of-the-art performance in polyphony-rich score recognition.
|
| [CalvoZaragoza2024] |
Jorge Calvo-Zaragoza, Eliseo Fuentes-Martínez, Noelia Luna-Barahona, and
Antonio Ríos-Vila.
Can multimodal large language models read music score images?
In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors,
Proceedings of the 6th International Workshop on Reading Music
Systems, pages 4-6, Online, 2024.
[ bib |
DOI |
http ]
This paper investigates whether multimodal large language models (MLLMs), which combine visual and textual understanding, can effectively read and interpret music score images. Given their ability to process and integrate information from multiple modalities, MLLMs present a promising approach for Optical Music Recognition (OMR). Through empirical evaluation, we demonstrate that while MLLMs exhibit potential in recognizing musical structures, challenges remain in addressing the complexity of music notation. This work highlights the need for further refinements in ML
|
| [Castellanos2024] |
Francisco J. Castellanos, Juan P. Martinez-Esteso, Alejandro Galán-Cuenca,
and Antonio Javier Gallego.
A Region-Based Approach for Layout Analysis of Music Score Images in
Scarce Data Scenarios.
In Document Analysis and Recognition - ICDAR 2024: 18th
International Conference, Athens, Greece, August 30-September 4, 2024,
Proceedings, Part IV, volume 14807 of Lecture Notes in Computer
Science, pages 58-75. Springer, 2024.
ISBN 978-3-031-70545-8.
[ bib |
DOI ]
This work presents a novel region-based layout analysis (LA) method for Optical Music Recognition (OMR) systems, aimed at overcoming the data scarcity challenge. Contemporary OMR techniques, grounded in machine learning principles, have a critical requirement: a labeled dataset for training. This presents a practical challenge due to the extensive manual effort required, coupled with the fact that the availability of suitable data for creating training sets is not always guaranteed. Unlike other approaches, our method focuses on adapting the training and sample extraction processes within an existing neural network framework. Our approach incorporates a labeled data-driven oversampling technique, a masking layer to enable training with partial labeling, and an adaptive scaling process to improve results for varying score sizes. Through comprehensive experimentation, we established the minimal labeled data necessary for an effective model and demonstrated that our method could achieve a performance comparable with the state-of-the-art with just 8 to 32 labeled samples.
|
| [Coueasnon2024] |
Bertrand Coüasnon, Mathieu Giraud, Christophe Guillotel Nothmann,
Aurélie Lemaitre, and Philippe Rigaux.
CollabScore project - From Optical Recognition to Multimodal Music
Sources.
In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors,
Proceedings of the 6th International Workshop on Reading Music
Systems, pages 33-37, Online, 2024.
[ bib |
DOI |
http ]
The CollabScore project is funded by the French National Research Agency and is devoted to the design and production of tools and methods to improve access to large collections of sheet music scans. The OMR approach developed in CollabScore is part of a larger goal of interlinking multimodal documents related to music works.
|
| [Dvorak2024] | Vojtěch Dvořák, Jan jr. Hajič, and Jiří Mayer. Staff Layout Analysis Using the YOLO Platform. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 18-22, Online, 2024. [ bib | DOI | http ] |
| [FuentesMartinez2024] |
Eliseo Fuentes-Martínez, Antonio Ríos-Vila, Juan C. Martinez-Sevilla,
David Rizo, and Jorge Calvo-Zaragoza.
Aligned Music Notation and Lyrics Transcription, Dec 2024.
[ bib |
DOI |
arXiv ]
The digitization of vocal music scores goes beyond traditional Optical Music Recognition (OMR) and Optical Character Recognition (OCR), as it necessitates preserving the critical alignment between music notation and lyrics. This paper introduces and formalizes, for the first time, the Aligned Music Notation and Lyrics Transcription (AMNLT) challenge, which addresses the complete transcription of vocal scores by jointly considering music symbols, lyrics, and their synchronization. Four datasets of Gregorian chants are introduced, comprising both real and synthetic sources, along with custom metrics to assess both transcription and alignment accuracy.
|
| [Graczyk2024] |
Stanislaw Graczyk, Zuzanna Piniarska, Mateusz Kalamoniak, Tomasz
Lukaszewski, and Ewa Lukasik.
An Online Tool for Semi-Automatically Annotating Music Scores for
Optical Music Recognition.
In Proceedings of the 11th International Conference on Digital
Libraries for Musicology, pages 73-77, New York, NY, USA, Jun 2024.
Association for Computing Machinery.
[ bib |
DOI ]
We describe OMRAT, an online tool for semi-automatic annotation of music scores for Optical Music Recognition (OMR) systems. OMRAT uses deep neural networks, machine learning, and music notation ontologies at different stages to respectively detect musical objects, establish relationships between them, and convert them into a machine-readable format MEI. A human editor verifies the output of the recognition stage to correct potential errors and remove incorrect labels as needed.
|
| [Hartelt2024] | Alexander Hartelt and Frank Puppe. OMMR4all revisited - a Semiautomatic Online Editor for Medieval Music Notations. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 46-49, Online, 2024. [ bib | DOI | http ] |
| [Harvanova2024] |
Kristýna Harvanóvá.
Postprocessing syntetických notopisů v kontextu jejich
rozpoznávání.
Master's thesis, Charles University, Faculty of Mathematics and
Physics, 2024.
[ bib |
http ]
This work focuses on improving training data synthesis methods for the Optical Music Recognition (OMR) task. The study concentrates on creating realistic, colored, and degraded images of musical scores (postprocessing). These degraded data are generated from synthetic, purely black-and-white images. After applying postprocessing methods, the musical scores closely mimic physical documents, thereby enhancing the quality of training data for OMR models. The proposed postprocessing methods were tested on object detection tasks, specifically recognizing various types of musical symbols. Experiments demonstrated that all proposed methods positively impact the resulting OMR model, with the greatest benefit coming from the generation of synthetic backgrounds for musical scores.
|
| [Hristov2024] |
Christo Hristov and Maddox de Bretteville.
Vision Transformers for Optical Music Recognition of Monophonic
Scores, 2024.
[ bib |
.pdf ]
This work explores a purely transformer-based approach to Optical Music Recognition (OMR), employing a pretrained Vision Transformer (ViT) alongside a transformer decoder to generate musical symbol sequences. The model incorporates an explicitly defined semantic musical vocabulary tailored for the transformer encoder-decoder architecture.
|
| [Lambertye2024] | Grégoire de Lambertye and Alexander Pacha. Semantic Reconstruction of Sheet Music with Graph-Neural Networks. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 12-17, Online, 2024. [ bib | DOI | http ] |
| [Lin2024] |
Yuheng Lin, Zheqi Dai, and Qiuqiang Kong.
MusicScore: A Dataset for Music Score Modeling and Generation, Jun
2024.
[ bib |
DOI |
arXiv ]
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are mainly designed for optical music recognition (OMR). There is a lack of research on creating a large-scale benchmark dataset for music modeling and generation. In this work, we propose MusicScore, a large-scale music score dataset collected and processed from the International Music Score Library Project (IMSLP). MusicScore consists of image-text pairs, where the image is a page of a music score and the text is the metadata of the music. The metadata includes rich information about the composer, instrument, piece style, and genre of the music pieces. MusicScore is curated into small, medium, and large scales of 400, 14k, and 200k image-text pairs with varying diversity, respectively. We build a score generation system based on a UNet diffusion model to generate visually readable music scores conditioned on text descriptions to benchmark the MusicScore dataset for music score generation.
|
| [Mayer2024] |
Jiří Mayer, Milan Straka, Jan Hajič, and Pavel Pecina.
Practical End-to-End Optical Music Recognition for Pianoform Music.
In Document Analysis and Recognition - ICDAR 2024, volume
14809 of Lecture Notes in Computer Science, pages 55-73, Cham, 2024.
Springer.
ISBN 9783031705519.
[ bib |
DOI |
arXiv ]
The majority of recent progress in Optical Music Recognition (OMR) has been achieved with Deep Learning methods, especially models following the end-to-end paradigm, reading input images and producing a linear sequence of tokens. Unfortunately, many music scores, especially piano music, cannot be easily converted to a linear sequence. This has led OMR researchers to use custom linearized encodings, instead of broadly accepted structured formats for music notation. Their diversity makes it difficult to compare the performance of OMR systems directly. To bring recent OMR model progress closer to useful results: (a) We define a sequential format called Linearized MusicXML, allowing to train an end-to-end model directly and maintaining close cohesion and compatibility with the industry-standard MusicXML format. (b) We create a dev and test set for benchmarking typeset OMR with MusicXML ground truth based on the OpenScore Lieder corpus. They contain 1,438 and 1,493 pianoform systems, each with an image from IMSLP. (c) We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model. We also test our model against the recently published synthetic pianoform dataset GrandStaff and surpass the state-of-the-art results.
|
| [MenarguezBox2024] |
Aitana Menárguez-Box, Alejandro H. Tosselli, and Enrique Vidal.
Enhanced User-Machine Interaction for Historical Sheet Music
Retrieval: a Musical Notation Approach.
In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors,
Proceedings of the 6th International Workshop on Reading Music
Systems, pages 28-32, Online, 2024.
[ bib |
DOI |
http ]
symbols (positions within the staves) which had little of interaction. Leveraging a web piano interface, users can now to do with musical notation.This made the search process less input queries using real musical notes, enhancing both usability intuitive and precise, especially for users who were familiar and accuracy. Even though previous works have already explored this kind of with music theory. implementation, it has only been tested in already transcribed Here we present an enhanced user-machine interaction apand digitalized sheet music. Our approach, based on fully au- p roach for historical sheet music retrieval that enables users to tomatic Probabilistic Indexing (PrIx) of a manuscript, addresses input queries using pitch-relative symbols (musical notation) the intricacies inherent in historical scores, including variations within the same web-based search engine. Our approach in clef types and positions, to transform musical queries into complex Boolean geometric expressions. By integrating these leverages a web piano interface that allows users to input enhancements into an existing search engine, we provide re- queries using real notes within the staff, making the search searchers with a more accessible and efficient means of exploring process more intuitive and precise. We also address the comvast collections of historical sheet music. plexities in historical scores, such as variations in clef types This paper underscores the significance of user-machine interand positions, by transforming musical queries into complex action improvements in facilitating meaningful discoveries and insights in musicology and historical research. Boolean geometric expressions that can be used to search for
|
| [Nugroho2024] |
Douglas Rakasiwi Nugroho and Amalia Zahra.
Musical Note Position and Duration Recognition Model in Optical Music
Recognition Using Convolutional Neural Network.
Journal of Image and Graphics, 12 (1): 32-39, 2024.
[ bib |
DOI |
.html ]
The study aims to solve Optical Music Recognition (OMR) problems using a non-End-to-End (non-E2E) approach, with separate models for Position Recognition (PR) and Duration Recognition (DR) constructed using Convolutional Neural Networks (CNN). The PR and DR models achieved accuracies of 97.88% and 99.23%, respectively.
|
| [Penarrubia2024] |
Carlos Penarrubia, Jose J. Valero-Mas, and Jorge Calvo-Zaragoza.
Contrastive Self-Supervised Learning for Optical Music Recognition.
In Document Analysis Systems: 16th IAPR International
Workshop, DAS 2024, Athens, Greece, August 30-31, 2024, Proceedings,
volume 14994 of Lecture Notes in Computer Science, pages 275-289.
Springer, 2024.
ISBN 978-3-031-70441-3.
[ bib |
DOI ]
Optical Music Recognition (OMR) is the research area focused on transcribing images of musical scores. In recent years, this field has seen great development thanks to the emergence of Deep Learning. However, these types of solutions require large volumes of labeled data. To alleviate this problem, Contrastive Self-Supervised Learning (SSL) has emerged as a paradigm that leverages large amounts of unlabeled data to train neural networks, yielding meaningful and robust representations. In this work, we explore its first application to the field of OMR. By utilizing three datasets that represent the heterogeneity of musical scores in notations and graphic styles, and through multiple evaluation protocols, we demonstrate that contrastive SSL delivers promising results, significantly reducing data scarcity challenges in OMR. To the best of our knowledge, this is the first study that integrates these two fields.
|
| [Repolusk2024] | Tristan Repolusk and Eduardo Veas. Semi-Automatic Annotation of Chinese Suzipu Notation Using a Component-Based Prediction and Similarity Approach. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 38-42, Online, 2024. [ bib | DOI | http ] |
| [RiosVila2024] | Antonio Ríos-Vila, Eliseo Fuentes-Martinez, and Jorge Calvo-Zaragoza. Towards Sheet Music Information Retrieval: A Unified Approach Using Multitask Transformers. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 7-11, Online, 2024. [ bib | DOI | http ] |
| [Romao2024] |
Gustavo Henrique Romão, Henrique Santos Lara, and João Neto Brito.
Testing YOLOv8's Efficacy as a Pitch and Duration Detector Across
Digitally Written Monophonic Music Scores.
Observatorio de la Economía Latinoamericana, 22 (9):
1-16, 2024.
ISSN 1696-8352.
[ bib |
DOI |
http ]
This study uses the Ultralytics YOLOv8 algorithm to detect, classify, and reconstruct digitally written monophonic music scores from the Printed Images of Music Staves (PrIMuS) dataset. The approach achieves mAP50-95 scores above 85%, demonstrating the efficacy of YOLOv8 as a pitch and duration detector for digitally written monophonic music scores.
|
| [RoselloPedraza2024] |
Adrián Roselló Pedraza.
Técnicas de adaptación al dominio libres de origen para
transcripción de partituras.
Master's thesis, Universidad de Alicante, 2024.
[ bib |
http ]
This master's thesis focuses on the development of a music recognition system that can work effectively for any type of score, investigating source-free domain adaptation techniques for optical music score transcription. This work received an honorable mention (accésit) in the 2024 Arquímedes University Competition of the Ministry of Science, Innovation and Universities of Spain.
|
| [RoselloPedraza2024a] |
Adrián Roselló Pedraza, Eliseo Fuentes-Martínez, María
Alfaro-Contreras, David Rizo, and Jorge Calvo-Zaragoza.
Source-Free Domain Adaptation for Optical Music Recognition.
In Document Analysis and Recognition - ICDAR 2024: 18th
International Conference, Athens, Greece, August 30-September 4, 2024,
Proceedings, Part VI, volume 14809 of Lecture Notes in Computer
Science, pages 1-14. Springer, 2024.
ISBN 978-3-031-70551-9.
[ bib |
DOI ]
This work addresses the problem of Domain Adaptation (DA) in the context of staff-level end-to-end Optical Music Recognition. Specifically, we consider a source-free DA approach to adapt a given trained model to a new collection-an extremely useful scenario for preserving musical heritage. The method involves re-training the pre-trained model to align the statistics stored from the original data in normalization layers with those of the new collection, while also including a regularization mechanism to prevent the model from converging to undesirable solutions. Unlike conventional DA techniques, this approach is very efficient and practical, as it only requires the pre-trained model and unlabeled data from the new collection, without relying on data from the original training collections (i.e., source-free). Evaluation of diverse music collections in Mensural notation and a synthetic-to-real scenario of common Western modern notation demonstrates consistent improvements over the baseline.
|
| [Setyo2024] |
Ciara Setyo and Gede Putra Kusuma.
Recognition of Music Symbol Notation Using Convolutional Neural
Network.
International Journal of Electrical and Computer Engineering,
14 (2): 2055-2067, Apr 2024.
ISSN 2088-8708.
[ bib |
DOI |
http ]
Musical notation is one thing that needs to be learned to play music. This notation has an important role in music because it can help in visualizing instructions for playing musical instruments and singing. Unfortunately, musical symbols that are commonly written in musical notation are difficult for beginners who have just started learning music. This research proposed a solution to create an optical music recognition (OMR) using a deep learning model to classify musical notes more accurately with some of the latest convolutional neural network (CNN) architectures. The research was carried out by implementing vision transformer (ViT), CoAtNet-0, and ConvNeXtTiny architecture. The training process was also combined with data augmentation to provide more information for the model to learn. The best accuracy for the Andrea dataset is 98.15% and for the Attwenger dataset is 98.43%.
|
| [Shatri2024] |
Elona Shatri and George Fazekas.
Knowledge Discovery in Optical Music Recognition: Enhancing
Information Retrieval with Instance Segmentation, 2024.
[ bib |
DOI |
arXiv ]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI, significantly reducing the costs and time of manual transcription. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music. Unlike Optical Character Recognition (OCR), OMR must handle the intricate semantics of Common Western Music Notation (CWMN), where symbol meanings depend on shape, position, and context. Our approach leverages instance segmentation to manage the density and overlap of musical symbols, facilitating more precise information retrieval from music scores. Evaluations on the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with our method achieving a mean Average Precision (mAP) of up to 59.70% in dense symbol environments.
|
| [Tirupati2024] | Nivesara Tirupati, Elona Shatri, and György Fazekas. Crafting Handwritten Notations: Towards Sheet Music Generation. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 50-56, Online, 2024. [ bib | DOI | http ] |
| [Torras2024] |
Pau Torras, Sanket Biswas, and Alicia Fornés.
A Unified Representation Framework for the Evaluation of Optical
Music Recognition Systems.
International Journal on Document Analysis and Recognition, 27
(3): 379-393, 2024a.
[ bib |
DOI |
arXiv ]
Modern-day Optical Music Recognition (OMR) is a fairly fragmented field. Most OMR approaches use datasets that are independent and incompatible between each other, making it difficult to both combine them and compare recognition systems built upon them. In this paper we identify the need of a common music representation language and propose the Music Tree Notation (MTN) format, with the idea to construct a common endpoint for OMR research that allows coordination, reuse of technology and fair evaluation of community efforts. This format represents music as a set of primitives that group together into higher-abstraction nodes, a compromise between the expression of fully graph-based and sequential notation formats. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
|
| [Torras2024b] | Pau Torras, Sanket Biswas, and Alicia Fornés. On Designing a Representation for the Evaluation of Optical Music Recognition Systems. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 23-27, Online, 2024b. [ bib | DOI | http ] |
| [Tuggener2024] |
Lukas Tuggener, Raphael Emberger, Adhiraj Ghosh, Pascal Sager, Yvan Putra
Satyawan, Javier Montoya, Simon Goldschagg, Florian Seibold, Urs Gut, Philipp
Ackermann, Jürgen Schmidhuber, and Thilo Stadelmann.
Real World Music Object Recognition.
Transactions of the International Society for Music Information
Retrieval, 7 (1): 1-14, 2024.
[ bib |
DOI ]
Music object recognition-the task of detecting and classifying notation symbols in images of music scores-is a core challenge in Optical Music Recognition (OMR). While significant progress has been made on synthetic and clean datasets, performance on real-world scanned and photographed scores remains limited. This paper presents a large-scale study of music object recognition under real-world conditions, introducing new benchmarks and evaluating state-of-the-art object detection models.
|
| [Umbreit2024] | Janosch Umbreit and Silvana Schumann. OMR on Early Music Sources at the Bavarian State Library with MuRET - Prototyping, Automating, Scaling. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 6th International Workshop on Reading Music Systems, pages 43-45, Online, 2024. [ bib | DOI | http ] |
| [Villarreal2024] |
Manuel Villarreal and Joan Andreu Sánchez.
Enhancing Recognition of Historical Musical Pieces with Synthetic and
Composed Images.
In Document Analysis and Recognition - ICDAR 2024: 18th
International Conference, Athens, Greece, August 30-September 4, 2024,
Proceedings, Part III, volume 14806 of Lecture Notes in Computer
Science, pages 74-90. Springer, 2024.
ISBN 978-3-031-70542-7.
[ bib |
DOI ]
Handwritten Music Recognition (HMR) poses the problem of transcribing historical musical pieces from digital image to text. The vast number of untranscribed pieces, together with the scarcity of manually annotated training data renders the manual transcription impractical. Historical musical pieces of particular interest are those dating back to the XVth century and earlier, available only in their original manuscripts. Current state-of-the-art approaches leverage Convolutional and Recurrent Neural Networks (CRNN) due to their effectiveness in processing information without relying on extensive datasets. This paper addresses the data scarcity challenge in HMR by proposing two approaches. Firstly, the utilization of synthetic images to augment the training data, leveraging its successful applications in Handwritten Text Recognition (HTR). Secondly, the paper advocates for image composition, combining the images from a manuscript page to mitigate the contextual limitations associated with single-line processing.
|
| [Xia2024] |
Yingqi Xia and Ying Zhao.
Research on Music Symbol Recognition Model Based on YOLOv8s.
In Fourth International Conference on Computer Graphics, Image,
and Virtualization (ICCGIV 2024), 2024.
[ bib |
DOI ]
This paper introduces YOLOv8 object detection algorithms into music symbol recognition and proposes an improved model named YOLO-Score based on YOLOv8s. The model incorporates SPD-Conv into the backbone feature network, an LSK selective attention mechanism, a redesigned detection layer with a small target detection branch, and Shape-IoU as the bounding box regression loss function. Experimental results show an 11.2% increase in precision, a 33.0% increase in recall, and a 26.6% increase in mAP compared to baseline YOLOv8s.
|
| [Yang2024] |
Guang Yang, Muru Zhang, Lin Qiu, Yanming Wan, and Noah A. Smith.
Toward a More Complete OMR Solution, 2024.
[ bib |
DOI |
arXiv ]
Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we focus on the MUSCIMA++ v2.0 dataset, which represents musical notation as a graph with pairwise relationships among detected music objects, and we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
|
| [Yesilkanat2024] | Ali Yesilkanat, Yann Soullard, Bertrand Coüasnon, and Nathalie Girard. Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores. In Document Analysis Systems - DAS 2024, Lecture Notes in Computer Science, pages 327-343, Cham, 2024. Springer. ISBN 9783031704413. [ bib | DOI ] |
| [Yu2024] |
Ping Yu and HaiLing Chen.
Deep Multilevel Cascade Residual Recurrent Framework (MCRR) for
Sheet Music Recognition.
IEEE Access, 12: 6941-6960, 2024.
[ bib |
DOI |
http ]
Sheet music recognition is a vital technology aimed at converting printed or handwritten musical scores into digital or machine-readable formats. The significance of this technology lies in making music compositions more accessible for editing, performance, learning, and sharing, thereby fostering music education, composition, and culture. It also provides a powerful tool for music analysis, research, and preservation. Our aim is to investigate a sheet music recognition method that offers a simple workflow, high recognition accuracy, and fast model convergence. Specifically, the proposed Deep Multilevel Cascade Residual Recurrent (MCRR) framework for sheet music recognition consists of the following components. Firstly, we introduce additive Gaussian white noise, additive Perlin noise, and elastic deformations such as rotation and stretching to simulate real-world noise in the sheet music images, thereby augmenting the dataset, enhancing model robustness, and mitigating overfitting. Secondly, in the feature extraction phase, we employ a residual Convolutional Neural Network (ConvNet) to address the issue of model degradation and use the multilevel cascade fusion technique to obtain comprehensive feature information, improving the model’s feature extraction capability and reducing recognition errors. For note recognition, we use a variant of RNN (Recurrent Neural Network) called SRU (Simple Recurrent Unit), which transforms most computations into parallel processing, speeding up model convergence. Finally, we combine the Connectionist Temporal Classification (CTC) loss function with SRU to eliminate the requirement for strict alignment between data and labels, enabling note classification and recognition. Extensive ablation experiments and comparative analyses, including visual analysis, intuitive illustrations, and quantitative assessments, confirm the effectiveness of the proposed method, demonstrating its superiority over various state-of-the-art methods. The proposed method achieved promising results in both the PrIMus and Camera-PrIMuS datasets. Specifically, in the PrIMus dataset, the method obtained an SeER (Symbol Error Rate) of 1.4571% and a SyER (System Error Rate) of 0.3234%. Notably, it demonstrated high accuracy in pitch, type, and note recognition, scoring approximately 97% in pitch and type accuracy and around 94% in note accuracy. The training time per epoch was relatively low, recorded at 0.56 seconds. In the case of the Camera-PrIMuS dataset, the method achieved slightly lower but still competitive results. It exhibited an SeER of 5.1488% and a SyER of 1.0612%, with pitch and type accuracies around 90%, and note accuracy at approximately 88%. The training time per epoch was slightly higher at 1.93 seconds Furthermore, we compare our method with existing commercial software, namely Capella-scan, PhotoScore, and SmartScore. Among these, Capella-scan delivers the best performance but exhibits lower robustness compared to the proposed method.
|
| [RosVila2023] |
Antonio Ríos-Vila, David Rizo, José Manuel Iñesta, and Jorge
Calvo-Zaragoza.
End-to-End Optical Music Recognition for Pianoform Sheet Music.
International Journal on Document Analysis and Recognition,
26: 347-362, May 2023.
[ bib |
DOI ]
End-to-end solutions have brought about significant advances in the field of Optical Music Recognition. These approaches directly provide the symbolic representation of a given image of a musical score. Despite this, several documents, such as pianoform musical scores, cannot yet benefit from these solutions since their structural complexity does not allow their effective transcription. This paper presents a neural method whose objective is to transcribe these musical scores in an end-to-end fashion. We also introduce the GrandStaff dataset, which contains 53,882 single-system piano scores in common western modern notation. The sources are encoded in both a standard digital music representation and its adaptation for current transcription technologies. The method proposed in this paper is trained and evaluated using this dataset. The results show that the approach presented is, for the first time, able to effectively transcribe pianoform notation in an end-to-end manner.
|
| [AlfaroContreras2023] | María Alfaro-Contreras. Few-Shot Music Symbol Classification via Self-Supervised Learning and Nearest Neighbor. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 39-43, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Ayllon2023] |
Eva Ayllon, Francisco J. Castellanos, and Jorge Calvo-Zaragoza.
A Weakly-Supervised Approach for Layout Analysis in Music Score
Images.
In Pattern Recognition and Image Analysis: 11th Iberian
Conference, IbPRIA 2023, volume 14062 of Lecture Notes in Computer
Science, pages 170-181. Springer, Jun 2023.
[ bib |
DOI ]
In this paper,we propose a data efficient holistic method for layout analysis in the context of Optical Music Recognition(OMR).Our approach can be trained by just providing the number of staves present in the document collection at issue (weak label), thereby making it practical for real use cases where other fine grained annotations are expensive. We consider a Convolutional Recurrent Neural Network trained with the Connection is t Temporal Classification loss function,which must retrieve a pretext sequence that encodes the number of staves per page. As a by product,the model learns to relate every image row according to the presence or not of a staff. We demonstrate that our approach achieves performances close to the full supervised scenario on two OMR benchmarks, according to the eventual performance of the full transcription pipeline.We believe that our work will be useful for researchers working on music score recognition,and will open up new avenues for research in this field. · ·
|
| [CalvoZaragoza2023] |
Jorge Calvo-Zaragoza, Juan C. Martínez-Sevilla, Carlos Peñarrubia, and
Antonio Ríos-Vila.
Optical Music Recognition: Recent Advances, Current Challenges, and
Future Directions.
In Document Analysis and Recognition - ICDAR 2023
Workshops, volume 14193 of Lecture Notes in Computer Science, pages
94-104. Springer, Aug 2023.
ISBN 978-3-031-41497-8.
[ bib |
DOI ]
Optical Music Recognition (OMR) is an interdisciplinary field that aims to automate the process of transcribing sheet music into a digital format. Over the past few years, significant progress has been made in developing OMR systems that can recognize musical symbols with high accuracy.However,completing the pipeline of OMR remains a challenging endeavor due to the complexity and variability of music notation, and there are several open challenges that need to be addressed. In this position paper, we provide an overview of the current state-ofthe-art in OMR through the two main lines of research. We include the problems that have been recently addressed and the techniques that have been considered.We then identify the key challenges that remain,suchas learning to reconstruct the music notation, recognizing multiple voices, or dealing with artifacts such as lyrics.Finally,we suggest some possible directions for future research.We argue that addressing these challenges is crucial to making OMR a more practical and useful tool for musicians, scholars, and librarians alike.
|
| [Castellanos2023] | Francisco J. Castellanos, Antonio Javier Gallego, and Ichiro Fujinaga. A Preliminary Study of Few-shot Learning for Layout Analysis of Music Scores. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 44-48, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Ferano2023] |
Francisco Calvin Arnel Ferano, Amalia Zahra, and Gede Putra Kusuma.
Stacking Ensemble Learning for Optical Music Recognition.
Bulletin of Electrical Engineering and Informatics, 12 (5):
3095-3104, Oct 2023.
[ bib |
DOI |
http ]
Article history: The development of music culture has resulted in a problem called optical music recognition (OMR). OMR is a task in computer vision that explores the Received Nov 2, 2022 algorithms and models to recognize musical notation. This study proposed the Revised Dec 12, 2022 stacking ensemble learning model to complete the OMR task using the Accepted Jan 29, 2023 common western musical notation (CWMN) musical notation. The ensemble learning model used four deep convolutional neural networks (DCNNs) models, namely ResNeXt50, Inception-V3, RegNetY-400MF, and
|
| [Fujinaga2023] | Ichiro Fujinaga and Gabriel Vigliensoni. Optical Music Recognition Workflow for Medieval Music Manuscripts. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 4-6, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Hajic2023] | Jan jr. Hajič, Petr Žabička, Jan Rychtář, Jiří Mayer, Martina Dvořáková, Filip Jebavý, Markéta Vlková, and Pavel Pecina. The OmniOMR Project. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 12-14, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Hande2023] | Pranjali Hande, Elona Shatri, Benjamin Timms, and György Fazekas. Towards Artificially Generated Handwritten Sheet Music Datasets. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 25-30, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Havelka2023] | Jonáš Havelka, Jiří Mayer, and Pavel Pecina. Symbol Generation via Autoencoders for Handwritten Music Synthesis. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 20-24, Milan, Italy, 2023. [ bib | DOI | http ] |
| [He2023] |
Ruichen He and Junfeng Yao.
End-to-End Optical Music Recognition with Attention Mechanism and
Memory Units Optimization.
In Pattern Recognition and Computer Vision - PRCV 2023,
volume 14432 of Lecture Notes in Computer Science, pages 400-411,
Singapore, Dec 2023. Springer.
[ bib |
DOI ]
Optical Music Recognition (OMR) is a research field aimed at exploring how computers can read sheet music in music documents. In this paper, we propose an end-to-end OMR model based on memory units optimization and attention mechanisms, named ATTML. Firstly, were place the origin alL STM memory unit with a better Mo gri fier L STM memory unit,which enables the input and hidden states to interact fully and obtain better context-related expressions. Meanwhile, the decoder part is augmented with the ECA attention mechanism, enabling the model to better focus on salient features and patterns present in the input data.We use the existing excellent music datasets,PrIMuS,Doremi,and Deepscores, for joint training. Ablation experiments were conducted in our study with the incorporation of diverse attention mechanisms and memory optimization units.Furthermore,we used the musical score density metric, SnSl, to measure the superiority of our model over others, as well as its performance specifically in dense musical scores. Comparative and ablation experiment results show that the proposed method outperforms previous state-of-the-art methods in terms of accuracy and robustness. · ·
|
| [Janmohamed2023] | Nashir A. Janmohamed. Musical Form Reconstruction in Printed and Handwritten Lead Sheets via Optical Recognition of Chord Symbols. Honors undergraduate thesis, University of Central Florida, 2023. [ bib | http ] |
| [Li2023] |
Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, and Peng Li.
TrOMR: Transformer-Based Polyphonic Optical Music Recognition.
In ICASSP 2023 - 2023 IEEE International Conference on
Acoustics, Speech and Signal Processing, pages 1-5. IEEE, 2023.
[ bib |
DOI |
arXiv ]
Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility1.
|
| [Lou2023] |
Fengbin Lou, Yaling Lu, and Guangyu Wang.
Design of a Semantic Understanding System for Optical Staff Symbols.
Applied Sciences, 13 (23): 12627, Nov 2023.
[ bib |
DOI |
http ]
Symbolic semantic understanding of staff images is an important technological support to achieve “intelligent score flipping”. Due to the complex composition of staff symbols and the strong semantic correlation between symbol spaces, it is difficult to understand the pitch and duration of each note when the staff is performed. In this paper, we design a semantic understanding system for optical staff symbols. The system uses the YOLOv5 to implement the optical staff’s low-level semantic understanding stage, which understands the pitch and duration in natural scales and other symbols that affect the pitch and duration. The proposed note encoding reconstruction algorithm is used to implement the high-level semantic understanding stage. Such an algorithm understands the logical, spatial, and temporal relationships between natural scales and other symbols based on music theory and outputs digital codes for the pitch and duration of the main notes during performances. The model is trained with a self-constructed SUSN dataset. Experimental results with YOLOv5 show that the precision is 0.989 and that the recall is 0.972. The system’s error rate is 0.031, and the omission rate is 0.021. The paper concludes by analyzing the causes of semantic understanding errors and offers recommendations for further research. The results of this paper provide a method for multimodal music artificial intelligence applications such as notation recognition through listening, intelligent score flipping, and automatic performance.
|
| [MartinezSevilla2023] | Juan Carlos Martinez-Sevilla and Francisco J. Castellanos. Towards Music Notation and Lyrics Alignment: Gregorian Chants as Case Study. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 15-19, Milan, Italy, 2023. [ bib | DOI | http ] |
| [MartinezSevilla2023b] | Juan Carlos Martínez-Sevilla. SheetVision. Aplicación para el reconocimiento óptico de música. Master's thesis, Universitat d'Alacant (University of Alicante), Jun 2023. [ bib | http ] |
| [Penarrubia2023] |
Carlos Peñarrubia, Carlos Garrido-Muñoz, Jose J. Valero-Mas, and Jorge
Calvo-Zaragoza.
Efficient Notation Assembly in Optical Music Recognition.
In Proceedings of the 24th International Society for Music
Information Retrieval Conference, pages 182-189, Milan, Italy, Nov 2023.
[ bib |
.pdf ]
Notation assembly is the last stage of a modular Optical Music Recognition (OMR) pipeline, responsible for converting the detected musical objects into a structured notation format. This work presents an efficient approach for notation assembly that leverages graph-based representations and heuristic rules to reconstruct the musical content from detected symbols.
|
| [Repolusk2023] | Tristan Repolusk and Eduardo Veas. The Suzipu Musical Annotation Tool for the Creation of Machine-Readable Datasets of Ancient Chinese Music. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 7-11, Milan, Italy, 2023. [ bib | DOI | http ] |
| [RiosVila2023] | Antonio Ríos-Vila. Rotations Are All You Need: A Generic Method For End-To-End Optical Music Recognition. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 34-38, Milan, Italy, 2023. [ bib | DOI | http ] |
| [Vania2023] |
Stella Vania, Patrick Sutanto, Ricky Sutanto, and Joan Santoso.
Ekstraksi Partitur Balok Monofonik untuk Instrumen Flute dengan
CRNN dan CRF.
INSYST: Journal of Intelligent System and Computation, 5
(1): 01-09, Apr 2023.
[ bib |
DOI |
http ]
Notasi partitur balok bukanlah notasi yang mudah dibaca oleh pemula dalam dunia musik. Di sinilah Optical Music Recognition (OMR) dapat berperan. OMR merupakan sebuah pembelajaran mengenai komputer yang dapat mengenali objek dalam partitur balok. Dengan adanya program yang menerapkan OMR dan memberikan output dengan format yang mudah dipahami oleh pengguna, maka pemula dalam dunia musik dapat terbantu dalam membaca partitur not balok. Karya ilmiah ini dibuat dengan pendekatan deep learning dalam beberapa arsitektur. Dataset yang digunakan adalah Camera-PrIMuS yang terdiri dari dataset gambar sebaris partitur musik dan juga ground-truth per objek pada gambar yang bersangkutan. Arsitektur yang digunakan adalah CRNN, CRNN-CRF, dan Attention. Dari ketiga arsitektur tersebut, hasil terbaik diperoleh pada aristektur Attention dengan symbol error rate (SER) sekitar 9%, diikuti dengan CRNN dengan SER sekitar 84%, dan CRNN-CRF yang berdasarkan hasil uji coba tidaklah cocok untuk OMR dengan nilai loss yang tidak kunjung turun dalam proses training. Arsitektur Attention secara garis besar terdiri dari blok encoder dan decoder. Encoder berfungsi untuk menerima input gambar dan melakukan encoding terhadap gambar tersebut. Hasil encoding kemudian diterima oleh decoder yang berperan untuk melakukan decoding dan memprediksi sequence selanjutnya berdasarkan hasil encoding dari encoder. Dalam implementasinya program dapat menerima input berupa gambar selembar partitur penuh yang agak miring, maka program juga akan melakukan skew-correction dan pemotongan gambar per baris agar input dari pengguna dapat diproses oleh model. Output dari model yang masih berupa label-label prediksi akan diproses kembali agar menghasilkan not angka dan file MIDI yang relatif lebih mudah untuk dipahami oleh pengguna.
|
| [Villarreal2023] |
Manuel Villarreal and Joan Andreu Sánchez.
Synchronous Recognition of Music Images Using Coupled N-Gram
Models.
In Proceedings of the ACM Symposium on Document Engineering
2023, pages 1-9. ACM, Aug 2023.
[ bib |
DOI ]
Handwritten music recognition researches the use of technologies to automatically transcribe handwritten music pieces that are only found in image format, and make them available to the general public. Many historical music pieces are composed by a music part and a lyrics part. Handwritten music recognition has focused mainly on transcribing the music elements in historical images, but there exist many pieces where both music and lyrics are present and of relevance. The recognition of both music and lyrics is generally carried out as separate tasks. Both parts are synchronized in many historical documents at line level and loosely at word level. These two elements are strongly related having each one affecting the other. Discovering this relation may be very relevant to improve recognition results in both parts and to further steps like music analysis, composition analysis, etc. This paper introduces a preliminary system that transcribes synchronously and simultaneously both the music and lyrics elements of handwritten historical music images. The results obtained over a historical manuscript dataset show that this system obtains an improvement of up to 15.4% at symbol rate on stave recognition and up to an approximately average 7.6% improvement when both the music and lyrics part are jointly considered.
|
| [Zhang2023] | Zihui Zhang, Elona Shatri, and György Fazekas. Improving Sheet Music Recognition using Data Augmentation and Image Enhancement. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 5th International Workshop on Reading Music Systems, pages 31-33, Milan, Italy, 2023. [ bib | DOI | http ] |
| [AlfaroContreras2022] |
María Alfaro-Contreras, Antonio Ríos-Vila, José J. Valero-Mas,
José M. Iñesta, and Jorge Calvo-Zaragoza.
Decoupling music notation to improve end-to-end Optical Music
Recognition.
Pattern Recognition Letters, 158: 157-163, 2022.
ISSN 0167-8655.
[ bib |
DOI |
http ]
This paper exploits the two-dimensional nature of music notation symbols to improve end-to-end OMR. Two CRNN schemes simultaneously exploit shape and height information of notation elements, merging them at the neural level. Three integration policies are evaluated, with the InterRNN approach-merging after the first recurrent layer-producing significantly better recognition rates than the baseline.
|
| [Barbosa2022] |
JA Barbosa and EB dos Santos.
Convolutional Neural Networks and Ensemble Methods to Identify
Musical Elements in Optical Music Recognition.
Revista Eletrônica de Iniciação Científica
em Computação, 2022.
[ bib |
http ]
Optical Music Recognition (OMR) is an important tool to recognize a scanned page of music sheet automatically, which has been applied to preserving music scores. In this paper, we present a comparative study among a Convolutional Neural Network (CNN) architecture, named CREATES, and Ensemble Learning methods, such as Random Forest and XGBoost, to classify musical symbols. The initial results show that CREATES is promising in this task and outperforms ensemble methods on the HOMUS dataset. However, CNN require more computing power.
|
| [Baro2022] |
Arnau Baró, Pau Riba, and Alicia Fornés.
Musigraph: Optical Music Recognition Through Object Detection and
Graph Neural Network.
In Proceedings of the 18th International Conference on
Frontiers in Handwriting Recognition (ICFHR), volume 13639 of
Lecture Notes in Computer Science, pages 171-184. Springer, 2022.
[ bib |
DOI ]
During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2 dimensional nature of music notation (e.g.notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated / atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results.
|
| [Beltrn2022] | V Beltrán, M Coustaty, R Agresta, and A Doucet. Weighting Sliding Tiles for Writer Identification in Handwritten Musical Scores. In 2022 IEEE International Conference on Image Processing (ICIP), 2022. [ bib | DOI | http ] |
| [De2022] |
FF Fernández de Vega, J Alvarado, and J Villegas Cortez.
Optical Music Recognition and Deep Learning: An Application to
4-Part Harmony.
In 2022 IEEE Congress on Evolutionary Computation (CEC),
2022.
[ bib |
DOI |
http ]
Optical Music Recognition (OMR) applied to hand-written scores is widely recognized as a hard real-world problem. The number of different symbols that must be recognized in a score, such as the key, time signatures, tempo, dynamics, notes, alterations, duration, etc, as well as the different meanings some symbols may embody depending on the position in the score, such as a quarter that may mean notes C, D, E, …, makes OMR much harder challenge than Optical Character Recognition (OCR), particularly when dealing with handwritten scores for Computational Intelligence (CI) methods. This paper addresses this hard problem using deep learning based approaches, specif-ically Mask R-CNN, in a specific context: music students that write their scores in a ruled paper notebook when learning 4-part harmony. Preliminary results show that high accuracy levels are obtained, both during training+validation and also during tests, and this allows us to foresee new tools for students that could be combined with available CI methods for 4-part harmony learning.
|
| [Egozy2022] | Eran Egozy and Ian Clester. Computer-Assisted Measure Detection in a Music Score-Following Application. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 33-36, Online, 2022. [ bib | DOI | http ] |
| [GarridoMunoz2022] | Carlos Garrido-Munoz, Antonio Ríos-Vila, and Jorge Calvo-Zaragoza. End-to-End Graph Prediction for Optical Music Recognition. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 25-28, Online, 2022. [ bib | DOI | http ] |
| [Gut2022] | Urs Gut. Confidence-Rated Predictions from Deep Learning Ensembles for Music Object Detection, 2022. Student thesis, ZHAW School of Engineering, Zurich University of Applied Sciences. Contributed to the RealScore project on optical music recognition for real-world sheet music. [ bib ] |
| [Hartelt2022] |
Alexander Hartelt and Frank Puppe.
Optical Medieval Music Recognition Using Background Knowledge.
Algorithms, 15 (7): 221, Jun 2022.
ISSN 1999-4893.
[ bib |
DOI |
http ]
This paper presents an OMR pipeline for transcribing medieval, monophonic, handwritten music from the 12th-14th century. Various types of background knowledge about overlapping notes and text, clefs, graphical connections (neumes), and their implications for note position on the staff are incorporated and evaluated within a deep learning framework.
|
| [Jacquemard2022] | Florent Jacquemard, Lydia Rodriguez-de la Nava, and Martin Digard. Automated Transcription of Electronic Drumkits. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 37-41, Online, 2022. [ bib | DOI | http ] |
| [Li2022] |
Na Li.
Generative Adversarial Network for Musical Notation Recognition
during Music Teaching.
Computational Intelligence and Neuroscience, 2022: 1-9, 2022.
ISSN 1687-5265.
[ bib |
DOI ]
This paper improves a generative adversarial network (GAN) to enhance the recognition accuracy and efficiency of music short scores for automated music notation teaching. Using an embedded matching structure based on adversarial neural networks, the system unifies generators and discriminators from the note input side, achieving improved recognition of printed music notation.
|
| [Liang2022] |
Mingheng Liang.
Music Score Recognition and Composition Application Based on Deep
Learning.
Mathematical Problems in Engineering, 2022: 1-9, Jun 2022.
ISSN 1024-123X.
[ bib |
DOI ]
This paper proposes a deep learning-based music score recognition model that accepts a complete score image as input and outputs note time values and pitch directly. The model addresses low note recognition accuracy in prior work and achieves note identification accuracy of 0.95 for time values and 0.97 for pitch.
|
| [Mayer2022] | Jiří Mayer and Pavel Pecina. Obstacles with Synthesizing Training Data for OMR. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 15-19, Online, 2022. [ bib | DOI | http ] |
| [Moss2022] | Fabian C. Moss, Néstor Nápoles López, Maik Köster, and David Rizo. Challenging sources: a new dataset for OMR of diverse 19th-century music theory examples. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 4-8, Online, 2022. [ bib | DOI | http ] |
| [Paul2022] |
Ashis Paul, Rishav Pramanik, Samir Malakar, and Ram Sarkar.
An ensemble of deep transfer learning models for handwritten music
symbol recognition.
Neural Computing and Applications, 34 (13): 10409-10427,
2022.
[ bib |
DOI ]
In ancient times,there was no system to record or document music.A basic notation system to write European music was formulated around 14 th century in the Baroque period which slowly evolved into the standard notation system that we have today.Later,the musical pieces from the classical and post classical period of European music were documented as scores using this standard European staff notations.These notations are used by most of the modern genres of music due to their versatility.Hence,it is very important to develop a method that can store such music sheets containing handwritten music scores digitally.Optical music recognition(OMR)is a system that automatically interprets the scanned handwritten music scores.In this work,we have proposed a class i fier ensemble of deep transfer learning models with support vector machine (SVM) as the aggregator for handwritten music symbol recognition. We have applied three pre-trained deep learning models,name lyRes Net 50,Google Net and Dense Net 161(each trained on Image Net),and fine tuned on our target datasets i.e., music symbolimage datasets. The proposed ensemble technique can capture a more complex association of the base classifiers, thus improving the overall performance. We have evaluated the proposed model on five publicly available standard datasets, namely Handwritten Online Music Symbols (HOMUS), Capitan_Score_Uniform, Capitan Score Non uniform, Rebelo_real and Forne´s, and achieved state-of-the-art results for all these datasets. Additionally, we have evaluated our model on publicly available two non-music symbols datasets, namely CMATERdb 2.1.2 containing 120 handwritten Bangla city names and CMATERdb 3.1.1 dataset containing handwritten Bangla numerals to validate its effectiveness on diversified datasets. The source code of this present work is available at https://github.com/ashis0013/ Music Symbol Recognition.
|
| [Penarrubia2022] | Carlos Penarrubia, Carlos Garrido-Muñoz, Jose J. Valero-Mas, and Jorge Calvo-Zaragoza. Efficient Approaches for Notation Assembly in Optical Music Recognition. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 29-32, Online, 2022. [ bib | DOI | http ] |
| [RiosVila2022] | Antonio Ríos-Vila, Jose M. Iñesta, and Jorge Calvo-Zaragoza. End-To-End Full-Page Optical Music Recognition of Monophonic Documents via Score Unfolding. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 20-24, Online, 2022. [ bib | DOI | http ] |
| [RosVila2022] | A Ríos-Vila, JM Iñesta, and J Calvo-Zaragoza. End-to-End Full-Page Optical Music Recognition for Mensural Notation. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, pages 226-232, 2022. [ bib | DOI | .pdf ] |
| [Santamaria2022] |
Gonzalo Santamaría, Carlos Domínguez, Jónathan Heras, Eloy J. Mata,
and Vico Pascual.
Combining Image Processing Techniques, OCR, and OMR for the
Digitization of Musical Books.
In Document Analysis Systems, volume 13237 of Lecture
Notes in Computer Science, pages 553-567. Springer, Cham, 2022.
ISBN 978-3-031-06554-5.
[ bib |
DOI ]
Digitizing historical music books is challenging because staves are usually mixed with typewritten text. This paper proposes a methodology using image processing to detect and organize text and stave blocks, then applying OCR and OMR methods respectively, with information stored in MusicXML format. The methodology was applied to digitize the book “The Music in the Santo Domingo's Cathedral”, yielding F1-score of 90% for symbol detection and 98.4% pitch accuracy.
|
| [Song2022] |
Y Song, Y Shen, P Ding, X Zhang, X Shi, and Y Xue.
Optical Music Recognition Based Deep Neural Networks.
In Signal and Information Processing, Networking and Computers
(ICSINC 2021), volume 895 of Lecture Notes in Electrical
Engineering. Springer, 2022.
[ bib |
DOI ]
In this paper,it is proposed that an approach for the task of optical music recognition(OMR)is based on deep neural networks.Recognition of notation is a very important analysis content of optical score recognition. In order to workout the dilemma of complex processing and low recognition accuracy of OMR,the design based on YO LOv 5 which is a peer to peer printed music recognition model is proposed. Multi-task learning is taken by the deep learning model,such as tasks and considerations for pitch and duration classification,to improve the abstraction capability.With the introduced augmentations, 97 of pitch recognition accuracy and 86 of duration accuracy are obtained and note recognition accuracy is higher than other music score recognition models. Finally, when competed with traditional methods,the experimental verification shows that the method is promising.
|
| [Torras2022] | Pau Torras, Arnau Baró, Lei Kang, and Alicia Fornés. Improving Handwritten Music Recognition through Language Model Integration. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, Online, 2022. [ bib | DOI | http ] |
| [Walwadkar2022] | Dnyanesh Walwadkar, Elona Shatri, Benjamin Timms, and György Fazekas. CompIdNet: Sheet Music Composer Identification using Deep Neural Network. In Jorge Calvo-Zaragoza, Alexander Pacha, and Elona Shatri, editors, Proceedings of the 4th International Workshop on Reading Music Systems, pages 9-14, Online, 2022. [ bib | DOI | http ] |
| [Wen2022] |
C Wen and L Zhu.
A Sequence-to-Sequence Framework Based on Transformer with Masked
Language Model for Optical Music Recognition.
IEEE Access, 10: 118243-118252, 2022.
[ bib |
DOI |
http ]
Optical music recognition technology is of great significance in the development of the digital music. In recent years, the convolutional recurrent neural network framework with connectionist temporal classification has been used in music recognition. However, its loss function is calculated in serial mode, which leads to low efficiency in training and difficulty in convergence. Additionally, because of the gradient disappearance of excessive long music sequences, the existing music recognition models are hard to learn the relationships between musical symbols, resulting in high sequence error rate. Therefore, we propose a sequence-to-sequence framework based on transformer with masked language model to deal with these problems. The context representation between musical symbols can be captured further by the self-attention module in the transformer, which will reduce the sequence error rate. In addition, we refer to the masked language model and design a mask matrix to predict each musical symbol in a parallel way, so as to speed up the training process. Our experiments are carried out on the printed images of music stave dataset, and the results show that our proposed method is training-efficient and has great improvement in sequence accuracy rate.
|
| [Xiao2022] |
Z Xiao, X Chen, and L Zhou.
A Multi-layer Image Operator Learning Based on Sample Structure for
Staff Lines Removal.
Applied Intelligence, 53: 8436-8452, 2022.
[ bib |
DOI ]
The removal of staff lines is the most significant step to separate notes from the score images in optical music recognition (OMR). However, musical images are often affected by different deformations, and it is difficult to delete the staff lines completely without affecting the integrity of the notes. A novel multi-layer image operator learning algorithm based on sample structure is proposed in this paper to solve the problem of staff lines deletion.Our algorithm is dedicated to obtain the structural characteristics of staff lines via image operator learning.Firstly,an iterative strategy is proposed to update the distribution of the samples for learning multiple image operators with different sample structure features.Further,basedon the learned image operators,a multi layer image operators network is designed to obtain the optimal combination of multiple operators.Finally,we have verified the feasibility of our algorithm on the data set 2013 ICD AR GRE C staff lines removal competition.The experiment shows that the proposed algorithm is robust against many kinds of deformation.Moreover,our algorithm is more competitive by comparing with state of the art algorithms.
|
| [AlfaroContreras2021] | María Alfaro-Contreras, Jose J. Valero-Mas, and José Manuel Iñesta. Neural architectures for exploiting the components of Agnostic Notation in Optical Music Recognition. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 33-37, Alicante, Spain, 2021. [ bib | http ] |
| [Baro2021] | Arnau Baró, Carles Badal, Pau Torras, and Alicia Fornés. Handwritten Historical Music Recognition through Sequence-to-Sequence with Attention Mechanism. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 55-59, Alicante, Spain, 2021. [ bib | http ] |
| [Bhumichitr2021] |
Kiratijuta Bhumichitr, Menh Keo, and Aung Khant Oo.
Musical Pitch Alphabets Generator Using Haar-like Feature.
In 2021 18th International Joint Conference on Computer Science
and Software Engineering (JCSSE), pages 1-5. IEEE, Jul 2021.
[ bib |
DOI |
http ]
Optical Music Recognition (OMR) has become a study trend with the increasing demand for digital sheet music. In this paper, we explore techniques and algorithms to implement optical music recognition. This paper aims to encourage people who just begin and enjoy learning object detection by using a simple and comprehensible framework called Haar-like Feature to detect the music notation. Furthermore, it also assists beginner musicians who have a difficult time in memorizing the music theory and rules by generating musical alphabets. The paper will include the process of how to generate the cascade classifier model and how to imply them to detect the target object.
|
| [Braae2021] |
David O. Braae and Thomas R. Rusbjerg.
DETR for Combined Object Detection and Notation Assembly in
Optical Music Recognition.
Master's thesis, Aalborg University, 2021.
[ bib |
.pdf ]
This thesis investigates the use of DETR (Detection Transformer) for combined object detection and notation assembly in optical music recognition, proposing an end-to-end transformer-based approach that simultaneously detects music objects and predicts their relationships.
|
| [Castellanos2021] | Francisco J. Castellanos and Antonio-Javier Gallego. Unsupervised Neural Document Analysis for Music Score Images. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 50-54, Alicante, Spain, 2021. [ bib | http ] |
| [Echevarria2021] |
Daniel Echevarría Naharro.
Sistema de reconocimiento de partituras musicales y generación de
archivos sonoros.
Master's thesis, Universitat Autònoma de Barcelona, Escola
d'Enginyeria, 2021.
[ bib |
http ]
Final degree project (TFG) implementing an optical music recognition system using a Sequence-to-Sequence model that recognizes musical scores from images and generates audio output files directly, without intermediate steps. Directed by Alicia Fornés Bisquerra.
|
| [Edirisooriya2021] |
Sachinda Edirisooriya, Hao-Wen Dong, Julian McAuley, and Taylor
Berg-Kirkpatrick.
An Empirical Evaluation of End-to-End Polyphonic Optical Music
Recognition.
In Proceedings of the 22nd International Society for Music
Information Retrieval Conference, pages 167-173, Nov 2021.
[ bib |
DOI |
arXiv ]
Previous work has shown that neural architectures are able to perform optical music recognition (OMR) on monophonic and homophonic music with high accuracy. However, piano and orchestral scores frequently exhibit polyphonic passages,which add a second dimension to the task. Monophonic and homophonic music can be described as homorhythmic, or having a single musical rhythm. Polyphonic music, on the other hand, can be seen as having multiple rhythmic sequences, orvoices, concurrently. We first introduce a workflow for creating large-scale polyphonic datasets suitable for end-to-end recognition from sheet music publicly available on the MuseScore forum. Figure1. Examples of theM use Score Polyphonic Data set We then propose two novel formulations for end-to-end (MSPD)and its hard subset(M SPD Hard)—(top)aneasy polyphonic OMR—one treating the problem as a type of excerpt in MSPD and (bottom) three excerpts that can be multi-task binary classification, and the other treating it found in both M SPD and M SPD Hard. as multi-sequence detection. Building upon the encoderdecoder architecture and an image encoder proposed in ture,time signature,ortempo? What if a user had a score, past work on end to end OMR,we propose two novel debut wanted to transpose it to a different key? Manually doc oder models Flag Decoder andR NN Decoder that coring all of the annotation required for these demands when respond to the two formulations. Finally,we compare the uploading sheet music scans would be impractical,andthis empirical performance of these end-to-end approaches to is where optical music recognition(OMR)canshine. polyphonic OMR and observe a new state-of-the-art per Over the past few years,data driven approaches too pti form ance with our multi sequence detection decoder,RNc al music recognition have become attractive ways to solve NDecoder. theproblem. The improvement in the accuracy of systems built using these tools is very exciting, however they are
|
| [Fuente2021] | Carlos de la Fuente, Jose J. Valero-Mas, Francisco J. Castellanos, and Jorge Calvo-Zaragoza. Multimodal Audio and Image Music Transcription. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 18-22, Alicante, Spain, 2021. [ bib | http ] |
| [Furukawa2021] |
Erick Seiji Furukawa and Hélio Pedrini.
Leitura de partituras em imagens digitais.
Technical Report IC-PFG-20-26, Instituto de Computação,
Universidade Estadual de Campinas (UNICAMP), Jan 2021.
[ bib |
.pdf ]
Final graduation project (Projeto Final de Graduação) studying optical music recognition systems and developing a reader for musical scores in digital images, with the goal of converting score images into computer-readable musical file formats.
|
| [Kletz2021] | Marc Kletz and Alexander Pacha. Detecting Staves and Measures in Music Scores with Deep Learning. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 8-12, Alicante, Spain, 2021. [ bib | http ] |
| [LpezGutirrez2021] |
Juan Carlos López-Gutiérrez, José J. Valero-Mas, Francisco J.
Castellanos, and Jorge Calvo-Zaragoza.
Data Augmentation for End-to-End Optical Music Recognition.
In Document Analysis and Recognition - ICDAR 2021
Workshops, volume 12916 of Lecture Notes in Computer Science, pages
59-73. Springer, Sep 2021.
[ bib |
DOI ]
Optical Music Recognition (OMR) is the research area that studies how to transcribe the content from music documents into a structured digital format.Within this field,techniques based on Deep Learning represent the current state of the art. Nevertheless, their use is constrained by the large amount of labeled data required,which constitutes a relevant issue when dealing with historical manuscripts.This drawback can be palliated by means of Data Augmentation (DA), which encompasses a series of strategies to increase data without the need of manual labeling new images. This work studies the applicability of specific DA techniques in the context of end-to-end staff-level OMR methods. More precisely, considering two corpora of historical music manuscripts, we applied different types of distortions to the music scores and assessed their contribution in an end-to-end system. Our results show that some transformations are much more appropriate than others,leading up to a 34.5% of relative improvement with respect to scenario without DA. · ·
|
| [MasCandela2021] | Enrique Mas-Candela and María Alfaro-Contreras. Sequential Next-Symbol Prediction for Optical Music Recognition. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 13-17, Alicante, Spain, 2021. [ bib | http ] |
| [Mayer2021] |
Jiří Mayer and Pavel Pecina.
Synthesizing Training Data for Handwritten Music Recognition.
In Document Analysis and Recognition - ICDAR 2021, volume
12823 of Lecture Notes in Computer Science, pages 626-641. Springer,
Sep 2021.
[ bib |
DOI ]
Handwritten music recognition is a challenging task that could be of great use if mastered, e.g., to improve the accessibility of archival manuscripts or to ease music composition. Many modern machine learning techniques, however, cannot be easily applied to this task because of the limi‘ted availability of high-quality training data. Annotating such data manually is expensive and thus not feasible at the necessary scale.This problem has already been tackled in other fields by training on automatically generated synthetic data. We bring this approach to handwritten music recognition and present a method to generate synthetic handwritten music images(limited to monophonic scores)and show that training on such data leads to state-of-the-art results. ·
|
| [Pacha2021] | Alexander Pacha. The Challenge of Reconstructing Digits in Music Scores. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 4-7, Alicante, Spain, 2021. [ bib | http ] |
| [RiosVila2021] | Antonio Ríos-Vila, David Rizo, Jorge Calvo-Zaragoza, and José Manuel Iñesta. Completing Optical Music Recognition with Agnostic Transcription and Machine Translation. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 28-32, Alicante, Spain, 2021b. [ bib | http ] |
| [Ros2021] | Antonio Ríos Vila. Development of a Complete Optical Music Recognition Workflow. Máster universitario en ciencia de datos, Universidad de Alicante, Jul 2021. [ bib | http ] |
| [RosVila2021] |
Antonio Ríos-Vila, David Rizo, and Jorge Calvo-Zaragoza.
Complete Optical Music Recognition via Agnostic Transcription and
Machine Translation.
In Document Analysis and Recognition - ICDAR 2021, volume
12823 of Lecture Notes in Computer Science, pages 661-675. Springer,
Sep 2021a.
[ bib |
DOI ]
Optical Music Recognition workflows currently involve several steps to retrieve information from music documents, focusing on image analysis and symbol recognition. However, despite many efforts, there is little research on how to bring these recognition results to practice, as there is still one step that has not been properly discussed: the encoding into standard music formats and its integration within OMR work flows to produce practical results that end users could benefit from. In this paper, we approach this topic and propose options for completing OMR, eventually exporting the score image into a standard digital format. Specifically, we discuss the possibility of attaching Machine Translation systems to the recognition pipeline to perform the encoding step. After discussing the most appropriate systems for the process and proposing two options for the translation, we evaluate its performance in contrast to a direct encoding pipeline.Our results confirm that the proposed addition to the pipeline establishes itself as a feasible and interesting solution for complete OMR processes,especially when limited training data is available, which represents a common scenario in music heritage. · ·
|
| [Samiotis2021] | Ioannis Petros Samiotis, Christoph Lofi, and Alessandro Bozzon. Hybrid Annotation Systems for Music Transcription. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 23-27, Alicante, Spain, 2021. [ bib | http ] |
| [Schneider2021] |
Daniel Schneider, Nikolaus Korfhage, Matthias Mühling, and Peter
Lüttig.
Automatic Transcription of Organ Tablature Music Notation with Deep
Neural Networks.
Transactions of the International Society for Music Information
Retrieval, 4 (1): 14-28, Feb 2021.
ISSN 2514-3298.
[ bib |
DOI ]
This paper presents a deep learning approach for the automatic recognition of New German Organ Tablature in scanned documents, transcribing historical organ tablature notation to modern music notation using convolutional neural networks.
|
| [Shatri2021] | Elona Shatri and György Fazekas. DoReMi: First glance at a universal OMR dataset. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 43-49, Alicante, Spain, 2021. [ bib | http ] |
| [Shishido2021] |
Tomoyuki Shishido, Farida Fati, Daisuke Tokushige, Yuya Ono, and Itsuo
Kumazawa.
Listen to Your Favorite Melodies with img2Mxml, Producing MusicXML
from Sheet Music Image by Measure-based Multimodal Deep Learning-driven
Assembly, Jun 2021.
[ bib |
DOI |
arXiv ]
This paper presents img2Mxml (MMdA), a measure-based multimodal deep learning-driven assembly method for end-to-end optical music recognition that produces MusicXML files from sheet music images, including locally inclined photo images of piano pieces.
|
| [Torras2021] | Pau Torras, Arnau Baró, Lei Kang, and Alicia Fornés. On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, pages 690-696, Nov 2021. [ bib | DOI | http ] |
| [Wang2021] |
Yanfang Wang.
Research on Handwritten Note Recognition in Digital Music Classroom
Based on Deep Learning.
Journal of Internet Technology, 22 (6): 1443-1455, 2021.
[ bib |
DOI ]
Keywords: Digital music classroom, Handwritten note recognition, Deep learning, Gaussian process, Music is an indispensable subject in quality education, Non-parametric estimation which plays an important role in improving students’ overall quality. Traditional music teaching is mainly a
|
| [Wenzlitschke2021] | Nils Wenzlitschke. Implementation and evaluation of a neural network for the recognition of handwritten melodies. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, Proceedings of the 3rd International Workshop on Reading Music Systems, pages 38-42, Alicante, Spain, 2021. [ bib | http ] |
| [Wick2021] |
Christoph Wick and Frank Puppe.
Experiments and detailed error-analysis of automatic square notation
transcription of medieval music manuscripts using CNN/LSTM-networks and a
neume dictionary.
Journal of New Music Research, 50 (1): 18-36, Jan 2021.
ISSN 0929-8215.
[ bib |
DOI ]
This paper presents experiments on automatic recognition of scanned medieval manuscripts written in square notation using CNN/LSTM networks trained with segmentation-free CTC loss, achieving a diplomatic Symbol Accuracy Rate of 86.0-92.2% across three manuscripts, with a neume dictionary yielding approximately 5% relative improvement during decoding.
|
| [Sands2020] |
Janelle C. Sands.
Efficient Optical Music Recognition Validation Using MIDI
Sequence Data.
Master's thesis, Massachusetts Institute of Technology, Department of
Electrical Engineering and Computer Science, May 2020.
[ bib |
http ]
This thesis develops an OMR corrector that automatically identifies discrepancies between resultant OMR scores and corresponding MIDI scores, then either automatically fixes errors or flags ambiguous cases for manual correction by the user.
|
| [AlfaroContreras2020] | María Alfaro-Contreras, Jorge Calvo-Zaragoza, and José M. Iñesta. Reconocimiento holístico de partituras musicales. Technical report, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain, 2020. [ bib | .pdf ] |
| [Calvo-Zaragoza2020] |
Jorge Calvo-Zaragoza, Jan Hajič Jr., and Alexander Pacha.
Understanding Optical Music Recognition.
ACM Comput. Surv., 53 (4), 2020.
ISSN 0360-0300.
[ bib |
DOI ]
For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords.
|
| [Castellanos2020] |
Francisco J. Castellanos, Antonio-Javier Gallego, and Jorge Calvo-Zaragoza.
Automatic scale estimation for music score images.
Expert Systems with Applications, page 113590, 2020.
ISSN 0957-4174.
[ bib |
DOI |
http ]
Optical Music Recognition (OMR) is the research field focused on the automatic reading of music from scanned images. Its main goal is to encode the content into a digital and structured format with the advantages that this entails. This discipline is traditionally aligned to a workflow whose first step is the document analysis. This step is responsible of recognizing and detecting different sources of information—e.g. music notes, staff lines and text—to extract them and then processing automatically the content in the following steps of the workflow. One of the most difficult challenges it faces is to provide a generic solution to analyze documents with diverse resolutions. The endless number of existing music sources does not meet a standard that normalizes the data collections, giving complete freedom for a wide variety of image sizes and scales, thereby making this operation unsustainable. In the literature, this question is commonly overlooked and a uniform scale is assumed. In this paper, a machine learning-based approach to estimate the scale of music documents with respect to a reference scale is presented. Our goal is to propose a robust and generalizable method to adapt the input image to the requirements of an OMR system. For this, two goal-directed case studies are included to evaluate the proposed approach over common task within the OMR workflow, comparing the behavior with other state-of-the-art methods. Results suggest that it is necessary to perform this additional step in the first stage of the workflow to correct the scale of the input images. In addition, it is empirically demonstrated that our specialized approach is more promising than image augmentation strategies for the multi-scale challenge.
|
| [Elezi2020] | Ismail Elezi. Exploiting Contextual Information with Deep Neural Networks. mathesis, Ca' Foscari, University of Venice, 2020. [ bib | .pdf ] |
| [Henkel2020] | Florian Henkel, Rainer Kelz, and Gerhard Widmer. Learning to Read and Follow Music in Complete Score Sheet Images. In Proceedings of the 21st Int. Society for Music Information Retrieval Conf., 2020. [ bib | .html ] |
| [Malakar2020] |
Samir Malakar, Manosij Ghosh, Agneet Chaterjee, and Showmik Bhowmik.
Offline music symbol recognition using Daisy feature and quantum
Grey wolf optimization based feature selection.
Multimedia Tools and Applications, 79: 32011-32036, Aug 2020.
ISSN 1573-7721.
[ bib |
DOI ]
This paper applies the Daisy feature descriptor for offline music symbol recognition and uses Quantum Grey Wolf Optimization (QGWO) for feature selection from the high-dimensional feature vector, achieving improved recognition accuracy on classical music symbol datasets.
|
| [Mico2020] |
Luisa Micó, Jose Oncina, and José M. Iñesta.
Adaptively Learning to Recognize Symbols in Handwritten Early Music.
In Peggy Cellier and Kurt Driessens, editors, Machine Learning
and Knowledge Discovery in Databases, pages 470-477, Cham, 2020. Springer
International Publishing.
ISBN 978-3-030-43887-6.
[ bib |
DOI ]
Human supervision is necessary for a correct edition and publication of handwritten early music collections. The output of an optical music recognition system for that kind of documents may contain a significant number of errors, making it tedious to correct for a human expert. An adequate strategy is needed to optimize the human feedback information during the correction stage to adapt the classifier to the specificities of each manuscript. In this paper, we compare the performance of a neural system, difficult and slow to be retrained, and a nearest neighbor strategy, based on the neural codes provided by a neural net, trained offline, used as a feature extractor.
|
| [MuNG] | Alexander Pacha and Jan Hajič jr. The Music Notation Graph (MuNG) Repository. https://github.com/OMR-Research/mung, 2020. [ bib | http ] |
| [RiosVila2020] |
Antonio Ríos-Vila, Jorge Calvo-Zaragoza, and José M. Iñesta.
Exploring the Two-Dimensional Nature of Music Notation for Score
Recognition with End-to-End Approaches.
In Proceedings of the 17th International Conference on
Frontiers in Handwriting Recognition (ICFHR), pages 193-198, 2020.
[ bib |
DOI |
http ]
This paper explores alternative output representations for end-to-end neural music score recognition that take into account the inherent two-dimensional nature of music symbols, which is typically ignored in sequential approaches.
|
| [Tardon2020] | Lorenzo J. Tardón, Isabel Barbancho, Ana M. Barbancho, and Ichiro Fujinaga. Automatic Staff Reconstruction within SIMSSA Project. Applied Sciences, 10 (7): 2468-2484, 2020. [ bib | DOI | http ] |
| [Tsai2020] | Timothy J. Tsai, Daniel Yang, Mengyi Shan, Thitaree Tanprasert, and Teerapat Jenrungrot. Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages. IEEE Transactions on Multimedia, pages 1-13, 2020. [ bib | DOI | arXiv ] |
| [Tuggener2020] |
Lukas Tuggener, Yvan Putra Satyawan, Alexander Pacha, Jürgen Schmidhuber,
and Thilo Stadelmann.
The DeepScoresV2 Dataset and Benchmark for Music Object Detection.
In Proceedings of the 25th International Conference on Pattern
Recognition, Milan, Italy, 2020.
[ bib |
DOI ]
In this paper, we present DeepScoresV2, an extended version of the DeepScores dataset for optical music recognition (OMR). We improve upon the original DeepScores dataset by providing much more detailed annotations, namely (a) annotations for 135 classes including fundamental symbols of non-fixed size and shape, increasing the number of annotated symbols by 23%; (b) oriented bounding boxes; (c) higher-level rhythm and pitch information (onset beat for all symbols and line position for noteheads); and (d) a compatibility mode for easy use in conjunction with the MUSCIMA++ dataset for OMR on handwritten documents. These additions open up the potential for future advancement in OMR research. Additionally, we release two state-of-the-art baselines for DeepScoresV2 based on Faster R-CNN and the Deep Watershed Detector. An analysis of the baselines shows that regular orthogonal bounding boxes are unsuitable for objects which are long, small, and potentially rotated, such as ties and beams, which demonstrates the need for detection algorithms that naturally incorporate object angles.
|
| [Wick2020] | Christoph Wick and Frank Puppe. Automatic Neume Transcription of Medieval Music Manuscripts using CNN/LSTM-Networks and the segmentation-free CTC-Algorithm. Technical report, University of Würzburg, 2020. [ bib | DOI ] |
| [Wick2019] | Christoph Wick, Alexander Hartelt, and Frank Puppe. Staff, Symbol, and Melody Detection of Medieval Manuscripts Written in Square Notation Using Deep Fully Convolutional Networks. May 2019a. [ bib | DOI | http ] |
| [Baro2019] |
Arnau Baró, Pau Riba, Jorge Calvo-Zaragoza, and Alicia Fornés.
From Optical Music Recognition to Handwritten Music Recognition: A
baseline.
Pattern Recognition Letters, 123: 1-8, 2019.
ISSN 0167-8655.
[ bib |
DOI |
http ]
Optical Music Recognition (OMR) is the branch of document image analysis that aims to convert images of musical scores into a computer-readable format. Despite decades of research, the recognition of handwritten music scores, concretely the Western notation, is still an open problem, and the few existing works only focus on a specific stage of OMR. In this work, we propose a full Handwritten Music Recognition (HMR) system based on Convolutional Recurrent Neural Networks, data augmentation and transfer learning, that can serve as a baseline for the research community.
|
| [Calvo-Zaragoza2019] |
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal.
Hybrid hidden Markov models and artificial neural networks for
handwritten music recognition in mensural notation.
Pattern Analysis and Applications, Mar 2019b.
ISSN 1433-755X.
[ bib |
DOI ]
In this paper, we present a hybrid approach using hidden Markov models (HMM) and artificial neural networks to deal with the task of handwritten Music Recognition in mensural notation. Previous works have shown that the task can be addressed with Gaussian density HMMs that can be trained and used in an end-to-end manner, that is, without prior segmentation of the symbols. However, the results achieved using that approach are not sufficiently accurate to be useful in practice. In this work, we hybridize HMMs with deep multilayer perceptrons (MLPs), which lead to remarkable improvements in optical symbol modeling. Moreover, this hybrid architecture maintains important advantages of HMMs such as the ability to properly model variable-length symbol sequences through segmentation-free training, and the simplicity and robustness of combining optical models with N-gram language models, which provide statistical a priori information about regularities in musical symbol concatenation observed in the training data. The results obtained with the proposed hybrid MLP-HMM approach outperform previous works by a wide margin, achieving symbol-level error rates around 26%, as compared with about 40% reported in previous works.
|
| [Calvo-Zaragoza2019a] | Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha. Understanding Optical Music Recognition. Computing Research Repository, 2019a. [ bib | DOI | arXiv ] |
| [Calvo-Zaragoza2019b] |
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal.
Handwritten Music Recognition for Mensural notation with
convolutional recurrent neural networks.
Pattern Recognition Letters, 128: 115-121, 2019c.
ISSN 0167-8655.
[ bib |
DOI |
http ]
Optical Music Recognition is the technology that allows computers to read music notation, which is also referred to as Handwritten Music Recognition when it is applied over handwritten notation. This technology aims at efficiently transcribing written music into a representation that can be further processed by a computer. This is of special interest to transcribe the large amount of music written in early notations, such as the Mensural notation, since they represent largely unexplored heritage for the musicological community. Traditional approaches to this problem are based on complex strategies with many explicit rules that only work for one particular type of manuscript. Machine learning approaches offer the promise of generalizable solutions, based on learning from just labelled examples. However, previous research has not achieved sufficiently acceptable results for handwritten Mensural notation. In this work we propose the use of deep neural networks, namely convolutional recurrent neural networks, which have proved effective in other similar domains such as handwritten text recognition. Our experimental results achieve, for the first time, recognition results that can be considered effective for transcribing handwritten Mensural notation, decreasing the symbol-level error rate of previous approaches from 25.7% to 7.0%.
|
| [Colesnicov2019] |
Alexandru Colesnicov, Svetlana Cojocaru, Mihaela Luca, and Ludmila Malahov.
On Digitization of Documents with Script Presentable Content.
In Proceedings of the Fifth Conference of Mathematical Society
of Moldova, 2019.
[ bib |
.pdf ]
The paper is dedicated to details of the digitization of printed documents that include formalized script presentable content, in connection with the revitalization of the cultural heritage. We discuss the process and the necessary software by an example of music, as the recognition of scores is a solved task.
|
| [Eipert2019] | Tim Eipert, Felix Herrman, Christoph Wick, Frank Puppe, and Andreas Haug. Editor Support for Digital Editions of Medieval Monophonic Music. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 4-7, Delft, The Netherlands, 2019. [ bib | http ] |
| [Goularas2019] |
Dionysis Goularas and Kürsat Çinar.
Optical Music Recognition of the Hamparsum Notation.
In 2019 Ninth International Conference on Image Processing
Theory, Tools and Applications (IPTA), pages 1-7, Nov 2019.
[ bib |
DOI ]
This paper presents a method for the recognition of music notes from the Hamparsum music notation system. This notation was widely used during the last two centuries of the Ottoman Empire and it is still in use today. The Hamparsum notation presents significant differences compared to the European music notation, in terms of symbols and structure. Moreover, the notes can consist of more than one individual symbols. The proposed recognition method comprises several steps and algorithms, including a feature extraction based on Gabor Filters, recognition of symbols using a Support Vector Machine classifier, a method for assigning recognized symbols to a candidate Hamparsum note and a final recognition system based on template matching. This work will help to popularize this unique cultural heritage by providing Hamparsum scores in a machine-readable format.
|
| [Gover2019] | Matan Gover and Ichiro Fujinaga. A Notation-Based Query Language for Searching in Symbolic Music. In 6th International Conference on Digital Libraries for Musicology, DLfM ’19, pages 79-83, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450372398. [ bib | DOI ] |
| [Hajicjr.2019] |
Jan Hajič jr.
Optical Recognition of Handwritten Music Notation.
phdthesis, Charles University, Prague, 2019.
[ bib ]
Optical Music Recognition (OMR) is the field of computationally reading music notation. This thesis presents, in the form of dissertation by publication, contributions to the theory, resources, and methods of OMR especially for handwritten notation. The main contributions are (1) the Music Notation Graph (MuNG) formalism for describing arbitrarily complex music notation using an oriented graph that can be unambiguously interpreted in terms of musical semantics, (2) the MUSCIMA++ dataset of musical manuscripts with MuNG as ground truth that can be used to train and evaluate OMR systems and subsystems from the image all the way to extracting the musical semantics encoded therein, and (3) a pipeline for performing OMR on musical manuscripts that relies on machine learning both for notation symbol detection and the notation assembly stage, and on properties of the inferred MuNG representation to deterministically extract the musical semantics. While the the OMR pipeline does not perform flawlessly, this is the first OMR system to perform at basic useful tasks over musical semantics extracted from handwritten music notation of arbitrary complexity.
|
| [Hakim2019] | Dzikry Maulana Hakim and Ednawati Rainarli. Convolutional Neural Network untuk Pengenalan Citra Notasi Musik. Techno.COM, 18 (3): 214-226, 2019. ISSN 2356-2579. [ bib | DOI | http ] |
| [Henkel2019] | Florian Henkel, Rainer Kelz, and Gerhard Widmer. Audio-Conditioned U-Net for Position Estimation in Full Sheet Images. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 8-11, Delft, The Netherlands, 2019. [ bib | http ] |
| [Huang2019] |
Zhiquing Huang, Xiang Jia, and Yifan Guo.
State-of-the-Art Model for Music Object Recognition with Deep
Learning.
Applied Sciences, 9 (13): 2645-2665, 2019.
ISSN 2076-3417.
[ bib |
DOI |
http ]
Optical music recognition (OMR) is an area in music information retrieval. Music object detection is a key part of the OMR pipeline. Notes are used to record pitch and duration and have semantic information. Therefore, note recognition is the core and key aspect of music score recognition. This paper proposes an end-to-end detection model based on a deep convolutional neural network and feature fusion. This model is able to directly process the entire image and then output the symbol categories and the pitch and duration of notes. We show a state-of-the-art recognition model for general music symbols which can get 0.92 duration accurary and 0.96 pitch accuracy .
|
| [Inesta2019] | José M. Iñesta, David Rizo, and Jorge Calvo-Zaragoza. MuRET as a software for the transcription of historical archives. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 12-15, Delft, The Netherlands, 2019. [ bib | http ] |
| [Ju2019] | Qinjie Ju, René Chalon, and Stéphane Derrode. Assisted Music Score Reading Using Fixed-Gaze Head Movement: Empirical Experiment and Design Implications. Proc. ACM Hum.-Comput. Interact., 3 (EICS): 3:1-3:29, 2019. ISSN 2573-0142. [ bib | DOI ] |
| [Mateiu2019] |
Tudor Nicolae Mateiu.
Unsupervised Learning for Domain Adaptation in automatic
classification tasks through Neural Networks.
mathesis, Universidad de Alicante, 2019.
[ bib |
http ]
Machine Learning systems have improved dramatically in recent years for automatic recognition and artificial intelligence tasks. In general, these systems are based on the use of a large amount of labeled data - also called training sets - in order to learn a model that fits the problem in question. The training set consists of examples of possible inputs to the system and the output expected from them. Achieving this training set is the main limitation to use Machine Learning systems, since it requires human effort to find and map possible inputs with their corresponding outputs. The situation is often frustrating since systems learn to solve the task for a specific domain - that is, a type of input with relatively homogeneous conditions – and they are not able to generalize to correctly solve the same task in other domains. This project considers the use of Domain Adaptation algorithms, which are capable of learning to adapt a Machine Learning model to work in an unknown domain based on only unlabeled data (unsupervised learning). This facilitates the transfer of systems to new domains because obtaining unlabeled data is relatively cheap, since the cost is to label them. To date, Domain Adaptation algorithms have been used in very restricted contexts, so this project aims to make an empirical evaluation of these algorithms in a greater number of cases, as well as propose possible improvements.
|
| [Mateiu2019a] |
Tudor N. Mateiu, Antonio-Javier Gallego, and Jorge Calvo-Zaragoza.
Domain Adaptation for Handwritten Symbol Recognition: A Case of Study
in Old Music Manuscripts.
In Aythami Morales, Julian Fierrez, José Salvador Sánchez,
and Bernardete Ribeiro, editors, Pattern Recognition and Image
Analysis, pages 135-146, Cham, 2019. Springer International Publishing.
ISBN 978-3-030-31321-0.
[ bib |
DOI ]
The existence of a large amount of untranscripted music manuscripts has caused initiatives that use Machine Learning (ML) for Optical Music Recognition, in order to efficiently transcribe the music sources into a machine-readable format. Although most music manuscript are similar in nature, they inevitably vary from one another. This fact can negatively influence the complexity of the classification task because most ML models fail to transfer their knowledge from one domain to another, thereby requiring learning from scratch on new domains after manually labeling new data. This work studies the ability of a Domain Adversarial Neural Network for domain adaptation in the context of classifying handwritten music symbols. The main idea is to exploit the knowledge of a specific manuscript to classify symbols from different (unlabeled) manuscripts. The reported results are promising, obtaining a substantial improvement over a conventional Convolutional Neural Network approach, which can be used as a basis for future research.
|
| [Mengarelli2019] |
Luciano Mengarelli, Bruno Kostiuk, João G. Vitório, Maicon A. Tibola,
William Wolff, and Carlos N. Silla.
OMR metrics and evaluation: a systematic review.
Multimedia Tools and Applications, Dec 2019.
ISSN 1573-7721.
[ bib |
DOI ]
Music is rhythm, timbre, tones, intensity and performance. Conventional Western Music Notation (CWMN) is used to generate Music Scores in order to register music on paper. Optical Music Recognition (OMR) studies techniques and algorithms for converting music scores into a readable format for computers. work presents a systematic literature review (SLR) searching for metrics and methods of evaluation and comparing for OMR systems and algorithms. The most commonly used metrics on OMR works are described. A research protocol is elaborated and executed. From 802 publications found, 94 are evaluated. All results are organized and classified focusing on metrics, stages, comparisons, OMR datasets and related works. Although there is still no standard methodology for evaluating OMR systems, a good number of datasets and metrics are already available and apply to all the stages of OMR. Some of the analyzed works can give good directions for future works.
|
| [Metaj2019] | Stiven Metaj and Federico Magnolfi. MNR: MUSCIMA Notes Recognition. Using Faster R-CNN on handwritten music dataset. resreport, Politecnico di Milano, 2019. [ bib | DOI ] |
| [Miro2019] |
Jordi Burgués Miró.
Recognition of musical symbols in scores using neural networks.
Master's thesis, Universitat Politècnica de Catalunya, Barcelona,
Jun 2019.
[ bib |
http ]
Object detection is present nowadays in many aspects of our life. From security to entertainment, its applications play a key role in computer vision and image processing worlds.
|
| [Noll2019] | Justus Noll. Intelligentes Notenlesen. c't, 18: 122-126, 2019. [ bib | http ] |
| [NunezAlcover2019] |
Alicia Núñez Alcover.
Glyph and Position Classification of Music Symbols in Early
Manuscripts.
mathesis, Universidad de Alicante, 2019.
[ bib |
http ]
In this research, we study how to classify of handwritten music symbols in early music manuscripts written in white Mensural notation, a common notation system used since the fourteenth century and until the Renaissance. The field of Optical Music Recognition researches how to automate the reading of musical scores to transcribe its content to a structured digital format such as MIDI. When dealing with music manuscripts, the traditional workflow establishes two separate stages of detection and classification of musical symbols. In the classification stage, most of the research focuses on detecting musical symbols, without taking into account that a musical note is defined in two components: glyph and its position with respect to the staff. Our purpose will consist of the design and implementation of architectures in the field of Deep Learning, using Convolutional Neural Networks (CNNs) as well as its evaluation and comparison to determine which model provides the best performance in terms of efficiency and precision for its implementation in an interactive scenario.
|
| [Nunez-Alcover2019] |
Alicia Nuñez-Alcover, Pedro J. Ponce de León, and Jorge
Calvo-Zaragoza.
Glyph and Position Classification of Music Symbols in Early Music
Manuscripts.
In Aythami Morales, Julian Fierrez, José Salvador Sánchez,
and Bernardete Ribeiro, editors, Pattern Recognition and Image
Analysis, pages 159-168, Cham, 2019. Springer International Publishing.
ISBN 978-3-030-31321-0.
[ bib |
DOI ]
Optical Music Recognition is a field of research that automates the reading of musical scores so as to transcribe their content into a structured digital format. When dealing with music manuscripts, the traditional workflow establishes separate stages of detection and classification of musical symbols. In the latter, most of the research has focused on detecting musical glyphs, ignoring that the meaning of a musical symbol is defined by two components: its glyph and its position within the staff. In this paper we study how to perform both glyph and position classification of handwritten musical symbols in early music manuscripts written in white Mensural notation, a common notation system used for the most part of the XVI and XVII centuries. We make use of Convolutional Neural Networks as the classification method, and we tested several alternatives such as using independent models for each component, combining label spaces, or using both multi-input and multi-output models. Our results on early music manuscripts provide insights about the effectiveness and efficiency of each approach.
|
| [OmrBibliography] | Alexander Pacha. The definitive bibliography for research on Optical Music Recognition. https://omr-research.github.io, 2019a. [ bib | http ] |
| [Pacha2019] |
Alexander Pacha.
Self-Learning Optical Music Recognition.
phdthesis, TU Wien, 2019b.
[ bib |
.pdf ]
Music is an essential part of our culture and heritage. Throughout the centuries, millions of songs were composed and written down in documents using music notation. Optical Music Recognition (OMR) is the research field that investigates how the computer can learn to read those documents. Despite decades of research, OMR is still considered far from being solved. One reason is that traditional approaches rely heavily on heuristics and often do not generalize well. In this thesis, I propose a different approach to let the computer learn to read music notation documents mostly by itself using machine learning, especially deep learning.
|
| [Pacha2019a] | Alexander Pacha, Jorge Calvo-Zaragoza, and Jan Hajič jr. Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition. In 20th International Society for Music Information Retrieval Conference, pages 75-82, 2019. [ bib | .pdf ] |
| [Pacha2019b] | Alexander Pacha. Incremental Supervised Staff Detection. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 16-20, Delft, The Netherlands, 2019c. [ bib | http ] |
| [Panadero2019] |
Ivan Santos Panadero.
Alignment of handwritten music scores.
Technical report, Universitat Autónoma de Barcelona, 2019.
[ bib |
.pdf ]
There are musicologists that spend their time in analyzing musical pieces of more than a century ago in order to link them to another pre-existing pieces from the same author but written by different hands. It is a tedious task, since there are many representations done of a single piece through the time, and the writing variability among those representations can be extensive. The purpose would be in having a varied database of these old compositions for the study, reproduction and difusion. This work is divided into two phases. The first one, constitent in the detection of primitive present elements in each of the measures of a score using the existing transcription of the piece, thus obtaining the desired guided alignment. The second one will seek to analyze this alignment. Obtained results are encouraging.
|
| [Parada-Cabaleiro2019] |
Emilia Parada-Cabaleiro, Anton Batliner, and Björn Schuller.
A Diplomatic Edition of Il Lauro Secco: Ground Truth for OMR of White
Mensural Notation.
In 20th International Society for Music Information Retrieval
Conference, pages 557-564, Delft, The Netherlands, 2019.
[ bib |
.pdf ]
Early musical sources in white mensural notation—the most common notation in European printed music during the Renaissance—are nowadays preserved by libraries worldwide trough digitalisation. Still, the application of music information retrieval to this repertoire is restricted by the use of digitalisation techniques which produce an uncodified output. Optical Music Recognition (OMR) automatically generates a symbolic representation of imagebased musical content, thus making this repertoire reachable from the computational point of view; yet, further improvements are often constricted by the limited ground truth available. We address this lacuna by presenting a symbolic representation in original notation of Il Lauro Secco, an anthology of Italian madrigals in white mensural notation. For musicological analytic purposes, we encoded the repertoire in **mens and MEI formats; for OMR ground truth, we automatically codified the repertoire in agnostic and semantic formats, via conversion from the **mens files.
|
| [Regimbal2019] | Juliette Regimbal, McLennan Zoé, Gabriel Vigliensoni, Andrew Tran, and Ichiro Fujinaga. Neon2: A Verovio-based square-notation editor. In Music Encoding Conference 2019, Vienna, Austria, 2019. [ bib | .pdf ] |
| [Reuse2019] | Timothy de Reuse and Ichiro Fujinaga. Robust Transcript Alignment on Medieval Chant Manuscripts. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 21-26, Delft, The Netherlands, 2019. [ bib | http ] |
| [Rios-Vila2019] | Antonio Ríos-Vila, Jorge Calvo-Zaragoza, David Rizo, and José M. Iñesta. ReadSco: An Open-Source Web-Based Optical Music Recognition Tool. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 27-30, Delft, The Netherlands, 2019. [ bib | http ] |
| [Thomae2019] | Martha E. Thomae, Julie E. Cumming, and Ichiro Fujinaga. The Mensural Scoring-up Tool. In 6th International Conference on Digital Libraries for Musicology, DLfM ’19, pages 9-19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450372398. [ bib | DOI ] |
| [Vigliensoni2019] | Gabriel Vigliensoni, Alex Daigle, Eric Liu, Jorge Calvo-Zaragoza, Juliette Regimbal, Minh Anh Nguyen, Noah Baxter, Zoé McLennan, and Ichiro Fujinaga. From image to encoding: Full optical music recognition of Medieval and Renaissance music. In Music Encoding Conference, 2019. [ bib | .pdf ] |
| [Waloschek2019] | Simon Waloschek, Aristotelis Hadjakos, and Alexander Pacha. Identification and Cross-Document Alignment of Measures in Music Score Images. In 20th International Society for Music Information Retrieval Conference, pages 137-143, 2019. [ bib | .pdf ] |
| [Wick2019a] |
Christoph Wick, Alexander Hartelt, and Frank Puppe.
Staff, Symbol and Melody Detection of Medieval Manuscripts Written in
Square Notation Using Deel Fully Convolutional Networks.
Applied Sciences, 9 (13): 2646-2673, 2019b.
ISSN 2076-3417.
[ bib |
DOI |
http ]
Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F1-score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an F1-score of over 96%.
|
| [Wick2019b] | Christoph Wick and Frank Puppe. OMMR4all - a Semiautomatic Online Editor for Medieval Music Notations. In Jorge Calvo-Zaragoza and Alexander Pacha, editors, 2nd International Workshop on Reading Music Systems, pages 31-34, Delft, The Netherlands, 2019. [ bib | http ] |
| [Xiao2019] |
Zhe Xiao, Xin Chen, and Li Zhou.
Real-Time Optical Music Recognition System for Dulcimer Musical
Robot.
Journal of Advanced Computational Intelligence and Intelligent
Informatics, 23 (4): 782-790, 2019.
[ bib |
DOI ]
Traditional optical music recognition (OMR) is an important technology that automatically recognizes scanned paper music sheets. In this study, traditional OMR is combined with robotics, and a real-time OMR system for a dulcimer musical robot is proposed. This system gives the musical robot a stronger ability to perceive and understand music. The proposed OMR system can read music scores, and the recognized information is converted into a standard electronic music file for the dulcimer musical robot, thus achieving real-time performance. During the recognition steps, we treat note groups and isolated notes separately. Specially structured note groups are identified by primitive decomposition and structural analysis. The note groups are decomposed into three fundamental elements: note stem, note head, and note beams. Isolated music symbols are recognized based on shape model descriptors. We conduct tests on real pictures taken live by a camera. The tests show that the proposed method has a higher recognition rate.
|
| [Zalkow2019] | Frank Zalkow, Angel Villar Corrales, TJ Tsai, Vlora Arifi-Müller, and Meinard Müller. Tools For Semi-Automatic Bounding Box Annotation Of Musical Measures In Sheet Music. In Late Breaking/Demo at 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019. [ bib ] |
| [Achankunju2018] | Sanu Pulimootil Achankunju. Music Search Engine from Noisy OMR Data. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 23-24, Paris, France, 2018. [ bib | http ] |
| [Balke2018] |
Stefan Balke, Christian Dittmar, Jakob Abeßer, Klaus Frieler, Martin
Pfleiderer, and Meinard Müller.
Bridging the Gap: Enriching YouTube Videos with Jazz Music
Annotations.
Frontiers in Digital Humanities, 5: 1-11, 2018.
ISSN 2297-2668.
[ bib |
DOI ]
Web services allow permanent access to music from all over the world. Especially in the case of web services with user-supplied content, e.g., YouTube(TM), the available metadata is often incomplete or erroneous. On the other hand, a vast amount of high-quality and musically relevant metadata has been annotated in research areas such as Music Information Retrieval (MIR). Although they have great potential, these musical annotations are ofter inaccessible to users outside the academic world. With our contribution, we want to bridge this gap by enriching publicly available multimedia content with musical annotations available in research corpora, while maintaining easy access to the underlying data. Our web-based tools offer researchers and music lovers novel possibilities to interact with and navigate through the content. In this paper, we consider a research corpus called the Weimar Jazz Database (WJD) as an illustrating example scenario. The WJD contains various annotations related to famous jazz solos. First, we establish a link between the WJD annotations and corresponding YouTube videos employing existing retrieval techniques. With these techniques, we were able to identify 988 corresponding YouTube videos for 329 solos out of 456 solos contained in the WJD. We then embed the retrieved videos in a recently developed web-based platform and enrich the videos with solo transcriptions that are part of the WJD. Furthermore, we integrate publicly available data resources from the Semantic Web in order to extend the presented information, for example, with a detailed discography or artists-related information. Our contribution illustrates the potential of modern web-based technologies for the digital humanities, and novel ways for improving access and interaction with digitized multimedia content.
|
| [Baro2018] | Arnau Baró, Pau Riba, and Alicia Fornés. A Starting Point for Handwritten Music Recognition. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 5-6, Paris, France, 2018. [ bib | http ] |
| [Bonnici2018] | Alexandra Bonnici, Julian Abela, Nicholas Zammit, and George Azzopardi. Automatic Ornament Localisation, Recognition and Expression from Music Sheets. In ACM Symposium on Document Engineering, pages 25:1-25:11, Halifax, NS, Canada, 2018. ACM. ISBN 978-1-4503-5769-2. [ bib | DOI ] |
| [Calvo-Zaragoza2018] |
Jorge Calvo-Zaragoza and David Rizo.
End-to-End Neural Optical Music Recognition of Monophonic Scores.
Applied Sciences, 8 (4), 2018a.
ISSN 2076-3417.
[ bib |
DOI |
http ]
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score.
|
| [Calvo-Zaragoza2018a] |
Jorge Calvo-Zaragoza, Francisco J. Castellanos, Gabriel Vigliensoni, and Ichiro
Fujinaga.
Deep Neural Networks for Document Processing of Music Score Images.
Applied Sciences, 8 (5), 2018a.
ISSN 2076-3417.
[ bib |
DOI |
http ]
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data.
|
| [Calvo-Zaragoza2018b] | Jorge Calvo-Zaragoza and David Rizo. Camera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores. In 19th International Society for Music Information Retrieval Conference, pages 248-255, Paris, France, 2018b. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Calvo-Zaragoza2018c] | Jorge Calvo-Zaragoza. Why WoRMS? In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 7-8, Paris, France, 2018. [ bib | http ] |
| [Calvo-Zaragoza2018d] |
Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha.
Discussion Group Summary: Optical Music Recognition.
In Alicia Fornés and Lamiroy Bart, editors, Graphics
Recognition, Current Trends and Evolutions, Lecture Notes in Computer
Science, pages 152-157. Springer International Publishing, 2018b.
ISBN 978-3-030-02283-9.
[ bib |
DOI ]
This document summarizes the discussion of the interest group on Optical Music Recognition (OMR) that took place in the 12th IAPR International Workshop on Graphics Recognition, and presents the main conclusions drawn during the session: OMR should revisit how it describes itself, and the OMR community should intensify its collaboration both internally and with other stakeholders.
|
| [Calvo-Zaragoza2018e] |
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal.
Probabilistic Music-Symbol Spotting in Handwritten Scores.
In 16th International Conference on Frontiers in Handwriting
Recognition, pages 558-563, Niagara Falls, USA, 2018d.
[ bib |
DOI ]
Content-based search on musical manuscripts is usually performed assuming that there are accurate transcripts of the sources in a symbolic, structured format. Given that current systems for Handwritten Music Recognition are far from offering guarantees about their accuracy, this traditional approach does not represent a scalable scenario. In this work we propose a probabilistic framework for Music-Symbol Spotting (MSS), that allows for content-based music search directly over the images of the manuscripts. By means of statistical recognition systems, a probabilistic index is built upon which the search can be carried out efficiently. Our experiments over a dataset of an Early handwritten music manuscript in Mensural notation demonstrates that this MSS framework can be presented as a promising alternative to the traditional approach for content-based music search.
|
| [Castellanos2018] | Fancisco J. Castellanos, Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. Document Analysis of Music Score Images with Selectional Auto-Encoders. In 19th International Society for Music Information Retrieval Conference, pages 256-263, Paris, France, 2018. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Chen2018] | Liang Chen and Christopher Raphael. Optical Music Recognition and Human-in-the-loop Computation. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 11-12, Paris, France, 2018. [ bib | http ] |
| [Choi2018] | Kwon-Young Choi, Bertrand Coüasnon, Yann Ricquebourg, and Richard Zanibbi. Music Symbol Detection with Faster R-CNN Using Synthetic Annotations. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 9-10, Paris, France, 2018. [ bib | http ] |
| [Crawford2018] | Tim Crawford, Golnaz Badkobeh, and David Lewis. Searching Page-Images of Early Music Scanned with OMR: A Scalable Solution Using Minimal Absent Words. In 19th International Society for Music Information Retrieval Conference, pages 233-239, Paris, France, 2018. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Diet2018] | Jürgen Diet. Optical Music Recognition in der Bayerischen Staatsbibliothek. BIBLIOTHEK - Forschung und Praxis, 2018a. [ bib | DOI ] |
| [Diet2018a] |
Jürgen Diet.
Innovative MIR Applications at the Bayerische Staatsbibliothek.
In 5th International Conference on Digital Libraries for
Musicology, Paris, France, 2018b.
[ bib |
.pdf ]
This short position paper gives an insight into the digitization of music prints in the Bayerische Staatsbibliothek and describes two music information retrieval applications in the Bayerische Staatsbibliothek. One of them is a melody search application based on OMR data that has been generated with 40.000 pages of digitized music prints containing all compositions of L. van Beethoven, G. F. Händel, F. Liszt, and F. Schubert. The other one is the incipit search in the International Inventory of Musical Sources (Répertoire International des Sources Musicales, RISM).
|
| [Dorfer2018] | Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, and Gerhard Widmer. Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification. Transactions of the International Society for Music Information Retrieval, 1 (1): 22-33, 2018a. [ bib | DOI ] |
| [Dorfer2018a] | Matthias Dorfer, Florian Henkel, and Gerhard Widmer. Learning To Listen, Read And Follow: Score Following As A Reinforcement Learning Game. In 19th International Society for Music Information Retrieval Conference, pages 784-791, Paris, France, 2018b. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Elezi2018] | Ismail Elezi, Lukas Tuggener, Marcello Pelillo, and Thilo Stadelmann. DeepScores and Deep Watershed Detection: current state and open issues. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 13-14, Paris, France, 2018. [ bib | http ] |
| [Fornes2018] | Alicia Fornés and Lamiroy Bart, editors. Graphics Recognition, Current Trends and Evolutions, volume 11009 of Lecture Notes in Computer Science, 2018. Springer International Publishing. ISBN 978-3-030-02283-9. [ bib | DOI ] |
| [Fujinaga2018] |
Ichiro Fujinaga, Andrew Hankinson, and Laurent Pugin.
Automatic Score Extraction with Optical Music Recognition (OMR).
In Springer Handbook of Systematic Musicology, pages 299-311.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2018.
ISBN 978-3-662-55004-5.
[ bib |
DOI ]
Optical music recognition (OMR optical music recognition (OMR) ) describes the process of automatically transcribing music notation from a digital image. Although similar to optical character recognition (OCR optical character recognition (OCR) ), the process and procedures of OMR diverge due to the fundamental differences between text and music notation, such as the two-dimensional nature of the notation system and the overlay of music symbols on top of staff lines. The OMR process can be described as a sequence of steps, with techniques adapted from disciplines including image processing, machine learning, grammars, and notation encoding. The sequence and specific techniques used can differ depending on the condition of the image, the type of notation, and the desired output.
|
| [Gotham2018] | Mark Gotham, Peter Jonas, Bruno Bower, William Bosworth, Daniel Rootham, and Leigh VanHandel. Scores of Scores: An Openscore Project to Encode and Share Sheet Music. In 5th International Conference on Digital Libraries for Musicology, pages 87-95, Paris, France, 2018. ACM. ISBN 978-1-4503-6522-2. [ bib | DOI ] |
| [Hajicjr.2018] |
Jan Hajič jr., Marta Kolárová, Alexander Pacha, and Jorge
Calvo-Zaragoza.
How Current Optical Music Recognition Systems Are Becoming Useful for
Digital Libraries.
In 5th International Conference on Digital Libraries for
Musicology, pages 57-61, Paris, France, 2018b. ACM.
ISBN 978-1-4503-6522-2.
[ bib |
DOI ]
Optical Music Recognition (OMR) promises to make large collections of sheet music searchable by their musical content. It would open up novel ways of accessing the vast amount of written music that has never been recorded before. For a long time, OMR was not living up to that promise, as its performance was simply not good enough, especially on handwritten music or under non-ideal image conditions. However, OMR has recently seen a number of improvements, mainly due to the advances in machine learning. In this work, we take an OMR system based on the traditional pipeline and an end-to-end system, which represent the current state of the art, and illustrate in proof-of-concept experiments their applicability in retrieval settings. We also provide an example of a musicological study that can be replicated with OMR outputs at much lower costs. Taken together, this indicates that in some settings, current OMR can be used as a general tool for enriching digital libraries.
|
| [Hajicjr.2018a] | Jan Hajič jr., Matthias Dorfer, Gerhard Widmer, and Pavel Pecina. Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. In 19th International Society for Music Information Retrieval Conference, pages 225-232, Paris, France, 2018a. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Hajicjr.2018b] | Jan Hajič jr. A Case for Intrinsic Evaluation of Optical Music Recognition. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 15-16, Paris, France, 2018. [ bib | http ] |
| [Hemmatifar2018] | Ali Hemmatifar and Ashish Krishna. DeepPiano: A Deep Learning Approach to Translate Music Notation to English Alphabet. Technical report, Stanford University, 2018. [ bib | .pdf ] |
| [Inesta2018] | José Manuel Iñesta, Pedro J. Ponce de León, David Rizo, José Oncina, Luisa Micó, Juan Ramón Rico-Juan, Carlos Pérez-Sancho, and Antonio Pertusa. HISPAMUS: Handwritten Spanish Music Heritage Preservation by Automatic Transcription. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 17-18, Paris, France, 2018. [ bib | http ] |
| [Konwer2018] |
Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Prithaj
Banerjee, Partha Pratim Roy, and Umapada Pal.
Staff line Removal using Generative Adversarial Networks.
In 2018 24th International Conference on Pattern Recognition
(ICPR), pages 1103-1108, Aug 2018.
[ bib |
DOI ]
Staff line removal is a crucial pre-processing step in Optical Music Recognition. In this paper we propose a novel approach for staff line removal, based on Generative Adversarial Networks. We convert staff line images into patches and feed them into a U-Net, used as Generator. The Generator intends to produce staff-less images at the output. Then the Discriminator does binary classification and differentiates between the generated fake staff-less image and real ground truth staff less image. For training, we use a Loss function which is a weighted combination of L2 loss and Adversarial loss. L2 loss minimizes the difference between real and fake staff-less image. Adversarial loss helps to retrieve more high quality textures in generated images. Thus our architecture supports solutions which are closer to ground truth and it reflects in our results. For evaluation we consider the ICDAR/GREC 2013 staff removal database. Our method achieves superior performance in comparison to other conventional approaches on the same dataset.
|
| [Li2018] |
Chuanzhen Li, Jiaqi Zhao, Juanjuan Cai, Hui Wang, and Huaichang Du.
Optical Music Notes Recognition for Printed Music Score.
In 11th International Symposium on Computational Intelligence
and Design (ISCID), volume 01, pages 285-288, Dec 2018.
[ bib |
DOI ]
To convert printed music score into a machine-readable format, a system that can automatically decode the symbolic image and play the music is proposed. The system takes a music score image as input, segments music symbols after preprocessing the image, then recognizes their pitch and duration. Finally, MIDI files are generated. The experiments on Rebelo Database shows that the proposed method obtains superior recognition accuracy against other methods.
|
| [McLeod2018] | Andrew McLeod and Mark Steedman. Evaluating Automatic Polyphonic Music Transcription. In 19th International Society for Music Information Retrieval Conference, pages 42-49, Paris, France, 2018. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Mico2018] | Luisa Micó, José Manuel Iñesta, and David Rizo. Incremental Learning for Recognition of Handwritten Mensural Notation. In 11th International Workshop on Machine Learning and Music, 2018. [ bib | http ] |
| [Moonlight] | Dan Ringwalt. Moonlight. https://github.com/ringw/moonlight, 2018. [ bib | http ] |
| [Napoles2018] | Néstor Nápoles, Gabriel Vigliensoni, and Ichiro Fujinaga. Encoding Matters. In 5th International Conference on Digital Libraries for Musicology, pages 69-73, Paris, France, 2018. ACM. ISBN 978-1-4503-6522-2. [ bib | DOI ] |
| [Niitsuma2018] |
Masahiro Niitsuma, Yo Tomita, Wei Qi Yan, and David Bell.
Towards Musicologist-Driven Mining of Handwritten Scores.
IEEE Intelligent Systems, 33 (4): 24-34, 2018.
ISSN 1541-1672.
[ bib |
DOI ]
Historical musicologists have been seeking for objective and powerful techniques to collect, analyse and verify their findings for many decades. The aim of this study is to propose a musicologist-driven mining method for extracting quantitative information from early music manuscripts. Our focus is on finding evidence for the chronological ordering of J.S. Bachs manuscripts. Bachs C-clefs were extracted from a wide range of manuscripts under the direction of domain experts, and with these the classification of C-clefs was conducted. The proposed methods were evaluated on a dataset containing over 1000 clefs extracted from J.S. Bachs manuscripts. The results show more than 70% accuracy for dating J.S. Bachs manuscripts, providing a rough barometer to be combined with other evidence to evaluate musicologists hypotheses, and the practicability of this domain-driven approach is demonstrated.
|
| [OmrDatasetTools] | Alexander Pacha. Documentation of the OMR Dataset Tools Python package. https://omr-datasets.readthedocs.io/en/latest, 2018a. [ bib | http ] |
| [OmrTutorialOnYoutube] | Jorge Calvo-Zaragoza, Jan Hajič jr., Alexander Pacha, and Ichiro Fujinaga. The recording of the ISMIR Tutorial "OMR for Dummies" on YouTube. https://www.youtube.com/playlist?list=PL1jvwDVNwQke-04UxzlzY4FM33bo1CGS0, 2018c. [ bib | http ] |
| [Pacha2018] |
Alexander Pacha, Kwon-Young Choi, Bertrand Coüasnon, Yann Ricquebourg,
Richard Zanibbi, and Horst Eidenberger.
Handwritten Music Object Detection: Open Issues and Baseline Results.
In 13th International Workshop on Document Analysis Systems,
pages 163-168, 2018a.
[ bib |
DOI ]
Optical Music Recognition (OMR) is the challenge of understanding the content of musical scores. Accurate detection of individual music objects is a critical step in processing musical documents because a failure at this stage corrupts any further processing. So far, all proposed methods were either limited to typeset music scores or were built to detect only a subset of the available classes of music symbols. In this work, we propose an end-to-end trainable object detector for music symbols that is capable of detecting almost the full vocabulary of modern music notation in handwritten music scores. By training deep convolutional neural networks on the recently released MUSCIMA++ dataset which has symbol-level annotations, we show that a machine learning approach can be used to accurately detect music objects with a mean average precision of over 80%.
|
| [Pacha2018a] | Alexander Pacha. Self-learning Optical Music Recognition. In Philipp Hans, Gerald Artner, Johanna Grames, Heinz Krebs, Hamid Reza Mansouri Khosravi, and Taraneh Rouhi, editors, Vienna Young Scientists Symposium, pages 34-35. TU Wien, Book-of-Abstracts.com, Heinz A. Krebs, 2018b. ISBN 978-3-9504017-8-3. ISBN: 978-3-9504017-8-3. [ bib | http ] |
| [Pacha2018b] |
Alexander Pacha and Jorge Calvo-Zaragoza.
Optical Music Recognition in Mensural Notation with Region-Based
Convolutional Neural Networks.
In 19th International Society for Music Information Retrieval
Conference, pages 240-247, Paris, France, 2018.
ISBN 978-2-9540351-2-3.
[ bib |
.pdf ]
In this work, we present an approach for the task of optical music recognition (OMR) using deep neural networks. Our intention is to simultaneously detect and categorize musical symbols in handwritten scores, written in mensural notation. We propose the use of region-based convolutional neural networks, which are trained in an end-toend fashion for that purpose. Additionally, we make use of a convolutional neural network that predicts the relative position of a detected symbol within the staff, so that we cover the entire image-processing part of the OMR pipeline. This strategy is evaluated over a set of 60 ancient scores in mensural notation, with more than 15000 annotated symbols belonging to 32 different classes. The results reflect the feasibility and capability of this approach, with a weighted mean average precision of around 76% for symbol detection, and over 98% accuracy for predicting the position.
|
| [Pacha2018c] |
Alexander Pacha, Jan Hajič jr., and Jorge Calvo-Zaragoza.
A Baseline for General Music Object Detection with Deep Learning.
Applied Sciences, 8 (9): 1488-1508, 2018b.
ISSN 2076-3417.
[ bib |
DOI |
http ]
Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on a specific dataset and different evaluation criteria, which made it difficult to quantify the new deep learning-based state-of-the-art and assess the relative merits of these detection models on music scores. In this paper, a baseline for general detection of musical symbols with deep learning is presented. We consider three datasets of heterogeneous typology but with the same annotation format, three neural models of different nature, and establish their performance in terms of a common evaluation standard. The experimental results confirm that the direct music object detection with deep learning is indeed promising, but at the same time illustrates some of the domain-specific shortcomings of the general detectors. A qualitative comparison then suggests avenues for OMR improvement, based both on properties of the detection model and how the datasets are defined. To the best of our knowledge, this is the first time that competing music object detection systems from the machine learning paradigm are directly compared to each other. We hope that this work will serve as a reference to measure the progress of future developments of OMR in music object detection.
|
| [Pacha2018d] | Alexander Pacha. Advancing OMR as a Community: Best Practices for Reproducible Research. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 19-20, Paris, France, 2018c. [ bib | http ] |
| [Paeaekkoenen2018] | Tuula Pääkkönen, Jukka Kervinen, and Kimmo Kettunen. Digitisation and Digital Library Presentation System - Sheet Music to the Mix. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 21-22, Paris, France, 2018. [ bib | http ] |
| [PhotoScore] | Neuratron. PhotoScore 2018. http://www.neuratron.com/photoscore.htm, 2018. [ bib | http ] |
| [Rizo2018] | David Rizo, Jorge Calvo-Zaragoza, and José M. Iñesta. MuRET: A Music Recognition, Encoding, and Transcription Tool. In 5th International Conference on Digital Libraries for Musicology, pages 52-56, Paris, France, 2018. ACM. ISBN 978-1-4503-6522-2. [ bib | DOI ] |
| [Roggenkemper2018] | Heinz Roggenkemper and Ryan Roggenkemper. How can Machine Learning make Optical Music Recognition more relevant for practicing musicians? In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 25-26, Paris, France, 2018. [ bib | http ] |
| [Sotoodeh2017] |
Mahmood Sotoodeh, Farshad Tajeripour, Sadegh Teimori, and Kirk Jorgensen.
A music symbols recognition method using pattern matching along with
integrated projection and morphological operation techniques.
Multimedia Tools and Applications, 77 (13): 16833-16866,
2018.
ISSN 1573-7721.
[ bib |
DOI ]
Optical Music Recognition (OMR) can be divided into three main phases: (i) staff line detection and removal. The goal of this phase is to detect and to remove staff lines from sheet music images. (ii) music symbol detection and segmentation. The propose of this phase is to detect the remaining musical symbols such as single symbols and group symbols, then segment the group symbols to single or primitive symbols after removing staff lines. (iii) musical symbols recognition. In this phase, recognition of musical symbols is the main objective. The method presented in this paper, covers all three phases. One advantage of the first phase of the proposed method is that it is robust to staff lines rotation and staff lines which have curvature in sheet music images. Moreover, the staff lines are removed accurately and quickly and also fewer details of the musical symbols are omitted. The proposed method in the first phase focuses on the hand-written documents databases which have been introduced in the CVC-MUSCIMA and ICDAR 2013. It has the lowest error rate among well-known methods and outperforms the state of the art in CVC-MUSCIMA database. In ICDAR 2013, the specificity measure of this method is 99.71% which is the highest specificity among available methods. Also, in terms of accuracy, recall rate and f-measure is only slightly less than the best method. Therefor our method is comparable favorably to the existing methods. In the second phase, the symbols are divided into two categories, single and group. In the recognition phase, we use a pattern matching method to identify single symbols. For recognizing group symbols, a hierarchical method is proposed. The proposed method in the third phase has several advantages over the previous methods. It is quite robust to skewness of musical group symbols. Furthermore, it provides high accuracy in recognition of the symbols.
|
| [Tuggener2018] |
Lukas Tuggener, Ismail Elezi, Jürgen Schmidhuber, Marcello Pelillo, and
Thilo Stadelmann.
DeepScores - A Dataset for Segmentation, Detection and Classification
of Tiny Objects.
In 24th International Conference on Pattern Recognition,
Beijing, China, 2018a.
[ bib |
DOI |
arXiv ]
We present the DeepScores dataset with the goal of advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred millions of small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, beyond the scope of optical music recognition (OMR) research. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like Caltech101/256, PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, smaller computer vision datasets, as well as with other OMR datasets. Finally, we provide baseline performances for object classification and give pointers to future research based on this dataset.
|
| [Tuggener2018a] | Lukas Tuggener, Ismail Elezi, Jürgen Schmidhuber, and Thilo Stadelmann. Deep Watershed Detector for Music Object Recognition. In 19th International Society for Music Information Retrieval Conference, pages 271-278, Paris, France, 2018b. ISBN 978-2-9540351-2-3. [ bib | .pdf ] |
| [Vigliensoni2018] | Gabriel Vigliensoni, Jorge Calvo-Zaragoza, and Ichiro Fujinaga. Developing an environment for teaching computers to read music. In Jorge Calvo-Zaragoza, Jan Hajič jr., and Alexander Pacha, editors, 1st International Workshop on Reading Music Systems, pages 27-28, Paris, France, 2018. [ bib | http ] |
| [Vo2017] |
Quang Nhat Vo, Guee Sang Lee, Soo Hyung Kim, and Hyung Jeong Yang.
Recognition of Music Scores with Non-Linear Distortions in Mobile
Devices.
Multimedia Tools and Applications, 77 (12): 15951-15969,
2018.
ISSN 1573-7721.
[ bib |
DOI ]
Optical music recognition (OMR), when the input music score is captured by a handheld or a mobile phone camera, suffers from severe degradation in the image quality and distortions caused by non-planar document curvature and perspective projection. Hence the binarization of the input often fails to preserve the details of the original music score, leading to a poor performance in recognition of music symbols. This paper addresses the issue of staff line detection, which is the most important step in OMR, in the presence of nonlinear distortions and describes how to cope with severe degradations in recognition of music symbols. First, a RANSAC-based detection of curved staff lines is presented and staves are segmented into sub-areas for the rectification with bi-quadratic transformation. Then, run length coding is used to recognize music symbols such as stem, note head, flag, and beam. The proposed system is implemented on smart phones, and it shows promising results with music score images captured in the mobile environment.
|
| [Yin2018] | Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie, and Guoping Hu. Transcribing Content from Structural Images with Spotlight Mechanism. In 24th International Conference on Knowledge Discovery & Data Mining, pages 2643-2652, London, United Kingdom, 2018. ACM. ISBN 978-1-4503-5552-0. [ bib | DOI ] |
| [Baro2017] | Arnau Baró, Pau Riba, Jorge Calvo-Zaragoza, and Alicia Fornés. Optical Music Recognition by Recurrent Neural Networks. In 14th International Conference on Document Analysis and Recognition, pages 25-26, Kyoto, Japan, 2017. IEEE. [ bib | DOI ] |
| [Baro-Mas2017] | Arnau Baró-Mas. Optical Music Recognition by Long Short-Term Memory Recurrent Neural Networks. Master's thesis, Universitat Autònoma de Barcelona, 2017. [ bib | .pdf ] |
| [Bountouridis2017] |
Dimitrios Bountouridis, Frans Wiering, Dan Brown, and Remco C. Veltkamp.
Towards Polyphony Reconstruction Using Multidimensional Multiple
Sequence Alignment.
In João Correia, Vic Ciesielski, and Antonios Liapis, editors,
Computational Intelligence in Music, Sound, Art and Design, pages
33-48, Cham, 2017. Springer International Publishing.
ISBN 978-3-319-55750-2.
[ bib |
DOI ]
The digitization of printed music scores through the process of optical music recognition is imperfect. In polyphonic scores, with two or more simultaneous voices, errors of duration or position can lead to badly aligned and inharmonious digital transcriptions. We adapt biological sequence analysis tools as a post-processing step to correct the alignment of voices. Our multiple sequence alignment approach works on multiple musical dimensions and we investigate the contribution of each dimension to the correct alignment. Structural information, such musical phrase boundaries, is of major importance; therefore, we propose the use of the popular bioinformatics aligner Mafft which can incorporate such information while being robust to temporal noise. Our experiments show that a harmony-aware Mafft outperforms sophisticated, multidimensional alignment approaches and can achieve near-perfect polyphony reconstruction.
|
| [Calvo-Zaragoza2017] |
Jorge Calvo-Zaragoza, Antonio Pertusa, and Jose Oncina.
Staff-line detection and removal using a convolutional neural
network.
Machine Vision and Applications, pages 1-10, 2017b.
ISSN 1432-1769.
[ bib |
DOI ]
Staff-line removal is an important preprocessing stage for most optical music recognition systems. Common procedures to solve this task involve image processing techniques. In contrast to these traditional methods based on hand-engineered transformations, the problem can also be approached as a classification task in which each pixel is labeled as either staff or symbol, so that only those that belong to symbols are kept in the image. In order to perform this classification, we propose the use of convolutional neural networks, which have demonstrated an outstanding performance in image retrieval tasks. The initial features of each pixel consist of a square patch from the input image centered at that pixel. The proposed network is trained by using a dataset which contains pairs of scores with and without the staff lines. Our results in both binary and grayscale images show that the proposed technique is very accurate, outperforming both other classifiers and the state-of-the-art strategies considered. In addition, several advantages of the presented methodology with respect to traditional procedures proposed so far are discussed.
|
| [Calvo-Zaragoza2017a] |
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal.
Early handwritten music recognition with Hidden Markov Models.
In 15th International Conference on Frontiers in Handwriting
Recognition, pages 319-324. Institute of Electrical and Electronics
Engineers Inc., 2017d.
ISBN 9781509009817.
[ bib |
DOI ]
This work presents a statistical method to tackle the Handwritten Music Recognition task for Early notation, which comprises more than 200 different symbols. Unlike previous approaches to deal with music notation, our strategy is to perform a holistic recognition without any previous segmentation or staff removal process. The input consists of a page of a music book, which is processed to extract and normalize the staves contained. Then, a feature extraction process is applied to define such sections as a sequence of numerical vectors. The recognition is based on the use of Hidden Markov Models for the optical processing and smoothed N-grams as language model. Experimentation results over a historical archive of Hispanic music reported an error around 40 account the difficulty of the task.
|
| [Calvo-Zaragoza2017b] |
Jorge Calvo-Zaragoza, Alejandro Toselli, and Enrique Vidal.
Handwritten Music Recognition for Mensural Notation: Formulation,
Data and Baseline Results.
In 14th International Conference on Document Analysis and
Recognition, pages 1081-1086, Kyoto, Japan, 2017c.
[ bib |
DOI ]
Music is a key element for cultural transmission, and so large collections of music manuscripts have been preserved over the centuries. In order to develop computational tools for analysis, indexing and retrieval from these sources, it is necessary to transcribe the content to some machine-readable format. In this paper we discuss the Handwritten Music Recognition problem, which refers to the development of automatic transcription systems for musical manuscripts. We focus on mensural notation, one of the most widespread varieties of Western classical music. For that, we present a labeled corpus containing 576 staves, along with a baseline recognition system based on a combination of hidden Markov models and N-gram language models. The baseline error obtained at symbol level is about 40 % which, given the difficulty of the task, can be considered a good starting point for future developments. Our aim is that these data and preliminary results help to promote this research field, serving as a reference in future developments.
|
| [Calvo-Zaragoza2017c] | Jorge Calvo-Zaragoza, Jose J. Valero-Mas, and Antonio Pertusa. End-to-end Optical Music Recognition using Neural Networks. In 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017e. ISBN 978-981-11-5179-8. [ bib | .pdf ] |
| [Calvo-Zaragoza2017d] | Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. One-step detection of background, staff lines, and symbols in medieval music manuscripts with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017f. ISBN 978-981-11-5179-8. [ bib | .pdf ] |
| [Calvo-Zaragoza2017e] | Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. A machine learning framework for the categorization of elements in images of musical documents. In 3rd International Conference on Technologies for Music Notation and Representation, A Coruña, Spain, 2017g. University of A Coruña. [ bib | .pdf ] |
| [Calvo-Zaragoza2017f] |
Jorge Calvo-Zaragoza, Antonio-Javier Gallego, and Antonio Pertusa.
Recognition of Handwritten Music Symbols with Convolutional Neural
Codes.
In 14th International Conference on Document Analysis and
Recognition, pages 691-696, Kyoto, Japan, 2017a.
[ bib |
DOI ]
There are large collections of music manuscripts preserved over the centuries. In order to analyze these documents it is necessary to transcribe them into a machine-readable format. This process can be done automatically using Optical Music Recognition (OMR) systems, which typically consider segmentation plus classification workflows. This work is focused on the latter stage, presenting a comprehensive study for classification of handwritten musical symbols using Convolutional Neural Networks (CNN). The power of these models lies in their ability to transform the input into a meaningful representation for the task at hand, and that is why we study the use of these models to extract features (Neural Codes) for other classifiers. For the evaluation we consider four datasets containing different configurations and notation styles, along with a number of network models, different image preprocessing techniques and several supervised learning classifiers. Our results show that a remarkable accuracy can be achieved using the proposed framework, which significantly outperforms the state of the art in all datasets considered.
|
| [Calvo-Zaragoza2017g] |
Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga.
Pixelwise classification for music document analysis.
In 7th International Conference on Image Processing Theory,
Tools and Applications, pages 1-6, 2017h.
[ bib |
DOI ]
Content within musical documents not only contains music symbol but also include different elements such as staff lines, text, or frontispieces. Before attempting to automatically recognize components in these layers, it is necessary to perform an analysis of the musical documents in order to detect and classify each of these constituent parts. The obstacle for this analysis is the high heterogeneity amongst music collections, especially with ancient documents, which makes it difficult to devise methods that can be generalizable to a broader range of sources. In this paper we propose a data-driven document analysis framework based on machine learning that focuses on classifying regions of interest at pixel level. For that, we make use of Convolutional Neural Networks trained to infer the category of each pixel. The main advantage of this approach is that it can be applied regardless of the type of document provided, as long as training data is available. Since this work represents first efforts in that direction, our experimentation focuses on reporting a baseline classification using our framework. The experiments show promising performance, achieving an accuracy around 90% in two corpora of old music documents.
|
| [Calvo-Zaragoza2017h] | Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. Pixel-wise binarization of musical documents with convolutional neural networks. In 15th International Conference on Machine Vision Applications, pages 362-365, 2017i. [ bib | DOI ] |
| [Calvo-Zaragoza2017i] |
Jorge Calvo-Zaragoza and Jose Oncina.
Recognition of pen-based music notation with finite-state machines.
Expert Systems with Applications, 72: 395-406, 2017.
ISSN 0957-4174.
[ bib |
DOI ]
This work presents a statistical model to recognize pen-based music compositions using stroke recognition algorithms and finite-state machines. The series of strokes received as input is mapped onto a stochastic representation, which is combined with a formal language that describes musical symbols in terms of stroke primitives. Then, a Probabilistic Finite-State Automaton is obtained, which defines probabilities over the set of musical sequences. This model is eventually crossed with a semantic language to avoid sequences that does not make musical sense. Finally, a decoding strategy is applied in order to output a hypothesis about the musical sequence actually written. Comprehensive experimentation with several decoding algorithms, stroke similarity measures and probability density estimators are tested and evaluated following different metrics of interest. Results found have shown the goodness of the proposed model, obtaining competitive performances in all metrics and scenarios considered.
|
| [Calvo-Zaragoza2017j] |
Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga.
Staff-Line Detection on Grayscale Images with Pixel Classification.
In Luís A. Alexandre, José Salvador Sánchez, and João
M. F. Rodrigues, editors, Pattern Recognition and Image Analysis,
pages 279-286, Cham, 2017j. Springer International Publishing.
ISBN 978-3-319-58838-4.
[ bib |
DOI ]
Staff-line detection and removal are important processing steps in most Optical Music Recognition systems. Traditional methods make use of heuristic strategies based on image processing techniques with binary images. However, binarization is a complex process for which it is difficult to achieve perfect results. In this paper we describe a novel staff-line detection and removal method that deals with grayscale images directly. Our approach uses supervised learning to classify each pixel of the image as symbol, staff, or background. This classification is achieved by means of Convolutional Neural Networks. The features of each pixel consist of a square window from the input image centered at the pixel to be classified. As a case of study, we performed experiments with the CVC-Muscima dataset. Our approach showed promising performance, outperforming state-of-the-art algorithms for staff-line removal.
|
| [Calvo-Zaragoza2017k] |
Jorge Calvo-Zaragoza, Ké Zhang, Zeyad Saleh, Gabriel Vigliensoni, and
Ichiro Fujinaga.
Music Document Layout Analysis through Machine Learning and Human
Feedback.
In 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), volume 02, pages 23-24, Nov 2017k.
[ bib |
DOI ]
Music documents often include musical symbols as well as other relevant elements such as staff lines, text, and decorations. To detect and separate these constituent elements, we propose a layout analysis framework based on machine learning that focuses on pixel-level classification of the image. For that, we make use of supervised learning classifiers trained to infer the category of each pixel. In addition, our scenario considers a human-aided computing approach in which the user is part of the recognition loop, providing feedback where relevant errors are made.
|
| [Chen2017] | Liang Chen, Rong Jin, and Christopher Raphael. Human-Guided Recognition of Music Score Images. In 4th International Workshop on Digital Libraries for Musicology. ACM Press, 2017. [ bib | DOI ] |
| [Chen2017b] | Liang Chen and Christopher Raphael. Renotation of Optical Music Recognition Data. In 14th Sound and Music Computing Conference, Espoo, Finland, 2017. [ bib | .pdf ] |
| [Choi2017] |
Kwon-Young Choi, Bertrand Coüasnon, Yann Ricquebourg, and Richard Zanibbi.
Bootstrapping Samples of Accidentals in Dense Piano Scores for
CNN-Based Detection.
In 14th International Conference on Document Analysis and
Recognition, Kyoto, Japan, 2017. IAPR TC10 (Technical Committee on Graphics
Recognition), IEEE Computer Society.
ISBN 978-1-5386-3586-5.
[ bib |
DOI ]
State-of-the-art Optical Music Recognition system often fails to process dense and damaged music scores, where many symbols can present complex segmentation problems. We propose to resolve these segmentation problems by using a CNNbased detector trained with few manually annotated data. A data augmentation bootstrapping method is used to accurately train a deep learning model to do the localization and classification of an accidental symbol associated with a note head, or the note head if there is no accidental. Using 5-fold cross-validation, we obtain an average of 98.5 and a classification accuracy of 99.2%.
|
| [Gallego2017] |
Antonio-Javier Gallego and Jorge Calvo-Zaragoza.
Staff-line removal with selectional auto-encoders.
Expert Systems with Applications, 89: 138-148, 2017.
ISSN 0957-4174.
[ bib |
DOI |
http ]
Abstract Staff-line removal is an important preprocessing stage as regards most Optical Music Recognition systems. The common procedures employed to carry out this task involve image processing techniques. In contrast to these traditional methods, which are based on hand-engineered transformations, the problem can also be approached from a machine learning point of view if representative examples of the task are provided. We propose doing this through the use of a new approach involving auto-encoders, which select the appropriate features of an input feature set (Selectional Auto-Encoders). Within the context of the problem at hand, the model is trained to select those pixels of a given image that belong to a musical symbol, thus removing the lines of the staves. Our results show that the proposed technique is quite competitive and significantly outperforms the other state-of-art strategies considered, particularly when dealing with grayscale input images.
|
| [Gomez2017] | Ashley Antony Gomez and C. N. Sujatha. Optical Music Recognition: Staffline Detection and Removal. International Journal of Application or Innovation in Engineering & Management, 2017. [ bib ] |
| [Hajicjr.2017] | Jan Hajič jr. and Pavel Pecina. In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++. Computing Research Repository, abs/1703.04824: 1-16, 2017a. [ bib | DOI | arXiv ] |
| [Hajicjr.2017a] | Jan Hajič jr. and Pavel Pecina. Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression. Computing Research Repository, abs/1708.01806, 2017b. [ bib | DOI | arXiv ] |
| [Hajicjr.2017b] | Jan Hajič jr. and Matthias Dorfer. Prototyping Full-Pipeline Optical Music Recognition with MUSCIMarker. In Extended abstracts for the Late-Breaking Demo Session of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017. [ bib | .pdf ] |
| [Hajicjr.2017c] |
Jan Hajič jr. and Pavel Pecina.
Groundtruthing (Not Only) Music Notation with MUSICMarker: A
Practical Overview.
In 14th International Conference on Document Analysis and
Recognition, pages 47-48, Kyoto, Japan, 2017c.
[ bib |
DOI ]
Dataset creation for graphics recognition, especially for hand-drawn inputs, is often an expensive and time-consuming undertaking. The MUSCIMarker tool used for creating the MUSCIMA++ dataset for Optical Music Recognition (OMR) led to efficient use of annotation resources, and it provides enough flexibility to be applicable to creating datasets for other graphics recognition tasks where the ground truth can be represented similarly. First, we describe the MUSCIMA++ ground truth to define the range of tasks for which using MUSCIMarker to annotate ground truth is applicable. We then describe the MUSCIMarker tool itself, discuss its strong and weak points, and share practical experience with the tool from creating the MUSCIMA++ dataset.
|
| [Hajicjr.2017d] |
Jan Hajič jr. and Pavel Pecina.
The MUSCIMA++ Dataset for Handwritten Optical Music Recognition.
In 14th International Conference on Document Analysis and
Recognition, pages 39-46, Kyoto, Japan, 2017d.
[ bib |
DOI ]
Optical Music Recognition (OMR) promises to make accessible the content of large amounts of musical documents, an important component of cultural heritage. However, the field does not have an adequate dataset and ground truth for benchmarking OMR systems, which has been a major obstacle to measurable progress. Furthermore, machine learning methods for OMR require training data. We design and collect MUSCIMA++, a new dataset for OMR. Ground truth in MUSCIMA++ is a notation graph, which our analysis shows to be a necessary and sufficient representation of music notation. Building on the CVC-MUSCIMA dataset for staffline removal, the MUSCIMA++ dataset v1.0 consists of 140 pages of handwritten music, with 91254 manually annotated notation symbols and 82247 explicitly marked relationships between symbol pairs. The dataset allows training and directly evaluating models for symbol classification, symbol localization, and notation graph assembly, and indirectly musical content extraction, both in isolation and jointly. Open-source tools are provided for manipulating the dataset, visualizing the data and annotating further, and the data is made available under an open license.
|
| [iSeeNotes] | Gear Up AB. iSeeNotes. http://www.iseenotes.com, 2017. [ bib | http ] |
| [Jin2017] | Rong Jin. Graph-Based Rhythm Interpretation in Optical Music Recognition. PhD thesis, Indiana University, 2017. [ bib | http ] |
| [KompApp] | Gene Ragan. KompApp. http://kompapp.com, 2017. [ bib | http ] |
| [Mexin2017] | Yevgen Mexin, Aristotelis Hadjakos, Axel Berndt, Simon Waloschek, Anastasia Wawilow, and Gerd Szwillus. Tools for Annotating Musical Measures in Digital Music Editions. In 14th Sound and Music Computing Conference, pages 279-286, Espoo, Finland, 2017. [ bib | .pdf ] |
| [Montagner2017] |
Igor dos Santos Montagner, Nina S.T. Hirata, and Roberto Jr. Hirata.
Staff removal using image operator learning.
Pattern Recognition, 63: 310-320, 2017.
ISSN 0031-3203.
[ bib |
DOI ]
Staff removal is an image processing task that aims to facilitate further analysis of music score images. Even when restricted to images in specific domains such as music score recognition, solving image processing problems usually requires the design of customized algorithms. To cope with image variabilities and the growing amount of data, machine learning based techniques emerge as a natural approach to be employed in image processing problems. In this sense, image operator learning methods are concerned with estimating, from sample pairs of input-output images of a transformation, a local function that characterizes the image transformation. These methods require the definition of some parameters, including the local information to be considered in the processing which is defined by a window. In this work we show how to apply the image operator learning technique to the staff line removal problem. We present an algorithm for window determination and show that it captures visual information relevant for staff removal. We also present a reference window set to be used in cases where the training set is not sufficiently large. Experimental results obtained with respect to synthetic and handwritten music scores under varying image conditions show that the learned image operators are comparable with especially designed state-of-the-art heuristic algorithms. © 2016 Elsevier Ltd
|
| [MusicScoreClassifier] | Alexander Pacha. Github Repository of the Music Score Classifier. https://github.com/apacha/MusicScoreClassifier, 2017a. [ bib | http ] |
| [Oh2017] | Jiyong Oh, Sung Joon Son, Sangkuk Lee, Ji-Won Kwon, and Nojun Kwak. Online recognition of handwritten music symbols. International Journal on Document Analysis and Recognition, 20 (2): 79-89, 2017. [ bib | DOI ] |
| [OmrDatasetsProject] | Alexander Pacha. The OMR Datasets Project. https://apacha.github.io/OMR-Datasets, 2017b. [ bib | http ] |
| [Pacha2017] |
Alexander Pacha and Horst Eidenberger.
Towards a Universal Music Symbol Classifier.
In 14th International Conference on Document Analysis and
Recognition, pages 35-36, Kyoto, Japan, 2017a. IAPR TC10 (Technical
Committee on Graphics Recognition), IEEE Computer Society.
ISBN 978-1-5386-3586-5.
[ bib |
DOI ]
Optical Music Recognition (OMR) aims to recognize and understand written music scores. With the help of Deep Learning, researchers were able to significantly improve the state-of-the-art in this research area. However, Deep Learning requires a substantial amount of annotated data for supervised training. Various datasets have been collected in the past, but without a common standard that defines data formats and terminology, combining them is a challenging task. In this paper we present our approach towards unifying multiple datasets into the largest currently available body of over 90000 musical symbols that belong to 79 classes, containing both handwritten and printed music symbols. A universal music symbol classifier, trained on such a dataset using Deep Learning, can achieve an accuracy that exceeds 98%.
|
| [Pacha2017a] |
Alexander Pacha and Horst Eidenberger.
Towards Self-Learning Optical Music Recognition.
In 16th International Conference on Machine Learning and
Applications, pages 795-800, 2017b.
[ bib |
DOI ]
Optical Music Recognition (OMR) is a branch of artificial intelligence that aims at automatically recognizing and understanding the content of music scores in images. Several approaches and systems have been proposed that try to solve this problem by using expert knowledge and specialized algorithms that tend to fail at generalization to a broader set of scores, imperfect image scans or data of different formatting. In this paper we propose a new approach to solve OMR by investigating how humans read music scores and by imitating that behavior with machine learning. To demonstrate the power of this approach, we conduct two experiments that teach a machine to distinguish entire music sheets from arbitrary content through frame-by-frame classification and distinguishing between 32 classes of handwritten music symbols which can be a basis for object detection. Both tasks can be performed at high rates of confidence (>98comparable to the performance of humans on the same task.
|
| [Parada-Cabaleiro2017] | Emilia Parada-Cabaleiro, Anton Batliner, Alice Baird, and Björn Schuller. The SEILS Dataset: Symbolically Encoded Scores in Modern-Early Notation for Computational Musicology. In 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017. ISBN 978-981-11-5179-8. [ bib | .pdf ] |
| [Riba2017] |
Pau Riba, Alicia Fornés, and Josep Lladós.
Towards the Alignment of Handwritten Music Scores.
In Lins R.D. Lamiroy B., editor, Graphic Recognition. Current
Trends and Challenges, Lecture Notes in Computer Science, pages 103-116.
Springer Verlag, 2017.
ISBN 9783319521589.
[ bib |
DOI ]
It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results. © Springer International Publishing AG 2017.
|
| [RicoBlanes2017] | Adrià Rico Blanes and Alicia Fornés Bisquerra. Camera-Based Optical Music Recognition Using a Convolutional Neural Network. In 14th International Conference on Document Analysis and Recognition, pages 27-28, Kyoto, Japan, 2017. IEEE. [ bib | DOI ] |
| [Roy2017] |
Partha Pratim Roy, Ayan Kumar Bhunia, and Umapada Pal.
HMM-based writer identification in music score documents without
staff-line removal.
Expert Systems with Applications, 89: 222-240, 2017.
ISSN 0957-4174.
[ bib |
DOI |
http ]
Writer identification from musical score documents is a challenging task due to its inherent problem of overlapping of musical symbols with staff-lines. Most of the existing works in the literature of writer identification in musical score documents were performed after a pre-processing stage of staff-lines removal. In this paper we propose a novel writer identification framework in musical score documents without removing staff-lines from the documents. In our approach, Hidden Markov Model (HMM) has been used to model the writing style of the writers without removing staff-lines. The sliding window features are extracted from musical score-lines and they are used to build writer specific HMM models. Given a query musical sheet, writer specific confidence for each musical line is returned by each writer specific model using a log-likelihood score. Next, a log-likelihood score in page level is computed by weighted combination of these scores from the corresponding line images of the page. A novel Factor Analysis-based feature selection technique is applied in sliding window features to reduce the noise appearing from staff-lines which proves efficiency in writer identification performance. In our framework we have also proposed a novel score-line detection approach in musical sheet using HMM. The experiment has been performed in CVC-MUSCIMA data set and the results obtained show that the proposed approach is efficient for score-line detection and writer identification without removing staff-lines. To get the idea of computation time of our method, detail analysis of execution time is also provided.
|
| [Saleh2017] | Zeyad Saleh, Ke Zhang, Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. Pixel.js: Web-Based Pixel Classification Correction Platform for Ground Truth Creation. In 14th International Conference on Document Analysis and Recognition, pages 39-40, Kyoto, Japan, 2017. [ bib | DOI ] |
| [Shi2017] |
Baoguang Shi, Xiang Bai, and Cong Yao.
An End-to-End Trainable Neural Network for Image-Based Sequence
Recognition and Its Application to Scene Text Recognition.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 39 (11): 2298-2304, 2017.
ISSN 0162-8828.
[ bib |
DOI ]
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for realworld application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.
|
| [SmartScore] | Musitek. SmartScore X2. http://www.musitek.com/smartscore-pro.html, 2017. [ bib | .html ] |
| [Sober-Mira2017] | Javier Sober-Mira, Jorge Calvo-Zaragoza, David Rizo, and José Manuel Iñesta. Pen-Based Music Document Transcription. In 14th International Conference on Document Analysis and Recognition, pages 21-22, Kyoto, Japan, 2017a. IEEE. [ bib | DOI ] |
| [Sober-Mira2017a] | Javier Sober-Mira, Jorge Calvo-Zaragoza, David Rizo, and José Manuel Iñesta. Multimodal Recognition for Music Document Transcription. In 10th International Workshop on Machine Learning and Music, Barcelona, Spain, 2017b. [ bib | .pdf ] |
| [StaffPad] | StaffPad Ltd. StaffPad. http://www.staffpad.net, 2017. [ bib | http ] |
| [Wel2017] | Eelco van der Wel and Karen Ullrich. Optical Music Recognition with Convolutional Sequence-to-Sequence Models. In 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017. ISBN 978-981-11-5179-8. [ bib | .pdf ] |
| [Wu2017] |
Fu-Hai Frank Wu.
Applying Machine Learning in Optical Music Recognition of Numbered
Music Notation.
In International Journal of Multimedia Data Engineering and
Management, page 21. IGI Global, 2017.
[ bib |
DOI ]
Although research of optical music recognition (OMR) has existed for few decades, most of efforts were put in step of image processing to approach upmost accuracy and evaluations were not in common ground. And major music notations explored were the conventional western music notations with staff. On contrary, the authors explore the challenges of numbered music notation, which is popular in Asia and used in daily life for sight reading. The authors use different way to improve recognition accuracy by applying elementary image processing with rough tuning and supplementing with methods of machine learning. The major contributions of this work are the architecture of machine learning specified for this task, the dataset, and the evaluation metrics, which indicate the performance of OMR system, provide objective function for machine learning and highlight the challenges of the scores of music with the specified notation.
|
| [Zhang2017a] |
Emily H. Zhang.
An Efficient Score Alignment Algorithm and its Applications.
Master's thesis, Massachusetts Institute of Technology, 2017.
[ bib |
http ]
String alignment and comparison in Computer Science is a well-explored space with classic problems such as Longest Common Subsequence that have practical application in bioinformatic genomic sequencing and data comparison in revision control systems. In the field of musicology, score alignment and comparison is a problem with many similarities to string comparison and alignment but also vast differences. In particular we can use ideas in string alignment and comparison to compare a music score in the MIDI format with a music score generated from Optical Musical Recognition (OMR), both of which have incomplete or wrong information, and correct errors that were introduced in the OMR process to create an improved third score. This thesis creates a set of algorithms that align and compare MIDI and OMR music scores to produce a corrected version of the OMR score that borrows ideas from classic computer science string comparison and alignment algorithm but also incorporates optimizations and heuristics from music theory.
|
| [Baro2016] |
Arnau Baró, Pau Riba, and Alicia Fornés.
Towards the recognition of compound music notes in handwritten music
scores.
In 15th International Conference on Frontiers in Handwriting
Recognition, pages 465-470. Institute of Electrical and Electronics
Engineers Inc., 2016.
ISBN 9781509009817.
[ bib |
DOI ]
The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising.
|
| [Byrd2016] | Donald Byrd and Eric Isaacson. A Music Representation Requirement Specification for Academia. Technical report, Indiana University, Bloomington, 2016. [ bib | http ] |
| [Calvo-Zaragoza2016c] | Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga. Document Analysis for Music Scores via Machine Learning. In 3rd International workshop on Digital Libraries for Musicology, pages 37-40, New York, USA, 2016c. ACM, ACM. ISBN 978-1-4503-4751-8. [ bib | DOI ] |
| [Calvo-Zaragoza2016d] | Jorge Calvo-Zaragoza, David Rizo, and José Manuel Iñesta. Two (note) heads are better than one: pen-based multimodal interaction with music scores. In J. et al. Devaney, editor, 17th International Society for Music Information Retrieval Conference, pages 509-514, New York City, 2016b. ISBN 978-0-692-75506-8. [ bib | .pdf ] |
| [Calvo-Zaragoza2016e] |
Jorge Calvo-Zaragoza, Luisa Micó, and Jose Oncina.
Music staff removal with supervised pixel classification.
International Journal on Document Analysis and Recognition, 19
(3): 211-219, 2016a.
[ bib |
DOI ]
This work presents a novel approach to tackle the music staff removal. This task is devoted to removing the staff lines from an image of a music score while maintaining the symbol information. It represents a key step in the performance of most optical music recognition systems. In the literature, staff removal is usually solved by means of image processing procedures based on the intrinsics of music scores. However, we propose to model the problem as a supervised learning classification task. Surprisingly, although there is a strong background and a vast amount of research concerning machine learning, the classification approach has remained unexplored for this purpose. In this context, each foreground pixel is labelled as either staff or symbol. We use pairs of scores with and without staff lines to train classification algorithms. We test our proposal with several well-known classification techniques. Moreover, in our experiments no attempt of tuning the classification algorithms has been made, but the parameters were set to the default setting provided by the classification software libraries. The aim of this choice is to show that, even with this straightforward procedure, results are competitive with state-of-the-art algorithms. In addition, we also discuss several advantages of this approach for which conventional methods are not applicable such as its high adaptability to any type of music score.
|
| [Campos2016] |
Vicente Bosch Campos, Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique
Vidal Ruiz.
Sheet Music Statistical Layout Analysis.
In 15th International Conference on Frontiers in Handwriting
Recognition, pages 313-318, 2016.
[ bib |
DOI ]
In order to provide access to the contents of ancient music scores to researchers, the transcripts of both the lyrics and the musical notation is required. Before attempting any type of automatic or semi-automatic transcription of sheet music, an adequate layout analysis (LA) is needed. This LA must provide not only the locations of the different image regions, but also adequate region labels to distinguish between different region types such as staff, lyric, etc. To this end, we adapt a stochastic framework for LA based on Hidden Markov Models that we had previously introduced for detection and classification of text lines in typical handwritten text images. The proposed approach takes a scanned music score image as input and, after basic preprocessing, simultaneously performs region detection and region classification in an integrated way. To assess this statistical LA approach several experiments were carried out on a representative sample of a historical music archive, under different difficulty settings. The results show that our approach is able to tackle these structured documents providing good results not only for region detection but also for classification of the different regions.
|
| [Chen2016] |
Liang Chen and Kun Duan.
MIDI-assisted egocentric optical music recognition.
In Winter Conference on Applications of Computer Vision.
Institute of Electrical and Electronics Engineers Inc., 2016.
ISBN 9781509006410.
[ bib |
DOI ]
Egocentric vision has received increasing attention in recent years due to the vast development of wearable devices and their applications. Although there are numerous existing work on egocentric vision, none of them solve Optical Music Recognition (OMR) problem. In this paper, we propose a novel optical music recognition approach for egocentric device (e.g. Google Glass) with the assistance of MIDI data. We formulate the problem as a structured sequence alignment problem as opposed to the blind recognition in traditional OMR systems. We propose a linear-chain Conditional Random Field (CRF) to model the note event sequence, which translates the relative temporal relations contained by MIDI to spatial constraints over the egocentric observation. We performed evaluations to compare the proposed approach with several different baselines and proved that our approach achieved the highest recognition accuracy. We view our work as the first step towards egocentric optical music recognition, and believe it will bring insights for next-generation music pedagogy and music entertainment.
|
| [Chen2016a] | Liang Chen and Christopher Raphael. Human-Directed Optical Music Recognition. Electronic Imaging, 2016 (17): 1-9, 2016. [ bib | DOI ] |
| [Chen2016b] | Liang Chen, Erik Stolterman, and Christopher Raphael. Human-Interactive Optical Music Recognition. In Michael I. Mandel, Johanna Devaney, Douglas Turnbull, and George Tzanetakis, editors, 17th International Society for Music Information Retrieval Conference, pages 647-653, 2016b. ISBN 978-0-692-75506-8. [ bib | .pdf ] |
| [Chen2016e] | Liang Chen, Rong Jin, Simo Zhang, Stefan Lee, Zhenhua Chen, and David Crandall. A Hybrid HMM-RNN Model for Optical Music Recognition. In Extended abstracts for the Late-Breaking Demo Session of the 17th International Society for Music Information Retrieval Conference, 2016a. [ bib | .pdf ] |
| [Dinh2016] |
Cong Minh Dinh, Hyung-Jeong Yang, Guee-Sang Lee, and Soo-Hyung Kim.
Fast lyric area extraction from images of printed Korean music
scores.
IEICE Transactions on Information and Systems, E99D (6):
1576-1584, 2016.
ISSN 0916-8532.
[ bib |
DOI ]
In recent years, optical music recognition (OMR) has been extensively developed, particularly for use with mobile devices that require fast processing to recognize and play live the notes in images captured from sheet music. However, most techniques that have been developed thus far have focused on playing back instrumental music and have ignored the importance of lyric extraction, which is time consuming and affects the accuracy of the OMR tools. The text of the lyrics adds complexity to the page layout, particularly when lyrics touch or overlap musical symbols, in which case it is very difficult to separate them from each other. In addition, the distortion that appears in captured musical images makes the lyric lines curved or skewed, making the lyric extraction problem more complicated. This paper proposes a new approach in which lyrics are detected and extracted quickly and effectively. First, in order to resolve the distortion problem, the image is undistorted by a method using information of stave lines and bar lines. Then, through the use of a frequency count method and heuristic rules based on projection, the lyric areas are extracted, the cases where symbols touch the lyrics are resolved, and most of the information from the musical notation is kept even when the lyrics and music notes are overlapping. Our algorithm demonstrated a short processing time and remarkable accuracy on two test datasets of images of printed Korean musical scores: The first set included three hundred scanned musical images; the second set had two hundred musical images that were captured by a digital camera. © 2016 The Institute of Electronics, Information and Communication Engineers.
|
| [Dorfer2016] | Matthias Dorfer, Andreas Arzt, and Gerhard Widmer. Towards End-to-End Audio-Sheet-Music Retrieval. Computing Research Repository, abs/1612.05070, 2016a. [ bib | DOI | arXiv ] |
| [Dorfer2016a] | Matthias Dorfer, Andreas Arzt, and Gerhard Widmer. Towards Score Following In Sheet Music Images. In Michael I. Mandel, Johanna Devaney, Douglas Turnbull, and George Tzanetakis, editors, 17th International Society for Music Information Retrieval Conference, pages 789-795, 2016b. ISBN 978-0-692-75506-8. [ bib | .pdf ] |
| [Hajicjr.2016] | Jan Hajič jr., Jiří Novotný, Pavel Pecina, and Jaroslav Pokorný. Further Steps towards a Standard Testbed for Optical Music Recognition. In Michael Mandel, Johanna Devaney, Douglas Turnbull, and George Tzanetakis, editors, 17th International Society for Music Information Retrieval Conference, pages 157-163, New York, USA, 2016. New York University, New York University. ISBN 978-0-692-75506-8. [ bib | http ] |
| [Jastrzebska2016] |
Agnieszka Jastrzebska and Wojciech Lesinski.
Optical Music Recognition as the Case of Imbalanced Pattern
Recognition: A Study of Single Classifiers.
In Andrzej M.J. Skulimowski and Janusz Kacprzyk, editors,
Knowledge, Information and Creativity Support Systems: Recent Trends,
Advances and Solutions, pages 493-505, Cham, 2016. Springer International
Publishing.
ISBN 978-3-319-19090-7.
[ bib |
DOI ]
The article is focused on a particular aspect of classification, namely the imbalance of recognized classes. The paper contains a comparative study of results of musical symbols classification using known algorithms: k-nearest neighbors, k-means, Mahalanobis minimal distance, and decision trees. Authors aim at addressing the problem of imbalanced pattern recognition. First, we theoretically analyze difficulties entailed in the classification of music notation symbols. Second, in the enclosed case study we investigate the fitness of named single classifiers on real data. Conducted experiments are based on own implementations of named algorithms with all necessary image processing tasks. Results are highly satisfying.
|
| [Laplante2016] | Audrey Laplante and Ichiro Fujinaga. Digitizing musical scores: Challenges and opportunities for libraries. In 3rd International Workshop on Digital Libraries for Musicology, pages 45-48. ACM, 2016. [ bib | DOI ] |
| [Lee2016a] |
Sangkuk Lee, Sung Joon Son, Jiyong Oh, and Nojun Kwak.
Handwritten Music Symbol Classification Using Deep Convolutional
Neural Networks.
In International Conference on Information Science and
Security, pages 1-5, 2016.
[ bib |
DOI ]
In this paper, we utilize deep Convolutional Neural Networks (CNNs) to classify handwritten music symbols in HOMUS data set. HOMUS data set is made up of various types of strokes which contain time information and it is expected that online techniques are more appropriate for classification. However, experimental results show that CNN which does not use time information achieved classification accuracy around 94.6 the prior state-of-the-art online technique. Finally, we achieved the best accuracy around 95.6% with the ensemble of CNNs.
|
| [Lehman-Borer2016] | Ryerson Lehman-Borer. Optical Music Recognition. Technical report, Swarthmore College, 2016. [ bib | http ] |
| [Pedersoli2016] |
Fabrizio Pedersoli and George Tzanetakis.
Document segmentation and classification into musical scores and
text.
International Journal on Document Analysis and Recognition, 19
(4): 289-304, 2016.
ISSN 1433-2825.
[ bib |
DOI ]
A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility.
|
| [PinheiroPereira2016] | Roberto M. Pinheiro Pereira, Caio E.F. Matos, Geraldo Jr. Braz, João D.S. de Almeida, and Anselmo C. de Paiva. A Deep Approach for Handwritten Musical Symbols Recognition. In 22nd Brazilian Symposium on Multimedia and the Web, pages 191-194, Teresina, Piau; Brazil, 2016. ACM. ISBN 978-1-4503-4512-5. [ bib | DOI ] |
| [PlayScore] | Organum. PlayScore. http://www.playscore.co, 2016. [ bib | http ] |
| [Rhodes2016] |
Christophe Rhodes, Tim Crawford, and Mark d'Inverno.
Duplicate Detection in Facsimile Scans of Early Printed Music.
In Analysis of Large and Complex Data, pages 449-459.
Springer International Publishing, Cham, 2016.
ISBN 978-3-319-25226-1.
[ bib |
DOI ]
There is a growing number of collections of readily available scanned musical documents, whether generated and managed by libraries, research projects, or volunteer efforts. They are typically digital images; for computational musicology we also need the musical data in machine-readable form. Optical Music Recognition (OMR) can be used on printed music, but is prone to error, depending on document condition and the quality of intermediate stages in the digitization process such as archival photographs. This work addresses the detection of one such error-duplication of images-and the discovery of other relationships between images in the process.
|
| [Vo2016] |
Quang Nhat Vo, Soo Hyung Kim, Hyung Jeong Yang, and Gueesang Lee.
An MRF model for binarization of music scores with complex
background.
Pattern Recognition Letters, 69: 88-95, 2016.
ISSN 0167-8655.
[ bib |
DOI ]
We present a Gaussian Mixture Markov Random Field (GMMRF) model that is effective for the binarization of music score images with complex backgrounds. The binarization of music score documents containing noises with arbitrary shapes and/or non-uniform colors in the background area is a very challenging problem. In order to extract the content knowledge of music score documents, the staff lines are extracted by first applying a stroke width transform. With the color and spatial information of the detected staff lines, we can accurately model the foreground and background color distribution, in which a GMMRF framework is used to make the binarization robust to variations in colors. Then, the staff line information is employed for guiding the GMMRF labeling process. In the experiment, the music score images captured by camera show promising results compared to existing methods.
|
| [Wen2016] |
Cuihong Wen, Jing Zhang, Ana Rebelo, and Fanyong Cheng.
A Directed Acyclic Graph-Large Margin Distribution Machine Model for
Music Symbol Classification.
PLoS ONE, 11 (3): 1-11, 2016.
[ bib |
DOI ]
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
|
| [Wu2016] |
Fu-Hai Frank Wu.
An Evaluation Framework of Optical Music Recognition in Numbered
Music Notation.
In International Symposium on Multimedia, pages 626-631,
2016.
[ bib |
DOI ]
In this study, we refine the ecosystem for optical music recognition (OMR) of numbered music notation with better accuracy. The ecosystem includes users, OMR system, dataset of music scores, groundtruth building, symbolic representation of sheet music, checking by musicological rules and performance evaluation. Especially, the evaluation metric includes exact and approximate approach to count accuracy automatically. The hands-on dataset comprises of 110 music score manuscripts in a songbook for singing reference. The experimental results justify the value of evaluation framework and show the necessity of checks complying with musicological properties.
|
| [Adamska2015] |
Julia Adamska, Mateusz Piecuch, Mateusz Podgórski, Piotr Walkiewicz, and
Ewa Lukasik.
Mobile System for Optical Music Recognition and Music Sound
Generation.
In Khalid Saeed and Wladyslaw Homenda, editors, Computer
Information Systems and Industrial Management, pages 571-582, Cham, 2015.
Springer International Publishing.
ISBN 978-3-319-24369-6.
[ bib |
DOI ]
The paper presents a mobile system for generating a melody based on a photo of a musical score. The client-server architecture was applied. The client role is designated to a mobile application responsible for taking a photo of a score, sending it to the server for further processing and playing mp3 file received from the server. The server role is to recognize notes from the image, generate mp3 file and send it to the client application. The key element of the system is the program realizing the algorithm of notes recognition. It is based on the decision trees and characteristics of the individual symbols extracted from the image. The system is implemented in the Windows Phone 8 framework and uses a cloud operating system Microsoft Azure. It enables easy archivization of photos, recognized notes in the Music XML format and generated mp3 files. An easy transition to other mobile operating systems is possible as well as processing multiple music collections scans.
|
| [Balke2015] |
Stefan Balke, Sanu Pulimootil Achankunju, and Meinard Müller.
Matching Musical Themes Based on Noisy OCR and OMR Input.
In International Conference on Acoustics, Speech and Signal
Processing, pages 703-707. Institute of Electrical and Electronics
Engineers Inc., 2015.
ISBN 9781467369978.
[ bib |
DOI ]
In the year 1948, Barlow and Morgenstern published the book 'A Dictionary of Musical Themes', which contains 9803 important musical themes from the Western classical music literature. In this paper, we deal with the problem of automatically matching these themes to other digitally available sources. To this end, we introduce a processing pipeline that automatically extracts from the scanned pages of the printed book textual metadata using Optical Character Recognition (OCR) as well as symbolic note information using Optical Music Recognition (OMR). Due to the poor printing quality of the book, the OCR and OMR results are quite noisy containing numerous extraction errors. As one main contribution, we adjust alignment techniques for matching musical themes based on the OCR and OMR input. In particular, we show how the matching quality can be substantially improved by fusing the OCR- and OMR-based matching results. Finally, we report on our experiments within the challenging Barlow and Morgenstern scenario, which also indicates the potential of our techniques when considering other sources of musical themes such as digital music archives and the world wide web.
|
| [Burgoyne2015] |
John Ashley Burgoyne, Ichiro Fujinaga, and J. Stephen Downie.
Music Information Retrieval.
In Susan Schreibman, Ray Siemens, and John Unsworth, editors, A
New Companion to Digital Humanities, pages 213-228. Wiley Blackwell, 2015.
ISBN 9781118680605.
[ bib |
DOI ]
Music information retrieval (MIR) is "a multidisciplinary research endeavor that strives to develop innovative content-based searching schemes, novel interfaces, and evolving networked delivery mechanisms in an effort to make the world's vast store of music accessible to all." MIR was born from computational musicology in the 1960s and has since grown to have links with music cognition and audio engineering, a dedicated annual conference (ISMIR) and an annual evaluation campaign (MIREX). MIR combines machine learning with expert human knowledge to use digital music data - images of music scores, "symbolic" data such as MIDI files, audio, and metadata about musical items - for information retrieval, classification and estimation, or sequence labeling. This chapter gives a brief history of MIR, introduces classical MIR tasks from optical music recognition to music recommendation systems, and outlines some of the key questions and directions for future developments in MIR. © 2016 John Wiley & Sons, Ltd.
|
| [Byrd2015] |
Donald Byrd and Jakob Grue Simonsen.
Towards a Standard Testbed for Optical Music Recognition:
Definitions, Metrics, and Page Images.
Journal of New Music Research, 44 (3): 169-195, 2015.
ISSN 0929-8215.
[ bib |
DOI ]
We posit that progress in Optical Music Recognition (OMR) has been held up for years by the absence of anything resembling the standard testbeds in use in other fields that face difficult evaluation problems. One example of such a field is text information retrieval (IR), where the Text Retrieval Conference (TREC) has annually-renewed IR tasks with accompanying data sets. In music informatics, the Music Information Retrieval Exchange (MIREX), with its annual tests and meetings held during the ISMIR conference, is a close analog to TREC; but MIREX has never had an OMR track or a collection of music such a track could employ. We describe why the absence of an OMR testbed is a problem and how this problem may be mitigated. To aid in the establishment of a standard testbed, we provide (1) a set of definitions for the complexity of music notation; (2) a set of performance metrics for OMR tools that gauge score complexity and graphical quality; and (3) a small corpus of music for use as a baseline for a proper OMR testbed.
|
| [Calvo-Zaragoza2015] |
Jorge Calvo-Zaragoza, Isabel Barbancho, Lorenzo J. Tardón, and Ana M.
Barbancho.
Avoiding staff removal stage in optical music
recognition: application to scores written in white mensural notation.
Pattern Analysis and Applications, 18 (4): 933-943, 2015.
ISSN 1433-755X.
[ bib |
DOI ]
Staff detection and removal is one of the most important issues in optical music recognition (OMR) tasks since common approaches for symbol detection and classification are based on this process. Due to its complexity, staff detection and removal is often inaccurate, leading to a great number of errors in posterior stages. For this reason, a new approach that avoids this stage is proposed in this paper, which is expected to overcome these drawbacks. Our approach is put into practice in a case of study focused on scores written in white mensural notation. Symbol detection is performed by using the vertical projection of the staves. The cross-correlation operator for template matching is used at the classification stage. The goodness of our proposal is shown in an experiment in which our proposal attains an extraction rate of 96 % and a classification rate of 92 %, on average. The results found have reinforced the idea of pursuing a new research line in OMR systems without the need of the removal of staff lines.
|
| [Calvo-Zaragoza2015a] |
Jorge Calvo-Zaragoza and Jose Oncina.
Clustering of strokes from pen-based music notation: An experimental
study.
Lecture Notes in Computer Science, 9117: 633-640, 2015.
ISSN 0302-9743.
[ bib |
DOI ]
A comfortable way of digitizing a new music composition is by using a pen-based recognition system, in which the digital score is created with the sole effort of the composition itself. In this kind of systems, the input consist of a set of pen strokes. However, it is hitherto unclear the different types of strokes that must be considered for this task. This paper presents an experimental study on automatic labeling of these strokes using the well-known k-medoids algorithm. Since recognition of pen-based music scores is highly related to stroke recognition, it may be profitable to repeat the process when new data is received through user interaction. Therefore, our intention is not to propose some stroke labeling but to show which stroke dissimilarities perform better within the clustering process. Results show that there can be found good methods in the trade-off between cluster complexity and classification accuracy, whereas others offer a very poor performance. © Springer International Publishing Switzerland 2015.
|
| [Chen2015] | Liang Chen, Rong Jin, and Christopher Raphael. Renotation from Optical Music Recognition. In Mathematics and Computation in Music, pages 16-26, Cham, 2015. Springer International Publishing. [ bib | DOI ] |
| [Chen2015a] | Liang Chen and Christopher Raphael. Ceres: An Interactive Optical Music Recognition System. In Extended abstracts for the Late-Breaking Demo Session of the 16th International Society for Music Information Retrieval Conference, Málaga, Spain, 2015. [ bib | .pdf ] |
| [Fang2015] |
Yang Fang and Teng Gui-fa.
Visual music score detection with unsupervised feature learning
method based on K-means.
International Journal of Machine Learning and Cybernetics, 6
(2): 277-287, 2015.
ISSN 1868-8071.
[ bib |
DOI ]
Automatic music score detection plays important role in the optical music recognition (OMR). In a visual image, the characteristic of the music scores is frequently degraded by illumination, distortion and other background elements. In this paper, to reduce the influences to OMR caused by those degradations especially the interference of Chinese character, an unsupervised feature learning detection method is proposed for improving the correctness of music score detection. Firstly, a detection framework was constructed. Then sub-image block features were extracted by simple unsupervised feature learning (UFL) method based on K-means and classified by SVM. Finally, music score detection processing was completed by connecting component searching algorithm based on the sub-image block label. Taking Chinese text as the main interferences, the detection rate was compared between UFL method and texture feature method based on 2D Gabor filter in the same framework. The experiment results show that unsupervised feature learning method gets less error detection rate than Gabor texture feature method with limited training set. © 2014, Springer-Verlag Berlin Heidelberg.
|
| [Huang2015] | Yu-Hui Huang, Xuanli Chen, Serafina Beck, David Burn, and Luc Van Gool. Automatic Handwritten Mensural Notation Interpreter: From Manuscript to MIDI Performance. In Meinard Müller and Frans Wiering, editors, 16th International Society for Music Information Retrieval Conference, pages 79-85, Málaga, Spain, 2015. ISBN 978-84-606-8853-2. [ bib | .pdf ] |
| [Lesinski2015] | Wojciech Lesinski and Agnieszka Jastrzebska. Optical Music Recognition: Standard and Cost-Sensitive Learning with Imbalanced Data. In IFIP International Conference on Computer Information Systems and Industrial Management, pages 601-612. Springer, 2015. [ bib | DOI ] |
| [Liu2015] |
Xiaoxiang Liu, Mi Zhou, and Peng Xu.
A Robust Method for Musical Note Recognition.
In 14th International Conference on Computer-Aided Design and
Computer Graphics, pages 212-213. Institute of Electrical and Electronics
Engineers Inc., 2015.
ISBN 9781467380201.
[ bib |
DOI ]
Musical note recognition plays a fundamental role in the process of the optical music recognition system. In this paper, we propose a robust method for recognizing notes. The method includes three parts: (1) the description of relationships between primitives by introducing the concept of interaction field, (2) the definition of six hierarchical structure features for analyzing notes structures, (3) the workflow of primitive assembly under the guidance of giving priority to key structure features. To evaluate the performance of our method, we present experimental results on real-life scores and comparisons with two commercial products. Experiment show that our method lead to quite good results, especially for complicated scores.
|
| [Mehta2015] |
Apurva A. Mehta and Malay S. Bhatt.
Optical Music Notes Recognition for Printed Piano Music Score Sheet.
In International Conference on Computer Communication and
Informatics, Coimbatore, India, 2015.
ISBN 9781479968053.
[ bib |
DOI ]
Entertainment, Therapy and Education are the fields where music is always found in couple with homo-sapiens. Music is presented in various formats to us like aural, visual and one more - written form of music that is known very less to us. In a way music dominates our life. System discussed in this paper inputs music score written for piano music using modern staff notations as image. Segmentation is carried out using hierarchical decomposition using thresholding along with stave lines of score sheet. Segmented symbols are recognized through an established artificial neural network based on boosting approach. Recognized symbols are represented in an admissible way. System is capable enough of addressing very complex cases and validation is done over 53 songs available at various global music scores resources. Segmentation algorithms achieve accuracy of 99.12% and segmented symbols are recognized with prompt accuracy of 92.38% through the help of PCA and AdaBoost.
|
| [Nguyen2015] | Tam Nguyen and Gueesang Lee. A Lightweight and Effective Music Score Recognition on Mobile Phones. Journal of Information Processing Systems, 11 (3): 438-449, 2015. [ bib | DOI ] |
| [NotateMe] | Neuratron. NotateMe. http://www.neuratron.com/notateme.html, 2015. [ bib | .html ] |
| [Novotny2015] |
Jiri Novotny and Jaroslav Pokorny.
Introduction to Optical Music Recognition: Overview and Practical
Challenges.
In Pokorny J. Necasky M., Moravec P., editor, Annual
International Workshop on DAtabases, TExts, Specifications and Objects,
pages 65-76. CEUR-WS, 2015.
[ bib |
.pdf ]
Music has been always an integral part of human culture. In our computer age, it is not surprising that there is a growing interest to store music in a digitized form. Optical music recognition (OMR) refers to a discipline that investigates music score recognition systems. This is similar to well-known optical character recognition systems, except OMR systems try to automatically transform scanned sheet music into a computer-readable format. In such a digital format, semantic information is also stored (instrumentation, notes, pitches and duration, contextual information, etc.). This article introduces the OMR field and presents an overview of the relevant literature and basic techniques. Practical challenges and questions arising from the automatic recognition of music notation and its semantic interpretation are discussed as well as the most important open issues.
|
| [Pham2015] | Viet-Khoi Pham, Hai-Dang Nguyen, and Minh-Triet Tran. Virtual Music Teacher for New Music Learners with Optical Music Recognition. In International Conference on Learning and Collaboration Technologies, pages 415-426. Springer, 2015b. [ bib | DOI ] |
| [Pham2015a] |
Van Khien Pham and Guee-Sang Lee.
Music Score Recognition Based on a Collaborative Model.
International Journal of Multimedia and Ubiquitous
Engineering, 10 (8): 379-390, 2015.
ISSN 1975-0080.
[ bib |
DOI ]
Recognition musical symbols are very important in music score system and they depend on these methods of researchers. Most of existing approaches for OMR (optical music recognition) removes staff lines before symbols are detected, therefore the symbols can get damaged easily. Another method recognizes symbols without staff line removal but all of them have a low accuracy rate and high processing time for recognizing symbols. In this paper, none staff removal and staff removal are suggested and these new methods are proposed to improve appreciation result of symbols. A lot of symbols are detected before deleted staff line as vertical lines, note head, pitch, beam, tail and then these staff lines are removed to identify other symbols using connected component. The proposed method is applied to the Samsung smart phone which embeds a high resolution camera. Experimental results show that the recognition rate is higher than existing methods and the computation time is reduced significantly.
|
| [Pham2015b] |
Viet-Khoi Pham, Hai-Dang Nguyen, Tung-Anh Nguyen-Khac, and Minh-Triet Tran.
Apply lightweight recognition algorithms in optical music
recognition.
In 7th International Conference on Machine Vision. SPIE,
2015a.
ISBN 9781628415605.
[ bib |
DOI ]
The problems of digitalization and transformation of musical scores into machine-readable format are necessary to be solved since they help people to enjoy music, to learn music, to conserve music sheets, and even to assist music composers. However, the results of existing methods still require improvements for higher accuracy. Therefore, the authors propose lightweight algorithms for Optical Music Recognition to help people to recognize and automatically play musical scores. In our proposal, after removing staff lines and extracting symbols, each music symbol is represented as a grid of identical M â- N cells, and the features are extracted and classified with multiple lightweight SVM classifiers. Through experiments, the authors find that the size of 10 â- 12 cells yields the highest precision value. Experimental results on the dataset consisting of 4929 music symbols taken from 18 modern music sheets in the Synthetic Score Database show that our proposed method is able to classify printed musical scores with accuracy up to 99.56%.
|
| [Ringwalt2015] | Dan Ringwalt, Roger Dannenberg, and Andrew Russell. Optical Music Recognition for Interactive Score Display. In Edgar Berdahl and Jesse T. Allison, editors, International Conference on New Interfaces for Musical Expression, pages 95-98, Baton Rouge, Louisiana, USA, 2015. The School of Music and the Center for Computation and Technology (CCT), Louisiana State University. ISBN 978-0-692-49547-6. [ bib | http ] |
| [Ringwalt2015a] | Dan Ringwalt and Roger B. Dannenberg. Image Quality Estimation for Multi-Score OMR. In 16th International Society for Music Information Retrieval Conference, pages 17-23, 2015. ISBN 978-84-606-8853-2. [ bib | .pdf ] |
| [Taele2015] |
Paul Taele, Laura Barreto, and Tracy Hammond.
Maestoso: An Intelligent Educational Sketching Tool for Learning
Music Theory.
In 27th Conference on Innovative Applications of Artificial
Intelligence, pages 3999-4005, Austin, Texas, 2015. AAAI Press.
ISBN 0-262-51129-0.
[ bib |
http ]
Learning music theory not only has practical benefits for musicians to write, perform, understand, and express music better, but also for both non-musicians to improve critical thinking, math analytical skills, and music appreciation. However, current external tools applicable for learning music theory through writing when human instruction is unavailable are either limited in feedback, lacking a written modality, or assuming already strong familiarity of music theory concepts. In this paper, we describe Maestoso, an educational tool for novice learners to learn music theory through sketching practice of quizzed music structures. Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback. From our evaluations, we demonstrate that Maestoso performs reasonably well on recognizing music structure elements and that novice students can comfortably grasp introductory music theory in a single session.
|
| [Wen2015] |
Cuihong Wen, Ana Rebelo, Jing Zhang, and Jamie dos Santos Cardoso.
A new optical music recognition system based on combined neural
network.
Pattern Recognition Letters, 58: 1-7, 2015.
ISSN 0167-8655.
[ bib |
DOI |
http ]
Abstract Optical music recognition (OMR) is an important tool to recognize a scanned page of music sheet automatically, which has been applied to preserving music scores. In this paper, we propose a new OMR system to recognize the music symbols without segmentation. We present a new classifier named combined neural network (CNN) that offers superior classification capability. We conduct tests on fifteen pages of music sheets, which are real and scanned images. The tests show that the proposed method constitutes an interesting contribution to OMR.
|
| [Alirezazadeh2014] | Fatemeh Alirezazadeh and Mohammad Reza Ahmadzadeh. Effective staff line detection, restoration and removal approach for different quality of scanned handwritten music sheets. Journal of Advanced Computer Science & Technology, 3 (2): 136-142, 2014. [ bib | DOI ] |
| [Bainbridge2014] |
David Bainbridge, Xiao Hu, and J. Stephen Downie.
A Musical Progression with Greenstone: How Music Content Analysis and
Linked Data is Helping Redefine the Boundaries to a Music Digital Library.
In 1st International Workshop on Digital Libraries for
Musicology. Association for Computing Machinery, 2014.
ISBN 9781450330022.
[ bib |
DOI ]
Despite the recasting of the web's technical capabilities through Web 2.0, conventional digital library software architectures-from which many of our leading Music Digital Libraries (MDLs) are formed-result in digital resources that are, surprisingly, disconnected from other online sources of information, and embody a "read-only" mindset. Leveraging from Music Information Retrieval (MIR) techniques and Linked Open Data (LOD), in this paper we demonstrate a new form of music digital library that encompasses management, discovery, delivery, and analysis of the musical content it contains. Utilizing open source tools such as Greenstone, audioDB, Meandre, and Apache Jena we present a series of transformations to a musical digital library sourced from audio files that steadily increases the level of support provided to the user for musicological study. While the seed for this work was motivated by better supporting musicologists in a digital library, the developed software architecture alters the boundaries to what is conventionally thought of as a digital library- and in doing so challenges core assumptions made in mainstream digital library software design. Copyright 2014 ACM.
|
| [Bui2014] |
Hoang-Nam Bui, Iin-Seop Na, and Soo-Hyung Kim.
Staff Line Removal Using Line Adjacency Graph and Staff Line Skeleton
for Camera-Based Printed Music Scores.
In 22nd International Conference on Pattern Recognition, pages
2787-2789, 2014.
[ bib |
DOI ]
On camera-based music scores, curved and uneven staff-lines tend to incur more frequently, and with the loss in performance of binarization methods, line thickness variation and space variation between lines are inevitable. We propose a novel and effective staff-line removal method based on following 3 main ideas. First, the state-of-the-art staff-line detection method, Stable Path, is used to extract staff-line skeletons of the music score. Second, a line adjacency graph (LAG) model is exploited in a different manner of over segmentation to cluster pixel runs generated from the run-length encoding (RLE) of the image. Third, a two-pass staff-line removal pipeline called filament filtering is applied to remove clusters lying on the staff-line. Our method shows impressive results on music score images captured from cameras, and gives high performance when applied to the ICDAR/GREC 2013 database.
|
| [Calvo-Zaragoza2014] |
Jorge Calvo-Zaragoza and Jose Oncina.
Recognition of Pen-Based Music Notation: The HOMUS Dataset.
In 22nd International Conference on Pattern Recognition, pages
3038-3043. Institute of Electrical & Electronics Engineers (IEEE), 2014.
[ bib |
DOI ]
A profitable way of digitizing a new musical composition is by using a pen-based (online) system, in which the score is created with the sole effort of the composition itself. However, the development of such systems is still largely unexplored. Some studies have been carried out but the use of particular little datasets has led to avoid objective comparisons between different approaches. To solve this situation, this work presents the Handwritten Online Musical Symbols (HOMUS) dataset, which consists of 15200 samples of 32 types of musical symbols from 100 different musicians. Several alternatives of recognition for the two modalities -online, using the strokes drawn by the pen, and offline, using the image generated after drawing the symbol- are also presented. Some experiments are included aimed to draw main conclusions about the recognition of these data. It is expected that this work can establish a binding point in the field of recognition of online handwritten music notation and serve as a baseline for future developments.
|
| [Chanda2014] | Sukalpa Chanda, Debleena Das, Umapada Pal, and Fumitaka Kimura. Offline Hand-Written Musical Symbol Recognition. 14th International Conference on Frontiers in Handwriting Recognition, pages 405-410, 2014. [ bib | DOI | http ] |
| [Chen2014] |
Gen-Fang Chen and Jia-Shing Sheu.
An optical music recognition system for traditional Chinese Kunqu
Opera scores written in Gong-Che Notation.
EURASIP Journal on Audio, Speech, and Music Processing, 2014
(1): 7, 2014.
ISSN 1687-4722.
[ bib |
DOI ]
This paper presents an optical music recognition (OMR) system to process the handwritten musical scores of Kunqu Opera written in Gong-Che Notation (GCN). First, it introduces the background of Kunqu Opera and GCN. Kunqu Opera is one of the oldest forms of musical activity, spanning the sixteenth to eighteenth centuries, and GCN has been the most popular notation for recording musical works in China since the seventh century. Many Kunqu Operas that use GCN are available as original manuscripts or photocopies, and transforming these versions into a machine-readable format is a pressing need. The OMR system comprises six stages: image pre-processing, segmentation, feature extraction, symbol recognition, musical semantics, and musical instrument digital interface (MIDI) representation. This paper focuses on the symbol recognition stage and obtains the musical information with Bayesian, genetic algorithm, and K-nearest neighbor classifiers. The experimental results indicate that symbol recognition for Kunqu Opera's handwritten musical scores is effective. This work will help to preserve and popularize Chinese cultural heritage and to store Kunqu Opera scores in a machine-readable format, thereby ensuring the possibility of spreading and performing original Kunqu Opera musical scores.
|
| [Chen2014a] | Liang Chen, Rong Jin, and Christopher Raphael. Optical Music Recognition with Human Labeled Constraints. In CHI'14 Workshop on Human-Centred Machine Learning, Toronto, Canada, 2014. [ bib | .pdf ] |
| [Church2014] | Maura Church and Michael Scott Cuthbert. Improving Rhythmic Transcriptions via Probability Models Applied Post-OMR. In Hsin-Min Wang, Yi-Hsuan Yang, and Jin Ha Lee, editors, 15th International Society for Music Information Retrieval Conference, pages 643-648, 2014. [ bib | .pdf ] |
| [Ding2014] |
Ing-Jr Ding, Chih-Ta Yen, Che-Wei Chang, and He-Zhong Lin.
Optical music recognition of the singer using formant frequency
estimation of vocal fold vibration and lip motion with interpolated GMM
classifiers.
Journal of Vibroengineering, 16 (5): 2572-2581, 2014.
ISSN 1392-8716.
[ bib |
http ]
The main work of this paper is to identify the musical genres of the singer by performing the optical detection of lip motion. Recently, optical music recognition has attracted much attention. Optical music recognition in this study is a type of automatic techniques in information engineering, which can be used to determine the musical style of the singer. This paper proposes a method for optical music recognition where acoustic formant analysis of both vocal fold vibration and lip motion are employed with interpolated Gaussian mixture model (GMM) estimation to perform musical genre classification of the singer. The developed approach for such classification application is called GMM-Formant. Since humming and voiced speech sounds cause periodic vibrations of the vocal folds and then the corresponding motion of the lip, the proposed GMM-Formant firstly operates to acquire the required formant information. Formant information is important acoustic feature data for recognition classification. The proposed GMM-Formant method then uses linear interpolation for combining GMM likelihood estimates and formant evaluation results appropriately. GMM-Formant will effectively adjust the estimated formant feature evaluation outcomes by referring to certain degree of the likelihood score derived from GMM calculations. The superiority and effectiveness of presented GMM-Formant are demonstrated by a series of experiments on musical genre classification of the singer.
|
| [Fornes2014] |
Alicia Fornés, Van Cuong Kieu, Muriel Visani, Nicholas Journet, and Anjan
Dutta.
The ICDAR/GREC 2013 Music Scores Competition: Staff Removal.
In Bart Lamiroy and Jean-Marc Ogier, editors, Graphics
Recognition. Current Trends and Challenges, pages 207-220, Berlin,
Heidelberg, 2014. Springer Berlin Heidelberg.
ISBN 978-3-662-44854-0.
[ bib |
DOI ]
The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated in both staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario concerning old and degraded music scores. For this purpose, we have generated a new set of semi-synthetic images using two degradation models that we previously introduced: local noise and 3D distortions. In this extended paper we provide an extended description of the dataset, degradation models, evaluation metrics, the participant's methods and the obtained results that could not be presented at ICDAR and GREC proceedings due to page limitations.
|
| [Fujinaga2014] | Ichiro Fujinaga, Andrew Hankinson, and Julie E. Cumming. Introduction to SIMSSA (Single Interface for Music Score Searching and Analysis). In 1st International Workshop on Digital Libraries for Musicology, pages 1-3. ACM, 2014. [ bib | DOI ] |
| [Fujinaga2014a] | Ichiro Fujinaga and Andrew Hankinson. SIMSSA: Single Interface for Music Score Searching and Analysis. Journal of the Japanese Society for Sonic Arts, 6 (3): 25-30, 2014. [ bib | .pdf ] |
| [Galea2014] | Dan Gâlea, Florin Rotaru, Silviu-Ioan Bejinariu, Mihai Bulea, Dan Murgu, Simona Pescaru, Vasile Apopei, Mihaela Murgu, and Irina Rusu. A review on printed music recognition system developed in institute of computer science iasi. Technical Report Lxiv, Universitatea Tehnica Gheorghe Asachi din Iasi, 2014. [ bib | .pdf ] |
| [Geraud2014] |
Thierry Géraud.
A morphological method for music score staff removal.
In International Conference on Image Processing, pages
2599-2603. Institute of Electrical and Electronics Engineers Inc., 2014.
ISBN 9781479957514.
[ bib |
DOI ]
Removing the staff in music score images is a key to improve the recognition of music symbols and, with ancient and degraded handwritten music scores, it is not a straightforward task. In this paper we present the method that has won in 2013 the staff removal competition, organized at the International Conference on Document Analysis and Recognition (ICDAR). The main characteristics of this method is that it essentially relies on mathematical morphology filtering. So it is simple, fast, and its full source code is provided to favor reproducible research. © 2014 IEEE.
|
| [Han2014] | Sejin Han and Gueesang Lee. Optical Music Score Recognition System for Smart Mobile Devices. International Journal of Contents, 10 (4): 63-68, 2014. [ bib | DOI ] |
| [Hankinson2014] | Andrew Hankinson. Optical music recognition infrastructure for large-scale music document analysis. PhD thesis, McGill University, 2014. [ bib | http ] |
| [Helsen2014] | Kate Helsen, Jennifer Bain, Ichiro Fujinaga, Andrew Hankinson, and Debra Lacoste. Optical music recognition and manuscript chant sources. Early Music, 42 (4): 555-558, 2014. [ bib | DOI ] |
| [Homenda2014] |
Wladyslaw Homenda and Wojciech Lesinski.
Decision trees and their families in imbalanced pattern recognition:
Recognition with and without Rejection.
Lecture Notes in Computer Science, 8838: 219-230, 2014.
ISSN 0302-9743.
[ bib |
DOI ]
Decision trees are considered to be among the best classifiers. In this work we use decision trees and its families to the problem of imbalanced data recognition. Considered are aspects of recognition without rejection and with rejection: it is assumed that all recognized elements belong to desired classes in the first case and that some of them are outside of such classes and are not known at classifiers training stage. The facets of imbalanced data and recognition with rejection affect different real world problems. In this paper we discuss results of experiment of imbalanced data recognition on the case study of music notation symbols. Decision trees and three methods of joining decision trees (simple voting, bagging and random forest) are studied. These methods are used for recognition without and with rejection. © IFIP International Federation for Information Processing 2014.
|
| [Jastrzebska2014] |
Agnieszka Jastrzebska and Wojciech Lesinski.
Optical Music Recognition as the Case of Imbalanced Pattern
Recognition: A Study of Complex Classifiers.
In International Conference on Systems Science 2013, pages
325-335. Springer International Publishing, Cham, 2014.
ISBN 978-3-319-01857-7.
[ bib |
DOI ]
The article is focused on a particular aspect of classification, namely the imbalance of recognized classes. Imbalanced data adversely affects the recognition ability and requires proper classifier's construction. The aim of presented study is to explore the capabilities of classifier combining methods with such raised problem. In this paper authors discuss results of experiment of imbalanced data recognition on the case study of music notation symbols. Applied classification methods include: simple voting method, bagging and random forest.
|
| [Jastrzebski2014] | Krzysztof Jastrzebski. OMR for sheet music digitization. Master's thesis, Politechnika Wroclawska, 2014. [ bib | .pdf ] |
| [Kiriella2014] |
Dawpadee B. Kiriella, Shyama C. Kumari, Kavindu C. Ranasinghe, and Lakshman
Jayaratne.
Music Training Interface for Visually Impaired through a Novel
Approach to Optical Music Recognition.
GSTF Journal on Computing, 3 (4): 45, 2014.
ISSN 2010-2283.
[ bib |
DOI ]
Some inherited barriers which limits the human abilities can be surprisingly win through technology. This research focuses on defining a more reliable and a controllable interface for visually impaired people to read and study eastern music notations which are widely available in printed format. One of another concept behind was that differently-abled people should be assisted in a way which they can proceed interested tasks in an independent way. The research provide means to continue on researching the validity of using a controllable auditory interface instead using Braille music scripts converted with the help of 3rd parties. The research further summarizes the requirements aroused by the relevant users, design considerations, evaluation results on user feedbacks of proposed interface.
|
| [Kodirov2014] |
Elyor Kodirov, Sejin Han, Guee-Sang Lee, and YoungChul Kim.
Music with Harmony: Chord Separation and Recognition in Printed Music
Score Images.
In 8th International Conference on Ubiquitous Information
Management and Communication, pages 1-8, Siem Reap, Cambodia, 2014. ACM.
ISBN 978-1-4503-2644-5.
[ bib |
DOI ]
Optical music recognition systems are in the general interest recently. These systems achieve accurate symbol recognition at some level. However, chords are not considered in these systems yet they play a role in music. Therefore, we aimed to develop an algorithm that can deal with separation and recognition of chords in music score images. Separation is necessary because the chords can be touched, overlapped or/and broken due to noise and other reasons. By considering these problems, we propose top-down based separation using domain information and characteristics of the chords. To handle recognition, we propose a modified zoning method with k-nearest neighbor classifier. Also, we analyzed several classifiers with different features to see which method is reliable for the chord recognition. Since this topic is not considered with special focus before, there is not a standard benchmark to evaluate performance of the algorithm. Thus, we introduce a new dataset, namely OMR-ChSR6306, which includes a wide range of chords such as single chords, touched chords, and overlapped chords. Experiments on the proposed dataset demonstrate that our algorithm can separate and recognize the chords, with 100 and 98.98% recognition accuracy respectively.
|
| [Kusakunniran2014] | Worapan Kusakunniran, Attapol Prempanichnukul, Arthid Maneesutham, Kullachut Chocksawud, Suparus Tongsamui, and Kittikhun Thongkanchorn. Optical music recognition for traditional Thai sheet music. In International Computer Science and Engineering Conference, pages 157-162. IEEE, 2014. [ bib | DOI ] |
| [Mehta2014] | Apurva Ashokbhai Mehta and Malay S. Bhatt. Practical Issues in the Field of Optical Music Recognition. International Journal of Advance Research in Computer Science and Management Studies, 2 (1): 513-518, 2014. ISSN 2321-7782. Dubious Journal. [ bib | .pdf ] |
| [Montagner2014] |
Igor dos Santos Montagner, Roberto Jr. Hirata, and Nina S. T. Hirata.
Learning to remove staff lines from music score images.
In International Conference on Image Processing, pages
2614-2618, 2014a.
[ bib |
DOI ]
The methods for removal of staff lines rely on characteristics specific to musical documents and they are usually not robust to some types of imperfections in the images. To overcome this limitation, we propose the use of binary morphological operator learning, a technique that estimates a local operator from a set of example images. Experimental results in both synthetic and real images show that our approach can adapt to different types of deformations and achieves similar or better performance than existing methods in most of the test scenarios.
|
| [Montagner2014a] |
Igor dos Santos Montagner, Roberto Jr. Hirata, and Nina S. T. Hirata.
A Machine Learning based method for Staff Removal.
In 22nd International Conference on Pattern Recognition, pages
3162-3167. Institute of Electrical and Electronics Engineers Inc.,
2014b.
ISBN 9781479952083.
[ bib |
DOI ]
Staff line removal is an important pre-processing step to convert content of music score images to machine readable formats. Many heuristic algorithms have been proposed for staff removal and recently a competition was organized in the 2013 ICDAR/GREC conference. Music score images are often subject to different deformations and variations, and existing algorithms do not work well for all cases. We investigate the application of a machine learning based method for the staff removal problem. The method consists in learning multiple image operators from training input-output pairs of images and then combining the results of these operators. Each operator is based on local information provided by a neighborhood window, which is usually manually chosen based on the content of the images. We propose a feature selection based approach for automatically defining the windows and also for combining the operators. The performance of the proposed method is superior to several existing methods and is comparable to the best method in the competition. © 2014 IEEE.
|
| [Ng2014] | Kia Ng, Alex McLean, and Alan Marsden. Big Data Optical Music Recognition with Multi Images and Multi Recognisers. In EVA London 2014 on Electronic Visualisation and the Arts, pages 215-218. BCS, 2014. [ bib | DOI | .pdf ] |
| [Nguyen2014] |
Hong Quy Nguyen, Hyung-Jeong Yang, Soo-Hyung Kim, and Guee-Sang Lee.
Automatic Touching Detection and Recognition of Music Chord Using
Auto-encoding and Softmax.
In 8th International Conference on Ubiquitous Information
Management and Communication, Siem Reap, 2014. Association for Computing
Machinery.
[ bib |
DOI ]
Humankind envisioned an age of automatic where many machines perform all cumbersome and tedious tasks and we just enjoy. Playing music is not a tedious work but a program that plays music from music sheet image automatically can increase productivity of musician or bring convenience to amateurs. Following its requirement, we studied a specific task in Optical Music Recognition problem that is touching chord. Specially, touching chord becomes a critical problem on mobile device captured image because of some objective conditions. In this paper we showed our proposed method which used Autoencoder and Softmax classifier. The experiment results showed that our method is very promising. We get 94.117 96.261% in separate phase.
|
| [Nhat2014] |
Vo Quang Nhat and GueeSang Lee.
Adaptive Line Fitting for Staff Detection in Handwritten Music Score
Images.
In 8th International Conference on Ubiquitous Information
Management and Communication, pages 991-996, Siem Reap, Cambodia, 2014.
ACM.
ISBN 978-1-4503-2644-5.
[ bib |
DOI ]
The target of staff line detection is to extract staff lines accurately in order to remove them while preserves the shape of musical symbols. There are several researches in staff line detection and removal which provide good results with printed scores. However, in case of handwritten music scores, detecting staff lines still has problems due to the diversity of musical symbol shape, line curvature and disconnection. In this paper, we present a novel line fitting method for detecting the staff line in handwritten music score images. Our method first starts with the estimation of staff line height and staff space height. Then the staff segments are selected. Based on these staff candidates, we construct a line with the orientation of the staff segment and gradually fit it to the real lines. The staff line is then removed and the process is continuing until no line is detected. To show the effectiveness of our proposed method with different types of handwritten music score, images from the ICDAR/GREC 2013 dataset are tested. The experiment results show the advantages of our algorithm comparing with the previous approaches. Copyright 2014 ACM.
|
| [Padilla2014] |
Victor Padilla, Alan Marsden, Alex McLean, and Kia Ng.
Improving OMR for Digital Music Libraries with Multiple Recognisers
and Multiple Sources.
In 1st International Workshop on Digital Libraries for
Musicology, pages 1-8, London, United Kingdom, 2014. ACM.
ISBN 978-1-4503-3002-2.
[ bib |
DOI ]
Large quantities of scanned music are now available in public digital music libraries. However, the information in such sources is represented as pixel data in images rather than symbolic information about the notes of a piece of music, and therefore it is opaque to musically meaningful computational processes (e.g., to search for a particular melodic pattern). Optical Music Recognition (Optical Character Recognition for music) holds out the prospect of a solution to this issue and allowing access to very large quantities of musical information in digital libraries. Despite the efforts made by the different commercial OMR developers to improve the accuracy of their systems, mistakes in the output are currently too frequent to make OMR a practical tool for bulk processing.
|
| [Ramirez2014] |
Carolina Ramirez and Jun Ohya.
Automatic Recognition of Square Notation Symbols in Western
Plainchant Manuscripts.
Journal of New Music Research, 43 (4): 390-399, 2014.
ISSN 0929-8215.
[ bib |
DOI ]
Abstract: While the Optical Music Recognition (OMR) of printed and handwritten music scores in modern standard notation has been broadly studied, this is not the case for early music manuscripts. This is mainly due to the high variability in the sources introduced by their severe physical degradation, the lack of notation standards and, in the case of the scanned versions, by non-homogenous image-acquisition protocols. The volume of early musical manuscripts available is considerable, and therefore we believe that computational methods can be extremely useful in helping to preserve, share and analyse this information. This paper presents an approach to recognizing handwritten square musical notation in degraded western plainchant manuscripts from the XIVth to XVIth centuries. We propose the use of image processing techniques that behave robustly under high data variability and which do not require strong hypotheses regarding the condition of the sources. The main differences from traditional OMR approaches are our avoidance of the staff line removal stage and the use of grey-level images to perform primitive segmentation and feature extraction. We used 136 images from the Digital Scriptorium repository (DS, 2007), from which we were able to extract over 90 of all symbols present. For symbol classification, we used gradient-based features and SVM classifiers, obtaining over 90 over eight basic symbol classes.
|
| [Saitis2014] | Charalampos Saitis, Andrew Hankinson, and Ichiro Fujinaga. Correcting Large-Scale OMR Data with Crowdsourcing. In 1st International Workshop on Digital Libraries for Musicology, pages 1-3. ACM, 2014. [ bib | DOI ] |
| [Stramer2014] | Tal Stramer. Digitizing sheet music. Technical report, Stanford University, 2014. [ bib | .pdf ] |
| [Vo2014] |
Quang Nhat Vo, Tam Nguyen, Soo-Hyung Kim, Hyung-Jeong Yang, and Guee-Sang Lee.
Distorted music score recognition without Staffline removal.
In 22nd International Conference on Pattern Recognition, pages
2956-2960. Institute of Electrical and Electronics Engineers Inc., 2014.
ISBN 9781479952083.
[ bib |
DOI |
http ]
This paper proposes a new approach for recognizing the primitive musical symbols in distorted music scores without the staff line removal. We try to overcome two main issues. The first problem is the difficult and unreliable removal of staff lines required as a pre-processing step for most of recognition systems. The second problem is the non-linear distortion of the music score images captured by digital cameras. At the beginning, we detect the locations of bar-lines on each staff and segment it into sub-areas which can be rectified into undistorted shapes by biquadratic transformation. Then, musical rules, template matching, run length coding and projection methods are employed to extract the musical note information without the application of staff removal. The proposed method is implemented on smart phones and shows promising results. © 2014 IEEE.
|
| [Wallner2014] | Matthias Wallner. A System for Optical Music Recognition and Audio Synthesis. Master's thesis, TU Wien, 2014. [ bib | .pdf ] |
| [Wen2014] |
Cuihong Wen, Ana Rebelo, Jing Zhang, and Jamie dos Santos Cardoso.
Classification of optical music symbols based on combined neural
network.
In International Conference on Mechatronics and Control, pages
419-423, 2014.
[ bib |
DOI ]
In this paper, a new method for music symbol classification named Combined Neural Network (CNN) is proposed. Tests are conducted on more than 9000 music symbols from both real and scanned music sheets, which show that the proposed technique offers superior classification capability. At the same time, the performance of the new network is compared with the single Neural Network (NN) classifier using the same music scores. The average classification accuracy increased more than ten percent, reaching 98.82%.
|
| [Chen2013] | Yung-Sheng Chen, Feng-Sheng Chen, and Chin-Hung Teng. An Optical Music Recognition System for Skew or Inverted Musical Scores. International Journal of Pattern Recognition and Artificial Intelligence, 27 (07), 2013. [ bib | DOI ] |
| [Fornes2013] |
Alicia Fornés, Anjan Dutta, Albert Gordo, and Josep Lladós.
The 2012 Music Scores Competitions: Staff Removal and Writer
Identification.
In Young-Bin Kwon and Jean-Marc Ogier, editors, Graphics
Recognition. New Trends and Challenges, pages 173-186, Berlin, Heidelberg,
2013. Springer Berlin Heidelberg.
ISBN 978-3-642-36824-0.
[ bib |
DOI ]
Since there has been a growing interest in the analysis of handwritten music scores, we have tried to foster this interest by proposing in ICDAR and GREC two different competitions: Staff removal and Writer identification. Both competitions have been tested on the CVC-MUSCIMA database of handwritten music score images. In the corresponding ICDAR publication, we have described the ground-truth, the evaluation metrics, the participants' methods and results. As a result of the discussions with attendees in ICDAR and GREC concerning our music competition, we decided to propose a new experiment for an extended competition. Thus, this paper is focused on this extended competition, describing the new set of images and analyzing the new results.
|
| [Gordo2013] |
Albert Gordo, Alicia Fornés, and Ernest Valveny.
Writer identification in handwritten musical scores with bags of
notes.
Pattern Recognition, 46 (5): 1337-1345, 2013.
ISSN 0031-3203.
[ bib |
DOI |
http ]
Writer Identification is an important task for the automatic processing of documents. However, the identification of the writer in graphical documents is still challenging. In this work, we adapt the Bag of Visual Words framework to the task of writer identification in handwritten musical scores. A vanilla implementation of this method already performs comparably to the state-of-the-art. Furthermore, we analyze the effect of two improvements of the representation: a Bhattacharyya embedding, which improves the results at virtually no extra cost, and a Fisher Vector representation that very significantly improves the results at the cost of a more complex and costly representation. Experimental evaluation shows results more than 20 points above the state-of-the-art in a new, challenging dataset.
|
| [Hankinson2013] | Andrew Hankinson and Ichiro Fujinaga. Using optical music recognition to navigate and retrieve music documents. In Conference of the International Association of Music Libraries, Vienna, Austria, 2013. [ bib | .pdf ] |
| [Malik2013] |
Rakesh Malik, Partha Pratim Roy, Umapada Pal, and Fumitaka Kimura.
Handwritten Musical Document Retrieval Using Music-Score Spotting.
In 12th International Conference on Document Analysis and
Recognition, pages 832-836, 2013.
[ bib |
DOI ]
In this paper, we present a novel approach for retrieval of handwritten musical documents using a query sequence/word of musical scores. In our algorithm, the musical score-words are described as sequences of symbols generated from a universal codebook vocabulary of musical scores. Staff lines are removed first from musical documents using structural analysis of staff lines and symbol codebook vocabulary is created in offline. Next, using this symbol codebook the music symbol information in each document image is encoded. Given a query sequence of musical symbols in a musical score-line, the symbols in the query are searched in each of these encoded documents. Finally, a sub-string matching algorithm is applied to find query words. For codebook, two different feature extraction methods namely: Zernike Moments and 400 dimensional gradient features are tested and two unsupervised classifiers using SOM and K-Mean are evaluated. The results are compared with a baseline approach of DTW. The performance is measured on a collection of handwritten musical documents and results are promising.
|
| [Pugin2013] | Laurent Pugin and Tim Crawford. Evaluating OMR on the Early Music Online Collection. In Alceu de Souza Britto Jr., Fabien Gouyon, and Simon Dixon, editors, 14th International Society for Music Information Retrieval Conference, pages 439-444, Curitiba, Brazil, 2013. [ bib | .pdf ] |
| [Raphael2013] | Christopher Raphael and Rong Jin. Optical music recognition on the international music score library project. In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2013. [ bib | DOI ] |
| [Rebelo2013] | Ana Rebelo, André Marçal, and Jamie dos Santos Cardoso. Global constraints for syntactic consistency in OMR: an ongoing approach. In International Conference on Image Analysis and Recognition, 2013. [ bib | .pdf ] |
| [Rebelo2013a] |
Ana Rebelo and Jamie dos Santos Cardoso.
Staff Line Detection and Removal in the Grayscale Domain.
In 12th International Conference on Document Analysis and
Recognition, pages 57-61, 2013.
[ bib |
DOI ]
The detection of staff lines is the first step of most Optical Music Recognition (OMR) systems. Its great significance derives from the ease with which we can then proceed with the extraction of musical symbols. All OMR tasks are usually achieved using binary images by setting thresholds that can be local or global. These techniques however, may remove relevant information of the music sheet and introduce artifacts which will degrade results in the later stages of the process. It arises therefore a need to create a method that reduces the loss of information due to the binarization. The baseline for the methodology proposed in this paper follows the shortest path algorithm proposed in [CardosoTPAMI08]. The concept of strong staff pixels (SSP's), which is a set of pixels with a high probability of belonging to a staff line, is proposed to guide the cost function. The SSP allows to overcome the results of the binary based detection and to generalize the binary framework to grayscale music scores. The proposed methodology achieves good results.
|
| [Sapp2013] | Craig Sapp. OMR Comparison of SmartScore and SharpEye. https://ccrma.stanford.edu/~craig/mro-compare-beethoven, 2013. [ bib | http ] |
| [Silva2013] | Rui Miguel Filipe da Silva. Mobile framework for recognition of musical characters. Master's thesis, Universidade do Porto, 2013. [ bib | .pdf ] |
| [Tambouratzis2013] |
Tatiana Tambouratzis.
The Digital Music Stand as a Minimal Processing Custom-Made Optical
Music Recognition System, Part 1: Key Music Symbol Recognition.
International Journal of Intelligent Systems, 28 (5):
474-504, 2013.
ISSN 0884-8173.
[ bib |
DOI ]
The digital music stand is proposed as a minimal-processing optical music recognition implementation, where music score (MS) presentation is realized without prior alignment, noise, or staff line removal. After each MS page is segmented into systems, staves, measures, and candidate music symbols, music symbol recognition is accomplished via probabilistic neural networks: Only the key music symbols (namely clefs, global accidentals, time signatures) of the MS are identified, while the remaining music symbols are generally classified. Subsequently, satisfactory quality of on-screen MS viewing is accomplished via the concatenation and/or substitution of appropriately selected parts and isolated music symbols of the original MS. In this piece of research, the processing stages leading to on-screen MS presentation are detailed. © 2013 Wiley Periodicals, Inc.
|
| [Timofte2013] |
Radu Timofte and Luc Van Gool.
Automatic Stave Discovery for Musical Facsimiles.
In Kyoung Mu Lee, Yasuyuki Matsushita, James M. Rehg, and Zhanyi Hu,
editors, Computer Vision - ACCV 2012, pages 510-523, Berlin,
Heidelberg, 2013. Springer Berlin Heidelberg.
ISBN 978-3-642-37447-0.
[ bib |
DOI ]
Lately, there is an increased interest in the analysis of music score facsimiles, aiming at automatic digitization and recognition. Noise, corruption, variations in handwriting, non-standard page layouts and notations are common problems affecting especially the centuries-old manuscripts.
|
| [Vigliensoni2013] | Gabriel Vigliensoni, Gregory Burlet, and Ichiro Fujinaga. Optical measure recognition in common music notation. In 14th International Society for Music Information Retrieval Conference, Curitiba, Brazil, 2013. [ bib | .pdf ] |
| [Visaniy2013] |
Muriel Visaniy, V.C. Kieu, Alicia Fornés, and Nicholas Journet.
The ICDAR 2013 Music Scores Competition: Staff Removal.
In 12th International Conference on Document Analysis and
Recognition, pages 1407-1411, 2013.
[ bib |
DOI ]
The first competition on music scores that was organized at ICDAR in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario: old music scores. For this purpose, we have generated a new set of images using two kinds of degradations: local noise and 3D distortions. This paper describes the dataset, distortion methods, evaluation metrics, the participant's methods and the obtained results.
|
| [Witt2013] |
Carl Witt.
Optical Music Recognition Symbol Detection using Contour Traces,
2013.
[ bib ]
A novel approach to symbol detection in optical music recognition is presented. The binarized image of a scanned score is transformed into an intermediate representation by computing its contours and assigning additional visual features to them. The resulting contour points are accessed via a high dimensional spatial index that aids a heuristic search to detect a given symbol as described by a template image. An automatic and a manual method for generating ground truth data are presented, amongst other web-based tools to evaluate and supervise the recognition process.
|
| [Baba2012] | Tetsuaki Baba, Yuya Kikukawa, Toshiki Yoshiike, Tatsuhiko Suzuki, Rika Shoji, Kumiko Kushiyama, and Makoto Aoki. Gocen: A Handwritten Notational Interface for Musical Performance and Learning Music. In ACM SIGGRAPH 2012 Emerging Technologies, pages 9-9, New York, USA, 2012. ACM. ISBN 978-1-4503-1680-4. [ bib | DOI ] |
| [Burlet2012] | Gregory Burlet, Alastair Porter, Andrew Hankinson, and Ichiro Fujinaga. Neon.js: Neume Editor Online. In 13th International Society for Music Information Retrieval Conference, pages 121-126, Porto, Portugal, 2012. [ bib | .pdf ] |
| [Fornes2012] |
Alicia Fornés, Anjan Dutta, Albert Gordo, and Josep Lladós.
CVC-MUSCIMA: A Ground-truth of Handwritten Music Score Images for
Writer Identification and Staff Removal.
International Journal on Document Analysis and Recognition, 15
(3): 243-251, 2012.
ISSN 1433-2825.
[ bib |
DOI ]
The analysis of music scores has been an active research field in the last decades. However, there are no publicly available databases of handwritten music scores for the research community. In this paper, we present the CVC-MUSCIMA database and ground truth of handwritten music score images. The dataset consists of 1,000 music sheets written by 50 different musicians. It has been especially designed for writer identification and staff removal tasks. In addition to the description of the dataset, ground truth, partitioning, and evaluation metrics, we also provide some baseline results for easing the comparison between different approaches.
|
| [Hankinson2012] | Andrew Hankinson, John Ashley Burgoyne, Gabriel Vigliensoni, Alastair Porter, Jessica Thompson, Wendy Liu, Remi Chiu, and Ichiro Fujinaga. Digital Document Image Retrieval Using Optical Music Recognition. In Fabien Gouyon, Perfecto Herrera, Luis Gustavo Martins, and Meinard Müller, editors, 13th International Society for Music Information Retrieval Conference, pages 577-582, 2012b. [ bib | .pdf ] |
| [Hankinson2012a] | Andrew Hankinson, John Ashley Burgoyne, Gabriel Vigliensoni, and Ichiro Fujinaga. Creating a Large-scale Searchable Digital Collection from Printed Music Materials. In 21st International Conference on World Wide Web, pages 903-908, Lyon, France, 2012a. ACM. ISBN 978-1-4503-1230-1. [ bib | DOI ] |
| [Hankinson2012b] | Andrew Hankinson and Ichiro Fujinaga. SIMSSA: Single Interface for Music Score Searching and Analysis. In Conference of the International Association of Music Libraries, Montréal, QC, 2012. [ bib | .pdf ] |
| [Hankinson2012c] | Andrew Hankinson. Optical Music Recognition Bibliography. http://ddmal.music.mcgill.ca/research/omr/omr_bibliography, 2012. [ bib | http ] |
| [Jin2012] | Rong Jin and Christopher Raphael. Interpreting Rhythm in Optical Music Recognition. In Fabien Gouyon, Perfecto Herrera, Luis Gustavo Martins, and Meinard Müller, editors, 13th International Society for Music Information Retrieval Conference, pages 151-156, Porto, Portugal, 2012. [ bib | .pdf ] |
| [Liu2012] |
Xiaoxiang Liu.
Note Symbol Recognition for Music Scores.
In Jeng-Shyang Pan, Shyi-Ming Chen, and Ngoc Thanh Nguyen, editors,
Intelligent Information and Database Systems, pages 263-273, Berlin,
Heidelberg, 2012. Springer Berlin Heidelberg.
ISBN 978-3-642-28490-8.
[ bib |
DOI ]
Note symbol recognition plays a fundamental role in the process of an OMR system. In this paper, we propose new approaches for recognizing notes by extracting primitives and assembling them into constructed symbols. Firstly, we propose robust algorithms for extracting primitives (stems, noteheads and beams) based on Run-Length Encoding. Secondly, introduce the concept of interaction field to describe the relationship between primitives, and define six hierarchical categories for the structure of notes. Thirdly, propose an effective sequence to assemble the primitives into notes, guided by the mechanism of giving priority to the key structures. To evaluate the performance of those approaches,wepresent experimental results on real-life scores and comparisons with commercial systems. The results show our approaches can recognize notes with high-accuracy and powerful adaptability, especially for the complicated scores with high density of symbols.
|
| [Low2012] | Grady Low and Yung-Ho Chang. Optical Music Recognition Application, 2012. [ bib | .pdf ] |
| [Luangnapa2012] | Nawapon Luangnapa, Thongchai Silpavarangkura, Chakarida Nukoolkit, and Pornchai Mongkolnam. Optical Music Recognition on Android Platform. In International Conference on Advances in Information Technology, pages 106-115. Springer, 2012. [ bib | DOI ] |
| [Rebelo2012] | Ana Rebelo, Ichiro Fujinaga, Filipe Paszkiewicz, Andre R.S. Marcal, Carlos Guedes, and Jamie dos Santos Cardoso. Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1 (3): 173-190, 2012. [ bib | DOI ] |
| [Rebelo2012a] | Ana Rebelo. Robust Optical Recognition of Handwritten Musical Scores based on Domain Knowledge. PhD thesis, University of Porto, 2012. [ bib | .pdf ] |
| [Sebastien2012] | Véronique Sébastien, Henri Ralambondrainy, Olivier Sébastien, and Noël Conruyt. Score Analyzer: Automatically Determining Scores Difficulty Level for Instrumental e-Learning. In Fabien Gouyon, Perfecto Herrera, Luis Gustavo Martins, and Meinard Müller, editors, 13th International Society for Music Information Retrieval Conference, pages 571-576, Porto, Portugal, 2012. [ bib | .pdf ] |
| [Su2012] |
Bolan Su, Shijian Lu, Umapada Pal, and Chew Lim Tan.
An effective staff detection and removal technique for musical
documents.
In 10th International Workshop on Document Analysis Systems,
pages 160-164. IEEE, 2012.
ISBN 9780769546612.
[ bib |
DOI ]
Abstract Musical staff line detection and removal techniques detect the staff positions in musical documents and segment musical score from musical documents by removing those staff lines. It is an important preprocessing step for ensuing the Optical Music Recognition ...
|
| [Tsandilas2012] | Theophanis Tsandilas. Interpreting Strokes on Paper with a Mobile Assistant. In 25th Annual ACM Symposium on User Interface Software and Technology, pages 299-308, Cambridge, Massachusetts, USA, 2012. ACM. ISBN 978-1-4503-1580-7. [ bib | DOI ] |
| [Vidal2012] | Vitor Hugo Couto Vidal. Optical Music Recognition in the grey-scale domain. Technical report, Universidade do Porto, 2012. [ bib | .pdf ] |
| [Yin-xian2012] |
Yang Yin-xian and Yang Ding-li.
Staff Line Removal Algorithm Based on Trajectory Tracking and
Topological Structure of Score.
In 4th International Conference on Computer Modeling and
Simulation, 2012.
[ bib ]
Staff line removal plays a vital role in OMR technology, and is the preconditions of succeeding segmentation & recognition of music sheets. For the phenomena of over-deletion or mistaken deletion and under-deletion which often appear in removal process of staff lines, a novel staff line removal algorithm based on tra1jectory tracking and topological structure of music symbols is put forward to solve the deletion faults of partial notions, Experimental results show the presented algorithms can remove staff lines fast and effectively.
|
| [Bugge2011] |
Esben Paul Bugge, Kim Lundsteen Juncher, Brian Soborg Mathiasen, and Jakob Grue
Simonsen.
Using Sequence Alignment and Voting To Improve Optical Music
Recognition From Multiple Recognizers.
In 12th International Society for Music Information Retrieval
Conference, pages 405-410, 2011.
ISBN 9780615548654.
[ bib |
.pdf ]
Digitalizing sheet music using Optical Music Recognition (OMR) is error-prone, especially when using noisy images created from scanned prints. Inspired by DNA-sequence alignment, we devise a method to use multiple sequence alignment to automatically compare output from multiple third partyOMRtools and perform automatic error-correction of pitch and duration of notes. We perform tests on a corpus of 49 one-page scores of varying quality. Our method on average reduces the amount of errors from an ensemble of 4 commercial OMR tools. The method achieves, on average, fewer errors than each recognizer by itself, but statistical tests show that it is sig- nificantly better than only 2 of the 4 commercial recogniz- ers. The results suggest that recognizers may be improved somewhat by sequence alignment and voting, but that more elaborate methods may be needed to obtain substantial im- provements. All software, scanned music data used for testing, and experiment protocols are open source and available at: http://code.google.com/p/omr-errorcorrection/
|
| [Fornes2011] |
Alicia Fornés, Anjan Dutta, Albert Gordo, and Josep Llados.
The ICDAR 2011 Music Scores Competition: Staff Removal and Writer
Identification.
In International Conference on Document Analysis and
Recognition, pages 1511-1515, 2011.
[ bib |
DOI ]
In the last years, there has been a growing interest in the analysis of handwritten music scores. In this sense, our goal has been to foster the interest in the analysis of handwritten music scores by the proposal of two different competitions: Staff removal and Writer Identification. Both competitions have been tested on the CVC-MUSCIMA database: a ground-truth of handwritten music score images. This paper describes the competition details, including the dataset and ground-truth, the evaluation metrics, and a short description of the participants, their methods, and the obtained results.
|
| [Min2011] |
Du Min.
Research on numbered musical notation recognition and performance in
a intelligent system.
In International Conference on Business Management and
Electronic Information, pages 340-343, 2011.
[ bib |
DOI ]
A intelligent system with numbered musical notation recognition and performance (NMRPIS) is presented which is based on notation recognition and can play digital music automatically. The system combines with OMR to analyze musical notation, interpret completely, form the output quickly and efficiently by the embedded program. The experimental result indicates this system has high classification rate and higher recognition performance.
|
| [Pinto2011] |
Telmo Pinto, Ana Rebelo, Gilson Giraldi, and Jamie dos Santos Cardoso.
Music Score Binarization Based on Domain Knowledge.
In Jordi Vitrià, João Miguel Sanches, and Mario
Hernández, editors, Pattern Recognition and Image Analysis, pages
700-708. Springer Berlin Heidelberg, 2011.
ISBN 978-3-642-21257-4.
[ bib |
DOI ]
Image binarization is a common operation in the pre- processing stage in most Optical Music Recognition (OMR) systems. The choice of an appropriate binarization method for handwritten music scores is a difficult problem. Several works have already evaluated the performance of existing binarization processes in diverse applications. However, no goal-directed studies for music sheets documents were carried out. This paper presents a novel binarization method based in the content knowledge of the image. The method only needs the estimation of the staffline thickness and the vertical distance between two stafflines. This information is extracted directly from the gray level music score. The proposed binarization procedure is experimentally compared with several state of the art methods.
|
| [Raphael2011] | Christopher Raphael. Optical Music Recognition on the IMSLP. Technical report, Indiana University, Bloomington, 2011. [ bib ] |
| [Raphael2011a] | Christopher Raphael and Jingya Wang. New Approaches to Optical Music Recognition. In Anssi Klapuri and Colby Leider, editors, 12th International Society for Music Information Retrieval Conference, pages 305-310, Miami, Florida, 2011. University of Miami. [ bib | .pdf ] |
| [Rebelo2011] |
Ana Rebelo, Jakub Tkaczuk, Sousa Sousa, and Jamie dos Santos Cardoso.
Metric Learning for Music Symbol Recognition.
In 10th International Conference on Machine Learning and
Applications and Workshops, pages 106-111, 2011b.
[ bib |
DOI ]
Although Optical Music Recognition (OMR) has been the focus of much research for decades, the processing of handwritten musical scores is not yet satisfactory. The efforts made to find robust symbol representations and learning methodologies have not found a similar quality in the learning of the dissimilarity concept. Simple Euclidean distances are often used to measure dissimilarity between different examples. However, such distances do not necessarily yield the best performance. In this paper, we propose to learn the best distance for the k-nearest neighbor (k-NN) classifier. The distance concept will be tuned both for the application domain and the adopted representation for the music symbols. The performance of the method is compared with the support vector machine (SVM) classifier using both real and synthetic music scores. The synthetic database includes four types of deformations inducing variability in the printed musical symbols which exist in handwritten music sheets. The work presented here can open new research paths towards a novel automatic musical symbols recognition module for handwritten scores.
|
| [Rebelo2011a] | Ana Rebelo, Filipe Paszkiewicz, Carlos Guedes, Andre R. S. Marcal, and Jamie dos Santos Cardoso. A Method for Music Symbols Extraction based on Musical Rules. In Bridges 2011: Mathematics, Music, Art, Architecture, Culture, pages 81-88, 2011a. ISBN 098460426X. [ bib | .pdf ] |
| [Tambouratzis2011] |
Tatiana Tambouratzis.
Identification of key music symbols for optical music recognition and
on-screen presentation.
In International Joint Conference on Neural Networks, pages
1935-1942, 2011.
[ bib |
DOI ]
A novel optical music recognition (OMR) system is put forward, where the custom-made on-screen presentation of the music score (MS) is promoted via the recognition of key music symbols only. The proposed system does not require perfect manuscript alignment or noise removal. Following the segmentation of each MS page into systems and, subsequently, into staves, staff lines, measures and candidate music symbols (CMS's), music symbol recognition is limited to the identification of the clefs, accidentals and time signatures. Such an implementation entails significantly less computational effort than that required by classic OMR systems, without an observable compromise in the quality of the on-screen presentation of the MS. The identification of the music symbols of interest is performed via probabilistic neural networks (PNN's), which are trained on a small set of exemplars from the MS itself. The initial results are promising in terms of efficiency, identification accuracy and quality of viewing.
|
| [Thompson2011] | Jessica Thompson, Andrew Hankinson, and Ichiro Fujinaga. Searching the Liber Usualis: Using CouchDB and ElasticSearch to Query Graphical Music Documents. In 12th International Society for Music Information Retrieval Conference, 2011. [ bib | .pdf ] |
| [Vigliensoni2011] | Gabriel Vigliensoni, John Ashley Burgoyne, Andrew Hankinson, and Ichiro Fujinaga. Automatic Pitch Detection in Printed Square Notation. In Anssi Klapuri and Colby Leider, editors, 12th International Society for Music Information Retrieval Conference, pages 423-428, Miami, Florida, 2011. University of Miami. [ bib | .pdf ] |
| [Viro2011] |
Vladimir Viro.
Peachnote: Music Score Search and Analysis Platform.
In 12th International Society for Music Information Retrieval
Conference, pages 359-362, Miami, FL, 2011.
[ bib |
.pdf ]
Our system takes the scores in PDF format, runs optical music recognition (OMR) softwareover them, indexes the data and makes them accessible for querying and data min- ing. Thesearch engine is built upon Hadoop and HBase and runs on a cluster.
|
| [Byrd2010] | Donald Byrd, William Guerin, Megan Schindele, and Ian Knopke. OMR Evaluation and Prospects for Improved OMR via Multiple Recognizers. Technical report, Indiana University, Bloomington, IN, USA, 2010. [ bib | http ] |
| [Dutta2010] |
Anjan Dutta, Umapada Pal, Alicia Fornés, and Josep Llados.
An Efficient Staff Removal Approach from Printed Musical Documents.
In 20th International Conference on Pattern Recognition, pages
1965-1968, 2010.
[ bib |
DOI ]
Staff removal is an important preprocessing step of the Optical Music Recognition (OMR). The process aims to remove the stafflines from a musical document and retain only the musical symbols, later these symbols are used effectively to identify the music information. This paper proposes a simple but robust method to remove stafflines from printed musical scores. In the proposed methodology we have considered a staffline segment as a horizontal linkage of vertical black runs with uniform height. We have used the neighbouring properties of a staffline segment to validate it as a true segment. We have considered the dataset along with the deformations described in for evaluation purpose. From experimentation we have got encouraging results.
|
| [Gozzi2010] | Gianmarco Gozzi. OMRJX: A framework for piano scores optical music recognition. Master's thesis, Politecnico di Milano, 2010. [ bib | .pdf ] |
| [Hankinson2010] | Andrew Hankinson, Laurent Pugin, and Ichiro Fujinaga. An Interchange Format for Optical Music Recognition Applications. In 11th International Society for Music Information Retrieval Conference, pages 51-56, Utrecht, The Netherlands, 2010. [ bib | .pdf ] |
| [Pinto2010] | Telmo Pinto, Ana Rebelo, Gilson Giraldi, and Jamie dos Santos Cardoso. Content Aware Music Score Binarization. Technical report, Universidade do Porto, Portugal, 2010. [ bib | .pdf ] |
| [Rebelo2010] |
Ana Rebelo, G. Capela, and Jamie dos Santos Cardoso.
Optical recognition of music symbols.
International Journal on Document Analysis and Recognition, 13
(1): 19-31, 2010.
ISSN 1433-2825.
[ bib |
DOI ]
Many musical works produced in the past are still currently available only as original manuscripts or as photocopies. The preservation of these works requires their digitalization and transformation into a machine-readable format. However, and despite the many research activities on optical music recognition (OMR), the results for handwritten musical scores are far from ideal. Each of the proposed methods lays the emphasis on different properties and therefore makes it difficult to evaluate the efficiency of a proposed method. We present in this article a comparative study of several recognition algorithms of music symbols. After a review of the most common procedures used in this context, their respective performances are compared using both real and synthetic scores. The database of scores was augmented with replicas of the existing patterns, transformed according to an elastic deformation technique. Such transformations aim to introduce invariances in the prediction with respect to the known variability in the symbols, particularly relevant on handwritten works. The following study and the adopted databases can constitute a reference scheme for any researcher who wants to confront a new OMR algorithm face to well-known ones.
|
| [Rizo2010] | David Rizo. Symbolic music comparison with tree data structures. PhD thesis, Universidad de Alicante, 2010. [ bib | .pdf ] |
| [Burgoyne2009] | John Ashley Burgoyne, Yue Ouyang, Tristan Himmelman, Johanna Devaney, Laurent Pugin, and Ichiro Fujinaga. Lyric Extraction and Recognition on Digital Images of Early Music Sources. In 10th International Society for Music Information Retrieval Conference, pages 723-727, Kobe, Japan, 2009. [ bib | .pdf ] |
| [Byrd2009] | Donald Byrd. Studying Music is Difficult and Important: Challenges of Music Knowledge Representation. In Eleanor Selfridge-Field, Frans Wiering, and Geraint A. Wiggins, editors, Knowledge representation for intelligent music processing, number 09051 in Dagstuhl Seminar Proceedings, Wadern, Germany, 2009. Leibniz-Center for Informatics, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany. [ bib | http ] |
| [Cardoso2009] |
Jamie dos Santos Cardoso, Artur Capela, Ana Rebelo, Carlos Guedes, and Joaquim
Pinto da Costa.
Staff Detection with Stable Paths.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 31 (6): 1134-1139, 2009.
ISSN 0162-8828.
[ bib |
DOI ]
The preservation of musical works produced in the past requires their digitalization and transformation into a machine-readable format. The processing of handwritten musical scores by computers remains far from ideal. One of the fundamental stages to carry out this task is the staff line detection. We investigate a general-purpose, knowledge-free method for the automatic detection of music staff lines based on a stable path approach. Lines affected by curvature, discontinuities, and inclination are robustly detected. Experimental results show that the proposed technique consistently outperforms well-established algorithms.
|
| [Dalitz2009] |
Christoph Dalitz and Christine Pranzas.
German Lute Tablature Recognition.
In 10th International Conference on Document Analysis and
Recognition, pages 371-375, 2009.
[ bib |
DOI ]
This paper describes a document recognition system for 16th century German staffless lute tablature notation. We present methods for page layout analysis, symbol recognition and symbol layout analysis and report error rates for these methods on a variety of historic prints. Page layout analysis is based on horizontal separator lines, which may interfere with other symbols. The proposed algorithm for their detection and removal is also applicable to other single staff line detection problems (like percussion notation), for which common staff line removal algorithms fail.
|
| [Fornes2009] | Alicia Fornés, Josep Lladós, Gemma Sánchez, and Horst Bunke. On the Use of Textural Features for Writer Identification in Old Handwritten Music Scores. 10th International Conference on Document Analysis and Recognition, pages 996-1000, 2009. [ bib | DOI | http ] |
| [Fornes2009a] | Alicia Fornés. Writer Identification by a Combination of Graphical Features in the Framework of Old Handwritten Music Scores. PhD thesis, Universitat Autònoma de Barcelona, 2009. [ bib | .pdf ] |
| [Fremerey2009] | Christian Fremerey, David Damm, Frank Kurth, and Michael Clausen. Handling Scanned Sheet Music and Audio Recordings in Digital Music Libraries. In International Conference on Acoustics NAG/DAGA, pages 1-2, 2009. [ bib | .pdf ] |
| [Genfang2009] |
Chen Genfang, Zhang Wenjun, and Wang Qiuqiu.
Pick-up the Musical Information from Digital Musical Score Based on
Mathematical Morphology and Music Notation.
In 1st International Workshop on Education Technology and
Computer Science, pages 1141-1144, 2009.
[ bib |
DOI ]
The basic rule of musical notation for image processing is analyzed, in this paper. Using the structuring elements of musical notation and the basic algorithms of mathematical morphology, a new recognizing for the musical information of digital musical score is presented, and then the musical information is transformed to MIDI file for the communication and restoration of musical score. The results of experiment show that the statistic average value of recognition rate for musical information from digital musical score is 94.4%, and can be satisfied the practical applied demand, and it is a new way for applications of digital library, musical education, musical theory analysis and so on.
|
| [Johansen2009] | Linn Saxrud Johansen. Optical Music Recognition. Master's thesis, University of Oslo, 2009. [ bib | http ] |
| [Sharif2009] | Muhammad Sharif, Quratul-Ain Arshad, Mudassar Raza, and Wazir Zada Khan. [COMSCAN]: An Optical Music Recognition System. In 7th International Conference on Frontiers of Information Technology, page 34. ACM, 2009. [ bib | DOI ] |
| [Tardon2009] |
Lorenzo J. Tardón, Simone Sammartino, Isabel Barbancho, Verónica
Gómez, and Antonio Oliver.
Optical Music Recognition for Scores Written in White Mensural
Notation.
EURASIP Journal on Image and Video Processing, 2009 (1):
843401, 2009.
ISSN 1687-5281.
[ bib |
DOI ]
An Optical Music Recognition (OMR) system especially adapted for handwritten musical scores of the XVII-th and the early XVIII-th centuries written in white mensural notation is presented. The system performs a complete sequence of analysis stages: the input is the RGB image of the score to be analyzed and, after a preprocessing that returns a black and white image with corrected rotation, the staves are processed to return a score without staff lines; then, a music symbol processing stage isolates the music symbols contained in the score and, finally, the classification process starts to obtain the transcription in a suitable electronic format so that it can be stored or played. This work will help to preserve our cultural heritage keeping the musical information of the scores in a digital format that also gives the possibility to perform and distribute the original music contained in those scores.
|
| [Vrist2009] | Søren Bjerregaard Vrist. Optical Music Recognition for structural information from high-quality scanned music, 2009. [ bib ] |
| [Bellini2008] |
Pierfrancesco Bellini, Ivan Bruno, and Paolo Nesi.
Optical Music Recognition: Architecture and Algorithms.
In Kia Ng and Paolo Nesi, editors, Interactive Multimedia Music
Technologies, pages 80-110. IGI Global, Hershey, PA, USA, 2008.
[ bib |
DOI ]
Optical music recognition is a key problem for coding western music sheets in the digital world. This problem has been addressed in several manners obtaining suitable results only when simple music constructs are processed. To this end, several different strategies have been followed, to pass from the simple music sheet image to a complete and consistent representation of music notation symbols (symbolic music notation or representation). Typically, image processing, pattern recognition and symbolic reconstruction are the technologies that have to be considered and applied in several manners the architecture of the so called OMR (Optical Music Recognition) systems. In this chapter, the O3MR (Object Oriented Optical Music Recognition) system is presented. It allows producing from the image of a music sheet the symbolic representation and save it in XML format (WEDELMUSIC XML and MUSICXML). The algorithms used in this process are those of the image processing, image segmentation, neural network pattern recognition, and symbolic reconstruction and reasoning. Most of the solutions can be applied in other field of image understanding. The development of the O3MR solution with all its algorithms has been partially supported by the European Commission, in the IMUTUS Research and Development project, while the related music notation editor has been partially funded by the research and development WEDELMUSIC project of the European Commission. The paper also includes a methodology for the assessment of other OMR systems. The set of metrics proposed has been used to assess the quality of results produce by the O3MR with respect the best OMR on market.
|
| [Bullen2008] | Andrew H. Bullen. Bringing Sheet Music to Life: My Experiences with OMR. code4lib Journal, 3 (84), 2008. ISSN 1940-5758. [ bib | http ] |
| [Burgoyne2008] | John Ashley Burgoyne, Johanna Devaney, Laurent Pugin, and Ichiro Fujinaga. Enhanced Bleedthrough Correction for Early Music Documents with Recto-Verso Registration. In 9th International Conference on Music Information Retrieval, pages 407-412, Philadelphia, PA, 2008. [ bib | .pdf ] |
| [Capela2008] | Artur Capela, Jamie dos Santos Cardoso, Ana Rebelo, and Carlos Guedes. Integrated recognition system for music scores. In International Computer Music Conference, pages 3-6, 2008a. [ bib | http ] |
| [Capela2008a] | Artur Capela, Ana Rebelo, Jamie dos Santos Cardoso, and Carlos Guedes. Staff Line Detection and Removal with Stable Paths. In International Conference on Signal Processing and Multimedia Applications, 2008b. [ bib | .pdf ] |
| [Cardoso2008] |
Jamie dos Santos Cardoso, Artur Capela, Ana Rebelo, and Carlos Guedes.
A connected path approach for staff detection on a music score.
In 15th International Conference on Image Processing, pages
1005-1008, 2008.
[ bib |
DOI ]
The preservation of many music works produced in the past entails their digitalization and consequent accessibility in an easy-to-manage digital format. Carrying this task manually is very time consuming and error prone. While optical music recognition systems usually perform well on printed scores, the processing of handwritten musical scores by computers remain far from ideal. One of the fundamental stages to carry out this task is the staff line detection. In this paper a new method for the automatic detection of music staff lines based on a connected path approach is presented. Lines affected by curvature, discontinuities, and inclination are robustly detected. Experimental results show that the proposed technique consistently outperforms well-established algorithms.
|
| [Craig-McFeely2008] | Julia Craig-McFeely. Digital Image Archive of Medieval Music: The evolution of a digital resource. Digital Medievalist, 3, 2008. [ bib | DOI ] |
| [Dalitz2008] |
Christoph Dalitz, Michael Droettboom, Bastian Pranzas, and Ichiro Fujinaga.
A Comparative Study of Staff Removal Algorithms.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 30 (5): 753-766, 2008a.
ISSN 0162-8828.
[ bib |
DOI ]
This paper presents a quantitative comparison of different algorithms for the removal of stafflines from music images. It contains a survey of previously proposed algorithms and suggests a new skeletonization-based approach. We define three different error metrics, compare the algorithms with respect to these metrics, and measure their robustness with respect to certain image defects. Our test images are computer-generated scores on which we apply various image deformations typically found in real-world data. In addition to modern western music notation, our test set also includes historic music notation such as mensural notation and lute tablature. Our general approach and evaluation methodology is not specific to staff removal but applicable to other segmentation problems as well.
|
| [Dalitz2008a] |
Christoph Dalitz, Georgios K. Michalakis, and Christine Pranzas.
Optical recognition of psaltic Byzantine chant notation.
International Journal of Document Analysis and Recognition, 11
(3): 143-158, 2008b.
ISSN 1433-2825.
[ bib |
DOI ]
This paper describes a document recognition system for the modern neume based notation of Byzantine music. We propose algorithms for page segmentation, lyrics removal, syntactical symbol grouping and the determination of characteristic page dimensions. All algorithms are experimentally evaluated on a variety of printed books for which we also give an optimal feature set for a nearest neighbour classifier. The system is based on the Gamera framework for document image analysis. Given that we cover all aspects of the recognition process, the paper can also serve as an illustration how a recognition system for a non standard document type can be designed from scratch.
|
| [Damm2008] | David Damm, Christian Fremerey, Frank Kurth, Meinard Müller, and Michael Clausen. Multimodal Presentation and Browsing of Music. In 10th International Conference on Multimodal Interfaces, pages 205-208, Chania, Greece, 2008. ACM. ISBN 978-1-60558-198-9. [ bib | DOI ] |
| [Fornes2008] |
Alicia Fornés, Josep Lladós, Gemma Sánchez, and Horst Bunke.
Writer Identification in Old Handwritten Music Scores.
In 8th International Workshop on Document Analysis Systems,
pages 347-353, Nara, Japan, 2008.
[ bib |
DOI ]
The aim of writer identification is determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores. Even though an important amount of compositions contains handwritten text in the music scores, the aim of our work is to use only music notation to determine the author. The steps of the system proposed are the following. First of all, the music sheet is preprocessed and normalized for obtaining a single binarized music line, without the staff lines. Afterwards, 100 features are extracted for every music line, which are subsequently used in a k-NN classifier that compares every feature vector with prototypes stored in a database. By applying feature selection and extraction methods on the original feature set, the performance is increased. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving a recognition rate of about 95%.
|
| [Fornes2008a] |
Alicia Fornés, Josep Lladós, and Gemma Sánchez.
Old Handwritten Musical Symbol Classification by a Dynamic Time
Warping Based Method.
In Wenyin Liu, Josep Lladós, and Jean-Marc Ogier, editors,
Graphics Recognition. Recent Advances and New Opportunities, pages
51-60, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
ISBN 978-3-540-88188-9.
[ bib |
DOI ]
A growing interest in the document analysis field is the recognition of old handwritten documents, towards the conversion into a readable format. The difficulties when we work with old documents are increased, and other techniques are required for recognizing handwritten graphical symbols that are drawn in such these documents. In this paper we present a Dynamic Time Warping based method that outperforms the classical descriptors, being also invariant to scale, rotation, and elastic deformations typical found in handwriting musical notation.
|
| [Fremerey2008] |
Christian Fremerey, Meinard Müller, Frank Kurth, and Michael Clausen.
Automatic Mapping of Scanned Sheet Music to Audio Recordings.
In 9th International Conference on Music Information
Retrieval, pages 413-418, 2008.
ISBN 978-0-615-24849-3.
[ bib |
.pdf ]
Significant digitization efforts have resulted in large multimodal music collections comprising visual (scanned sheet music) as well as acoustic material (audio recordings). In this paper, we present a novel procedure for mapping scanned pages of sheet music to a given collection of audio recordings by identifying musically corresponding audio clips. To this end, both the scanned images as well as the audio recordings are first transformed into a common feature representation using optical music recognition (OMR) and methods from digital signal processing, respectively. Based on this common representation, a direct comparison of the two different types of data is facilitated. This allows for a search of scan-based queries in the audio collection. We report on systematic experiments conducted on the corpus of Beethoven’s piano sonatas showing that our mapping procedure works with high precision across the two types of music data in the case that there are no severe OMR errors. The proposed mapping procedure is relevant in a real-world application scenario at the Bavarian State Library for automatically identifying and annotating scanned sheet music by means of already available annotated audio material.
|
| [Jones2008] |
Graham Jones, Bee Ong, Ivan Bruno, and Kia Ng.
Optical Music Imaging: Music Document Digitisation, Recognition,
Evaluation, and Restoration.
In Interactive multimedia music technologies, pages 50-79.
IGI Global, 2008.
[ bib |
DOI ]
This paper presents the applications and practices in the domain of music imaging for musical scores (music sheets and music manuscripts), which include music sheet digitisation, optical music recognition (OMR) and optical music restoration. With a general background of Optical Music Recognition (OMR), the paper discusses typical obstacles in this domain and reports currently available commercial OMR software. It reports hardware and software related to music imaging, discussed the SharpEye optical music recognition system and provides an evaluation of a number of OMR systems. Besides the main focus on the transformation from images of music scores to symbolic format, this paper also discusses optical music image restoration and the application of music imaging techniques for graphical preservation and potential applications for cross-media integration.
|
| [Kolakowska2008] |
Agata Kolakowska.
Applying decision trees to the recognition of musical symbols.
In 1st International Conference on Information Technology,
pages 1-4, 2008.
[ bib |
DOI ]
The paper presents an experimental study on the recognition of printed musical scores. The first part of the study focuses on data preparation. Bitmaps containing musical symbols are converted to feature vectors using various methods. The vectors created in such a way are used to train classifiers which are the essential part of the study. Several decision tree classifiers are applied to this recognition task. These classifiers are created using different decision tree induction methods. The algorithms incorporate different criteria to select attributes in the nodes of the trees. Moreover, some of them apply stopping criteria, whereas the others perform tree pruning. The classification accuracy of the decision trees is estimated on data taken from musical scores. Eventually the usefulness of decision trees in the recognition of printed musical symbols is evaluated.
|
| [Kurth2008] |
Frank Kurth, David Damm, Christian Fremerey, Meinard Müller, and Michael
Clausen.
A Framework for Managing Multimodal Digitized Music Collections.
In Birte Christensen-Dalsgaard, Donatella Castelli, Bolette
Ammitzbøll Jurik, and Joan Lippincott, editors, Research and
Advanced Technology for Digital Libraries, pages 334-345, Berlin,
Heidelberg, 2008. Springer Berlin Heidelberg.
ISBN 978-3-540-87599-4.
[ bib |
DOI ]
In this paper, we present a framework for managing heterogeneous, multimodal digitized music collections containing visual music representations (scanned sheet music) as well as acoustic music material (audio recordings). As a first contribution, we propose a preprocessing workflow comprising feature extraction, audio indexing, and music synchronization (linking the visual with the acoustic data). Then, as a second contribution, we introduce novel user interfaces for multimodal music presentation, navigation, and content-based retrieval. In particular, our system offers high quality audio playback with time-synchronous display of the digitized sheet music. Furthermore, our system allows a user to select regions within the scanned pages of a musical score in order to search for musically similar sections within the audio documents. Our novel user interfaces and search functionalities will be integrated into the library service system of the Bavarian State Library as part of the Probado project.
|
| [Pugin2008] | Laurent Pugin, Jason Hockman, John Ashley Burgoyne, and Ichiro Fujinaga. Gamera versus Aruspix - Two Optical Music Recognition Approaches. In 9th International Conference on Music Information Retrieval, 2008. [ bib | .pdf ] |
| [Rebelo2008] | Ana Rebelo. New Methodologies Towards an Automatic Optical Recognition of Handwritten Musical Scores. Master's thesis, Universidade do Porto, 2008. [ bib | .pdf ] |
| [Smiatacz2008] |
Maciej Smiatacz and Witold Malina.
Matrix-based classifiers applied to recognition of musical notation
symbols.
In 1st International Conference on Information Technology,
pages 1-4, 2008.
[ bib |
DOI ]
The paper presents the application of matrix-based classifiers to the problem of automatic recognition of musical notation symbols. The idea of classification algorithms operating on matrices instead of feature vectors is briefly introduced together with a short description of methods that we have recently proposed. The experiments that we report show that the matrix-based approach can be used to improve the effectiveness and usefulness of the OMR system developed in our department as a part of the digital library of musical documents.
|
| [Szwoch2008] |
Mariusz Szwoch.
Using MusicXML to Evaluate Accuracy of OMR Systems.
In International Conference on Theory and Application of
Diagrams, pages 419-422, Herrsching, Germany, 2008. Springer,
Springer-Verlag.
ISBN 978-3-540-87729-5.
[ bib |
DOI ]
In this paper a methodology for automatic accuracy evaluation in optical music recognition (OMR) applications is proposed. Presented approach assumes using ground truth images together with digital music scores describing their content. The automatic evaluation algorithm measures differences between the tested score and the reference one, both stored in MusicXML format. Some preliminary test results of this approach are presented based on the algorithm’s implementation in OMR Guido application.
|
| [Wei2008] |
Lee Ling Wei, Qussay A. Salih, and Ho Sooi Hock.
Optical Tablature Recognition (OTR) system: Using Fourier
Descriptors as a recognition tool.
In International Conference on Audio, Language and Image
Processing, pages 1532-1539, 2008.
[ bib |
DOI ]
This paper presents an optical recognition system for the guitar tablature. Images of guitar tablature are fed as input to the system whereby each image undergoes four main stages of processing to produce a music output in MIDI format. Algorithms both existing and self-devised were used. Each input image was first cropped to the desired region, followed by a process for removal of the string lines and detection of the numbers. Recognition of the numbers was carried out using Fourier descriptors based on 8 selected feature points. Once completed, the numbers were matched to their corresponding chords and then rearranged and played. The algorithms and methods used within the system are presented here with a justification on the selection of Fourier descriptors as the recognition tool.
|
| [Yoo2008] |
JaeMyeong Yoo, Nguyen Dinh Toan, DeokJai Choi, HyukRo Park, and Gueesang Lee.
Advanced Binarization Method for Music Score Recognition Using Local
Thresholds.
In 8th International Conference on Computer and Information
Technology Workshops, pages 417-420, 2008.
[ bib |
DOI ]
Application technology of mobile phone has been developing for the delivery of various contents over a simple voice channel. Music score recognition is one of such application services provided by mobile phone manufacturers which transform a music score taken by the phone camera into a midi file. For the successful recognition of the music score, the input image should be properly binarized to be fed into the recognition process. In this paper, Adaptive binary algorithm is proposed which exploits local thresholds with several levels to deal with illumination changes over the entire image. Experimental results shown advanced performance of music score recognition.
|
| [Bellini2007] | Pierfrancesco Bellini, Ivan Bruno, and Paolo Nesi. Assessing Optical Music Recognition Tools. Computer Music Journal, 31 (1): 68-93, 2007. [ bib | DOI ] |
| [Burgoyne2007] | John Ashley Burgoyne, Laurent Pugin, Greg Eustace, and Ichiro Fujinaga. A Comparative Survey of Image Binarisation Algorithms for Optical Recognition on Degraded Musical Sources. In 8th International Conference on Music Information Retrieval, 2007. [ bib | .pdf ] |
| [Castro2007] |
Pedro Castro and J. R. Caldas Pinto.
Methods for Written Ancient Music Restoration.
In Mohamed Kamel and Aurélio Campilho, editors, Image
Analysis and Recognition, pages 1194-1205, Berlin, Heidelberg, 2007.
Springer Berlin Heidelberg.
ISBN 978-3-540-74260-9.
[ bib |
DOI ]
Degradation in old documents has been a matter of concern for a long time. With the easy access to information provided by technologies such as the Internet, new ways have arisen for consulting those documents without exposing them to yet more dangers of degradation. While restoration methods are present in the literature in relation to text documents and artworks, little attention has been given to the restoration of ancient music. This paper describes and compares different methods to restore images of ancient music documents degraded over time. Six different methods were tested, including global and adaptive thresholding, color clustering and edge detection. In this paper we conclude that those based on the Sauvola's thresholding algorithm are the better suited for our proposed goal of ancient music restoration.
|
| [Castro2007a] |
Pedro Castro, R. J. Almeida, and J. R. Caldas Pinto.
Restoration of Double-Sided Ancient Music Documents with
Bleed-Through.
In Luis Rueda, Domingo Mery, and Josef Kittler, editors,
Progress in Pattern Recognition, Image Analysis and Applications,
pages 940-949, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
ISBN 978-3-540-76725-1.
[ bib |
DOI ]
Access to collections of cultural heritage is increasingly becoming a topic of interest for institutions like libraries. With the easy access to information provided by technologies such as the Internet, new ways exist for consulting ancient documents without exposing them to more dangers of degradation. One of those types of documents is written ancient music. These documents suffer from multiple kinds of degradation, where bleed-through outstands as the most damaging. This paper proposes a new method based on the Takagi Sugeno fuzzy classification algorithm to classify the pixels as bleed-through, after performing a general background restoration. This method is applied to a set of double-sided ancient music documents, and the obtained results compared with methods present in the literature.
|
| [Diet2007] | Jürgen Diet and Frank Kurth. The Probado Music Repository at the Bavarian State Library. In 8th International Conference on Music Information Retrieval, pages 501-504, Vienna, Austria, 2007. [ bib | .pdf ] |
| [Knopke2007] |
Ian Knopke and Donald Byrd.
Towards Musicdiff : A Foundation for Improved Optical Music
Recognition Using Multiple Recognizers.
In 8th International Conference on Music Information
Retrieval, pages 123-126, Vienna, Austria, 2007.
ISBN 978-3-85403-218.
[ bib |
.pdf ]
This paper presents work towards a “musicdiff” program for comparing files representing different versions of the same piece, primarily in the context of comparing versions produced by different optical music recognition (OMR) programs. Previous work by the current authors and oth- ers strongly suggests that using multiple recognizers will make it possible to improve OMR accuracy substantially. The basicmethodology requires several stages: documents must be scanned and submitted to severalOMR programs, programswhose strengths andweaknesses have previously been evaluated in detail. We discuss techniques we have implemented for normalization, alignment and rudimen- tary error correction. We also describe a visualization tool for comparingmultiple versions on ameasure-by-measure basis.
|
| [McKay2007] | Cory McKay and Ichiro Fujinaga. Style-independent computer-assisted exploratory analysis of large music collections. Journal of Interdisciplinary Music Studies, 1 (1): 63-85, 2007. [ bib | .pdf ] |
| [Pugin2007] | Laurent Pugin, John Ashley Burgoyne, and Ichiro Fujinaga. Goal-directed Evaluation for the Improvement of Optical Music Recognition on Early Music Prints. In 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 303-304, Vancouver, Canada, 2007b. ACM. ISBN 978-1-59593-644-8. [ bib | DOI ] |
| [Pugin2007a] | Laurent Pugin, John Ashley Burgoyne, and Ichiro Fujinaga. MAP Adaptation to Improve Optical Music Recognition of Early Music Documents Using Hidden Markov Models. In 8th International Conference on Music Information Retrieval, pages 513-516, 2007c. [ bib | .pdf ] |
| [Pugin2007b] |
Laurent Pugin, John Ashley Burgoyne, and Ichiro Fujinaga.
Reducing Costs for Digitising Early Music with Dynamic Adaptation.
In László Kovács, Norbert Fuhr, and Carlo Meghini,
editors, Research and Advanced Technology for Digital Libraries, pages
471-474, Berlin, Heidelberg, 2007d. Springer Berlin Heidelberg.
ISBN 978-3-540-74851-9.
[ bib ]
Optical music recognition (OMR) enables librarians to digitise early music sources on a large scale. The cost of expert human labour to correct automatic recognition errors dominates the cost of such projects. To reduce the number of recognition errors in the OMR process, we present an innovative approach to adapt the system dynamically, taking advantage of the human editing work that is part of any digitisation project. The corrected data are used to perform MAP adaptation, a machine-learning technique used previously in speech recognition and optical character recognition (OCR). Our experiments show that this technique can reduce editing costs by more than half.
|
| [Pugin2007c] | Laurent Pugin, John Ashley Burgoyne, Douglas Eck, and Ichiro Fujinaga. Book-Adaptive and Book-Dependent Models to Accelerate Digitization of Early Music. Technical report, McGill University, Whistler, BC, 2007a. [ bib | http ] |
| [Rebelo2007] |
Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, Carlos Guedes, Eurico
Carrapatoso, and Jamie dos Santos Cardoso.
A Shortest Path Approach for Staff Line Detection.
In 3rd International Conference on Automated Production of
Cross Media Content for Multi-Channel Distribution, pages 79-85, 2007.
[ bib |
DOI ]
Many music works produced in the past still exist only as original manuscripts or as photocopies. Preserving them entails their digitalization and consequent accessibility in a digital format easy-to-manage. The manual process to carry out this task is very time consuming and error prone. Optical music recognition (OMR) is a form of structured document image analysis where music symbols are isolated and identified so that the music can be conveniently processed. While OMR systems perform well on printed scores, current methods for reading handwritten musical scores by computers remain far from ideal. One of the fundamental stages of this process is the staff line detection. In this paper a new method for the automatic detection of music stave lines based on a shortest path approach is presented. Lines with some curvature, discontinuities, and inclination are robustly detected. The proposed algorithm behaves favourably when compared experimentally with well-established algorithms.
|
| [Szwoch2007] |
Mariusz Szwoch.
Guido: A Musical Score Recognition System.
In 9th International Conference on Document Analysis and
Recognition, pages 809-813, 2007.
[ bib |
DOI ]
This paper presents an optical music recognition system Guido that can automatically recognize the main musical symbols of music scores that were scanned or taken by a digital camera. The application is based on object model of musical notation and uses linguistic approach for symbol interpretation and error correction. The system offers musical editor with a partially automatic error correction.
|
| [Bainbridge2006] |
David Bainbridge and Tim Bell.
Identifying music documents in a collection of images.
In 7th International Conference on Music Information
Retrieval, pages 47-52, Victoria, Canada, 2006.
[ bib |
http ]
Digital libraries and search engines are now well-equipped to find images of documents based on queries. Many images of music scores are now available, often mixed up with textual documents and images. For example, using the Google “images” search feature, a search for “Beethoven” will return a number of scores and manuscripts as well as pictures of the composer. In this paper we report on an investigation into methods to mechanically determine if a particular document is indeed a score, so that the user can specify that only musical scores should be returned. The goal is to find a minimal set of features that can be used as a quick test that will be applied to large numbers of documents. A variety of filters were considered, and two promising ones (run-length ratios and Hough transform) were evaluated. We found that a method based around run-lengths in vertical scans (RL) that out-performs a comparable algorithm using the Hough transform (HT). On a test set of 1030 images, RL achieved recall and precision of 97.8% and 88.4% respectively while HT achieved 97.8% and 73.5%. In terms of processor time, RL was more than five times as fast as HT.
|
| [Byrd2006] | Donald Byrd and Megan Schindele. Prospects for Improving OMR with Multiple Recognizers. In 7th International Conference on Music Information Retrieval, pages 41-46, 2006. ISBN 1-55058-349-2. [ bib | .pdf ] |
| [Desaedeleer2006] |
Arnaud F. Desaedeleer.
Reading Sheet Music.
Master's thesis, University of London, 2006.
[ bib |
http ]
Optical Music Recognition is the process of recognising a printed music score and converting it to a format that is understood by computers. This process involves detecting all musical elements present in the music score in such a way that the score can be represented digitally. For example, the score could be recognised and played back through the computer speakers. Much research has been carried out in this area and several approaches to performing OMR have been suggested. A more recent approach involves segmenting the image using a neural network to recognise the segmented symbols from which the score can be reconstructed. This project will survey the different techniques that have been used to perform OMR on printed music scores and an application by the name of OpenOMR will be developed. One of the aims is to create an open source project in which developers in the open source community will be able to contribute their ideas in order to enhance this application and progress the research in the OMR field.
|
| [Fornes2006] |
Alicia Fornés, Josep Lladós, and Gemma Sánchez.
Primitive Segmentation in Old Handwritten Music Scores.
In Wenyin Liu and Josep Lladós, editors, Graphics
Recognition. Ten Years Review and Future Perspectives, pages 279-290,
Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
ISBN 978-3-540-34712-5.
[ bib |
DOI ]
Optical Music Recognition consists in the identification of music information from images of scores. In this paper, we propose a method for the early stages of the recognition: segmentation of staff lines and graphical primitives in handwritten scores. After introducing our work with modern musical scores (where projections and Hough Transform are effectively used), an approach to deal with ancient handwritten scores is exposed. The recognition of such these old scores is more difficult due to paper degradation and the lack of a standard in musical notation. Our method has been tested with several scores of 19th century with high performance rates.
|
| [Homenda2006] |
Wladyslaw Homenda and Marcin Luckner.
Automatic Knowledge Acquisition: Recognizing Music Notation with
Methods of Centroids and Classifications Trees.
In International Joint Conference on Neural Network, pages
3382-3388, Vancouver, Canada, 2006.
[ bib |
DOI ]
This paper presents a pattern recognition study aimed al music symbols recognition. The study is focused on classification methods of music symbols based on decision trees and clustering method applied to classes of music symbols that face classification problems. Classification is made on the basis of extracted features. A comparison of selected classifiers was made on some classes of nutation symbols distorted by a variety of factors as image noise, printing defects, different fonts, skew and curvature of scanning, overlapped symbols.
|
| [Homenda2006a] |
Wladyslaw Homenda.
Automatic understanding of images: integrated syntactic and semantic
analysis of music notation.
In International Joint Conference on Neural Network, pages
3026-3033, Vancouver, Canada, 2006.
[ bib |
DOI ]
The paper introduces an approach to image processing and recognition based on the perception of images as subjects being exchanged in the man-computer communication. The approach reveals the parallel syntactic and semantic attempts to automatic image understanding. Both attempts are reflected in the paradigms of information granulation and granular computing. The parallel syntactic and semantic processing of images allows for solving problems raised by difficulties and complexity of the detailed syntactic description of images as well as difficulties of detailed semantic analysis. The study presented in this paper is cast on the practical task of the music notation recognition.
|
| [Luckner2006] |
Marcin Luckner.
Recognition of Noised Patterns Using Non-Disruption Learning Set.
In 6th International Conference on Intelligent Systems Design
and Applications, pages 557-562, 2006.
[ bib |
DOI ]
In this paper the recognition of strongly noised symbols on the basis of non-disruption patterns is discussed taking music symbols as an example. Although Optical Music Recognition technology is not developed as successfully as OCR technology, several systems do recognize typical musical symbols to quite a good level. However, the recognition of non-typical fonts is still an unsolved issue. In this paper a model of a recognition system for unusual scores is presented. In the model described non-disruption symbols are used to generate a learning set that makes possible improved recognition as is presented on a real example of rests and accidentals recognition. Some techniques are presented with various recognition rates and computing times including supervised and unsupervised ones
|
| [McPherson2006] | John R. McPherson. Coordinating Knowledge To Improve Optical Music Recognition. PhD thesis, The University of Waikato, 2006. [ bib | .pdf ] |
| [Pugin2006] | Laurent Pugin. Optical Music Recognitoin of Early Typographic Prints using Hidden Markov Models. In 7th International Conference on Music Information Retrieval, pages 53-56, Victoria, Canada, 2006a. [ bib | .pdf ] |
| [Pugin2006a] | Laurent Pugin. Aruspix: an Automatic Source-Comparison System. Computing in Musicology, 14: 49-59, 2006b. ISSN 1057-9478. [ bib | http ] |
| [Pugin2006b] | Laurent Pugin. Lecture et traitement informatique de typographies musicales anciennes: un logiciel de reconnaissance de partitions par modèles de Markov cachés. PhD thesis, Geneva University, Geneva, Switzerland, 2006c. [ bib | DOI ] |
| [Rossant2006] |
Florence Rossant and Isabelle Bloch.
Robust and Adaptive OMR System Including Fuzzy Modeling, Fusion of
Musical Rules, and Possible Error Detection.
EURASIP Journal on Advances in Signal Processing, 2007 (1):
081541, 2006.
ISSN 1687-6180.
[ bib |
DOI ]
This paper describes a system for optical music recognition (OMR) in case of monophonic typeset scores. After clarifying the difficulties specific to this domain, we propose appropriate solutions at both image analysis level and high-level interpretation. Thus, a recognition and segmentation method is designed, that allows dealing with common printing defects and numerous symbol interconnections. Then, musical rules are modeled and integrated, in order to make a consistent decision. This high-level interpretation step relies on the fuzzy sets and possibility framework, since it allows dealing with symbol variability, flexibility, and imprecision of music rules, and merging all these heterogeneous pieces of information. Other innovative features are the indication of potential errors and the possibility of applying learning procedures, in order to gain in robustness. Experiments conducted on a large data base show that the proposed method constitutes an interesting contribution to OMR.
|
| [Toyama2006] |
Fubito Toyama, Kenji Shoji, and Juichi Miyamichi.
Symbol Recognition of Printed Piano Scores with Touching Symbols.
In 18th International Conference on Pattern Recognition, pages
480-483, 2006.
[ bib |
DOI ]
To build a music database efficiently, an automatic score recognition system is a critical component. Many previous methods are applicable only to some simple music scores. In case of complex music scores it becomes difficult to detect symbols correctly because of noise and connection between symbols included in the scores. In this paper, we propose a score recognition method which is applicable to the complex music scores. Symbol candidates are detected by template matching. From these candidates correct symbols are selected by considering their relative positions and mutual connections. Under the presence of noise and connected symbols, the proposed method outperformed "Score Maker" which is an optical music score recognition software
|
| [Barton2005] | Louis W. G. Barton, John A. Caldwell, and Peter G. Jeavons. E-library of Medieval Chant Manuscript Transcriptions. In 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 320-329, Denver, CO, USA, 2005. ACM. ISBN 1-58113-876-8. [ bib | DOI ] |
| [Dalitz2005] | Christoph Dalitz and Thomas Karsten. Using the Gamera framework for building a lute tablature recognition system. In 6th International Conference on Music Information Retrieval, pages 478-481, London, UK, 2005. [ bib | .pdf ] |
| [Fornes2005] | Alicia Fornés. Analysis of Old Handwritten Musical Scores. Master's thesis, Universitat Autònoma de Barcelona, 2005. [ bib | .pdf ] |
| [Gan2005] | Ting Gan. Música Colonial: 18th Century Music Score Meets 21st Century Digitalization Technology. In 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 379-379, Denver, USA, 2005. ACM. ISBN 1-58113-876-8. [ bib | DOI ] |
| [Homenda2005] |
Wladyslaw Homenda.
Optical Music Recognition: the Case Study of Pattern Recognition.
In Marek Kurzyński, Edward Puchala, Michal WoŹniak,
and Andrzej żolnierek, editors, Computer Recognition Systems,
pages 835-842, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
ISBN 978-3-540-32390-7.
[ bib |
DOI ]
The paper presents a pattern recognition study aimed on music notation recognition. The study is focused on practical aspect of optical music recognition; it presents a variety of methods applied in optical music recognition technology. The following logically separated stages of music notation recognition are distinguished: acquiring music notation structure, recognizing symbols of music notation, analyzing contextual information. The directions for OMR package development are drawn.
|
| [Rossant2005] |
Florence Rossant and Isabelle Bloch.
Optical music recognition based on a fuzzy modeling of symbol classes
and music writing rules.
In IEEE International Conference on Image Processing 2005,
pages II-538, 2005.
[ bib |
DOI ]
We propose an OMR method based on fuzzy modeling of the information extracted from the scanned score and of musical rules. The aim is to disambiguate the recognition hypotheses output by the individual symbol analysis process. Fuzzy modeling allows to account for imprecision in symbol detection, for typewriting variations, and for flexibility of rules. Tests conducted on a hundred of music sheets result in a global recognition rate of 98.55%, and show good performances compared to SmartScore.
|
| [Szwoch2005] |
Mariusz Szwoch.
A Robust Detector for Distorted Music Staves.
In André Gagalowicz and Wilfried Philips, editors, Computer
Analysis of Images and Patterns, pages 701-708, Berlin, Heidelberg, 2005.
Springer Berlin Heidelberg.
ISBN 978-3-540-32011-1.
[ bib |
DOI ]
In this paper an algorithm for music staves detection is presented. The algorithm bases on horizontal projections in local windows of a score image and farther processing of resulting histograms and their connections. Experiments carried out, proved high efficiency of presented algorithm and its robustness in case of non-ideal staff lines: skew and with barrel and pincushion distortions. The algorithm allows for usage of acquisition devices alternative to scanner such as digital cameras.
|
| [Taubman2005] | Gabriel Taubman. MusicHand : A Handwritten Music Recognition System. Technical report, Brown University, 2005. [ bib | .pdf ] |
| [Audiveris] | Hervé Bitteur. Audiveris. https://github.com/audiveris, 2004. [ bib | http ] |
| [Bellini2004] | Pierfrancesco Bellini, Ivan Bruno, and Paolo Nesi. An Off-Line Optical Music Sheet Recognition. In Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 40-77. IGI Global, 2004. [ bib | DOI ] |
| [Clausen2004] |
Michael Clausen and Frank Kurth.
A unified approach to content-based and fault-tolerant music
recognition.
IEEE Transactions on Multimedia, 6 (5): 717-731, 2004.
ISSN 1520-9210.
[ bib |
DOI ]
In this paper, we propose a unified approach to fast index-based music recognition. As an important area within the field of music information retrieval (MIR), the goal of music recognition is, given a database of musical pieces and a query document, to locate all occurrences of that document within the database, up to certain possible errors. In particular, the identification of the query with regard to the database becomes possible. The approach presented in this paper is based on a general algorithmic framework for searching complex patterns of objects in large databases. We describe how this approach may be applied to two important music recognition tasks: The polyphonic (musical score-based) search in polyphonic score data and the identification of pulse-code modulation audio material from a given acoustic waveform. We give an overview on the various aspects of our technology including fault-tolerant search methods. Several areas of application are suggested. We describe several prototypic systems we have developed for those applications including the notify! and the audentify! systems for score- and waveform-based music recognition, respectively.
|
| [Dovey2004] |
Matthew J. Dovey.
Overview of the OMRAS Project: Online Music Retrieval and
Searching.
Journal of the American Society for Information Science and
Technology, 55 (12): 1100-1107, 2004.
[ bib |
DOI ]
Until recently, most research on music information retrieval concentrated on monophonic music. Online Music Retrieval and Searching (OMRAS) is a three-year project funded under the auspices of the JISC (Joint Information Systems Committee)/NSF (National Science Foundation) International Digital Library Initiative which began in 1999 and whose remit was to investigate the issues surrounding polyphonic music information retrieval. Here we outline the work OMRAS has achieved in pattern matching, document retrieval, and audio transcription, as well as some prototype work in how to implement these techniques into library systems.
|
| [Droettboom2004] | Michael Droettboom and Ichiro Fujinaga. Symbol-level groundtruthing environment for OMR. In 5th International Conference on Music Information Retrieval, pages 497-500, 2004. [ bib | .pdf ] |
| [Fujinaga2004] | Ichiro Fujinaga. Staff detection and removal. In Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 1-39. IGI Global, 2004. [ bib | DOI ] |
| [George2004] | Susan E. George. Visual Perception of Music Notation On-Line and Off-Line Recognition. IRM Press, 2004a. ISBN 1931777942. [ bib | http ] |
| [George2004a] | Susan E. George. Evaluation in the Visual Perception of Music Notation. In S. George, editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 304-349. IRM Press, Hershey, PA, 2004b. [ bib | DOI ] |
| [George2004b] | Susan E. George. Lyric Recognition and Christian Music. In S. George, editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 198-226. IRM Press, Hershey, PA, 2004c. [ bib | DOI ] |
| [George2004c] | Susan E. George. Wavelets for Dealing with Super-Imposed Objects in Recognition of Music Notation. In S. George, editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 78-107. IRM Press, Hershey, PA, 2004d. [ bib | DOI ] |
| [George2004d] | Susan E. George. Pen-Based Input for On-Line Handwritten Music Notation. In S. George, editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 128-160. IRM Press, Hershey, PA, 2004e. [ bib | DOI ] |
| [Homenda2004] | Wladyslaw Homenda and Marcin Luckner. Automatic Recognition of Music Notation Using Neural Networks. In International Conference on AI and Systems, Divnormorkoye, Russia, 2004. [ bib | http ] |
| [Homenda2004a] | Wladyslaw Homenda and K. Mossakowski. Music Symbol Recognition: Neural Networks vs. Statistical Methods. In B. De Baets, R. De Caluwe, G. De Tre, Janos Fodor, J. Kaprzyk, and S. Zadrozny, editors, EUROFUSE Workshop On Data And Knowledge Engineering, Warszawa, Poland, 2004. [ bib | http ] |
| [Mitobe2004] |
Youichi Mitobe, Hidetoshi Miyao, and Minoru Maruyama.
A fast HMM algorithm based on stroke lengths for on-line recognition
of handwritten music scores.
In 9th International Workshop on Frontiers in Handwriting
Recognition, pages 521-526, 2004.
[ bib |
DOI ]
The hidden Markov model (HMM) has been successfully applied to various kinds of on-line recognition problems including, speech recognition, handwritten character recognition, etc. In this paper, we propose an on-line method to recognize handwritten music scores. To speed up the recognition process and improve usability of the system, the following methods are explained: (1) The target HMMs are restricted based on the length of a handwritten stroke, and (2) Probability calculations of HMMs are successively made as a stroke is being written. As a result, recognition rates of 85.78% and average recognition times of 5.19 ms/stroke were obtained for 6,999 test strokes of handwritten music symbols, respectively. The proposed HMM recognition rate is 2.4% higher than that achieved with the traditional method, and the processing time was 73% of that required by the traditional method.
|
| [Miyao2004] | Hidetoshi Miyao and Minoru Maruyama. An online handwritten music score recognition system. In 17th International Conference on Pattern Recognition. Institute of Electrical & Electronics Engineers (IEEE), 2004. [ bib | DOI ] |
| [Ng2004] | Kia Ng. Optical Music Analysis for Printed Music Score and Handwritten Music Manuscript. In Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 108-127. IGI Global, 2004. [ bib | DOI ] |
| [Rossant2004] |
Florence Rossant and Isabelle Bloch.
A fuzzy model for optical recognition of musical scores.
Fuzzy Sets and Systems, 141 (2): 165-201, 2004.
ISSN 0165-0114.
[ bib |
DOI |
http ]
Optical music recognition aims at reading automatically scanned scores in order to convert them in an electronic format, such as a midi file. We only consider here classical monophonic music: we exclude any music written on several staves, but also any music that contains chords. In order to overcome recognition failures due to the lack of methods dealing with structural information, non-local rules and corrections, we propose a recognition approach integrating structural information in the form of relationships between symbols and of musical rules. Another contribution of this paper is to solve ambiguities by accounting for sources of imprecision and uncertainty, within the fuzzy set and possibility theory framework. We add to a single symbol analysis several rules for checking the consistency of hypotheses: graphical consistency (compatibility between accidental and note, between grace note and note, between note and augmentation dot, etc.), and syntactic consistency (accidentals, tonality, metric). All these rules are combined in order to lead to better decisions. Experimental results on 65 music sheets show that our approach leads to very good results, and is able to correct errors made by other approaches, such as the one of SmartScore.
|
| [Sheridan2004] |
Scott Sheridan and Susan E. George.
Defacing Music Scores for Improved Recognition.
In 2nd Australian Undergraduate Students' Computing
Conference, pages 142-148, 2004.
[ bib |
.pdf ]
The area of Optical Music Recognition (OMR) has long been plagued by an inability to provide a definitive method for locating and identifying musical objects superimposed on musical stave lines. The first step in the process of recognising musical symbols in OMR has previously been to either remove the stave lines, or ignore them. Removing stave lines leads to many problems of fragmented and deformed musical symbols, or in the case of ignoring them, a lowered chance of recognition. Most OMR systems attempt to correct these deficiencies later on in the process through many varied approaches including bounding box analysis, k-nearest-neighbour (k-NN) and neural network (ANN) classification schemes. All of these have a level of success, but none have provided nearly the desired level of accuracy.
|
| [Bainbridge2003] |
David Bainbridge and Tim Bell.
A music notation construction engine for optical music recognition.
Software: Practice and Experience, 33 (2): 173-200, 2003.
ISSN 1097-024X.
[ bib |
DOI ]
Optical music recognition (OMR) systems are used to convert music scanned from paper into a format suitable for playing or editing on a computer. These systems generally have two phases: recognizing the graphical symbols (such as note-heads and lines) and determining the musical meaning and relationships of the symbols (such as the pitch and rhythm of the notes). In this paper we explore the second phase and give a two-step approach that admits an economical representation of the parsing rules for the system. The approach is flexible and allows the system to be extended to new notations with little effort—the current system can parse common music notation, Sacred Harp notation and plainsong. It is based on a string grammar and a customizable graph that specifies relationships between musical objects. We observe that this graph can be related to printing as well as recognizing music notation, bringing the opportunity for cross-fertilization between the two areas of research. Copyright © 2003 John Wiley & Sons, Ltd.
|
| [Bruder2003] |
Ilvio Bruder, Andreas Finger, Andreas Heuer, and Temenushka Ignatova.
Towards a Digital Document Archive for Historical Handwritten Music
Scores.
In Tengku Mohd Tengku Sembok, Halimah Badioze Zaman, Hsinchun Chen,
Shalini R. Urs, and Sung-Hyon Myaeng, editors, Digital Libraries:
Technology and Management of Indigenous Knowledge for Global Access, pages
411-414, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
ISBN 978-3-540-24594-0.
[ bib |
DOI ]
Contemporary digital libraries and archives of music scores focus mainly on providing efficient storage and access methods for their data. However, digital archives of historical music scores can enable musicologists not only to easily store and access research material, but also to derive new knowledge from existing data. In this paper we present the first steps in building a digital archive of historical music scores from the 17th and 18th century. Along with the architectural and accessibility aspects of the system, we describe an integrated approach for classification and identification of the scribes of music scores.
|
| [Byrd2003] | Donald Byrd and Eric Isaacson. A Music Representation Requirement Specification for Academia. Computer Music Journal, 27 (4): 43-57, 2003. ISSN 01489267, 15315169. [ bib | http ] |
| [George2003] | Susan E. George. Online Pen-Based Recognition of Music Notation with Artificial Neural Networks. Computer Music Journal, 27 (2): 70-79, 2003. [ bib | DOI ] |
| [Goecke2003] |
Roland Göcke.
Building a system for writer identification on handwritten music
scores.
In IASTED International Conference on Signal Processing,
Pattern Recognition, and Applications, pages 250-255. Acta Press, 2003.
ISBN 0 88986 363 6.
[ bib |
.pdf ]
A significant example of the integration of musicology and computer science. The problem of writer identification process by historical musicologists is identified and possible solutions by computer technology are assessed. The system outline is unique and seems convincing including the interesting ideas such as the feature trees and consistency check. However, it lacks any concrete methods to implement the proposed system and any evaluation.
|
| [Nehab2003] | Diego Nehab. Staff Line Detection by Skewed Projection. Technical report, 2003. [ bib | .pdf ] |
| [Pinto2003] |
João Caldas Pinto, Pedro Vieira, and João M. Sousa.
A new graph-like classification method applied to ancient handwritten
musical symbols.
Document Analysis and Recognition, 6 (1): 10-22, 2003.
ISSN 1433-2825.
[ bib |
DOI ]
Several algorithms have been proposed in the past to solve the problem of binary pattern recognition. The problem of finding features that clearly distinguish two or more different patterns is a key issue in the design of such algorithms. In this paper, a graph-like recognition process is proposed that combines a number of different classifiers to simplify the type of features and classifiers used in each classification step. The graph-like classification method is applied to ancient music optical recogniti on, and a high degree of accuracy has been achieved.
|
| [Riley2003] | Jenn Riley and Ichiro Fujinaga. Recommended best practices for digital image capture of musical scores. OCLC Systems & Services, 19 (2): 62-69, 2003. ISSN 1065-075X. [ bib | DOI ] |
| [Barton2002] |
Louis W. G. Barton.
The NEUMES Project: digital transcription of medieval chant
manuscripts.
In 2nd International Conference on Web Delivering of Music,
pages 211-218, 2002.
[ bib |
DOI ]
This paper introduces the NEUMES Project from a top-down perspective. The purpose of the project is to design a software infrastructure for digital transcription of medieval chant manuscripts, such that transcriptions can be interoperable across many types of applications programs. Existing software for modern music does not provide an effective solution. A distributed library of chant document resources for the Web is proposed, to encompass photographic images, transcriptions, and searchable databases of manuscript descriptions. The NEUMES encoding scheme for chant transcription is presented, with NeumesXML serving as a 'wrapper' for transmission, storage, and editorial markup of transcription data. A scenario of use is given and future directions for the project are briefly discussed.
|
| [Clausen2002] |
Michael Clausen and Frank Kurth.
A unified approach to content-based and fault tolerant music
identification.
In 2nd International Conference on Web Delivering of Music,
pages 56-65, 2002.
[ bib |
DOI ]
In this paper we propose a unified approach to content-based search in different kinds of music data. Our approach is based on a general algorithmic framework for searching patterns of complex objects in large databases. In particular we describe how this approach may be used to allow for polyphonic search in polyphonic scores as well as for the identification of PCM audio material. We give an overview on the various aspects of our technology including fault tolerant search methods. Several areas of application are suggested. We give an overview on several prototypic systems we developed for those applications including the notify! and the audentify! systems.
|
| [Droettboom2002] |
Michael Droettboom, Ichiro Fujinaga, and Karl MacMillan.
Optical Music Interpretation.
In Terry Caelli, Adnan Amin, Robert P. W. Duin, Dick de Ridder, and
Mohamed Kamel, editors, Structural, Syntactic, and Statistical Pattern
Recognition, pages 378-387, Berlin, Heidelberg, 2002a. Springer Berlin
Heidelberg.
ISBN 978-3-540-70659-5.
[ bib |
DOI ]
A system to convert digitized sheet music into a symbolic music representation is presented. A pragmatic approach is used that conceptualizes this primarily two-dimensional structural recognition problem as a one-dimensional one. The transparency of the implementation owes a great deal to its implementation in a dynamic, object-oriented language. This system is a part of a locally developed end-to-end solution for the conversion of digitized sheet music into symbolic form.
|
| [Droettboom2002a] | Michael Droettboom, Ichiro Fujinaga, Karl MacMillan, G. Sayeed Chouhury, Tim DiLauro, Mark Patton, and Teal Anderson. Using the Gamera framework for the recognition of cultural heritage materials. In Joint Conference on Digital Libraries, pages 12-17, London, UK, 2002b. [ bib | .pdf ] |
| [Gezerlis2002] |
Velissarios G. Gezerlis and Sergios Theodoridis.
Optical character recognition of the Orthodox Hellenic Byzantine
Music notation.
Pattern Recognition, 35 (4): 895-914, 2002.
ISSN 0031-3203.
[ bib |
DOI |
http ]
In this paper we present for the first time, the development of a new system for the off-line optical recognition of the characters used in the orthodox Hellenic Byzantine Music notation, that has been established since 1814. We describe the structure of the new system and propose algorithms for the recognition of the 71 distinct character classes, based on Wavelets, 4-projections and other structural and statistical features. Using a nearest neighbor classifier, combined with a post classification schema and a tree-structured classification philosophy, an accuracy of 99.4% was achieved, in a database of about 18,000 Byzantine character patterns that have been developed for the needs of the system.
|
| [Lopresti2002] | Daniel Lopresti and George Nagy. Issues in Ground-Truthing Graphic Documents. In Graphics Recognition Algorithms and Applications, pages 46-67. Springer Berlin Heidelberg, Ontario, Canada, 2002. ISBN 978-3-540-45868-5. [ bib | DOI ] |
| [Luth2002] | Nailja Luth. Automatic Identification of Music Notations. In 2nd International Conference on WEB Delivering of Music, 2002. ISBN 0769518621. [ bib | DOI ] |
| [MacMillan2002] | Karl MacMillan, Michael Droettboom, and Ichiro Fujinaga. Gamera: Optical music recognition in a new shell. In International Computer Music Conference, pages 482-485, 2002. [ bib | .pdf ] |
| [McPherson2002] | John R. McPherson. Introducing Feedback into an Optical Music Recognition System. In 3rd International Conference on Music Information Retrieval, Paris, France, 2002. [ bib | .pdf ] |
| [McPherson2002a] | John R. McPherson and David Bainbridge. Coordinating Knowledge Within an Optical Music Recognition System. Technical report, University of Waikato, Hamilton, New Zealand, 2002. [ bib | http ] |
| [Miyao2002] |
Hidetoshi Miyao.
Stave Extraction for Printed Music Scores.
In Hujun Yin, Nigel Allinson, Richard Freeman, John Keane, and Simon
Hubbard, editors, Intelligent Data Engineering and Automated Learning,
pages 562-568. Springer Berlin Heidelberg, 2002.
ISBN 978-3-540-45675-9.
[ bib |
DOI ]
In this paper, a satisfactory method is described for the extraction of staff lines in which there are some inclinations, discontinuities, and curvatures. The extraction calls for four processes: (1) Extraction of specific points on a stave on vertical scan lines, (2) Connection of the points using DP matching, (3) Composition of stave groups using labeling, and (4) Extraction and adjustment of the edges of lines. The experiment resulted in an extraction rate of 99.4% for 71 printed music scores that included lines with some inclinations, discontinuities, and curvatures.
|
| [Ng2002] | Kia Ng. Music manuscript tracing. Lecture Notes in Computer Science, 2390: 322-334, 2002. ISSN 1611-3349. [ bib | DOI | .pdf ] |
| [Roland2002] | Perry Roland. The music encoding initiative (MEI). In 1st International Conference on Musical Applications Using XML, pages 55-59, 2002. [ bib | .pdf ] |
| [Rossant2002] |
Florence Rossant.
A global method for music symbol recognition in typeset music sheets.
Pattern Recognition Letters, 23 (10): 1129-1141, 2002.
ISSN 0167-8655.
[ bib |
DOI ]
This paper presents an optical music recognition (OMR) system that can automatically recognize the main musical symbols of a scanned paper-based music score. Two major stages are distinguished: the first one, using low-level pre-processing, detects the isolated objects and outputs some hypotheses about them; the second one has to take the final correct decision, through high-level processing including contextual information and music writing rules. This article exposes both stages of the method: after explaining in detail the first one, the symbol analysis process, it shows through first experiments that its outputs can efficiently be used as inputs for a high-level decision process.
|
| [Soak2002] |
Sang Moon Soak, Seok Cheol Chang, Taehwan Shin, and Byung-Ha Ahn.
Music recognition system using ART-1 and GA.
In AeroSense 2002, 2002.
[ bib |
DOI ]
Previously, most optical music recognition (OMR) systems have used the neural network, and used mainly back- propagation training method. One of the disadvantages of BP is that much time is required to train data sets. For example, when new data sets are added, all data sets have to be trained. Another disadvantage is that weighting values cannot be guaranteed as global optima after training them. It means that weighting values can fall down to local optimum solution. In this paper, we propose the new OMR method which combines the adaptive resonance theory (ART-1) with the genetic algorithms (GA). For reducing the training time, we use ART-1 which classifies several music symbols. It has another advantage to reduce the number of datasets, because classified symbols through ART-1 are used as input vectors of BP. And for guaranteeing the global optima in training data set, we use GA which is known as one of the best method for finding optimal solutions at complex problems.
|
| [Bainbridge2001] |
David Bainbridge and Tim Bell.
The Challenge of Optical Music Recognition.
Computers and the Humanities, 35 (2): 95-121, 2001.
ISSN 1572-8412.
[ bib |
DOI ]
This article describes the challenges posed by optical musicrecognition - a topic in computer science that aims to convert scannedpages of music into an on-line format. First, the problem is described;then a generalised framework for software is presented that emphasises keystages that must be solved: staff line identification, musical objectlocation, musical feature classification, and musical semantics. Next,significant research projects in the area are reviewed, showing how eachfits the generalised framework. The article concludes by discussingperhaps the most open question in the field: how to compare the accuracy and success of rival systems, highlighting certain steps thathelp ease the task.
|
| [Bainbridge2001a] | David Bainbridge, Gerry Bernbom, Mary Wallace Davidson, Andrew P. Dillon, Matthey Dovey, Jon W. Dunn, Michael Fingerhut, Ichiro Fujinaga, and Eric J. Isaacson. Digital Music Libraries - Research and Development. In 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pages 446-448, Roanoke, Virginia, USA, 2001. [ bib | DOI ] |
| [Bellini2001] |
Pierfrancesco Bellini, Ivan Bruno, and Paolo Nesi.
Optical music sheet segmentation.
In 1st International Conference on WEB Delivering of Music,
pages 183-190. Institute of Electrical & Electronics Engineers (IEEE),
2001.
ISBN 0769512844.
[ bib |
DOI ]
The optical music recognition problem has been addressed in several ways, obtaining suitable results only when simple music constructs are processed. The most critical phase of the optical music recognition process is the first analysis of the image sheet. The first analysis consists of segmenting the acquired sheet into smaller parts which may be processed to recognize the basic symbols. The segmentation module of the O<sup>3</sup> MR system (Object Oriented Optical Music Recognition) system is presented. The proposed approach is based on the adoption of projections for the extraction of basic symbols that constitute a graphic element of the music notation. A set of examples is also included.
|
| [Choudhury2001] | G. Sayeed Choudhury, Tim DiLauro, Michael Droettboom, Ichiro Fujinaga, and Karl MacMillan. Strike Up the Score: Deriving searchable and playable digital formats from sheet music. D-Lib Magazine, 7 (2), 2001. ISSN 1082-9873. [ bib | DOI | .html ] |
| [Coueasnon2001] |
Bertrand Coüasnon.
DMOS: a generic document recognition method, application to an
automatic generator of musical scores, mathematical formulae and table
structures recognition systems.
In 6th International Conference on Document Analysis and
Recognition, pages 215-220, 2001.
[ bib |
DOI ]
Genericity in structured document recognition is a difficult challenge. We therefore propose a new generic document recognition method, called DMOS (Description and MOdification of Segmentation), that is made up of a new grammatical formalism, called EPF (Enhanced Position Formalism) and an associated parser which is able to introduce context in segmentation. We implement this method to obtain a generator of document recognition systems. This generator can automatically produce new recognition systems. It is only necessary to describe the document with an EPF grammar, which is then simply compiled. In this way, we have developed various recognition systems: one on musical scores, one on mathematical formulae and one on recursive table structures. We have also defined a specific application to damaged military forms of the 19th Century. We have been able to test the generated system on 5,000 of these military forms. This has permitted us to validate the DMOS method on a real-world application
|
| [Droettboom2001] | Michael Droettboom and Ichiro Fujinaga. Interpreting the semantics of music notation using an extensible and object-oriented system. Technical report, John Hopkins University, 2001. [ bib | http ] |
| [Homenda2001] |
Wladyslaw Homenda.
Optical Music Recognition: the Case of Granular Computing.
In Granular Computing: An Emerging Paradigm, pages 341-366.
Physica-Verlag HD, Heidelberg, 2001.
ISBN 978-3-7908-1823-9.
[ bib |
DOI ]
The paper deals with optical music recognition (OMR) as a process of structured data processing applied to music notation. Granularity of OMR in both its aspects: data representation and data processing is especially emphasised in the paper. OMR is a challenge in intelligent computing technologies, especially in such fields as pattern recognition and knowledge representation and processing. Music notation is a language allowing for communication in music, one of most sophisticated field of human activity, and has a high level of complexity itself. On the one hand, music notation symbols vary in size and have complex shapes; they often touch and overlap each other. This feature makes the recognition of music symbols a very difficult and complicated task. On the other hand, music notation is a two dimensional language in which importance of geometrical and logical relations between its symbols may be compared to the importance of the symbols alone. Due to complexity of music nature and music notation, music representation, necessary to store and reuse recognised information, is also the key issue in music notation recognition and music processing. Both: the data representation and the data processing used in OMR is highly structured, granular rather than numeric. OMR technology fits paradigm of granular computing
|
| [MacMillan2001] | Karl MacMillan, Michael Droettboom, and Ichiro Fujinaga. Gamera: A structured document recognition application development environment. In 2nd International Symposium on Music Information Retrieval, pages 15-16, Bloomington, IN, 2001. [ bib | http ] |
| [McPherson2001] | John R. McPherson. Using feedback to improve Optical Music Recognition, 2001. [ bib ] |
| [Pugin2001] | Laurent Pugin. Réalisation d'un système de superposition de partitions de musique anciennes. Technical report, Geneva University, Geneva, Switzerland, 2001. [ bib | .pdf ] |
| [Rossant2001] | Florence Rossant and Isabelle Bloch. Reconnaissance de Partitions Musicales par Modélisation Floue et Intégration de Règles Musicales. In GRETSI, Toulouse, France, 2001. [ bib | http ] |
| [Su2001] |
Mu-Chun Su, Chee-Yuen Tew, and Hsin-Hua Chen.
Musical symbol recognition using SOM-based fuzzy systems.
In Joint 9th IFSA World Congress and 20th NAFIPS International
Conference, pages 2150-2153 vol.4, 2001.
[ bib |
DOI ]
A large number of research activities have been undertaken to investigate optical music recognition (OMR). OMR involves identifying musical symbols on a scanned sheet of music and transforming them into a computer readable format. We propose an efficient method based on SOM-based fuzzy systems to recognize musical symbols. A database consisting of 9 kinds of musical symbols were used to test the performance of the SOM-based fuzzy systems.
|
| [Vieira2001] |
Pedro Vieira and João Caldas Pinto.
Recognition of musical symbols in ancient manuscripts.
In International Conference on Image Processing, pages 38-41
vol.3, 2001.
[ bib |
DOI ]
This paper presents a system for the automatic retrieval of music from ancient music collections (XVI-XVIII century), creating digital documents of music from images of music sheets. This is an optical music recognition system that uses image processing and pattern recognition techniques. Finally, we obtain a document that contains the music semantics: description of the notes, in time and pitches, as well as other relevant information.
|
| [Anquetil2000] |
Éric Anquetil, Bertrand Coüasnon, and Frédéric Dambreville.
A Symbol Classifier Able to Reject Wrong Shapes for Document
Recognition Systems.
In Atul K. Chhabra and Dov Dori, editors, Graphics Recognition
Recent Advances, pages 209-218, Berlin, Heidelberg, 2000. Springer Berlin
Heidelberg.
ISBN 978-3-540-40953-3.
[ bib |
DOI ]
We propose in this paper a new framework to develop a transparent classifier able to deal with reject notions. The generated classifier can be characterized by a strong reliability without loosing good properties in generalization. We show on a musical scores recognition system that this classifier is very well suited to develop a complete document recognition system. Indeed this classifier allows them firstly to extract known symbols in a document (text for example) and secondly to validate segmentation hypotheses. Tests had been successfully performed on musical and digit symbols databases.
|
| [Choudhury2000] | G. Sayeed Choudhury, M. Droetboom, Tim DiLauro, Ichiro Fujinaga, and Brian Harrington. Optical Music Recognition System within a Large-Scale Digitization Project. In 1st International Symposium on Music Information Retrieval, 2000a. [ bib | http ] |
| [Choudhury2000a] |
G. Sayeed Choudhury, Cynthia Requardt, Ichiro Fujinaga, Tim DiLauro,
Elisabeth W. Brown, James W. Warner, and Brian Harrington.
Digital workflow management: The Lester S. Levy digitized collection
of sheet music.
First Monday, 5 (6), 2000b.
[ bib |
DOI ]
The paper describes the development of a set of workflow management tools (WMS) that will reduce the manual input necessary to manage the workflow of large-scale digitization projects. The WMS will also support the path from physical object and/or digitized material into a digital library repository by providing effective tools for perusing multimedia elements. The Lester S. Levy Collection of Sheet Music Project at the Milton S. Eisenhower Library at The Johns Hopkins University provides an ideal testbed for the development and evaluation of the WMS. Building upon previous effort to digitize the entire collection of over 29000 pieces of sheet music, optical music recognition (OMR) software will create sound files and full-text lyrics. The combination of image, text and sound files provide a comprehensive multimedia environment. The functionality of the collection will be enhanced by the incorporation of metadata, the implementation of a disk based search engine for lyrics, and the development of toolkits for searching sound files (0 Refs.) music; search engines; workflow management software
|
| [Fotinea2000] | Stavroula-Evita Fotinea, George Giakoupis, Aggelos Livens, Stylianos Bakamidis, and George Carayannis. An Optical Notation Recognition System for Printed Music Based on Template Matching and High Level Reasoning. In RIAO '00 Content-Based Multimedia Information Access, pages 1006-1014, Paris, France, 2000. Le centre de hautes etudes internationales d'informatique documentaire. [ bib | http ] |
| [Fujinaga2000] | Ichiro Fujinaga. Optical Music Recognition Bibliography. http://www.music.mcgill.ca/~ich/research/omr/omrbib.html, 2000. [ bib | .html ] |
| [Lallican2000] | P. M. Lallican, C. Viard-Gaudin, and S. Knerr. From Off-Line to On-Line Handwriting Recognition. In L. R. B. Schomaker and L. G. Vuurpijl, editors, 7th International Workshop on Frontiers in Handwriting Recognition, pages 303-312, Amsterdam, 2000. International Unipen Foundation. ISBN 90-76942-01-3. [ bib | .pdf ] |
| [Lin2000] |
Karen Lin and Tim Bell.
Integrating Paper and Digital Music Information Systems.
In International Society for Music Information Retrieval,
pages 23-25, 2000.
[ bib |
.pdf ]
Active musicians generally rely on extensive personal paper-based music information retrieval systems containing scores, parts, compositions, and arrangements of published and hand-written music. Many have a bias against using computers to store, edit and retrieve music, and prefer to work in the paper domain rather than using digital documents, despite the flexibility and powerful retrieval opportunities available. In this paper we propose a model of operation that blurs the boundaries between the paper and digital domains, offering musicians the best of both worlds. A survey of musicians identifies the problems and potential of working with digital tools, and we propose a system using colour printing and scanning technology that simplifies the process of moving music documents between the two domains
|
| [Miyao2000] | Hidetoshi Miyao and Robert Martin Haralick. Format of Ground Truth Data Used in the Evaluation of the Results of an Optical Music Recognition System. In 4th International Workshop on Document Analysis Systems, pages 497-506, Brasil, 2000. [ bib | .pdf ] |
| [Pinto2000] |
João Caldas Pinto, Pedro Vieira, M. Ramalho, M. Mengucci, P. Pina, and
F. Muge.
Ancient Music Recovery for Digital Libraries.
In José Borbinha and Thomas Baker, editors, Research and
Advanced Technology for Digital Libraries, pages 24-34, Berlin, Heidelberg,
2000. Springer Berlin Heidelberg.
ISBN 978-3-540-45268-3.
[ bib |
DOI ]
The purpose of this paper is to present a description and current state of the “ROMA” (Reconhecimento Óptico de Música Antiga or Ancient Music Optical Recognition) Project that consists on building an application, for the recognition and restoration specialised in ancient music manuscripts (from XVI to XVIII century). This project, beyond the inventory of the Biblioteca Geral da Universidade de Coimbra musical funds aims to develop algorithms for scores restoration and musical symbols recognition in order to allow a suitable representation and restoration on digital format. Both objectives have an intrinsic research nature one in the area of musicology and other in digital libraries.
|
| [Bainbridge1999] |
David Bainbridge and K. Wijaya.
Bulk processing of optically scanned music.
In 7th International Conference on Image Processing and its
Applications, pages 474-478. Institution of Engineering and Technology,
1999.
[ bib |
DOI ]
For many years now optical music recognition (OMR) has been advocated as the leading methodology for transferring the vast repositories of music notation from paper to digital database. Other techniques exist for acquiring music on-line; however, these methods require operators with musical and computer skills. The notion, therefore, of an entirely automated process through OMR is highly attractive. It has been an active area of research since its inception in 1966 (Pruslin), and even though there has been the development of many systems with impressively high accuracy rates it is surprising to note that there is little evidence of large collections being processed with the technology-work by Carter (1994) and Bainbridge and Carter (1997) being the only known notable exception. This paper outlines some of the insights gained, and algorithms implemented, through the practical experience of converting collections in excess of 400 pages. In doing so, the work demonstrates that there are additional factors not currently considered by other research centres that are necessary for OMR to reach its full potential.
|
| [Beran1999] |
Tomáš Beran and Tomáš Macek.
Recognition of Printed Music Score.
In Petra Perner and Maria Petrou, editors, Machine Learning and
Data Mining in Pattern Recognition, pages 174-179. Springer Berlin
Heidelberg, 1999.
ISBN 978-3-540-48097-6.
[ bib |
DOI ]
This article describes our implementation of the Optical Music Recognition System (OMR). The system implemented in our project is based on the binary neural network ADAM. ADAM has been used for recognition of music symbols. Preprocessing was implemented by conventional techniques. We decomposed the OMR process into several phases. The results of these phases are summarized.
|
| [Blostein1999] |
Dorothea Blostein and Lippold Haken.
Using diagram generation software to improve diagram recognition: a
case study of music notation.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 21 (11): 1121-1136, 1999.
ISSN 0162-8828.
[ bib |
DOI ]
Diagrams are widely used in society to transmit information such as circuit designs, music, mathematical formulae, architectural plans, and molecular structure. Computers must process diagrams both as images (marks on paper) and as information. A diagram recognizer translates from image to information and a diagram generator translates from information to image. Current technology for diagram generation is ahead of the technology for diagram recognition. Diagram generators have extensive knowledge of notational conventions which relate to readability and aesthetics, whereas current diagram recognizers focus on the hard constraints of the notation. To create a recognizer capable of exploiting layout information, it is expedient to reuse the expertise in existing diagram generators. In particular, we discuss the use of Lime (our editor and generator for music notation) to proofread and correct the raw output of MIDIScan (a third-party commercial recognizer for music notation). Over the past several years, this combination of software has been distributed to thousands of users.
|
| [Ferrand1999] | Miguel Ferrand, João Alexandre Leite, and Amilcar Cardoso. Hypothetical reasoning: An application to Optical Music Recognition. In Appia-Gulp-Prode'99 joint conference on declarative programming, pages 367-381, 1999a. [ bib | http ] |
| [Ferrand1999a] |
Miguel Ferrand, João Alexandre Leite, and Amilcar Cardoso.
Improving Optical Music Recognition by Means of Abductive Constraint
Logic Programming.
In Pedro Barahona and José J. Alferes, editors, Progress in
Artificial Intelligence, pages 342-356, Berlin, Heidelberg, 1999b.
Springer Berlin Heidelberg.
ISBN 978-3-540-48159-1.
[ bib |
DOI ]
In this paper we propose a hybrid system that bridges the gap between traditional image processing methods, used for low-level object recognition, and abductive constraint logic programming used for high-level musical interpretation. Optical Music Recognition (OMR) is the automatic recognition of a scanned page of printed music. All such systems are evaluated by their rate of successful recognition; therefore a reliable OMR program should be able to detect and eventually correct its own recognition errors. Since we are interested in dealing with polyphonic music, some additional complexity is introduced as several concurrent voices and simultaneous musical events may occur. In RIEM, the OMR system we are developing, when events are inaccurately recognized they will generate inconsistencies in the process of voice separation. Furthermore if some events are missing a consistent voice separation may not even be possible.
|
| [Hori1999] |
Toyokazu Hori, Shinichiro Wada, Howzan Tai, and S. Y. Kung.
Automatic music score recognition/play system based on decision based
neural network.
In 3rd Workshop on Multimedia Signal Processing, pages
183-184, 1999.
[ bib |
DOI ]
This paper proposes an automatic music score recognition system based on a hierarchically structured decision based neural network (DBNN), which can classify patterns with nonlinear decision boundaries. Currently, this system yields around a 97% recognition rate for printed music scores.
|
| [Marinai1999] |
Simone Marinai and Paolo Nesi.
Projection Based Segmentation of Musical Sheets.
In 5th International Conference on Document Analysis and
Recognition, pages 3-6, 1999.
ISBN 0-7695-0318-7.
[ bib |
DOI ]
The automatic recognition of music scores is a key process for the electronic treatment of music information. In this paper we present the segmentation module of an OMR system. The proposed approach is based on the use of projection profiles for the location of elementary symbols that constitute the music notation. An extensive experimentation was made which the help of a tool developed to this purpose. Reported results shown a high efficiency in the correct location of elementary symbols
|
| [McPherson1999] | John R. McPherson. Page Turning - Score Automation for Musicians. Technical report, University of Canterbury, New Zealand, 1999. [ bib | http ] |
| [Ng1999] | Kia Ng, David Cooper, Ewan Stefani, Roger Boyle, and Nick Bailey. Embracing the Composer : Optical Recognition of Handwrtten Manuscripts. In International Computer Music Conference, pages 500-503, 1999. [ bib | http ] |
| [VuilleumierStueckelberg1999] |
Marc Vuilleumier Stückelberg and David Doermann.
On musical score recognition using probabilistic reasoning.
In 5th International Conference on Document Analysis and
Recognition, pages 115-118, 1999.
ISBN 0-7695-0318-7.
[ bib |
DOI |
http ]
We present a probabilistic framework for document analysis and recognition and illustrate it on the problem of musical score recognition. Our system uses an explicit descriptive model of the document class to find the most likely interpretation of a scanned document image. In contrast to the traditional pipeline architecture, we carry out all stages of the analysis with a single inference engine, allowing for an end-to-end propagation of the uncertainty. The global modeling structure is similar to a stochastic attribute grammar, and local parameters are estimated using hidden Markov models (10 Refs.) image processing; image recognition; inference mechanisms; music; uncertainty handling
|
| [Wijaya1999] |
K. Wijaya and David Bainbridge.
Staff line restoration.
In 7th International Conference on Image Processing and its
Applications, pages 760-764. Institution of Engineering and Technology,
1999.
[ bib |
DOI ]
Optical music recognition (OMR), the conversion of scanned pages of music into a musical database, has reached an exciting level of maturity. Like optical character recognition, it has now reached the point where the returns in accuracy from increasingly sophisticated pattern recognition algorithms appears saturated and more significant gains are being made from the application of structured a priori knowledge. This paper describes one such technique for improved staff line processing-the detection and subsequent correction of bowing in the staff lines, which is an important category given the significant source of music in book form. Two versions of the algorithm are tested: the first, based on mathematical morphology, has the added benefit of automatically fusing small breaks in staff lines, common for example in older works; the second, based on a flood-fill algorithm, requires a minor modification if fragmented staff lines are to be repaired. The correct detection and processing of staff lines is fundamental to OMR. Without adequate knowledge of staff line location, notation superimposed on the staves cannot be correctly separated, classified and processed.
|
| [Bainbridge1998] |
David Bainbridge and Stuart Inglis.
Musical image compression.
In Data Compression Conference, pages 209-218, 1998.
[ bib |
DOI ]
Optical music recognition aims to convert the vast repositories of sheet music in the world into an on-line digital format. In the near future it will be possible to assimilate music into digital libraries and users will be able to perform searches based on a sung melody in addition to typical text-based searching. An important requirement for such a system is the ability to reproduce the original score as accurately as possible. Due to the huge amount of sheet music available, the efficient storage of musical images is an important topic of study. This paper investigates whether the "knowledge" extracted from the optical music recognition (OMR) process can be exploited to gain higher compression than the JBIG international standard for bi-level image compression. We present a hybrid approach where the primitive shapes of music extracted by the optical music recognition process-note heads, note stems, staff lines and so forth-are fed into a graphical symbol based compression scheme originally designed for images containing mainly printed text. Using this hybrid approach the average compression rate for a single page is improved by 3.5% over JBIG. When multiple pages with similar typography are processed in sequence, the file size is decreased by 4-8%. The relevant background to both optical music recognition and textual image compression is presented. Experiments performed on 66 test images are described, outlining the combinations of parameters that were examined to give the best results.
|
| [Chhabra1998] |
Atul K. Chhabra.
Graphic symbol recognition: An overview.
In Karl Tombre and Atul K. Chhabra, editors, Graphics
Recognition Algorithms and Systems, pages 68-79, Berlin, Heidelberg, 1998.
Springer Berlin Heidelberg.
ISBN 978-3-540-69766-4.
[ bib |
DOI ]
Symbol recognition is one of the primary stages of any graphics recognition system. This paper reviews the current state of the art in graphic symbol recognition and raises some open issues that need further investigation. Work on symbol recognition tends to be highly application specific. Therefore, this review presents the symbol recognition methods in the context of specific applications.
|
| [Fahmy1998] |
Hoda M. Fahmy and Dorothea Blostein.
A graph-rewriting paradigm for discrete relaxation: Application to
sheet-music recognition.
International Journal of Pattern Recognition and Artificial
Intelligence, 12 (6): 763-799, 1998.
[ bib |
DOI ]
In image analysis, recognition of the primitives plays an important role. Subsequent analysis is used to interpret the arrangement of primitives. This subsequent analysis must make allowance for errors or ambiguities in the recognition of primitives. In this paper, we assume that the primitive recognizer produces a set of possible interpretations for each primitive. To reduce this primitive-recognition ambiguity, we use contextual information in the image, and apply constraints from the image domain. This process is variously termed constraint satisfaction, labeling or discrete relaxation. Existing methods for discrete relaxation are limited in that they assume a priori knowledge of the neighborhood model: before relaxation begins, the system is told (or can determine) which sets of primitives are related by constraints. These methods do not apply to image domains in which complex analysis is necessary to determine which primitives are related by constraints. For example, in music notation, we must recognize which notes belong to one measure, before it is possible to apply the constraint that the number of beats in the measure should match the time signature. Such constraints can be handled by our graph-rewriting paradigm for discrete relaxation: here neighborhood-model construction is interleaved with constraint-application. In applying this approach to the recognition of simple music notation, we use approximately 180 graph-rewriting rules to express notational constraints and semantic-interpretation rules far music notation. The graph rewriting rules express both binary and higher-order notational constraints. As image-interpretation proceeds, increasingly abstract levels of interpretation are assigned to (groups of) primitives. This allows application of higher-level constraints, which can be formulated only after partial interpretation of the image.
|
| [Ferrand1998] |
Miguel Ferrand and Amílcar Cardoso.
Scheduling to Reduce Uncertainty in Syntactical Music Structures.
In Flávio Moreira de Oliveira, editor, Advances in
Artificial Intelligence, pages 249-258, Berlin, Heidelberg, 1998. Springer
Berlin Heidelberg.
ISBN 978-3-540-49523-9.
[ bib |
DOI ]
In this paper, we focus on the syntactical aspects of music representation. We look at a music score as a structured layout of events with intrinsic temporal significance and we show that important basic relations between these events can be inferred from the topology of symbol objects in a music score. Within this framework, we propose a scheduling algorithm to find consistent assignments of events to voices, in the presence of uncertain information. Based on some experimental results, we show how we may use this approach to improve the accuracy of an Optical Music Recognition system.
|
| [Fujinaga1998] | Ichiro Fujinaga, Stephan Moore, and David S. Sullivan. Implementation of exemplar-based learning model for music cognition. In International Conference on Music Perception and Cognition, pages 171-179, Seoul, South Korea, 1998. [ bib | .pdf ] |
| [Bainbridge1997] |
David Bainbridge and Tim Bell.
Dealing with Superimposed Objects in Optical Music Recognition.
In 6th International Conference on Image Processing and its
Applications, pages 756-760, 1997.
ISBN 0 85296 692 X.
[ bib |
DOI ]
Optical music recognition (OMR) involves identifying musical symbols on a scanned sheet of music, and interpreting them so that the music can either be played by the computer, or put into a music editor. Applications include providing an automatic accompaniment, transposing or extracting parts for individual instruments, and performing an automated musicological analysis of the music. A key problem with music recognition, compared with character recognition, is that symbols very often overlap on the page. The most significant form of this problem is that the symbols are superimposed on a five-line staff. Although the staff provides valuable positional information, it creates ambiguity because it is difficult to determine whether a pixel would be black or white if the staff line was not there. The other main difference between music recognition and character recognition is the set of permissible symbols. In text, the alphabet size is fixed. Conversely, in music notation there is no standard "alphabet" of shapes, with composers inventing new notation where necessary, and music for particular instruments using specialised notation where appropriate. The focus of this paper is on techniques we have developed to deal with superimposed objects (6 Refs.) recognition
|
| [Bainbridge1997a] | David Bainbridge. Extensible optical music recognition. PhD thesis, University of Canterbury, 1997. [ bib | http ] |
| [Bainbridge1997b] |
David Bainbridge and Nicholas Paul Carter.
Automatic reading of music notation.
In H. Bunke and P. Wang, editors, Handbook of Character
Recognition and Document Image Analysis, pages 583-603. World Scientific,
Singapore, 1997.
[ bib |
DOI ]
The aim of Optical Music Recognition (OMR) is to convert optically scanned pages of music into a machine-readable format. In this tutorial level discussion of the topic, an historical background of work is presented, followed by a detailed explanation of the four key stages to an OMR system: stave line identification, musical object location, symbol identification, and musical understanding. The chapter also shows how recent work has addressed the issues of touching and fragmented objects—objectives that must be solved in a practical OMR system. The report concludes by discussing remaining problems, including measuring accuracy.
|
| [VuilleumierStueckelberg1997] | Marc Vuilleumier Stückelberg, Christian Pellegrini, and Mélanie Hillario. A preview of an architecture for musical score recognition. Technical report, University of Geneva, 1997b. [ bib | http ] |
| [VuilleumierStueckelberg1997a] |
Marc Vuilleumier Stückelberg, Christian Pellegrini, and Mélanie
Hilario.
An architecture for musical score recognition using high-level domain
knowledge.
In 4th International Conference on Document Analysis and
Recognition, pages 813-818 vol.2, 1997a.
[ bib |
DOI ]
Proposes an original approach to musical score recognition, a particular case of high-level document analysis. In order to overcome the limitations of existing systems, we propose an architecture which allows for a continuous and bidirectional interaction between high-level knowledge and low-level data, and which is able to improve itself over time by learning. This architecture is made of three cooperating layers, one made of parameterized feature detectors, another working as an object-oriented knowledge repository and the other as a supervising Bayesian metaprocessor. Although the implementation is still in progress, we show how this architecture is adequate for modeling and processing knowledge.
|
| [Anstice1996] |
Jamie Anstice, Tim Bell, Andy Cockburn, and Martin Setchell.
The design of a pen-based musical input system.
In 6th Australian Conference on Computer-Human Interaction,
pages 260-267, 1996.
[ bib |
DOI ]
Computerising the task of music editing can avoid a considerable amount of tedious work for musicians, particularly for tasks such as key transposition, part extraction, and layout. However the task of getting the music onto the computer can still be time consuming and is usually done with the help of bulky equipment. This paper reports on the design of a pen-based input system that uses easily-learned gestures to facilitate fast input, particularly if the system must be portable. The design is based on observations of musicians writing music by hand, and an analysis of the symbols in samples of music. A preliminary evaluation of the system is presented, and the speed is compared with the alternatives of handwriting, synthesiser keyboard input, and optical music recognition. Evaluations suggest that the gesture-based system could be approximately three times as fast as other methods of music data entry reported in the literature.
|
| [Bainbridge1996] | David Bainbridge and Tim Bell. An extensible optical music recognition system. Australian Computer Science Communications, 18: 308-317, 1996. [ bib | .html ] |
| [CapellaScan] | capella-software AG. Capella Scan. https://www.capella-software.com, 1996. [ bib | http ] |
| [Dan1996] |
Lee Sau Dan.
Automatic Optical Music recognition.
Technical report, The University of Waikato, New Zealand, 1996.
[ bib |
.ps.gz ]
In this pro ject, the topic of automatic optical music recognition was studied. It is the conversion of an optically sampled image of a musical score into a representation that can be conveniently stored in computer storage and retrieved for various purpose. It is analogous to optical character recognition. Optical character recognition recognizes text characters in the input images and output the text in a machine-readable format. Similarly, an optical music recognition system recognizes the symbols on a musical score and output the results in a binary format. Subsequent processing on this output can provide a wide variety of applications, such as reprinting and archiving.
|
| [Fujinaga1996] | Ichiro Fujinaga. Exemplar-based learning in adaptive optical music recognition system. In International Computer Music Conference, pages 55-56, Hong Kong, 1996a. ISBN 962-85092-1-7. [ bib | http ] |
| [Fujinaga1996a] | Ichiro Fujinaga. Adaptive optical music recognition. PhD thesis, McGill University, 1996b. [ bib | .pdf ] |
| [Homenda1996] |
Wladyslaw Homenda.
Automatic recognition of printed music and its conversion into
playable music data.
Control and Cybernetics, 25 (2): 353-367, 1996.
[ bib |
.pdf ]
The paper describes MIDISCAN-a recognition system for printed music notation. Music notation recognition is a challenging problem in both fields: pattern recognition and knowledge representation. Music notation symbols, though well characterized by their features, are arranged in an elaborate way in real music notation, which makes recognition task very difficult and still open for new ideas, as for example, fuzzy set application in skew correction and stave location. On the other hand, the aim of the system, i.e. conversion of acquired printed music into playable MIDI format requires special representation of music data. The problems of pattern recognition and knowledge representation in context of music processing are discussed in this paper (16 Refs.) music; optical character recognition
|
| [Kopec1996] |
Gary E. Kopec, Philip A. Chou, and David A. Maltz.
Markov source model for printed music decoding.
Journal of Electronic Imaging, 5, 1996.
[ bib |
DOI |
.pdf ]
A Markov source model is described for a simple subset of printed music notation that was developed as an extended example of the document image decoding (DID) approach to document image analysis. The model is based on the Adobe Sonata music symbol set and a finite-state language of textual music messages. The music message language is defined and several important aspects of message imaging are discussed. Aspects of music notation that appear problematic for a finite-state representation are identified. Finally, an example of music image decoding and resynthesis using the model is presented. Development of the model was greatly facilitated by the duality between image synthesis and image decoding that is fundamental to the DID paradigm.
|
| [Miyao1996] |
Hidetoshi Miyao and Yasuaki Nakano.
Note symbol extraction for printed piano scores using neural
networks.
IEICE Transactions on Information and Systems, E79-D (5):
548-554, 1996.
[ bib |
http ]
In the traditional note symbol extraction processes, extracted candidates of note elements were identified using complex if-then rules based on the note formation rules and they needed subtle adjustment of parameters through many experiments. The purpose of our system is to avoid the tedious tasks and to present an accurate and high-speed extraction of note heads, stems and flags according to the following procedure. (1) We extract head and flag candidates based on the stem positions. (2) To identify heads and flags from the candidates, we use a couple of three-layer neural networks. To make the networks learn, we give the position informations and reliability factors of candidates to the input units. (3) With the weights learned by the net, the head and flag candidates are recognized. As an experimental result, we obtained a high extraction rate of more than 99% for thirteen printed piano scores on A4 sheet which have various difficulties. Using a workstation (SPARC Station 10), it took about 90 seconds to do on the average. It means that our system can analyze piano scores 5 times or more as fast as the manual work. Therefore, our system can execute the task without the traditional tedious works, and can recognize them quickly and accurately (9 Refs.) recognition
|
| [Modayur1996] | Bharath R. Modayur. Music Score Recognition - A Selective Attention Approach using Mathematical Morphology. Technical report, Electrical Engineering Department, University of Washington, Seattle, 1996. [ bib | http ] |
| [Ng1996] |
Kia Ng and Roger Boyle.
Recognition and reconstruction of primitives in music scores.
Image and Vision Computing, 14 (1): 39-46, 1996.
ISSN 0262-8856.
[ bib |
DOI |
http ]
Music recognition bears similarities and differences to OCR. In this paper we identify some of the problems peculiar to musical scores, and propose an approach which succeeds in a wide range of non-trivial cases. The composer customarily proceeds by writing notes, then stems, beams, ties and slurs — we have inverted this approach by segmenting and then subsegmenting scores to recapture the component parts of symbols. In this paper, we concentrate on the strategy of recognizing sub-segmented primitives, and the reassembly process which reconstructs low level graphical primitives back to musical symbols. The sub-segmentation process proves to be worthwhile, since many primitives complement each other and high level musical theory can be employed to enhance the recognition process.
|
| [Reed1996] |
K. Todd Reed and J. R. Parker.
Automatic Computer Recognition of Printed Music.
In 13th International Conference on Pattern Recognition, pages
803-807, 1996.
ISBN 081867282X.
[ bib |
DOI ]
This paper provides an overview to the implementation of Lemon, a complete optical music recognition system. Among the techniques employed by the implementation are: template matching, the Hough transform, line adjacency graphs, character profiles, and graph grammars. Experimental results, including comparisons with commercial systems, are provided
|
| [Yadid-Pecht1996] |
Orly Yadid-Pecht, Moty Gerner, Lior Dvir, Eliyahu Brutman, and Uri Shimony.
Recognition of handwritten musical notes by a modified Neocognitron.
Machine Vision and Applications, 9 (2): 65-72, 1996.
ISSN 1432-1769.
[ bib |
DOI ]
A neural network for recognition of handwritten musical notes, based on the well-known Neocognitron model, is described. The Neocognitron has been used for the “what” pathway (symbol recognition), while contextual knowledge has been applied for the “where” (symbol placement). This way, we benefit from dividing the process for dealing with this complicated recognition task. Also, different degrees of intrusiveness in “learning” have been incorporated in the same network: More intrusive supervised learning has been implemented in the lower neuron layers and less intrusive in the upper one. This way, the network adapts itself to the handwriting of the user. The network consists of a 13x49 input layer and three pairs of “simple” and “complex” neuron layers. It has been trained to recognize 20 symbols of unconnected notes on a musical staff and was tested with a set of unlearned input notes. Its recognition rate for the individual unseen notes was up to 93%, averaging 80% for all categories. These preliminary results indicate that a modified Neocognitron could be a good candidate for identification of handwritten musical notes.
|
| [Baumann1995] | Stephan Baumann. A Simplified Attributed Graph Grammar for High-Level Music Recognition. In 3rd International Conference on Document Analysis and Recognition, pages 1080-1083. IEEE, 1995. ISBN 0-8186-7128-9. [ bib | DOI ] |
| [Baumann1995a] | Stephan Baumann and Karl Tombre. Report of the line drawing and music recognition working group. In A. Lawrence Spitz and Andreas Dengel, editors, Document Analysis Systems, pages 1080-1083, 1995. [ bib | DOI ] |
| [Coueasnon1995] |
Bertrand Coüasnon, Pascal Brisset, and Igor Stéphan.
Using Logic Programming Languages For Optical Music Recognition.
In 3rd International Conference on the Practical Application of
Prolog, 1995.
[ bib |
http ]
Optical Music Recognition is a particular form of document analysis in which there is much knowledge about document structure. Indeed there exists an important set of rules for musical notation, but current systems do not fully use them. We propose a new solution using a grammar to guide the segmentation of the graphical ob jects and their recognition. The grammar is essentially a description of the relations (relative position and size, adjacency, etc) between the graphical ob jects. Inspired by Denite Clause Grammar techniques, the grammar can be directly implemented in Prolog, a higher-order dialect of Prolog. Moreover, the translation from the grammar into Prolog code can be done automatically. Our approach is justied by the rst encouraging results obtained with a prototype for music score recognition.
|
| [Coueasnon1995a] |
Bertrand Coüasnon and Jean Camillerapp.
A Way to Separate Knowledge From Program in Structured Document
Analysis: Application to Optical Music Recognition.
In 3rd International Conference on Document Analysis and
Recognition, pages 1092-1097, 1995.
[ bib |
DOI ]
Optical Music Recognition is a form of document analysis for which a priori knowledge is particularly important. Musical notation is governed by a substantial set of rules, but current systems fail to use them adequately. In complex scores, existing systems cannot overcome the well-known segmentation problems of document analysis, due mainly to the high density of music information. This paper proposes a new method of recognition which uses a grammar in order to formalize the syntactic rules and represent the context. However, where objects touch, there is a discrepancy between the way the existing knowledge (grammar) will describe an object and the way it is recognized, since touching objects have to be segmented first. Following a description of the grammar, this paper shall go on to propose the use of an operator to modify the way the grammar parses the image so that the system can deal with certain touching objects (e.g. where an accidental touches a notehead).
|
| [Coueasnon1995b] |
Bertrand Coüasnon and Bernard Rétif.
Using a grammar for a reliable full score recognition system.
In International Computer Music Conference, pages 187-194,
1995.
[ bib |
.pdf ]
Optical Music Recognition needs to be reliable to avoid users to detect and correct errors by controlling all the recognized score. Reliability can be reach by improving the recognition quality (on segmentation problems) and by making the system able to detect itself its recognition errors. This is possible only by using as much as possible the musical knowledge. Therefore, we propose a grammar to formalize the musical knowledge on full cores with polyphonic staves. We then show how this grammar can help detection of most of errors on note duration. The presented system is in an implementation phase but is already able to deal with full scores and to point on errors.
|
| [Homenda1995] |
Wladyslaw Homenda.
Optical pattern recognition for printed music notation.
In Symposium on OE/Aerospace Sensing and Dual Use Photonics,
1995.
[ bib |
DOI ]
The paper presents problems related to automated recognition of printed music notation. Music notation recognition is a challenging problem in both fields: pattern recognition and knowledge representation. Music notation symbols, though well characterized by their features, are arranged in elaborated way in real music notation, which makes recognition task very difficult and still open for new ideas. On the other hand, the aim of the system, i.e. application of acquired printed music into further processing requires special representation of music data. Due to complexity of music nature and music notation, music representation is one of the key issue in music notation recognition and music processing. The problems of pattern recognition and knowledge representation in context or music processing are discussed in this paper. MIDISCAN, the computer system for music notation recognition and music processing, is presented.
|
| [Miyao1995] |
Hidetoshi Miyao and Yasuaki Nakano.
Head and stem extraction from printed music scores using a neural
network approach.
In 3rd International Conference on Document Analysis and
Recognition, pages 1074-1079, 1995.
ISBN 0-8186-7128-9.
[ bib |
DOI ]
In an automatic music score recognition system, it is very important to extract heads and stems of notes, since these symbols are most ubiquitous in a score and musically important. The purpose of our system is to present an accurate and high-speed extraction of note heads (except the whole notes) and stems according to the following procedure. (1) We extract all regions which are considered as candidates of stems or heads. (2) To identify heads from the candidates, we use a three-layer neural network. (3) The weights for the network are learned by the back propagation method. In the learning, the network learns the spatial constraints between heads and surroundings rather than the shapes of heads. (4) After the learning process is completed we use this network to identify a number of test head candidates (5) The stem candidates touching the detected heads are extracted as true stems. As an experimental result, we obtained high recognition rates of 99.0% and 99.2% for stems and note heads, respectively. It took between 40 to 100 seconds to process a printed piano score on A4 sheet using a workstation. Therefore, our system can analyze it at least 10 times as fast as manual methods
|
| [Ng1995] |
Kia Ng, Roger Boyle, and David Cooper.
Low- and high-level approaches to optical music score recognition.
In IEE Colloquium on Document Image Processing and Multimedia
Environments, pages 31-36, 1995.
[ bib |
DOI ]
The computer has become an increasingly important device in music. It can not only generate sound but is also able to perform time consuming and repetitive tasks, such as transposition and part extraction, with speed and accuracy. However, a score must be represented in a machine readable format before any operation can be carried out. Current input methods, such as using an electronic keyboard, are time consuming and require human intervention. Optical music recognition (OMR) provides an interesting, efficient and automatic method to transform paper-based music scores into a machine representation. The authors outline the techniques for pre-processing and discuss the heuristic and musical rules employed to enhance recognition. A spin-off application that makes use of the intermediate results to enhance stave lines is also presented. The authors concentrate on the techniques used for time-signature detection, discuss the application of frequently-found rhythmical patterns to clarify the results of OMR, and propose possible enhancements using such knowledge. They believe that domain-knowledge enhancement is essential for complex document analysis and recognition. Other possible areas of development include melodic, harmonic and stylistic analysis to improve recognition results further.
|
| [PoulaindAndecy1995] | Vincent Poulain d'Andecy, Jean Camillerapp, and Ivan Leplumey. Analyse de Partitions Musicales. Traitement du Signal, 12 (6): 653-661, 1995. [ bib | http ] |
| [Seales1995] |
W. Brent Seales and Arcot Rajasekar.
Interpreting music manuscripts: A logic-based, object-oriented
approach.
In Roland T. Chin, Horace H. S. Ip, Avi C. Naiman, and Ting-Chuen
Pong, editors, Image Analysis Applications and Computer Graphics,
pages 181-188, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg.
ISBN 978-3-540-49298-6.
[ bib |
DOI ]
This paper presents a complete framework for recognizing classes of machine-printed musical manuscripts. Our framework is designed around the decomposition of a manuscript into objects such as staves and bars which are processed with a knowledge base module that encodes rules in Prolog. Object decomposition focuses the recognition problem, and the rule base provides a powerful and flexible way to encode the rules of a particular manuscript class. Our rule-base registers notes and stems, eliminates false-positives and correctly labels notes according to their position on the staff. We present results that show 99% accuracy at detecting note-heads and 95% accuracy in finding stems.
|
| [Yoda1995] |
Ikushi Yoda, Kazuhiko Yamamoto, and Hiromitsu Yamada.
Automatic Construction of Recognition Procedures for Musical Notes by
Genetic Algorithm.
In A. Lawrence Spitz and Andreas Dengel, editors, Document
Analysis Systems, 1995.
[ bib |
DOI ]
The Table of Contents for the full book PDF is as follows: System Architecture Data Structures for Page Readers Palace: A Multilingual Document Recognition System Experiences with High-Volume, High Accuracy Document Capture OfficeMAID - A System for Office Mail Analysis, Interpretation and Delivery Programmable Contextual Analysis A System for Exploiting Context in Automatic Recognition An Adaptive Approach to Document Classification and Understanding Class Evaluation Document Image Analysis: Automated Performance Evaluation Using Consensus Sequence Voting to Correct OCR Errors A Handwritten Character Recognition System by Efficient Combination of Multiple Classifiers A Region-Based System for the Automatic Evaluation of Page Segmentation Algorithms Integration of Contextual Knowledge Sources into a Blackboard-Based Text Recognition System Automatic Construction of Recognition Procedures for Musical Notes by Genetic Algorithm Recognition of Handwritten Responses on US Census Forms A System for the Recognition of Handwritten Literal Amounts of Checks Handwritten Text Recognition Line Drawing Knowledge Organization and Interpretation Process in Engineering Drawing Interpretation Processing Imprecise and Structural Distorted Line Drawings by An Adaptable Drawing Interpretation Kernal Vector-Based Arc Segmentation in the Machine Drawing Understanding System Environment Robust Drawing Recognition Based on Model-Guided Segmentation Innovations Document Image Matching and Retrieval with Multiple Distortion-Invariant Descriptors Off-Line Interpretation and Execution of Corrections on Text Documents Analysis of Scanned Braille Documents Document Analysis by Fractal Signatures Working Groups Possibilities for International Collaboration Document Analysis and Learning Needs of the Market and User Requirements Evaluation-Criteria Handwriting Line Drawing and Music Recognition Multilingual Documents and Natural Language Processing Form Recognition
|
| [Bainbridge1994] | David Bainbridge. A complete optical music recognition system: Looking to the future. Technical report, University of Canterbury, 1994a. [ bib | http ] |
| [Bainbridge1994a] | David Bainbridge. Optical music recognition: Progress report 1. Technical report, Department of Computer Science, University of Canterbury, 1994b. [ bib | http ] |
| [Carter1994] |
Nicholas Paul Carter.
Conversion of the Haydn symphonies into electronic form using
automatic score recognition: a pilot study.
In International Symposium on Electronic Imaging: Science and
Technology, pages 2181 - 2181 - 12, 1994.
[ bib |
DOI ]
As part of the development of an automatic recognition system for printed music scores, a series of `real-world' tasks are being undertaken. The first of these involves the production of a new edition of an existing 104-page, engraved, chamber-music score for Oxford University Press. The next substantial project, which is described here, has begun with a pilot study with a view to conversion of the 104 Haydn symphonies from a printed edition into machine- readable form. The score recognition system is based on a structural decomposition approach which provides advantages in terms of speed and tolerance of significant variations in font, scale, rotation and noise. Inevitably, some editing of the output data files is required, partially due to the limited vocabulary of symbols supported by the system and their permitted superimpositions. However, the possibility of automatically processing the bulk of the contents of over 600 pages of orchestral score in less than a day of compute time makes the conversion task manageable. The influence that this undertaking is having on the future direction of system development also is discussed.
|
| [Coueasnon1994] | Bertrand Coüasnon and Jean Camillerapp. Using Grammars to Segment and Recognize Music Scores. In International Association for Pattern Recognition Workshop on Document Analysis Systems, pages 15-27, Kaiserslautern, Germany, 1994. [ bib | .ps ] |
| [Essmayr1994] | Wolfgang Essmayr. Optische-Musik-Erkennung (OME), Erkennung von Notenschrift. Master's thesis, Johannes Kepler University Linz, Austria, 1994. [ bib | .ps.gz ] |
| [Fahmy1994] |
Hoda M. Fahmy and Dorothea Blostein.
Graph-rewriting approach to discrete relaxation: application to music
recognition.
In International Symposium on Electronic Imaging: Science and
Technology, pages 2181 - 2181 - 12, 1994.
[ bib |
DOI ]
In image analysis, low-level recognition of the primitives plays a very important role. Once the primitives of the image are recognized, depending on the application, many types of analyses can take place. It is likely that associated with each object or primitive is a set of possible interpretations, herein referred to as the label set. The low-level recognizer may associate a probability with each label in the label set. We can use the constraints of the application domain to reduce the ambiguity in the object's identity. This process is variously termed constraint satisfaction, labeling, or relaxation. In this paper, we focus on the discrete form of relaxation. Our contribution lies in the development of a graph-rewriting approach which does not assume the degree of localness is high. We apply our approach to the recognition of music notation, where non-local interactions between primitives must be used in order to reduce ambiguity in the identity of the primitives. We use graph-rewriting rules to express not only binary constraints, but also higher-order notational constraints.
|
| [PoulaindAndecy1994] | Vincent Poulain d'Andecy, Jean Camillerapp, and Ivan Leplumey. Kalman filtering for segment detection: application to music scores analysis. In 12th International Conference on Pattern Recognition. IEEE Comput. Soc. Press, 1994a. [ bib | DOI ] |
| [PoulaindAndecy1994a] | Vincent Poulain d'Andecy, Jean Camillerapp, and Ivan Leplumey. Détecteur robuste de segments; Application à l'analyse de partitions musicales. In Actes 9 ème Congrés AFCET Reconnaissance des Formes et Intelligence Artificielle, 1994b. [ bib ] |
| [Roth1994] | Martin Roth. An approach to recognition of printed music. Technical report, Swiss Federal Institute of Technology, 1994. [ bib | DOI ] |
| [Armand1993] |
Jean-Pierre Armand.
Musical score recognition: A hierarchical and recursive approach.
In 2nd International Conference on Document Analysis and
Recognition, pages 906-909, 1993.
[ bib |
DOI ]
Musical scores for live music show specific characteristics: large format, orchestral score, bad quality of (photo) copies. Moreover such music is generally handwritten. The author addresses the music recognition problem for such scores, and show a dedicated filtering that has been developed, both for segmentation and correction of copy defects. Recognition process involves geometrical and topographical parameters evaluation. The whole process (filtering + recognition) is recursively applied on images and sub-images, in a knowledge-based way.<<ETX>>
|
| [Baumann1993] | Stephan Baumann. Document recognition of printed scores and transformation into MIDI. Technical report, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, 1993. [ bib | DOI ] |
| [Clarke1993] |
Alastair T. Clarke, B. Malcom Brown, and M. P. Thorne.
Recognizing musical text.
In Machine Vision Applications, Architectures, and Systems
Integration, 1993.
[ bib |
DOI ]
This paper reports on some recent developments in a software product that recognizes printed music notation. There are a number of computer systems available which assist in the task of printing music; however the full potential of these systems cannot be realized until the musical text has been entered into the computer. It is this problem that we address in this paper. The software we describe, which uses computationally inexpensive methods, is designed to analyze a music score, previously read by a flat bed scanner, and to extract the musical information that it contains. The paper discusses the methods used to recognize the musical text: these involve sampling the image at strategic points and using this information to estimate the musical symbol. It then discusses some hard problems that have been encountered during the course of the research; for example the recognition of chords and note clusters. It also reports on the progress that has been made in solving these problems and concludes with a discussion of work that needs to be undertaken over the next five years in order to transform this research prototype into a commercial product.
|
| [Fahmy1993] |
Hoda M. Fahmy and Dorothea Blostein.
Graph Grammar Processing of Uncertain Data.
In Advances in Structural and Syntactic Pattern Recognition,
pages 373-382. World Scientific, 1993a.
[ bib |
DOI ]
Abstract Graph grammars may be used to extract the information content from diagrams where there is uncertainty about symbol identity. The input to the graph grammar is derived from the output of a symbol recognizer. We propose a way in which uncertainty can be represented by a graph and a method which extracts the information content of the diagram. We consider the application of graph grammars to the recognition of diagrams such as music scores.
|
| [Fahmy1993a] |
Hoda M. Fahmy and Dorothea Blostein.
A graph grammar programming style for recognition of music notation.
Machine Vision and Applications, 6 (2): 83-99, 1993b.
ISSN 1432-1769.
[ bib |
DOI ]
Graph grammars are a promising tool for solving picture processing problems. However, the application of graph grammars to diagram recognition has been limited to rather simple analysis of local symbol configurations. This paper introduces the Build-Weed-Incorporate programming style for graph grammars and shows its application in determining the meaning of complex diagrams, where the interaction among physically distant symbols is semantically important. Diagram recognition can be divided into two stages: symbol recognition and high-level recognition. Symbol recognition has been studied extensively in the literature. In this work we assume the existence of a symbol recognizer and use a graph grammar to assemble the diagram's information content from the symbols and their spatial relationships. The Build-Weed-Incorporate approach is demonstrated by a detailed discussion of a graph grammar for high-level recognition of music notation.
|
| [Fujinaga1993] |
Ichiro Fujinaga.
Optical music recognition system which learns.
In Enabling Technologies for High-Bandwidth Applications,
1993.
[ bib |
DOI ]
This paper describes an optical music recognition system composed of a database and three interdependent processes: a recognizer, an editor, and a learner. Given a scanned image of a musical score, the recognizer locates, separates, and classifies symbols into musically meaningful categories. This classification is based on the k-nearest neighbor method using a subset of the database that contains features of symbols classified in previous recognition sessions. Output of the recognizer is corrected by a musically trained human operator using a music notation editor. The editor provides both visual and high-quality audio feedback of the output. Editorial corrections made by the operator are passed to the learner which then adds the newly acquired data to the database. The learner's main task, however, involves selecting a subset of the database and reweighing the importance of the features to improve accuracy and speed for subsequent sessions. Good preliminary results have been obtained with everything from professionally engraved scores to hand-written manuscripts.
|
| [Leplumey1993] |
Ivan Leplumey, Jean Camillerapp, and G. Lorette.
A robust detector for music staves.
In 2nd International Conference on Document Analysis and
Recognition, pages 902-905, 1993.
[ bib |
DOI ]
A method for the automatic recognition of music staves based on a prediction-and-check technique is presented in order to extract staves. It can detect lines with some curvature, discontinuities, and inclination. Lines are asserted to be a part of a staff if they can be grouped by five, thus completing the staff. This last phase also identifies additional staff lines.<<ETX>>
|
| [Modayur1993] |
Bharath R. Modayur, Visvanathan Ramesh, Robert M. Haralick, and Linda G.
Shapiro.
MUSER: A prototype musical score recognition system using
mathematical morphology.
Machine Vision and Applications, 6 (2): 140-150, 1993.
ISSN 1432-1769.
[ bib |
DOI ]
Music representation utilizes a fairly rich repertoire of symbols. These symbols appear on a score sheet with relatively little shape distortion, differing from the prototype symbol shapes mainly by a positional translation and scale change. The prototype system we describe in this article is aimed at recognizing printed music notation from digitized music score images. The recognition system is composed of two parts: a low-level vision module that uses morphological algorithms for symbol detection and a high-level module that utilizes prior knowledge of music notation to reason about spatial positions and spatial sequences of these symbols. The high-level module also employs verification procedures to check the veracity of the output of the morphological symbol recognizer. The system produces an ASCII representation of music scores that can be input to a music-editing system. Mathematical morphology provides us the theory and the tools to analyze shapes. This characteristic of mathematical morphology lends itself well to analyzing and subsequently recognizing music scores that are rich in well-defined musical symbols. Since morphological operations can be efficiently implemented in machine vision systems that have special hardware support, the recognition task can be performed in near real-time. The system achieves accuracy in excess of 95% on the sample scores processed so far with a peak accuracy of 99.7% for the quarter and eighth notes, demonstrating the efficacy of morphological techniques for shape extraction.
|
| [Randriamahefa1993] |
R. Randriamahefa, J. P. Cocquerez, C. Fluhr, F. Pepin, and S. Philipp.
Printed music recognition.
In 2nd International Conference on Document Analysis and
Recognition, pages 898-901, 1993.
[ bib |
DOI ]
The different steps to recognize printed music are described. The first step is to detect and to eliminate the staff lines. A robust method based on finding regions where are only the staff lines, linking between them the staff lines pieces in these regions is used. After staff lines elimination, symbols are isolated and a representation called attributed graph is constructed for each symbol. Thinning, polygonalization, spurious segments cleaning, and segment fusion are performed. A first classification, separating all notes with black heads from others, is performed. To recognize notes with black heads (beamed group or quarter notes), a straightforward structural approach using this representation is sufficient and efficient in most cases. In the ambiguous cases (chord or black head linked to two stems), an ellipse matching method is used. To recognize half notes and bar lines, a structural method using the graph is used.<<ETX>>
|
| [Baumann1992] |
Stephan Baumann and Andreas Dengel.
Transforming Printed Piano Music into MIDI.
In Advances in Structural and Syntactic Pattern Recognition,
pages 363-372. World Scientific, 1992.
[ bib |
DOI ]
This paper decribes a recognition system for transforming printed piano music into the international standard MIDI for acoustic output generation. Because of the system is adapted for processing musical scores, it follows a top-down strategy in order to take advantage of the hierarchical structuring. Applying a decision tree classifier and various musical rules, the system comes up with a recognition rate of 80 to 100% depending on the musical complexity of the input. The resulting symbolic representation in terms of so called MIDI-EVENTs can be easily understood by musical devices such as synthesizers, expanders, or keyboards.
|
| [Blostein1992] |
Dorothea Blostein and Henry S. Baird.
A Critical Survey of Music Image Analysis.
In Structured Document Image Analysis, pages 405-434.
Springer Berlin Heidelberg, 1992.
ISBN 978-3-642-77281-8.
[ bib |
DOI ]
The research literature concerning the automatic analysis of images of printed and handwritten music notation, for the period 1966 through 1990, is surveyed and critically examined.
|
| [Blostein1992a] |
Dorothea Blostein and Nicholas Paul Carter.
Recognition of Music Notation: SSPR'90 Working Group Report.
In Structured Document Image Analysis, pages 573-574.
Springer Berlin Heidelberg, 1992.
ISBN 978-3-642-77281-8.
[ bib |
DOI ]
This report summarizes the discussions of the Working Group on the Recognition of Music Notation, of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13-15 June 1990. The participants were: D. Blostein, N. Carter, R. Haralick, T. Itagaki, H. Kato, H. Nishida, and R. Siromoney. The discussion was moderated by Nicholas Carter and recorded by Dorothea Blostein.
|
| [Bulis1992] | Alex Bulis, Roy Almog, Moti Gerner, and Uri Shimony. Computerized recognition of hand-written musical notes. In International Computer Music Conference, pages 110-112, 1992. [ bib | http ] |
| [Carter1992] |
Nicholas Paul Carter and Richard A. Bacon.
Automatic Recognition of Printed Music.
In Structured Document Image Analysis, pages 456-465.
Springer Berlin Heidelberg, Berlin, Heidelberg, 1992.
ISBN 978-3-642-77281-8.
[ bib |
DOI ]
There is a need for an automatic recognition system for printed music scores. The work presented here forms the basis of an omnifont, size-independent system with significant tolerance of noise and rotation of the original image. A structural decomposition technique is used based on an original transformation of the line adjacency graph. An example of output is given in the form of a data file and its score reconstruction.
|
| [Carter1992a] |
Nicholas Paul Carter.
A New Edition of Walton's Façade Using Automatic Score
Recognition.
In Advances in Structural and Syntactic Pattern Recognition,
pages 352-362. World Scientific, 1992a.
[ bib |
DOI ]
The availability of an automatic recognition system for printed music will facilitate applications such as musicological analysis, point-of-sale printing, creation of large format or braille scores and computer-based production of new editions. An example of the last of these possibilities is described here. A score-reading system is under development which makes use of a structural decomposition technique that is intended to be tolerant of significant variation in font, size, notation and noise in the source images. A description is given of the first "real-world" task to be undertaken using the system, i.e. the production of a new edition of Façade by William Walton. Sample output files and their corresponding reconstructions are given together with a discussion of the problems involved and the implications for future work.
|
| [Carter1992b] |
Nicholas Paul Carter.
Segmentation and preliminary recognition of madrigals notated in
white mensural notation.
Machine Vision and Applications, 5 (3): 223-229, 1992b.
ISSN 1432-1769.
[ bib |
DOI ]
An automatic music score-reading system will facilitate applications including computer-based editing of new editions, production of databases for musicological research, and creation of braille or large-format scores for the blind or partially-sighted. The work described here deals specifically with initial processing of images containing early seventeenth century madrigals notated in white mensural notation. The problems of segmentation involved in isolating the musical symbols from the word-underlay and decorative graphics are compounded by the poor quality of the originals which present a significant challenge to any recognition system. The solution described takes advantage of structural decomposition techniques based on a novel transformation of the line adjacency graph which have been developed during work on a score-reading system for conventional music notation.
|
| [Itagaki1992] |
Takebumi Itagaki, Masayuki Isogai, Shuji Hashimoto, and Sadamu Ohteru.
Automatic Recognition of Several Types of Musical Notation.
In Structured Document Image Analysis, pages 466-476.
Springer Berlin Heidelberg, Berlin, Heidelberg, 1992.
ISBN 978-3-642-77281-8.
[ bib |
DOI ]
This paper describes recent progress towards systems for automatic recognition of several different types of musical notation, including printed sheet music, Braille music, and dance notation.
|
| [Kato1992] |
Hirokazu Kato and Seiji Inokuchi.
A Recognition System for Printed Piano Music Using Musical Knowledge
and Constraints.
In Structured Document Image Analysis, pages 435-455.
Springer Berlin Heidelberg, Berlin, Heidelberg, 1992.
ISBN 978-3-642-77281-8.
[ bib |
DOI ]
We describe a recognition system for printed piano music, which presents challenging problems in both image pattern matching and semantic analysis. In music notation, the shape of symbols is simple, but confusing connections and overlaps among symbols occur. In order to deal with these difficulties, proper knowledge is required, so our system adopts a top-down approach based on bar-unit recognition to use musical knowledge and constraints effectively. Recognition results, described with a symbolic playable data format, exceed 90% correct on beginner's piano music.
|
| [Martin1992] |
Philippe Martin and Camille Bellisant.
Neural Networks for the Recognition of Engraved Musical Scores.
International Journal of Pattern Recognition and Artificial
Intelligence, 06 (01): 193-208, 1992.
[ bib |
DOI ]
The image analysis levels of a recognition system for engraved musical scores are described. Recognizing musical score images requires an accurate segmentation stage to isolate symbols from staff lines. This symbols/staves segregation is achieved by the use of inscribed line (chord) information. This information, processed by a multilayer perceptron, allows an efficient segmentation in terms of the remaining connected components. Some of these components are then classified, using another network, according to a coding of their skeleton graph. Special attention is paid to the design of the networks: the architectures are adapted to the specificities of each task. Multilayer perceptrons are employed here together with other more classical image analysis techniques which are also presented.
|
| [Martin1992a] | Philippe Martin. Artificial neural networks : application to optical musical score recognition. Theses, Université Joseph-Fourier - Grenoble I, 1992. [ bib | http ] |
| [Ng1992] |
Kia Ng and Roger Boyle.
Segmentation of Music Primitives.
In David Hogg and Roger Boyle, editors, BMVC92, pages
472-480, London, 1992. Springer London.
ISBN 978-1-4471-3201-1.
[ bib |
DOI ]
In this paper, low-level knowledge directed pre-processing and segmentation of music scores are presented. We discuss some of the problems that have been overlooked by existing research but have proved to be major obstacles for robust optical music recognisers [1] to help entering music into a computer, including sub-segmentation of interconnected primitives and identification of nonstraight stave lines, and present solutions to these problems. We conclude that, with knowledge, a significant improvement in low-level segmentations can be achieved.
|
| [Sicard1992] |
Etienne Sicard.
An efficient method for the recognition of printed music.
In 11th International Conference on Pattern Recognition, pages
573-576, 1992.
[ bib |
DOI ]
Deals with the recognition mechanisms of printed music scores. The techniques for extracting linear features, keys, noteheads and other musical figures from a digitized image are presented. Experimental results are given to show the effectiveness of the proposed methodology with a discussion of its performances and limits. Applications to full automated music score extraction, printed or handwritten are also discussed.<<ETX>>
|
| [Stevens1992] |
Catherine Stevens and Cyril Latimer.
A comparison of connectionist models of music recognition and human
performance.
Minds and Machines, 2 (4): 379-400, 1992.
ISSN 1572-8641.
[ bib |
DOI ]
Current artificial neural network or connectionist models of music cognition embody feature-extraction and feature-weighting principles. This paper reports two experiments which seek evidence for similar processes mediating recognition of short musical compositions by musically trained and untrained listeners. The experiments are cast within a pattern recognition framework based on the vision-audition analogue wherein music is considered an auditory pattern consisting of local and global features. Local features such as inter-note interval, and global features such as melodic contour, are derived from a two-dimensional matrix in which music is represented as a series of frequencies plotted over time.
|
| [Wolman1992] | Amnon Wolman, James Choi, Shahab Asgharzadeh, and Jason Kahana. Recognition of Handwritten Music Notation. In International Computer Music Conference, 1992. [ bib ] |
| [Bainbridge1991] | David Bainbridge. Preliminary experiments in musical score recognition, 1991. [ bib ] |
| [Blostein1991] | Dorothea Blostein and Lippold Haken. Justification of Printed Music. Communications of the ACM, 34 (3): 88-99, 1991. ISSN 0001-0782. [ bib | DOI ] |
| [McGee1991] | William McGee and Paul Merkley. The Optical Scanning of Medieval Music. Computers and the Humanities, 25 (1): 47-53, 1991. ISSN 1572-8412. [ bib | DOI ] |
| [Ruttenberg1991] | Alan Ruttenberg. Optical Reading of Typeset Music. Master's thesis, Massachusetts Institute of Technology, Boston, MA, 1991. [ bib | .pdf ] |
| [Blostein1990] |
Dorothea Blostein and Lippold Haken.
Template matching for rhythmic analysis of music keyboard input.
In 10th International Conference on Pattern Recognition, pages
767-770, 1990.
[ bib |
DOI ]
A system that recognizes common rhythmic patterns through template matching is described. The use of template matching gives the user the unusual ability to modify the set of templates used for analysis. This modification effects a tradeoff between the temporal accuracy required of the input and the complexity of the recognizable rhythm patterns that happen to be common in a particular piece of music. The evolving implementation of this algorithm has received heavy use over a six-year period and has proven itself as a practical and reliable input method for fast music transcription. It is concluded that templates demonstrably provide the necessary temporal context for accurate rhythm recognition.<<ETX>>
|
| [Diener1990] | Glendon Ross Diener. Modeling music notation: A three-dimensional approach. PhD thesis, Stanford University, Palo Alto, CA, 1990. [ bib | .ps.Z ] |
| [Hewlett1990] | Walter B. Hewlett and Eleanor Selfridge-Field, editors. Computing in Musicology: A Directory of Research, volume 6. Center for Computer, 1990. [ bib | .pdf ] |
| [Katayose1990] |
H. Katayose, T. Fukuoka, K. Takami, and S. Inokuchi.
Expression extraction in virtuoso music performances.
In 10th International Conference on Pattern Recognition, pages
780-784 vol.1, 1990.
[ bib |
DOI ]
An approach to music interpretation by computers is discussed. A rule-based music interpretation system is being developed that generates sophisticated performance from a printed music score. The authors describe the function of learning how to play music, which is the most important process in music interpretation. The target to be learned is expression rules and grouping strategy: expression rules are used to convert dynamic marks and motives into concrete performance data, and grouping strategy is used to extract motives from sequences of notes. They are learned from a given virtuoso performance. The delicate control of attack timing and of the duration and strength of the notes is extracted by the music transcription function. The performance rules are learned by investigating how the same or similar musical primitives are played in a performance. As for the grouping strategy, the system analyzes how the player grouped music and registers dominant note sequences to extract motives.<<ETX>>
|
| [Clarke1989] |
Alastair T. Clarke, B. Malcom Brown, and M. P. Thorne.
Coping with some really rotten problems in automatic music
recognition.
Microprocessing and Microprogramming, 27 (1): 547-550, 1989.
ISSN 0165-6074.
Fifteenth EUROMICRO Symposium on Microprocessing and
Microprogramming.
[ bib |
DOI |
http ]
This paper describes some of the problems encountered, and some of the techniques that have been used and implemented, during the development of an Optical Character Recognition system for printed music. It focuses on the recognition of chords and clusters, subdivision into single “lines” of music, and translation into musical code. Whereas other, mainframe based, music recognition systems have rarely attacked these problems, our methods have given some considerable success with an IBM PC.
|
| [Bacon1988] |
Richard A. Bacon and Nicholas Paul Carter.
Recognising music automatically.
Physics Bulletin, 39 (7): 265, 1988.
[ bib |
http ]
Recognising characters typed in at a keyboard is a familiar task to most computers and one at which they excel, except that they (usually) insist on recognising what we have typed, rather than what we meant to type. A number of programs now on the market, however, go rather beyond merely recognising keystrokes on a keyboard, to actually recognising printed words on paper.
|
| [Carter1988] |
Nicholas Paul Carter, Richard A. Bacon, and T. Messenger.
The acquisition, representation and reconstruction of printed music
by computer: A review.
Computers and the Humanities, 22 (2): 117-136, 1988.
ISSN 1572-8412.
[ bib |
DOI ]
Material published on the subject of Acquisition, Representation and Reconstruction of printed music by computer is reviewed.
|
| [Clarke1988] |
Alastair T. Clarke, B. Malcom Brown, and M. P. Thorne.
Using a micro to automate data acquisition in music publishing.
Microprocessing and Microprogramming, 24 (1): 549-553, 1988.
ISSN 0165-6074.
Supercomputers: Technology and Applications.
[ bib |
DOI |
http ]
With the number of computer applications involving music information growing, and the transition from traditional music printing methods to computer typesetting that is being faced by music publishers, there is an increasing need for an efficient and accurate method of getting musical information into computers. This paper describes some of the technical problems encountered in developing a system, based upon the IBM PC and a low-cost scanning device, to automatically recognise the printed music notation on a sheet of music that is fed through the scanner.
|
| [Fujinaga1988] |
Ichiro Fujinaga.
Optical Music Recognition using Projections.
Master's thesis, McGill University, 1988.
[ bib |
.pdf ]
This research examines the feasibility of implementing an optical music score recognition system on a microcomputer. Projection technique is the principal mcthod employed in the recognition process, assisted by some of the structural roles governing musical notation. Musical examples, excerpted mostly from solo repertoire for monophonic instruments and representing various publishers, are used as samples to develop a computer program that recognizes a set of musical symbols. A final test of the system is undertaken, involving additional samples of monophohnic music which were not used in the development stage. With these samples, an average recognition rate of 70% is attained without any operator intervention. On an IMB-AT-compatible microcomputer, the total processing time including the scanning operation is about two minutes per page.
|
| [Roach1988] |
JW W Roach and J E Tatem.
Using domain knowledge in low-level visual processing to interpret
handwritten music: an experiment.
Pattern Recognition, 21 (1): 33-44, 1988.
ISSN 0031-3203.
[ bib |
DOI |
http ]
Turning handwritten scores into engraved scores consumes a significant portion of music publishing companies' budgets. Pattern recognition is the major bottleneck holding up automation of this process. Human beings who know music can easily read a handwritten score, but without musical knowledge, even people cannot correctly perceive the markings in a handwritten score. This paper reports an experiment in which knowledge of music, a highly structured domain is applied to extract primitive musical features. This experiment shows that if the domain of image processing is well defined, significant improvements in low-level segmentations can be achieved (17 Refs.) recognition; computerised picture processing; expert systems; music
|
| [Kato1987] |
Ichiro Kato, Sadamu Ohteru, Katsuhiko Shirai, Toshiaki Matsushima, Seinosuke
Narita, Shigeki Sugano, Tetsunori Kobayashi, and Eizo Fujisawa.
The robot musician 'wabot-2' (waseda robot-2).
Robotics, 3 (2): 143-155, 1987.
ISSN 0167-8493.
Special Issue: Sensors.
[ bib |
DOI |
http ]
The wabot-2 is an anthropomorphic robot playing keyboard instruments, developed by the study group of Waseda University's Science and Engineering Department. The wabot-2 is equipped with hands tapping softly on keys, with legs handling bass keys and expression pedal, with eyes reading a score, and with a mouth and ears to converse with humans. Based on wabot-2, wasubot has been developed by Sumitomo Electric Industries Ltd., whose artistic skill has been demonstrated in performing music at the Japanese Government Pavillion in Expo'85. The present paper summarizes the wabot-2's motion, visual and vocal subsystems as well as its supervisory system and singing voice-tracking subsystem.
|
| [Kim1987] | W. J. Kim, M. J. Chung, and Z. Bien. Recognition system for a printed music score. In TENCON 87- Computers and Communications Technology Toward 2000, pages 573-577, 1987. [ bib | http ] |
| [Sugano1987] |
Shigeki Sugano and Ichiro Kato.
WABOT-2: Autonomous robot with dexterous finger-arm-Finger-arm
coordination control in keyboard performance.
In IEEE International Conference on Robotics and Automation,
pages 90-97, 1987.
[ bib |
DOI ]
Advanced robots will have to not only have 'hard' functions but also have 'soft' functions. Therefore, the purpose of this study is to realize 'soft' functions of robots such as dexterity, speediness and intelligence by the development of an anthropomorphic intelligent robot playing keyboard instrument. This paper describes the development of keyboard playing robot WABOT-2(WAseda roBOT-2) with a focus on the mechanisms of arm-and-hand which has 21 degrees of freedom in total, their hierarchically structured control computer system, the information processing method at the high level computer and finger-arm coordination control which realizes the autonomous movement of WABOT-2.
|
| [Roads1986] | Curtis Roads. The Tsukuba Musical Robot. Computer Music Journal, 10 (2): 39-43, 1986. ISSN 01489267, 15315169. [ bib | http ] |
| [Matsushima1985] |
T. Matsushima, I. Sonomoto, T. Harada, K. Kanamori, and S. Ohteru.
Automated High Speed Recognition of Printed Music (WABOT-2 Vision
System).
In International Conference on Advanced Robotics, pages
477-482, 1985.
[ bib |
http ]
Concerns the intelligent robot WABOT-2, which can play an electronic piano, using ten fingers and feet, while reading printed music. It can hold a conversation with a man using an artificial voice. The paper reports on its vision system, which can recognize not only a printed score but also fine hand-written score or instant lettering score. The resulting musical robot vision performance is sufficient to permit the reading of one sheet of commercially available printed music for an electric piano with three parts. Pertinent data can be recognized in about 15 seconds, with 100% accuracy (4 Refs.) electronic music; optical character recognition; robots
|
| [Byrd1984] | Donald Byrd. Music Notation by Computer. PhD thesis, Indiana University, 1984. [ bib | http ] |
| [Andronico1982] | Alfio Andronico and Alberto Ciampa. On Automatic Pattern Recognition and Acquisition of Printed Music. In International Computer Music Conference, Venice, Italy, 1982. Michigan Publishing. [ bib | http ] |
| [Kassler1972] |
Michael Kassler.
Optical Character-Recognition of Printed Music : A Review of Two
Dissertations. Automatic Recognition of Sheet Music by Dennis Howard Pruslin
; Computer Pattern Recognition of Standard Engraved Music Notation by David
Stewart Prerau.
Perspectives of New Music, 11 (1): 250-254, 1972.
[ bib |
http ]
Stable URL: http://www.jstor.org/stable/832471
|
| [Prerau1971] |
David S. Prerau.
Computer pattern recognition of printed music.
In Fall Joint Computer Conference, pages 153-162, 1971.
[ bib ]
The standard notation used to specify most instrumental and vocal music forms a conventionalized, two-dimensional, visual pattern class. This paper discusses computer recognition of the music information specified by a sample of this standard notation. A sample of printed music notation is scanned optically, and a digitized version of the music sample is fed into the computer. The digitized sample may be considered the data-set sensed by the computer. The computer performs the recognition and then produces an output in the Ford-Columbia music representation. Ford-Columbia is an alphanumeric language isomorphic to standard music notation It is therefore capable of representing the music information specified by the original sample
|
| [Prerau1970] | David S. Prerau. Computer pattern recognition of standard engraved music notation. PhD thesis, Massachusetts Institute of Technology, 1970. [ bib ] |
| [Pruslin1966] | Dennis Howard Pruslin. Automatic Recognition of Sheet Music. PhD thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, 1966. [ bib ] |
| [RISM] | Robert Eitner. Répertoire International des Sources Musicales. http://www.rism.info, 1952. [ bib | http ] |
This file was generated by bibtex2html 1.96.