Institute of Information Systems Separating compound figures in journal articles to allow for subfigure classification Ajad Chhatkuli Antonio Foncubierta-Rodríguez Dimitrios Markonis Henning Müller
Jun 22, 2015
Institute of Information Systems
Separating compound figures in journal
articles to allow for subfigure classification
Ajad Chhatkuli
Antonio Foncubierta-Rodríguez
Dimitrios Markonis
Henning Müller
Institute of Information Systems
Motivation
• Figures in biomedical journals contain a lot of
information
• CBIR has been proposed for accessing medical
literature
• Modality classification
• Improves accessibility
• Allows result filtering
• But 50% of figures are compound or multipanel
Institute of Information Systems
Aim
• Develop a system that separates compound figures
in the biomedical literature
• Visual-information only
• Textual information is discarded
• Modality-independent
• One method for many images types
• Many methods for few images types
• Tunable according to the dataset
• Large-scale tested
• Approximately 250 open access journals
Institute of Information Systems
Compound figure examples
Institute of Information Systems
Methods. Dataset
• 2982 manually classified figures from ImageCLEF
2012 dataset
• Ground truth:
• Image subclass: 2x1,1x2,
• Position of separators
Institute of Information Systems
Methods. Overview
• Problem is separated in two
• Find subfigure separator candidates
• Preprocessing if required
• Analyze candidates
• Remove false positives
• Rule-based decisions
Institute of Information Systems
Methods. Separator detection
• Based on minimum
pixel projection for
white-space separated
figures
• Horizontal Vertical
detection
• Inverse order by rotation
according to aspect ratio
• Recursive
Institute of Information Systems
Methods. Separator detection
• Rule-based processing
• Progressive truncation to remove labels if no
separators are found
• Text removal based on connected commponents if no
separators are found
• Complement image for black-space separations
• Standard deviation image for subtle separations
• Binarization of non-graph figures:
• Less than 40% of the image is white or almost white
Institute of Information Systems
Methods. Separator analysis
• Classification problem
• True/false separator
• Features used:
• Closeness to border, division ratio, standard
deviation, text removal analysis, histogram, gap
comparison
• Classifiers:
• SVM
• Rule-based classifier
Institute of Information Systems
Results
Institute of Information Systems
Successful examples
Institute of Information Systems
Successful examples
Institute of Information Systems
Unsuccessful examples
No separation gap
Not horizontal/vertical
separation
Institute of Information Systems
Conclusions future work
• Good results for a wide range of images
• Using purely visual information
• Separation problem: detection and analysis
• Rule weights can be fine-tuned according to dataset
• What would be the impact of a larger training set?
• What would be the impact in existing modality
classification accuracy?
Institute of Information Systems
Conclusions future work
• Good results for a wide range of images
• Using purely visual information
• Separation problem: detection and analysis
• Rule weights can be fine-tuned according to dataset
• What would be the impact of a larger training set?
• What would be the impact in existing modality
classification accuracy?
Institute of Information Systems
Thanks for your attention!
More information at http://medgift.hevs.ch
Ajad Chhatkuli, Dimitrios Markonis, Antonio Foncubierta-Rodríguez, Fabrice Meriaudeau
and Henning Müller, Separating compound figures in journal articles to allow for subfigure
classification, in: SPIE, Medical Imaging, Orlando, FL, USA, 2013