Chapter 22 The Mangrove Information System MAIS: Managing and Integrating Interdisciplinary Research Data U. Salzmann, G. Krause, B.P. Koch, and I. Puch Rojo 22.1 Introduction Regular and efficient exchange of data between investigators is essential for the progress of interdisciplinary and integrative scientific research. Research data have to be easily accessible and retrievable in a structured and understandable format in order to facilitate comparative studies and make them available to a wider commu- nity. Both intra-scientific communication and transfer of scientific knowledge to stakeholders have been integral parts of the interdisciplinary research project MADAM (Mangrove Dynamics and Management) that aims at supporting envi- ronmental management in northern Brazil (Berger et al. 1999). In order to ensure data availability, quality and exchange, a central GIS database called MAIS (Mangrove Information System) has been developed during the initial stage of the MADAM project (Koch 1997). MAIS archives and synthesizes heterogeneous data collected during 10 years of interdisciplinary research in biology, geography, biogeochemistry, socio-economy and meteorology in north Brazil. Facilitated by modern computer performance and memory capacity, the typical scientist stores and analyzes research data on his personal workstation or local server using the resources and applications of his local system. If data are regularly backed up, this method of scientific “data management” is relatively secure and straightforward. However, the volume of valuable and often unique information and data continually increases throughout the “life cycle” of a scientific project, which should result in publications in peer-reviewed journals. Journal publications contain figures, tables and interpretations, whereas digital primary data are rarely published. The primary (raw) data and supporting information (metadata), which are stored in the investigators’ personal file system, become rapidly unmanageable and are, at the end of each research project, in danger of being permanently “buried” in private archives. This equates to an effective loss of data and knowledge to the scientific community (Helly et al. 2003). Raw or primary research data are unique and must be stored and managed for the long-term. Concerted initia- tives to prevent research data loss have started more than 40 years ago with the U. Saint-Paul and H. Schneider (eds.), Mangrove Dynamics and Management in North Brazil, Ecological Studies 211, DOI 10.1007/978-3-642-13457-9_22, # Springer-Verlag Berlin Heidelberg 2010 355
10
Embed
The Mangrove Information System MAIS: Managing and Integrating Interdisciplinary Research Data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 22
The Mangrove Information System MAIS:
Managing and Integrating Interdisciplinary
Research Data
U. Salzmann, G. Krause, B.P. Koch, and I. Puch Rojo
22.1 Introduction
Regular and efficient exchange of data between investigators is essential for the
progress of interdisciplinary and integrative scientific research. Research data have
to be easily accessible and retrievable in a structured and understandable format in
order to facilitate comparative studies and make them available to a wider commu-
nity. Both intra-scientific communication and transfer of scientific knowledge to
stakeholders have been integral parts of the interdisciplinary research project
MADAM (Mangrove Dynamics and Management) that aims at supporting envi-
ronmental management in northern Brazil (Berger et al. 1999). In order to ensure
data availability, quality and exchange, a central GIS database called MAIS
(Mangrove Information System) has been developed during the initial stage of
the MADAM project (Koch 1997). MAIS archives and synthesizes heterogeneous
data collected during 10 years of interdisciplinary research in biology, geography,
biogeochemistry, socio-economy and meteorology in north Brazil.
Facilitated by modern computer performance and memory capacity, the typical
scientist stores and analyzes research data on his personal workstation or local
server using the resources and applications of his local system. If data are regularly
backed up, this method of scientific “data management” is relatively secure and
straightforward. However, the volume of valuable and often unique information
and data continually increases throughout the “life cycle” of a scientific project,
which should result in publications in peer-reviewed journals. Journal publications
contain figures, tables and interpretations, whereas digital primary data are rarely
published. The primary (raw) data and supporting information (metadata), which
are stored in the investigators’ personal file system, become rapidly unmanageable
and are, at the end of each research project, in danger of being permanently
“buried” in private archives. This equates to an effective loss of data and knowledge
to the scientific community (Helly et al. 2003). Raw or primary research data
are unique and must be stored and managed for the long-term. Concerted initia-
tives to prevent research data loss have started more than 40 years ago with the
U. Saint-Paul and H. Schneider (eds.), Mangrove Dynamics and Managementin North Brazil, Ecological Studies 211, DOI 10.1007/978-3-642-13457-9_22,# Springer-Verlag Berlin Heidelberg 2010
355
establishment of large data repositories such as the World Data Center System
(WDC), which promotes open access and exchange of scientific data (e.g. Mounsey
and Tomlinson 1988; Alverson and Eakin 2001, http://www.ngdc.noaa.gov/wdc).
Today, numerous archiving facilities for environmental scientific data are available
worldwide, ranging from large central data repositories to rather small databases
addressing specific research fields, disciplines or even single research projects (e.g.,
Baba et al. 2004; Diepenbroek et al. 2002; Konnen and Koek 2005). However, to
date, there are no internationally binding regulations for scientific data manage-
ment. The subject of an ongoing debate is how scientific data should be managed
and made available to the general scientific and public community (Klump et al.
2006; Dittert et al. 2001).
Here, we present a description of concept, design and functionality of the GIS-
database MAIS of the MADAM project. Our main objective is to highlight the
potential of such a central data management system for improving interdisciplinary
research. In regard to the ongoing debate on the freedom of scientific information,
we also discuss the challenges in running a project database and outline the
necessary conceptual prerequisites for a successful management of heterogeneous
research data.
22.2 Implementation of a GIS-Database
During the initial planning stage of MAIS, questionnaires were distributed and
meetings organized to fully assess the project collaborators’ needs and expectations
in regard to scientific data management. The interdisciplinary character of the
MADAM project resulted in the production of extremely heterogeneous datasets,
developed from zoological, botanical, geochemical and meteorological measure-
ments as well as the socio-economic census. This heterogeneity made high
demands on the design of the MAIS database. The following major requirements
and objectives for a central project data management were identified:
(1) Long-term data availability in Brazil and Germany (preferably via the Internet)
(2) Secure and long-term data storage
(3) High quality of supporting information (metadata)
(4) Protection of scientific ownership (data privacy)
(5) Use of geo-referenced data to facilitate spatial data comparison
(6) Flexibility and adaptability to specific project requirements
(7) User-friendly graphical user interface (GUI) for data search and visualization
As the program MADAM aims at delivering its scientific outcomes to relevant
stakeholders, the project database has also been regarded as a tool for supporting
management decisions in north Brazil. Therefore, considerable effort has been
devoted to the development of an user-friendly GUI and a clear description of
metadata, which should also be understandable to non-scientists. MAIS was imple-
mented in 1995 when standards in information technology were relatively low
356 U. Salzmann et al.
compared with today, and when MADAM was still in its initial programme phase.
This necessitated the design of an upgradeable, flexible database, adaptable to
varying project requirements and progress in information technology.
22.3 Database Model and Data Management
Research data within the MADAM project were managed on different levels. Data
storage and processing on the investigators’ personal computers guaranteed a
maximum of data privacy and adaptability to individual research needs. Specific
data analyses and individual software was employed at this level. If necessary, users
were advised by database administrators on how to archive data professionally to
facilitate a subsequent data transfer to the central project database. Before import-
ing into MAIS, the investigators raw data were restructured and redundancies were
removed. Each dataset was catalogued and documented using standardized meta-
data, which assured internal consistency and unambiguous description. Data were
distributed through the intranet and thereby could be made accessible to all project
members.
The MAIS database was initially developed on Microsoft Access Version
97–2000 for Windows. Access is one of the most popular relational database
management systems, which is easily programmable and has a very user-friendly
interface and fast search engine (Viescas 2004). The Access database was con-
nected to ArcView 3.2 using a special GIS-interface, programmed in Avenue,which enabled a visualization and basic analysis of geo-referenced data. The master
version of the MAIS database and GIS-interface was based in Germany and copies
were regularly distributed to servers in Brazil. In 2002, we migrated MAIS onto
a platform-independent and entirely internet-based data management system,
which facilitated data distribution and did not require expensive software licensing.
We employed the open source software MySQL (http://www.mysql.com) for
database management and MapServer (http://mapserver.gis.umn.edu) for the visu-
alization of geo-referenced data. The technical update also increased data security
and privacy. Protection against plagiarism was a major concern of many project
members storing their research data in a central data management system. There-
fore, if requested by the author, unpublished data were protected with a username
and password. Such a protection of data ownership is particularly important for
ongoing research projects as they contain high numbers of sensitive and unpub-
lished research data or preliminary work. While many datasets in MAIS are pass-
word protected, its meta-information is still accessible for all users providing
information about the status of ongoing research within the MADAM project.
The design and implementation of the MAIS database followed general stan-
dards for software and database development (e.g., Lang and Lockemann 1995). The
database was fully normalized and structured using a relational data model (Codd
1990). A full normalization implies (e.g., Carleton et al. 2005): (1) elimination of
repeating groups and redundant data; (2) elimination of columns not dependent on
22 The Mangrove Information System MAIS 357
key fields, which uniquely identify each record; and (3) isolation of independent
and semantically related multiple relationships.
MAIS-data are grouped in three main units: natural science, social science, and
climate data (derived from climate data loggers) (Fig. 22.1). A separate publication
unit links project papers and reports with respective research data. The primary data
are described by metadata standardized following the “Global Change Master
Directory” (http://gcmd.nasa.gov). MAIS metadata provide information on project,
staff, method, parameter, equipment and sample type. Geographical latitudes
and longitudes are assigned to each dataset, which allows a spatial synthesis and
comparison of research results. An internal species key connected to each bio-
logical dataset enables data retrieval on different taxonomic levels. The species key
follows the nomenclature provided by the Integrated Taxonomic Information
System, ITIS (http://www.itis.gov).
The heterogeneity of datasets generated by the MADAM project was a major
challenge and required a flexible and dynamic data model. This was particularly
true for the integration of fishery and socio-economic census data for which we
had to denormalize the relational database to optimize performance and size of
database queries and applications. For the same reason, most data on the mangrove
crab Ucides cordatus were managed in a separate database unit, which is used for
fisheries assessments (Araujo 2006; Chap. 19).
A multilevel menu-based, user-friendly web interface with applications for
advanced data retrieval was developed and made accessible through the intranet
of the Leibniz-Center of Tropical Marine Ecology in Bremen and Brazil. The MAIS
graphical user interface consists of data retrieval forms and a graphical tool
Fig. 22.1 Schematic design of MAIS database showing grouping of main data and metadata and
accessibility through different graphical user interfaces
358 U. Salzmann et al.
supported by MapServer to visualize geospatial data (Fig. 22.2). After login,
different web-based forms provide for each data unit (climate data logger, publica-
tion, social and natural science data) access to all data tables of the relational
system. The forms allow the user to retrieve individually queried subsets of
research results. Queries are based on the combination of the following fields,
which can be selected using drop-down lists of available values:
method, (6) taxon/group (with advanced search on different taxonomic levels); (7)
time period (start, end); (8) author (for publication); (9) title and year of publica-
tion; (10) keywords (publication).
The result pages are interactive in providing additional metadata, such as
advanced project or parameter descriptions for every dataset on point-and-click in
MAIS Security Login
MAIS Menu & Description
Data Retrieval & Metadata Information
Visualisation in Dynamic Maps
Fig. 22.2 MAIS examples showing login page, main menu, data retrieval form and thematic map
of study area
22 The Mangrove Information System MAIS 359
a pop-up window. A mapserver module allows the integration and visualization
of geo-referenced data in thematic maps (Fig. 22.2). MapServer was developed
by the University of Minnesota as an Open Source development environment for
building spatial enabled Internet applications (http://mapserver.gis.umn.edu). With
MapServer, we created thematic maps, which allowed the user to browse through
GISdata stored in MAIS. The maps are fully dynamic and different layers can be
added and zoomed (see example in Chap. 19, Fig. 19.3). However, the mapping tool
provided by MapServer does not replace a full geographical information system
and we still employed ArcView to conduct advanced geospatial analyses and used
MapServer to publish the thematic maps.
22.4 MAIS: A Tool for Supporting Interdisciplinary Research?
Archiving and managing scientific data in central databases is a time consuming
and costly endeavor. A professional scientific data management requires a clear
separation of data archiving and integration from “data gathering” and analysis,
which is the responsibility of the respective investigator. In particular, in fixed term
projects, both tasks, database management and research, often compete for the same
funding and databases were managed at the expense of “real” science. This raises
the question whether a central project data management is really worthwhile in
terms of costs and benefits.
Besides assuring long-term data storage, MAIS aimed at being flexible and
dynamic to actively support interdisciplinary research within the running project.
This was particularly important for bridging natural science and social science
data. The flexible design of MAIS greatly facilitated the storage of heterogeneous
datasets, including those originating from social science and fisheries biology. Instead
of reorganising the data to fit into a predefined archive structure, we modified the
project database to meet the requirements of specific scientific demands. This
flexibility made MAIS a useful tool for supporting interdisciplinary science.
MAIS was successfully applied by MADAM researchers to analyze, synthesize
and visualize project data (e.g., Glaser and Diele 2004, 2005; Goch et al. 2005;
Krumme et al. 2005, 2007; Araujo 2006; Chap. 19). Close cooperation and regular
communication between database administrators and researchers was a prerequisite
for successful data management. Both sides actively benefited from this coopera-
tion. While administrators needed to understand the structure of research data, staff
and in particular MSc and PhD students took advantage by receiving professional
advice in scientific data management and analysis.
Although MAIS had a great potential for initiating and supporting interdiscip-
linary science, the overall number of project investigators who regularly used the
project database was rather limited. The problems of acceptance of research
databases are well known in the scientific database management community (e.g.,
French et al. 1990; Gray et al. 2005; Grobe and Diepenbroek 2006). The reasons for
these problems are manifold, and combined efforts towards a better collaboration
360 U. Salzmann et al.
can be made on both the investigator and database management side. Several
attempts were made to make MAIS more popular by introducing user-friendly
and efficient applications. In the following, we will discuss three major problem
areas that we identified while working with MAIS. We will also define the pre-
requisites which are needed to ensure a successful scientific data management
within research projects.
22.4.1 Quality Control and Improved Analysis Tools
MAIS put much effort into metadata quality and the design of a user-friendly GUI.
Advanced metadata standards, which facilitate the exploration of existing data, as
well as improved analysis and visualization tools, are key factors for a successful
scientific data management in the coming decade (Gray et al. 2005). The increasing
heterogeneity of data in interdisciplinary projects puts even higher demands on the
quality of metadata. Data must be self-describing and must follow international
standards. Good metadata are central for data analysis, data visualization and data
sharing among different disciplines (Gray et al. 2005). However, to recognize the
benefits of a central scientific data management, the user must also be able to
retrieve, interchange, compare, analyze and visualize data in a most efficient way.
Unfortunately, available database applications are often insufficient and cumber-
some and do not address the investigators’ specific needs. Failures in the design of a
user-friendly man�machine interface are not only caused by technical limitations
but also by a lack of communication between database managers and scientists.
Whereas in MAIS, metadata standards reached highest levels, database tools for
retrieving, analyzing and visualizing geo-referenced data were rather limited. More
sophisticated applications could have significantly increased the viability of MAIS
for project investigators, but its implementation would have surpassed the financial
and technical scope of the MADAM project.
22.4.2 Appropriate Support and Funding
One of the biggest nontechnical barriers, which hamper an efficient operation of
scientific databases, is the often low attention researcher and funding bodies pay to
scientific data management. This results in an insecure funding situation, which is a
major threat to long-term archives. Whereas the production of data is often well-
funded, its management is chronically underfunded (French et al. 1990). As a result,
scientific databases are often managed part-time by regularly changing scientific
staff and students, which are primarily interested in data production rather than in
its management. MAIS was also affected by this lack of continuity.
22 The Mangrove Information System MAIS 361
22.4.3 Intellectual Property Rights and Better Incentivesfor Data Sharing
Protection of intellectual property rights and free exchange of information are
subjects of an ongoing controversy debate within the scientific community (e.g.
Dittert et al. 2001; Klump et al. 2006). In fact, researchers have only little incentives
to release their unpublished datasets into central data management systems. The
fear of plagiarism combined with the lack of binding standards for citing database
sources are major reasons that prevent researchers from publishing their data in
central databases. However, the ability of investigators to share data is vital to the
progress of interdisciplinary and integrative scientific research (Helly et al. 2003).
In MAIS, we protected the property rights by offering an optional password system,
which could be selected by researchers to protect their unpublished data. While
most project members, in particular PhD students, felt confident with this solution,
it had the major drawback that the number of password-protected datasets quickly
exceeded those freely accessible. Once protected by a password, it appeared to be
very difficult to receive permission from authors to release data thereafter. The high
number of password-protected data finally reduced the capability and usability of
MAIS. Our experience underlined that, within a research project, binding rules for
data transfer and release are essential for a successful central data management.
These regulations must include timetables and deadlines. Today, many funding
agencies, research organizations or projects actively encourage data sharing and
transfer to data centres (e.g., National Environmental Research Council, http://
www.nerc.ac.uk/research/sites/data/policy.asp, or US Geological Survey, http://
www.usgs.gov/foia/). Internationally binding regulations, however, are still miss-
ing and many principal investigators still refuse to archive their data in appropriate
databases (Dittert et al. 2001). As long as standards for data citations are missing
and data collectors are not adequately credited in a way comparable to journal
publication standards, internationally binding rules are not applicable.
22.5 Concluding Remarks
There is a growing need for central and integrative data management solutions in
interdisciplinary research projects, where research data must be continuously avail-
able and freely exchangeable. The Mangrove Information System (MAIS) has
proven to be a useful tool for promoting interdisciplinary research and data synth-
eses within the MADAM project. The GIS database MAIS managed heterogeneous
datasets on biology, chemistry, geography and socioeconomics collected in north
Brazil over a period of 10 years. Research data were accessible for the Brazilian and
German project members through the Internet by a user-friendly graphical user
interface.
362 U. Salzmann et al.
Although MAIS has been successfully used for research data synthesis, the
project database did not work to full capacity and some project investigators showed
little interest in a further use of a central data management system. A general
unwillingness of investigators to share data, coupled with a critical attitude towards
databases, is a common phenomenon in the scientific community. Binding regula-
tions, such as making data sharing part of the funding policy, are an effective way to
improve data availability and to increase the quality of scientific databases. How-
ever, the introduction of such regulations (and sanctions) must be accompanied by
efforts to give better incentives for scientists to release their data into central data
management systems and to create the necessary metadata. Such key incentives
include binding standards and regulations for the citation of archived datasets,
which should be supported by technical mechanisms to track usage of archived data.
References
Alverson K, Eakin CM (2001) Making sure that the world’s palaeodata do not get buried.
Nature 412:269
Araujo A (2006) Fishery statistics and commercialisation of the mangrove crab, Ucides cordatus(L.), in Braganca – Para-Brazil. PhD thesis, University of Bremen, Bremen
Baba S, Gordon C, Kainuma M, Ayivor JS, Dahdouh-Guebas F, Brown M (2004) The Global
Mangrove Database and Information System (GLOMIS): present status and future trends. In:
Vanden Berghe E, Costello MJ, Heip C, Levitus S, Pissierssens P (eds) Proceedings ‘The
Colour of Ocean Data’: international symposium on oceanographic data and information
management with special attention to biological data Brussels, Belgium, November 25–27,
2002. IOC Workshop Report 188. UNESCO, Paris, pp 3–14
Berger U, Glaser MEL, Koch BP, Krause G, Lara R, Saint-Paul U, Schories D, Wolff M (1999) An
integrated approach to mangrove dynamics and management. J Coast Conserv 5:125–134
Carleton CJ, Dahlgren RA, Tate KW (2005) A relational database for the monitoring and analysis
of watershed hydrologic functions: I. Database design and pertinent queries. Comput Geosci
31:393–402
Codd EF (1990) The relational model for database management, version 2. Addison-Wesley,
Reading
Diepenbroek M, Grobe H, Reinke M, Schindler U, Schlitzer R, Sieger R, Wefer G (2002)
PANGAEA – an information system for environmental sciences. Comput Geosci
28:1201–1210
Dittert N, Diepenbroek M, Grobe H (2001) Scientific data must be made available to all. Nature
414:393
French JC, Jones AK, Pfaltz, JL (1990) Scientific Database Management (Final Report). Report of
the Invitational NSF Workshop on Scientific Database Management, Technical Report 90–21,
Department of Computer Science, University of Virginia, Charlottesville, VA
Glaser M, Diele K (2004) Asymmetric Outcomes: Assessing the biological economic and social
sustainability of a mangrove crab fishery, Ucides cordatus (Ocypodidae), in North Brazil. EcolEcon 49:361–373
Glaser M, Diele K (2005) Resultados assimetricos: Avaliando aspectos centrais da sustentabil-
idade biologica, economica e social da pesca de caranguejo, Ucides cordatus (Ocypodidae).In: Glaser M, Cabral N, Ribeiro AL (eds) Gente, ambiente e pesquisa: Manejo transdisciplinar
no manguezal, Belem, pp 51–68
22 The Mangrove Information System MAIS 363
Goch YG, Krumme U, Saint-Paul U, Zuanon JAS (2005) Seasonal and diurnal changes in the
fish fauna composition of a mangrove lake in the Caete estuary, north Brazil. Amazonia
18:299–315
Gray J, Liu DT, Nieto-Santisteban M, Szalay AS, DeWitt D, Heber G (2005) Scientific data
management in the coming decade. CTWatch Quarterly 1(1). http://www.ctwatch.org/