Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Post on 18-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski

ACC Cyfronet AGHDepartment of Computer Science, AGH - UST

CGW 2015Kraków, Poland, October 26-28, 2015

Efficient Storing of Metadata for Distributed Data Management

Distributed data management in global environmentonedata

System’s descriptionData and metadata organization

Metadata challenges in onedataAnalyzed solutionsProposed solution

Performance testsConclusions

Agenda

28.10.15

Managing data over different storage solution in globally dispersed environments is hot topic.Global data management challenges are investigated by many research and commercial groups.

Distributed Data Management in Global Environment

28.10.15

Onedata is a distributed data management system that virtualizes access to organizationally distributed data and hides environment’s complexity where there is no trust between resources providers.

Data and metadata organization is a key to provide:

easy view on data for each user,automatic data management for better efficiency.

Onedata – overall description

28.10.15

Direct access whenever possible

Management of blocks’ replicas to minimize delays

Caching, prefetching and fast parallel transport

Onedata – work in distributed environment

Data organization

SpacesLogical

files Providers Storages

Users

Groups

Logical files organization via spaces separates users from problems connected with resources and data locations’ management.

Results of data organization design

Easy management and sharing of data for users.

Limitation of metadata that each provider stores

and processes.

Metadata organization

3 levels of metadata for data organization and usage description

1. Metadata used to

coordinate providers’

cooperation

2. Files metadata stored

by each provider

3. Current usage

metadata

Usage optimization

Lower level -> more frequent usage

-> higher distribution

Metadata challenges in onedata

Too slow storing of metadata when all metadata is stored on diskRisk of loosing important metadata when metadata is saved only in memory

Examples:metadata that describes location of actual data file has to be persistentmetadata that describes the way files are used by current sessions should be - at most - available as long as the session is active and be available extremely fast

Various solutionsIn-memory vs. persistent databasesStandalone vs. build-in applicationsExamples: Mnesia, Redis, Riak, Couchbase, Cassandra

No solution with all 3 features:SafetyHigh throughput (many operations per seconds)Low delay

Analysed solutions

Proposed solution - datastore

ModelsAPI that defines how specific types of metadata should be stored (e.g. in global memory)

StoresElements where data is kept

Worker with APISet of functionalities for data access optimization

Datastore key features and examples

Dynamic Cache SystemDatastore allows to set one store as cache for otherReads and writes are done on cacheWrites are aggregated and done asynchronousDynamic load/unload of data from cache when needed

Hooks for models cooperationSeparation of modelsEasy reaction for other models actions

Exemplary models: file_meta, session, task_pool

Performance tests

Speed vs. risk of metadata lossCache as compromise

Conclusions

For systems that globalize data access, efficient metadata management is key element. Proposed datastore provides flexible, efficient and safe solution for storing of metadata.Proposed datastore allows onedata to provide data access in a globally distributed environment.

Thank youonedata homepage:

http://www.onedata.orgSee also:

Łukasz Dutka, Michał Wrzeszcz, Tomasz Lichoń, Rafał Słota, Konrad Zemek, Krzysztof Trzepla, Łukasz Opioła, Renata Słota, and Jacek Kitowski. Onedata - a Step Forward towards Globalization of Data Access for Computing Infrastructures, ICCS 2015 Computational Science at the Gates of Nature, Procedia Computer Science, volume 51, pages 2843–2847. 2015.

M. Wrzeszcz, T. Lichoń, R. Słota, K. Zemek, K. Trzepla, Ł. Opioła, D. Nikolow, Ł. Dutka, R. Słota and J. Kitowski, Metadata Organization and Management for Globalization of Data Access with onedata, PPAM 2015 : book of abstracts, 2015, pp. 31

MichałWrzeszcz,ŁukaszDutka,RenataSłota,andJacekKitowski.VeilFS-AnewfaceofStorage as a Service. In eChallenges e-2014, 2014 Conference, pages 1–10, Oct 2014.

Łukasz Dutka, Renata Słota, Michał Wrzeszcz, Dariusz Król, and Jacek Kitowski. Uniform and Efficient Access to Data in Organizationally Distributed Environments. eScience on Distributed Computing Infrastructure, volume 8500 of Lecture Notes in Computer Science, pages 178–194. Springer International Publishing, 2014.

Słota,R., Dutka,Ł ., Wrzeszcz,M. Kryza,B., Nikolow,D., Król, D., Kitowski, J.: Storage Management Systems for Organizationally Distributed Environments - PLGrid PLUS Case Study. Lecture Notes in Computer Science, Vol. 8384, 2014, pp. 724–733.

top related