Top Banner
BIT17 Book of Abstracts 22–24 June 2017, Toru´ n, Poland
41

BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Jul 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

BIT17Book of Abstracts

22–24 June 2017, Torun, Poland

Page 2: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

PROGRAM COMMITEE:• Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland)

• Prof. Jarek Meller (University of Cincinnati, USA)

• Prof. Jerzy Tiuryn (University of Warsaw, Poland)

• Dr. hab. Witold Rudnicki (University of Warsaw, Poland)

LOCAL ORGANIZING COMMITTEE:• Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland)

• Dr. Aleksandra Gruca (Silesian University of Technology, Gliwice, Poland)

• Dr. Lukasz Pep lowski (Nicolaus Copernicus University, Torun, Poland)

• M. Eng. Jakub Rydzewski (Nicolaus Copernicus University, Torun, Poland)

Page 3: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Contents

Lectures 1Harel Weinstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Formal representations and quantification of allostery in functional mechanisms of molecular machinesKrzysztof Kuczera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Modeling of peptide permeation across biological membranesAndrew Smith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4ELIXIR Europe: progress so far and opportunities for engagingWladek Minor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Data mining in biomedical sciences - big data perspectiveDan Staines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Ensembl data - going beyond the browserBartosz Wilczynski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Machine learning from genomic data: finding the needle in multiple haystacksWeida Tong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Make genomics reproducible again – understanding statistics underpinning reproducible genomicsJ. Vondrasek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Challenges and solutions for life science data in Europe - building bioinformatics capacity in the Czech republic viaELIXIR InfrastructureJarek Meller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Harnessing advances in proteomics and data science to characterize biological states in cellular systemsSebastian Deorowicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Flood of genomic (big) dataZeynep Kurkcuoglu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Integrative modelling of biomolecular complexes using HADDOCKJacek B lazewicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13DNA sequencing - from SBH to Polish genome projectFeroz Zahid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Efficient and cost-effective data-intensive computing on multi-clouds: introduction to the MELODIC projectKarina Kubiak-Ossowska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15How to run not so big computer/data center: Management of tier-2 HPC: ARCHIE-WeSTMaciej Antczak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16New approaches for determination of RNA pseudoknot orderMarcin Radom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Stochastic Petri net model of a cholesterol metabolism and its analysisRafa l Jakubowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Jammed conformation of slipknoted membrane protein during mechanical unfoldingFrancisco Carrascoza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19On the origins of life: theoretical studies of reactions catalyzed by montmorillonite on atmospheric-like gases

Posters 20O. Bryzghalov • 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Retroposition as a source of antisense long non-coding RNAs with possible regulatory functionsA. Danek • 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Pan-genome index for resequencingH. Kranas • 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A comparison of frequencies of promoter-enhancer interactions in HUVEC and fetal brain cellsK. Kurasz • 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A model of methylation and demethylation of cytosine formsA. Rybarczyk • 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A role of immune and inflammatory mechanisms in essential hypertension and cardiovascular disease - modeled andanalyzed using Petri netsJ. Rydzewski • 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Camphor’s and Huperzine’s adventures in ProteinlandK. Sienkiewicz • 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Integrative Galaxy tool for genomic data visualization in JBrowseB. Soko lowska • 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Machine learning algorithms for the estimation of DNA repair processes in the Escherichia coli model and forsignificance of redox balance disturbance in muscular dystrophy patientsB. Szawulak • 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Nets decomposition as base for biochemical similarity methodC. Pareek • 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3

Page 4: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Big data analysis of porcine spermatozoa transcriptome by RNA-seqL. Peplowski • 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Haptic device facilitates big data analysis in structural biologyT. Zok • 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32RNAComposer allows the user to improve accuracy of predicted RNA 3D structuresA. Antkowiak • 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Graph theoretical models of the problem of chemical compounds structural formulas constructionK. Chmielewska • 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Modeling and analysis of disorders in prothrombotic states as a reason of atherosclerosis development using Petrinet-based approachJ. Miskiewicz • 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Bioinformatics Study of Structural Patterns in Plant MicroRNAK. Rzosinska • 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36A comparison of Petri net-based models of monocyte-macrophage axis in steady-state and in inflammation?J. Wiedemann • 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37LCS-TA to identify similarity in molecular structures

Page 5: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Lectures

Page 6: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Formal representations and quantification of allostery in functionalmechanisms of molecular machines

H. Weinstein1

1Department of Physiology and Biophysics and Institute for Computational Biomedicine,Weill Cornell Medical College, Cornell University, New York, NY 10065, USA

Colloquially, an allosteric mechanism is invoked when a change in the state of one component ofa molecular system leads to a change in the state of another component of the same system. Thesecomponents are coupled allosterically although they are typically not adjacent to each other insequence or structure. Allosteric couplings can thus be propagated far within a molecular assembly.Although believed to be a ubiquitous biological phenomenon (allostery has been suggested to bepresent in nearly all proteins), how such allosteric couplings are achieved is generally not welldefined. We set out to develop formal frameworks designed to define, represent rigorously, andassess quantitatively the coupling, the propagation, and the role of allostery.

Because it is experimentally well established that functional mechanisms of membrane proteinsinvolved in cell signaling require dynamic coupling between the conformations of relatively discretestructural domains within the protein over long molecular distances, these molecular machinesserve as excellent prototypes for the formal study of allostery. Using novel physics-based analysesof extensive MD simulations we obtain quantitative representations of such biomolecular “action ata distance”, in formal and quantitative structure-based models of function in two major membraneprotein families: the neurotransmitter:Na+ symporters, NSS, and the G protein coupled receptors(GPCRs). We show that this framework reveals with increasing accuracy the molecular mechanismsof these molecular machines and their functional association with membranes and surroundingproteins that evolved to regulate and transduce their function.

1 Han Y, et al. - Allosteric communication between protomers of dopamine class A GPCR dimers modulatesactivation. Nature Chem Biol. 2009 5(9):688-95.

2 Zhao Y, et al.– Substrate-modulated gating dynamics in a Na+-coupled neurotransmitter transporterhomolog Nature 2011 June 2; 474(7349):109-13.

3 Khelashvili G, et al. – Spontaneous inward opening of the dopamine transporter is triggered by PIP2-regulated dynamics of the N-terminus. ACS Chemical Neuroscience 2015, 6(11):1825-37.

4 Akyuz N, et al.–Transport domain unlocking sets the uptake rate of an aspartate transporter. Nature2015 518(7537):68-73.

5 LeVine, MV, Weinstein, H - NbIT - a new information theory-based analysis of allosteric mechanismsreveals residues that underlie function in the leucine transporter LeuT. – PLoS Comput Biol 2014 May1; 10(5): e1003603.

6 Stolzenberg S, et al. - Computational approaches to detect allosteric pathways in transmembrane molec-ular machines. Biochim Biophys Acta. 2016 1858:1652-62.

7 Cuendet AM, Weinstein H, LeVine MV - The Allostery Landscape: Quantifying Thermodynamic Cou-plings in Biomolecular Systems. J Chem Theory Comput. 2016 12(12):5758-67.

8 LeVine MV, et al. - Allosteric Mechanisms of Molecular Machines at the Membrane: Transport bySodium:Coupled Symporters. Chem Rev. 2016 116(11):6552-87.

9 Razavi AM, Khelashvili G, Weinstein H - A Markov State-based Quantitative Kinetic Model of SodiumRelease from the Dopamine Transporter. Nature Sci Reports 2017 7:40076.

10 Stolzenberg S et al. - The Role of TM5 in Na2 Release and the Conformational Transition of Neurotrans-mitter:Sodium Symporters toward the Inward-Open State. J Biol. Chem. 2017, in press jbc.M116.757153.doi:10.1074/jbc.M116.757153.

2

Page 7: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Modeling of peptide permeation across biological membranes

K. Kuczera1

1University of Kansas, Lawrence, Kansas, USA

3

Page 8: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

ELIXIR Europe: progress so far and opportunities for engaging

A. Smith1

1ELIXIR Hub, Hinxton, Cambridge, UK

ELIXIR Europe is new initiative to connect, integrate and help sustain Europe’s life sciencedata resources. Established as an inter-governmental organsation with 21 Members and recognisedby the European Council as a ‘priority’ research infrastructure for Europe, ELIXIR brings togetherover 180 institutes and 600 individuals to support the data-related needs of life scientists. In histalk, Andrew Smith, Head of External Relations for ELIXIR, will introduce ELIXIR, its structureand purpose, and show how current operations are progressing. Highlights from each of ELIXIR’sfive Platforms - Data, Tools, Interoperability, Compute and Training - will be presented, showingthe services and resources that users can already access. The work of ELIXIR’s four Use Cases -Marine Metagenomics, Plant Sciences, Rare diseases and Human Data - will also be showcased.As a membership-based infrastructure, where each country that joins also establishes an ’ELIXIRNode’ in that country to bring together the national bioinformatics community, the talk will alsogive examples of other ELIXIR Nodes and set out the process that bioinformatics communitieshave followed in other countries to become a member. The talk will describe in detail the variousways of engaging in the initiative as users of the services being developed within ELIXIR, suchas the TeSS portal for accessing training courses and online training content and the BioToolsregistry for discovering tools and databases, as well as looking at the initiatives ELIXIR supportsthat welcome community input such as the BioSchemas initiative to improve the interoperabilityof data.

4

Page 9: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Data mining in biomedical sciencesBig data perspective

W. Minor1

1University of Virginia, Charlottesville, VA, 22908-0736, USA

Experimental reproducibility is the cornerstone of scientific research, upon which all progressrests. The veracity of scientific publications is crucial because subsequent lines of investigationrely on previous knowledge. Several recent systematic surveys of academic results published inbiomedical journals reveal that a large fraction of representative sets of studies in a variety of fieldscannot be reproduced in another laboratory. Big Data approach and especially NIH Big Data toKnowledge (BD2K) program is coming to the rescue.

The goal of the presented research is to provide the biomedical community with a strategy toincrease the reproducibility of reported results for a wide range of experiments by building a setof “best practices”, culled by extensive data harvesting and curation combined with experimentalverification of the parameters crucial for reproducibility. Experimental verification assisted by theautomatic/semi-automatic harvesting of data from laboratory equipment into the already devel-oped sophisticated laboratory information management system (LIMS) will be presented. Thisdata-in, information out paradigm will be discussed.

5

Page 10: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Ensembl data - going beyond the browser

D. M. Staines1 and A. D. Yates1

1European Molecular Biology Laboratory, European Bioinformatics Institute,Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

Ensembl (www.ensembl.org) and Ensembl Genomes (www.ensemblgenomes.org) provide a plat-form enabling research on genomes across the taxonomic range. We import, analyse, curate andintegrate a diverse collection of large-scale reference data from over 40,000 genomes to create amore comprehensive view of genome biology than would be possible from any individual dataset.Our extensive data resources include evidence-based gene and regulatory annotation, genome vari-ation and gene trees, accessible via a common platform of a web-based genome browser and anaccompanying suite of tools, infrastructure and programmatic access methods.

The primary method of access data is via our genome browser but is not always the mostefficient form of access. Many queries involve retrieving a subset of our data for use in downstreamanalysis, combining complex, custom queries with a user defined output specifications. Queriescan be complex or simple, involve single or thousands of genomes, and query across the taxonomyand multiple ontologies. Our current data warehousing solution meets a number of these criteria,but scales poorly as the number of genomes and volume of data increases.

I will describe our work to meet these needs, covering how we have evaluated a range of tech-nologies, and implemented a comprehensive set of REST services for accessing diverse data fromboth Ensembl and other related resources hosted at the European Bioinformatics Institute.

6

Page 11: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Machine learning from genomic data: finding the needle in multiple haystacks

B. Wilczynski1, ∗1University of Warsaw, Faculty of Mathematics, Informatics and Mechanics

Our lab have had some experience in using machine learning for predicting location and activityof regulatory regions based on multiple types of genomic data. I will discuss some of our experiencesregarding the best methods for these purposes including our experiences with Bayesian Networksand Random Forest methods. I will discuss how these methods can be applied to datasets from largepublicly available datasets including ENCODE and modENCODE. I will discuss some previouslypublished results, such as identification of enhancers and insulator elements in the Fly genome.Then I will also discuss some not yet published results regarding the human data from the ENCODEconsortium.

∗ To whom the correspondence should be addressed: [email protected]

7

Page 12: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Make genomics reproducible again – understanding statistics underpinningreproducible genomics

W.Tong1

1The National Center for Toxicological Research and U.S.Food and Drug Administration, Jefferson, Arkansas, USA

8

Page 13: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Challenges and solutions for life science data in Europe - buildingbioinformatics capacity in the Czech republic via ELIXIR Infrastructure

J. Vondrasek1

1Czech Academy of Sciences / ELIXIR - Czechia, Prague, Czech Republic

9

Page 14: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Harnessing advances in proteomics and data science to characterize biologicalstates in cellular systems

J. Meller1

1Cincinnati Children‘s Hospital Medical Center and University of Cincinnati, Cincinnati, USA

The NIH LINCS Project (http://www.lincsproject.org/) constitutes one of the most ambitiousattempts to systematically map the effects of genetic and chemical perturbation in cellular systemsto date, leading to data science challenges and the development of new methods and resources thatare being developed by the LINCS Data Coordination and Integration Center and the community.The current state of the LINCS project will be discussed, with emphasis on proteomics profilingand the derivation of multilevel (transcriptional, proteomic and other) signatures of cellular pertur-bation. piLINCS and pNET will be presented as examples of new resources for LINCS proteomicdata being developed by our group. Applications to proteomic profiling of postsynaptic densityin the context of brain disorders (based on a joint work with Rob McCullumsmith and his group)will be used to illustrate the use of LINCS data and related to it challenges.

10

Page 15: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Flood of genomic (big) data

S. Deorowicz1, ∗1Silesian University of Technology, Institute of Informatics

In the last two decades the throughput of genome sequencers increased by a few orders ofmagnitude and nowadays it is possible to sequence H. sapiens genome in a day. At the sametime the sequencing cost of a single human individual decreased from over 1 billion to 1 thousanddollars, which allowed to think of genome sequencing as a useful procedure that can be used in thepersonalized medicine. In their recent study, Stephens et al. (PLOS Biology 2015) predicted thatin 10 years the genomic data will be acquired at 1ZB per year. Of course only a fraction of them,2–40EB as predicted, should be stored for long term. Nevertheless, the numbers are still huge,especially when we compare them with the costs of storage and transfer that decrease moderatelyover the recent years.

An obvious way to deal with the flood of genomic data seems to be data compression. Unfortu-nately it is not easy to design compression algorithms that are able to reduce the data more thana few times, which is crucial. Moreover, the compression scheme should not complicate the accessto the data.

In this study we discuss the problems related to storage, transfer, and easy access to the genomicdata obtained at various stages of processing.

∗ To whom the correspondence should be addressed: [email protected]

11

Page 16: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Integrative modelling of biomolecular complexes using HADDOCK

Z. Kurkcuoglu1

1Computational Structural Biology Group,Bijvoet Center for Biomolecular Research, Faculty of Science – Chemistry,

Utrecht University, Utrecht, the Netherlands

The dynamic network of biological macromolecules is critical for cellular processes and its studynecessitates complementary computational approaches to experimental techniques. For this pur-pose, our group (bonvinlab.org) focuses on developing reliable bioinformatics and computationalmethods to study the structure, the dynamics and the interactions of biomolecular machines atatomistic level. Since 2003, the group develops and maintains the HADDOCK software,1,2 aninformation-driven approach for modelling biomolecular complexes which can integrate experi-mental data from various methods including for example NMR, mutagenesis and cryo-electronmicroscopy data.3,4 HADDOCK is available via a user-friendly web interface to its community ofover 9000 users worldwide. The operation of the portal is supported by HTC resources of EGI(www.egi.eu) and various other European projects (WeNMR, West-Life VRE, INDIGO-DataCloud,BioExcel CoE).

In my talk, I will explain the basic concepts on HADDOCK and give various examples of itsuse in integrative modelling. I will also present recent results on the modelling of conformationalchanges upon binding – a still open challenge in the field. We address those by combining aniterative, unbiased, elastic network method named ClustENM5,6 with HADDOCK.

1 G.C.P van Zundert, J.P.G.L.M. Rodrigues, M. Trellet, C. Schmitz, P.L. Kastritis, E. Karaca, A.S.J.Melquiond, M. van Dijk, S.J. de Vries and A.M.J.J. Bonvin. The HADDOCK2.2 webserver: User-friendlyintegrative modeling of biomolecular complexes. J. Mol. Biol., 428, 720-725 (2015).

2 S.J. de Vries, M. van Dijk and A.M.J.J. Bonvin The HADDOCK web server for data-driven biomoleculardocking. Nature Protocols, 5, 883-897 (2010).

3 G.C.P. van Zundert, A.S.J. Melquiond and A.M.J.J. Bonvin. Integrative modeling of biomolecular com-plexes: HADDOCKing with Cryo-EM data. Structure. 23, 949-960 (2015).

4 G.C.P van Zundert and A.M.J.J. Bonvin. Defining the limits and reliability of rigid-body fitting in cryo-EM maps using multi-scale image pyramids. J. Struct. Biol., 195, 252-258 (2016).

5 Z. Kurkcuoglu, I. Bahar and P. Doruker. ClustENM: ENM-based sampling of essential conformationalspace at full atomic resolution. J. Chem. Theory Comput. 12 (9), 4549-4562 (2016).

6 Z. Kurkcuoglu and P. Doruker. Ligand docking to intermediate and close-to-bound conformers generatedby an elastic network model based algorithm for highly flexible proteins. PLoS One, 11(6): e0158063(2016)

12

Page 17: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

DNA Sequencing - from SBH to Polish Genome Project

J. Blazewicz1, ∗1Poznan University of Technology, Poznan, Poland

DNA sequencing receives attention of various research communities for several decades. In thetalk we overview its different phases, starting from the roots - Sequencing by Hybridization andunderlying graph theory to the modern algorithms following NGS technology. The ultimate goal ofour study will be the Polish Genome Project (PGP); which aims to develop a resource that givesinsights into the origins of the Polish people and facilitate and understanding of genetic variationamong Polish people.

In order to achieve high-quality, phased, de novo assembled genomes, the genomes of sequencedtrios will be characterized by combining long read sequencing using Single Molecule Real-Time(SMRT) sequencing using the Pacific Biosciences (PacBio) platform with the 10X Genomics (10XG)platform.

∗ To whom the correspondence should be addressed: [email protected]

13

Page 18: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Efficient and cost-effective data-intensive computing on multi-clouds: Anintroduction to the MELODIC project

F. Zahid1, ∗1Simula Research Laboratory, Martin Linges vei 25 1364 Fornebu, Norway

Data-intensive computing, often simply referred to as big data, is one of the major currenttrends in ICT. In areas as diverse as social media, business intelligence, information security,Internet-of-Things, and scientific research, a tremendous amount of data is created or collected ata speed surpassing what we can handle using traditional data management techniques. Life sciencesare not different. With the vast amount of biological information available, such as Omics data,unprecedented opportunities for modern research and scientific breakthroughs arise, all dependingon the efficient and cost-effective data analysis. Cloud computing, characterized by the paradigm ofon-demand network access to computational resources and pay-as-you-go economic model, promisesgreat potential of providing required computational resources for data analytics in Bioinformatics.However, challenges such as lack of data privacy and data-aware cloud federation keeps cloudcomputing from realizing the full potential for data-intensive applications. At the same time, non-standardized cloud interfaces make it complex to migrate big data applications between platformsthus preventing cloud users from achieving optimal cost-performance ratio for their applicationsby encouraging vendor lock-in.

In this talk, we will provide an introduction to the MELODIC H2020 project and show howit can be of great value in Bioinformatics. The vision of MELODIC is to enable federated cloudcomputing for data-intensive applications, and provide the user with an easy-to-use unified cloudenvironment, hiding the complexity of a multi-cloud. The MELODIC platform enables big dataapplications to transparently take advantage of distinct characteristics of available private andpublic clouds by dynamically optimizing resource allocations considering data locality and user’sperformance and privacy needs. From the perspective of the user, the MELODIC frameworkappears as an infrastructure-agnostic middleware platform supporting development, deployment,and execution of data-intensive applications on distributed and heterogeneous multi-clouds. Forthe Bioinformatics community, this could mean utilizing the resources available for multiple cloudproviders and private infrastructures in a secure, transparent, efficient, cost-effective, and reliablemanner for their big data workloads.

∗ To whom the correspondence should be addressed: [email protected]

14

Page 19: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

How to run not so big computer/data center: Management of Tier-2 HPCARCHIE-WeSt

K. Kubiak1

1ARCHIE-WeSt, Department of Physics, University of Strathclyde, Glasgow, UK

We are a supercomputer centre for the West of Scotland based at the University of Strathclydededicated to wealth creation and research excellence in the region. Funded by EPSRC, we oper-ate in partnership with the Universities of Glasgow, Glasgow Caledonian, West of Scotland andStirling. The aim of the centre is to support multi-disciplinary research, with a centre of gravityin engineering and physical sciences, while reaching out to other disciplines and to encourage andenable industrial usage and collaboration.

The talk will cover technical specification of the facility, access routes as well as operationalprocedures regarding users and project management, support, training, reporting and accounting.

15

Page 20: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

New approaches for determination of RNA pseudoknot order

M. Antczak,1, ∗ T. Zok,1 M. Popenda,2 and M. Szachniuk1, 2

1Institute of Computing Science, Poznan University of Technology, Poznan, Poland2Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland

RNAs are flexible molecules folded often into a complex 3D structure covering various kindsof canonical as well as non-canonical interactions. The longer the sequence is, fascinating motifscalled as pseudoknots, are more likely observed. In fact, pseudoknots are regular Watson-Crick basepairs formed between sequentially distant nucleotides that would unwind during melting just asany other base pair. They are represented by crossing arcs linking two distant locations separatedby a well-structured fragment such as a helix.

Pseudoknots are evolutionarily conserved, therefore are mainly responsible for stabilization ofRNA structure. Due to topological diversity, their detection and classification is still a challenge.In a literature, a few ways of pseudoknots classification have been proposed, namely by type, byfamily or by topological genus. We have also introduced a term a pseudoknot order that measuresa structural complexity of RNA, defined as a minimum number of base pairs set decompositionswhich result in a nested (i.e., non-pseudoknotted) structure.1 Unfortunately, it is difficult to de-termine the appropriate minimum decompositions set. Here, we propose new algorithms based ona well-known graph theory problem called vertex-coloring to identify RNA pseudoknots and en-code them in extended dot-bracket notation. The order corresponds to a chromatic number whilecolor assignments represent levels of pseudoknots. We show also a complementary way of theselevels assignment that we hope will be more accurate in terms of RNA folding pathways. Finally,we present the results of a comparison of both approaches prepared based on a set of RNA 3Dstructures deposited in Protein Data Bank repository.

∗ To whom the correspondence should be addressed: [email protected] M. Antczak, T. Zok et al., 2014. RNApdbee–a webserver to derive secondary structures from pdb

files of knotted and unknotted RNAs. Nucleic Acids Research, 42(W1), W368-W372. Available at:https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gku330

16

Page 21: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Stochastic Petri net model of a cholesterol metabolism and its analysis

M. Radom,1, 2, ∗ D. Formanowicz,3 and P. Formanowicz1, 2

1Institute of Computing Science, Poznan University of Technology, Poznan2Institute of Bioorganic Chemistry, Poznan, Polish Academy of Science

3Department of Clinical Biochemistry and Laboratory Medicine,Poznan University of Medical Sciences, Poland

Cholesterol is a chemical compound that performs an important role in membrane structure.It is also a precursor for the synthesis of the steroid hormones and bile acids. Both the synthesisand utilization of cholesterol must be regulated in order to prevent its accumulation or abnormaldeposition within the body, which may lead to, e.g., atherosclerotic plaque formation.

To model cholesterol metabolism Petri nets theory has been used. Classical Petri nets allowvarious analytical approaches like, e.g., t-invariants based analysis which often includes calculationof Maximum Common Transition sets (MCT sets) or grouping t-invariants into similar groups calledt-clusters for the purpose of finding interactions between subprocesses in the modeled system. Onthe other hand, a model based on a stochastic Petri net allows more sophisticated simulation basedanalysis.

In the presented work we were able to create such a stochastic Petri net based model of choles-terol metabolism in order to perform a few interesting simulational scenarios on it, representingdifferent reaction speeds and some basic simulation knockout analysis. The obtained results arevaluable complements to the ones obtained on the basis of a qualitative model based on classicalPetri nets.

ACKNOWLEDGMENTS

This research has been partially supported by the National Science Centre (Poland) grant No.2012/07/B/ST6/01537.

∗ To whom the correspondence should be addressed: [email protected]

17

Page 22: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Jammed conformation of slipknoted membrane protein during mechanicalunfolding

R. Jakubowski,1, ∗ S. Niewieczerzal,1 and J. Sulkowska1, 2

1Centre of New Technologies, University of Warsaw, ul. Banacha 2c, 02-097 Warsaw2Faculty of Chemistry, University of Warsaw, ul. Pasteura 1, Warszawa

Although undoubtable progress in science has been done, we still don’t fully understand manybasic phenomena, for instance protein folding process. What forces drive a polypeptide chain toform a specific and functional three dimensional structure? This, yet fully unanswered question,seems even more challenging in light of the discovery of proteins with non trivial topologies.

One of possible approaches to explore events that could be observed during protein folding isconducting a reversed process – a protein unfolding. Mechanical unfolding of proteins, both exper-imentally and computationally, is one of the most widespread techniques. It allows to recognizemost important events and transition states under conditions of mechanical stress, hence, bringsinsights into various properties of a protein. Many successful studies have been conducted for awhole diversity of soluble proteins,1 while membrane proteins and especially these with non-trivialtopology2 have been omitted, due to the lack of tools to interpret both theoretical and experimentalresults.

Here we present preliminary results of the unfolding pathway of a membrane protein BetP,composed of three internal knots, which we determined for the first time theoretically BetP is amember of betain/choline/carnitine transporter family and can perform three functions: betainetransport, osmosensing, and osmoregulation. Calculations are conducted using the structure basedmodel3 and newly implemented implicit membrane model, based on the simplified IMM1 energyfunction,4 validated with unfolding of bacteriorhodopsin against existing data. Our results showsurprising metastable conformation,5 which arises from tying a slipknot when protein is pulled.

ACKNOWLEDGMENTS

This research was funded by the Ministry of Science and Higher Educations of Poland by IdeasPlus II grant. Computational facilities of Interdisciplinary Centre for Modern Technologies areacknowledged.

∗ To whom the correspondence should be addressed: [email protected] Mechanical Stretching of proteins – A theoretical survey of the Protein Data Bank, JI Sulkowska, M

Cieplak, J. Phys. Cond. Mat. (2007) 19, 2832012 Conservation of complex knotting and slipknotting patterns in proteins, JI Sulkowska, EJ Rawdon, KC

Millett, JN Onuchic, A Stasiak, Proc. Natl. Acad. Sci (USA), (2012) 109(26): E1715-23.3 Selection of optimal variants of Go-like models of proteins through studies of stretching, JI Sulkowska, M

Cieplak Biophys. J. (2008) 95, 31744 Effective energy function for proteins in lipid membranes, Lazaridis, T., Proteins: Structure, Function,

and Bioinformatics 52.2 (2003): 176-1925 Jamming proteins with slipknots and their free energy landscape, JI Sulkowska, P Su lkowski, JN Onuchic,

Phys. Rev. Lett. (2009) 103 268103

18

Page 23: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

On the origins of life: Theoretical studies of reactions catalyzed bymontmorillonite on atmospheric-like gases

J. Carrascoza Mayen,1, ∗ J. Rydzewski,2 N. Szostak,1 J. Blazewicz,1, 3 and W. Nowak2

1Institute of Computer Science, Poznan University of Technology, Poland2Institute of Physics, Faculty of Physics,

Astronomy and Informatics, Torun, Poland3Institute of Bioorganic Chemistry, Poznan Academy of Sciences, Poland

Here we present our preliminary results of computationally modeled boxes containing elemen-tary gases in contact with montmorillonite, a mineral clay commonly found on planet earth. Ourfindings suggest the formation of molecules crucial for the development of life. These results areconsistent with previous experimental studies important to understand the formation of biopoly-meric compounds such as proteins and RNA. In this work we attempt to understand at quantummechanical level the mechanism of reaction and the potential role of mineral clay, all of this usingpseudopotentials as described by the Car-Parrinello molecular dynamics theory by implementingthis software in a high-parallel computing environment.

∗ To whom the correspondence should be addressed: [email protected]

19

Page 24: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Posters

Page 25: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Retroposition as a source of antisense long non-coding RNAs with possibleregulatory functions

O. Bryzghalov,1, ∗ M. Szczesniak,1 and I. Makalowska1

1Department of Integrative Genomics, Institute of Antropology,Adam Mickiewicz University in Poznan, Poznan, Poland

Recent studies have shown that long noncoding RNAs appear to regulate expression of the genesthrough a diverse group of mechanisms. Some of lncRNAs originate from retrocopies, intronlesscopies of the so-called parental genes originated in the process of retroposition. Once a retrocopyis transcribed in the antisense orientation, the resulting lncRNAs share sequence similarity withthe parental genes in the sense/antisense orientation, meaning they are able to interact and formRNA:RNA duplexes with possible regulatory implications. We found 58 lncRNAs that were tran-scribed in antisense to 35 human retroposition derived copies of protein-coding genes. Furtheranalysis of possible RNA:RNA duplexes revealed 10 lncRNAs with potential regulatory roles ex-erted on their parental genes, which include stability control, pre-mRNA and mRNA processing.Wefocused on three possible base-pairings with statistically significant correlations of expression in ouranalysis. These cases include the following parental genes: hnRNPA1, CHMP1A, and RPL23A.Our findings suggest that retroposition-derived, antisense lncRNAs might affect the expressionand processing of parental genes in a number of ways. This statement is supported by the in silicobase-pairing of the RNA molecules, followed by computational function assignment, co-expressiondata and, occasionally, correlation of expression and evolutionary conservation.

∗ To whom the correspondence should be addressed: [email protected]

21

Page 26: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Pan-genome index for resequencing

A. Danek1, ∗ and S. Deorowicz1

1Institute of Informatics, Silesian University of Technology,ul. Akademicka 16, 44-100 Gliwice, Poland

A linear reference genome, typically used in resequencing procedures, does not represent wellthe existing genetic variances within a species. Thus, the growing interest is in designing special-ized data structures and methods for processing sequencing data, which would make use of thepreexisting biological knowledge about the diversity of members of the same species.

We present an index data structure that incorporates already called variants into the referencegenome. It allows for a seed-end-extend style queries about any string. Basically, it keeps a trackof all possible k-length sequences that can occur in a genome with any combination of variants.Our solution is compact and fast. For example, 3GB are enough for an index of a human genomewith over 8 million known variant sites from the 1000GP, which answers the exact queries in lessthan 100us.

ACKNOWLEDGMENTS

Agnieszka Danek was supported by Silesian University of Technology under BKM509/RAU2/2017project.

∗ To whom the correspondence should be addressed: [email protected]

22

Page 27: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

A comparison of frequencies of promoter-enhancer interactions in HUVEC andfetal brain cells

H. Kranas1, ∗ and B. Wilczynski1

1University of Warsaw, Faculty of Mathematics, Informatics and Mechanics,Institute of Informatics. ul. Banacha 2, 02-097 Warsaw, Poland

Here we present an implementation of the method for identification of interacting regions andcreating interaction profiles based on Hi-C data as introduced by (Won et al. 2016). The focusis mainly on promoter-enhancer interactions–we generated interaction profiles for all fetal brainand HUVEC enhancers from Enhancer Atlas (Gao et al. 2016) and extracted positions of interest(statistically significant contacts). Results are shown as a comparative analysis of contacts in thosetwo distinctly different cell lines.

∗ To whom the correspondence should be addressed: [email protected]

23

Page 28: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

A model of methylation and demethylation of cytosine forms

K. Kurasz,1, ∗ D. Hudy,1 M. Skonieczna,1 E.

Zarakowska,2 K. Fujarewicz,1 and J. Rzeszowska-Wolny1

1Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland2Department of Clinical Biochemistry, Faculty of Pharmacy,

Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun,Karlowicza 24, 85-092 Bydgoszcz, Poland

The aim of the study is to propose mathematical model of methylation and demethylation ofcythosine forms, which is able to predict amount of different cythosine forms based on biologicaldata. Second aim of this work is selection of the model structure, and on its basis explanationof the contribution of the TET family proteins. In the process of DNA methylation are involvedDNA methyltransferases. DNMT1 is mainly responsible for DNA methylation during replicationprocess where the daughter strand should get the same methylation pattern. DNA demethylationis believed to involve the successive oxidation of 5-methylcythosine (5-mC) to 5-hydroxymethyl-(5-hmC), 5-formyl- (5-fC) and 5-carboxy- (5-caC) cytosine in a process that involves the TET (ten-eleven translocation) family of enzymes including TET1, TET2 and TET3. The model is describedby six ordinary differential equations. To calculate parameters of the models we used NonnegativeLinear Least Squares which compute a nonnegative solution to a linear least squares problem, andthe predictive ability of the model was assessed based on leave-one-out cross validation which is amodel validation technique for evaluating how the results of a statistical analysis will generalize toan independent data set.

Our research was focused on places where TET proteins work. The best model structuressuggests that in the process of transformation 5-mC to 5 hmC TET3 proteins are not involved andbetween 5-hmC and 5-fC TET2 proteins are not involved.

ACKNOWLEDGMENTS

This work was funded by the Silesian University of Technology under grant BKM/506/RAU1/2016/11

∗ To whom the correspondence should be addressed: [email protected]

24

Page 29: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

A role of immune and inflammatory mechanisms in essential hypertension andcardiovascular disease–modeled and analyzed using Petri nets

A. Rybarczyk,1, 2, ∗ D. Formanowicz,3 and P. Formanowicz1, 2

1Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland2Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Noskowskiego 12/14, 61-704 Poznan, Poland3Department of Clinical Biochemistry and Laboratory Medicine,

Poznan University of Medical Sciences, Grunwaldzka 6, 60-780 Poznan, Poland

Essential hypertension is the world’s most prevalent cardiovascular disorder, however, its eti-ology remains poorly understood, making it difficult to study. The evidence suggests that in-flammation can lead to the development of hypertension and that oxidative stress and endothelialdysfunction are involved in the inflammatory cascade. In this work, to investigate the influence ofthese factors on the essential hypertension development, a Petri net based model has been buildand then analyzed. The analysis consisted of generating MCT-sets and t-clusters using specificallyselected clustering method and knock-out analyzes. The application of systems approach that hasbeen used in this research has enabled for an in-depth analysis of the studied phenomenon and hasallowed to draw valuable biological conclusions.

ACKNOWLEDGMENTS

This research has been partially supported by the National Science Centre (Poland) grant No.2012/07/B/ST6/01537.

∗ To whom the correspondence should be addressed: [email protected]

25

Page 30: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Camphor’s and Huperzine’s adventures in Proteinland

J. Rydzewski1, ∗ and W. Nowak1

1Institute of Physics, Faculty of Physics, Astronomy and Informatics,Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland

Biologically relevant ligands migrate between solvent environments and enzymatic active sitesto fulfil their physiological roles. Theoretical prediction of such transport pathway is a challeng-ing task1,2. Recently, we developed an advanced computational scheme, based on the memeticsampling (MS) during molecular dynamics simulations3. MS effectively predicts ligand pathwayswithin crowded protein matrices. After dimensionality reduction of ligand-protein conformationalspace and calculating reaction coordinates4, we selected ligand exit pathways in two enzymaticsystems: cytochrome P450cam-camphor and acetylcholinesterase-huperzine A. For these pathwayscomprehensive metadynamics simulations5,6 revealed free-energy barriers encountered by both lig-ands during the unbinding from their enzymatic active sites. Chemical nature of these rate-limitingregions was determined.

ACKNOWLEDGMENTS

JR acknowledges funding (grants no. 2015/19/N/ST3/02171 and 2016/20/T/ST3/00488) fromNational Science Centre, Poland and UMK grants 2406-F, 2539-F. The results were obtained usingInterdisciplinary Centre for Modern Technologies computational facilities, NCU Torun, Poland.

∗ To whom the correspondence should be addressed: [email protected] J. Rydzewski & W. Nowak. Phys. Life Rev. doi.org/10.1016/j.plrev.2017.03.003 (2017)2 W. Nowak. Handbook of Computational Chemistry 2nd ed. J. Leszczynski, Springer (2017)3 J. Rydzewski & W. Nowak. J. Chem. Phys. 143, 09B617 1 (2015)4 J. Rydzewski & W. Nowak. J. Chem. Theory Comp. 12, 2110 (2016)5 J. Rydzewski & W. Nowak. Sci. Rep., submitted (2017)6 J. Rydzewski, R. Jakubowski, W. Nowak & H. Grubmueller, in preparation (2017)

26

Page 31: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Integrative Galaxy tool for genomic data visualization in JBrowse

K. Sienkiewicz1, ∗ and B. Wilczynski1

1Uniwersytet Warszawski, Wydzial Matematyki, Informatyki i Mechaniki

We are creating the automatic tool which allows exporting data from Galaxy web platform andimporting it to genome browser, such as JBrowse. The main purpose is to automatize data flowbetween computational biomedical research and data visualization for projects which conduct agenome analysis. We also want to provide a user with a simple way to manage access settings tokeep data secure while simultaneously sharing and collaborating on it.

∗ To whom the correspondence should be addressed: [email protected]

27

Page 32: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Machine learning algorithms for the estimation of DNA repair processes in theEscherichia coli model and for significance of redox balance disturbance in muscular

dystrophy patients

B. Sokolowska,1, ∗ A. M. Maciejewska,2 M. Dylewska,2 I. Niebroj-Dobosz,1

A. Madej-Pilarczyk,1 M. Hallay-Suszek,3 J. Kuzmierek,2 and B. Lesyng1, 4

1Mossakowski Medical Research Center, Polish Academy of Sciences, Warsaw, Poland2Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland

3Interdisciplinary Center for Mathematics and Computational Modelling, University of Warsaw, Poland4Faculty of Physics, University of Warsaw, Poland

This work presents preliminary statistical analyses of observed effects in molecular biology and in clinical appli-cations resulting from oxidative stress, by using the machine learning algorithms. Oxidative stress is defined as thepresence of active oxygen species in excess of the antioxidant buffering capacity. Reactive oxygen species, may damagelipids, proteins, DNA, proteins and carbohydrates changing their structure and functions. The first microbiologicalproject is focused on the analysis of an adaptive response system (Ada response) in Escherichia coli cells, which involvesthe expression of genes encoding four proteins Ada, AlkA, AlkB and AidB.1 The Ada response is activated after ex-posure to non-toxic doses of alkylating agents. E. coli AlkB dioxygenase belongs to the superfamily of 2-oxoglutarate-and iron-dependent dioxygenases, which removes alkyl lesions from bases via an oxidative mechanism restoring na-tive DNA structure. Recently discovered by us a new AlkB dioxygenase substrate, 1,N6-α-hydroxypropanoadenine(HPA) is an exocyclic DNA adduct of environmental pollutant and endocellular oxidative stress product – acrolein.2

The objective of the project is to estimate the significance of AlkB dioxygenase in the adaptive response system byapplication pattern recognition algorithms. Differentiation between wild and alkB bacterial strains with induced ornot Ada response system in their ability to repair HPA is based on the k nearest neighbors (k-NN) classifier. Theobserved differences of mutation frequencies in the tested strains indicate that HPA is mutagenic and repaired by AlkBdioxygenase in vivo (the calculated misclassification rates, Er, are the smallest ones). In the second clinical projectthe algorithms are applied to estimation of redox balance and significance of parameters/biomarkers such as totaloxidant status (TOS), total antioxidant capacity (TAC) and the oxidative stress index (OSI, ratio of TOS to TAC)in Emery-Dreifuss muscular dystrophy (EDMD) patients, which may have diagnostic/prognostic value.3 EDMD is arare genetic and very serious disease - it affects (skeletal) muscles used for movement as well as the heart (cardiac)muscle, and is caused by the deficit of lamin A/C or emerin.4 The perfect differentiation between healthy subjectsand EDMD patients indicates that TOS (Er=0) and also TAC might help to recognize and assess the progress of thedisease. In conclusion, the results of microbiological studies (acrolein is a marker of oxidative stress in lipid peroxida-tion processes) and clinical studies (oxidative stress resulting in oxidative forces which exceed the antioxidant systemsdue to loss of the balance between them, similarly for patients with EDMD) show that not only classical statisticalapproach and also machine learning algorithms may provide valuable information and may be very helpful both forresearchers and clinicians.

ACKNOWLEDGMENTS

Authors thank Profs. Iwona Fija lkowska and Piotr Jonczyk from Institute of Biochemistry and Biophysics PAS, forkindly providing the pIF plasmids for mutagenesis tests, and also Prof. Adam Jozwik from Institute of Biocyberneticsand Biomedical Engineering PAS for releasing his k-NN software. This study was partially supported by NationalScience Centre, Poland, grant no UMO-2012/04/M/NZ1/00068. Computations and analyses were carried out usingthe computational infrastructure of the Biocentrum - Ochota project (POIG.02.03.00-00-0030/09).

∗ To whom the correspondence should be addressed: [email protected] Mielecki D, Grzesiuk E: Ada response – a strategy for repair of alkylated DNA in bacteria. FEMS Microbiology Letters 2014,

355: 1–112 Dylewska M, Kusmierek JT, Pilzys T, Poznanski J, Maciejewska AM: 1,N6-α-hydroxypropanoadenine, the acrolein adduct

to adenine, is a substrate for AlkB dioxygenase. Under revision3 Niebroj-Dobosz I, Soko lowska B, Madej-Pilarczyk A, Marchel M, Hausmanowa-Petrusewicz I: Redox balance and the dys-

functional lamina in Emery-Dreifuss muscular dystrophy. Under revision4 Madej-Pilarczyk A, Kochanski A: Emery-Dreifuss muscular dystrophy: the most recognizable laminopathy. Folia Neuropatho-

logica 2016, 54(1): 1-8

28

Page 33: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Nets decomposition as base for biochemical similarity method.

B. Szawulak1, ∗ and P. Formanowicz1, 2

1Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland2Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Noskowskiego 12/14, 61-704 Poznan, Poland

Petri nets are a formalism widely used in many areas of computer science to model and analyzeconcurrent processes. In recent years there is a growing interest in an application of nets ofthis type for modeling and analysis of complex biological systems. Comparison of Petri net basedmodels is still a not well-solved problem. An effective method for such a comparison would allow todetermine common structures occuring in different biological systems what may have a great impacton understanding their functions. Unfortunately, using standard graph comparison methods forPetri nets has several disadvantages. They are related to higher complexity of Petri nets and loos ofinformations. Because of that a dedicated method is required. In this work we propose a method forcalculating similarity degree of two Petri nets by decomposition to biologically meaningful subnetsfor structural comparison.

ACKNOWLEDGMENTS

This research has been partially supported by the National Science Centre (Poland) grant No.2012/07/B/ST6/01537.

∗ To whom the correspondence should be addressed: [email protected]

29

Page 34: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Big data analysis of porcine spermatozoa transcriptome by RNA-seq

C. Pareek1, ∗ and L. Fraser2

1Centre for Modern Interdisciplinary Technologies,Nicolaus Copernicus University, ul. Wilenska 4, Torun 87-100, Poland

2Faculty of Animal Bio-engineering, University of Warmia and Mazury, Olsztyn, Poland

Revolutionized high-throughput next-generation genome sequencing (HT NGS) based RNA-seq technologies and advancements in bioinformatics tools were providing the challenging task toanalyse the big transcriptomics data of domestic animal genome including Sus scorfa.1 In this studythe NextSeq 500 (Illumina) NGS platforms were used to give a brief outline of the transcriptomeanalysis of spermatozoa of Polish Large White boars differing in semen freezability.2 Using advancedbioinformatics tools for big transcriptome data analysis, an experimental design to generate big dataof boar spermatozoa was been implemented in this study. Furthermore, bioinformatics analysisof the big data has been used to develop new sets of expressed sequence tag databases (dbESTs)and data base single nucleotide polymorphisms (dbSNPs) for boar sperm transcriptome, as wellas to develop novel EST and SNP markers for traits associated with gene expression profiling,particularly for transcriptome profiling of boar spermatozoa differing in freezability. It is envisagedthat bioinformatics analysis of the big data will be used to create new sets of annotated andun-annotated genes for boar sperm transcriptome. The analysis of boar sperm transcriptome onRNA-seq data is required to elucidate the biological significance of the specific sperm-related genetranscripts in semen freezability.

ACKNOWLEDGMENTS

Supported by a NCN project in Poland (2015/19/B/NZ9/01333).

∗ To whom the correspondence should be addressed: [email protected] Pareek C.S., Smoczynski R. Tretyn A. Sequencing technologies and genome sequencing. Journal of Applied

Genetics, 2011, 52: 413-4352 Fraser L. Sperm transcriptome profiling for assessment of boar semen freezability. IJASRM, 2016, 1(12):

9-12.

30

Page 35: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Haptic device facilitates big data analysis in structural biology

L. Peplowski,1, ∗ J. Rydzewski,1 and W. Nowak1

1Institute of Physics, Faculty of Physics, Astronomy and Informatics,Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland

Supercomputers and large UNIX clusters allow nowadays for generation of 100 ns moleculardynamics trajectories of large proteins per day.1 Data used in a single study may require petabytesof storage. Particularly difficult is assessment of ligand diffusion paths within protein tunnels.2

In this communication we analyze the advantages of using electronic haptic devices (HD) in acrude determination of possible paths for ligand (i.e., drug) unbinding from biological cavities.Interactive molecular dynamics (IMD)3 implemented in the NAMD/VMD code4,5 and coupledwith a mechanical robotic device6 allows a researcher to feel in hands the feedback coming froman interaction between a protein and ligand. The user may catch the ligand and effectively probepossible low-hindrance transport paths within the protein matrix in real time. There are reportsthat in advanced IMD systems researchers managed to steer molecular systems of 1.7 M atoms atabout 25 Hz using 384 CPU cores.7 Our experience shows that a plausible path in some 3000 atomprotein may be found within 10 minutes of work with the IMD/HD system running on a standardlinux graphical workstation. 120 Gb of visual data is processed during this short time period andreduced to a few kb of useful force-distance plots. Achieving similar results without the interactiveuse of the human brain and sense of touch is still much more time-demanding and may take a fewdays on a standard 100 core cluster.8,9

ACKNOWLEDGMENTS

JR thanks NCN for the financial support (grants 2015/19/N/ST3/02171 and 2016/20/T/ST3/00488).The infrastructure of NCU Interdisciplinary Centre for Modern Technologies was used in this study.

∗ To whom the correspondence should be addressed: [email protected] W. Nowak, in Handbook of Computational Chemistry, ed. J. Leszczynski, Springer (2017)2 J. Rydzewski & W. Nowak. Phys. Life Rev. (2017)3 P. Grayson, E. Tajkhorshid & K. Schulten. Biophys. J. 85, 36 (2003)4 W. Humphrey, A. Dalke & K. Schulten. J. Mol. Graph. 14, 33 (1996)5 J. C. Phillips et al. J. Comp. Chem. 26, 1781 (2005)6 http://www.geomagic.com/en/products-landing-pages/haptic/7 M. Dreher et al. Procedia Computer Science 18, 20 (2013)8 J. Rydzewski & W. Nowak. J. Chem. Phys. 143, 09B617 1 (2015)9 J. Rydzewski & W. Nowak. J. Chem. Theory Comp. 12, 2110 (2016)

31

Page 36: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

RNAComposer allows the user to improve accuracy of predicted RNA 3Dstructures

T. Zok,1, ∗ M. Antczak,1 M. Popenda,2 J. Sarzynska,2 T.

Ratajczak,1 K. Tomczyk,1 R. W. Adamiak,1, 2 and M. Szachniuk1, 2

1Institute of Computing Science, Poznan University of Technology, Poland2Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poland

Cellular functions of RNAs strongly depend on their three-dimensional structures. The latterones are recognized in processes combining experimental and computational methods, or predictedby in silico algorithms. Significant progress in automated RNA 3D structure prediction over the lastdecade, made the predicted 3D models become often the starting point in experimental research.

RNAComposer (http://rnacomposer.cs.put.poznan.pl, http://rnacomposer.ibch.poznan.pl) isone of the most popular worldwide, fully automated predictors dedicated to RNA. Due to its friend-liness, efficiency and capacity to predict high-resolution 3D structures of large RNAs, it continu-ously accumulates an interest among experimental molecular biologists. As reported throughoutthe literature, RNAComposer is regularly applied to reliably predict initial 3D models of unknownRNAs which are next treated as templates and processed i.a. by an application of experimentaldata-originated restraints. So far, post-processing of predictions was performed manually usingexternal computational tools. Presently, RNAComposer web server provides a platform for bothautomated and user-adjusted RNA 3D structure prediction. It allows to modify preliminary struc-ture fragments, apply own structural elements, define atom distance / torsion angle restraints, etc.These mechanisms, in hands of experimentalists, highly facilitate an access to accurate RNA 3Dmodels. We will exemplify them by modeling 3D structures of miR160 precursors.

∗ To whom the correspondence should be addressed: [email protected]

32

Page 37: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Graph theoretical models of the problem of chemical compounds structuralformulas construction

A. Antkowiak1, ∗ and P. Formanowicz1, 2

1Institute of Computing Science, Poznan Univesity of Technology, Piotrowo 2, 60-965 Poznan, Poland.2Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Noskowskiego 12/14, 61-704 Poznan, Poland.

In mass spectrometry there appears a problem of a construction of structural formulas of chem-ical compounds on the basis of the information about the number of atoms of various elements andtheir valencies. One of the possible approaches to solving this problem is based on graph theory.There is known a problem of a construction of a simple graph with given vertex degrees. In an in-stance of this problem there is the number of vertices of the graph to be constructed and a multisetof degrees of them. This problem corresponds to a simplified version of the problem of structuralformula construction. Here, the vertices are counterparts of the atoms, their degrees correspondto valencies, while edges in the constructed graph represent chemical bounds. We consider someextensions of the graph problem, which can be more realistic models of the chemical one. Amongothers, we allow parallel edges, what results in a construction of a multigraph. Moreover, insteadof a multiset of positive integer numbers (i.e., vertex degrees) we assume that a multiset of sets ofsuch numbers is given. The parallel edges correspond to multiple bounds between pairs of atoms,while the subsets of integer numbers describe possible valencies of the elements (since most of themhave more than one valence). The considered graph problems are basis for a construction of moreprecise algorithms solving the problem of structural formulas construction.

∗ To whom the correspondence should be addressed: [email protected]

33

Page 38: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Modeling and analysis of disorders in prothrombotic states as a reason ofatherosclerosis development using Petri net-based approach

K. Chmielewska,1, ∗ D. Formanowicz,2 and P. Formanowicz1, 3

1Institute of Computing Science, Poznan Univesity of Technology, Piotrowo 2, 60-965 Poznan, Poland2Department of Clinical Biochemistry and Laboratory Medicine,

Poznan University of Medical Sciences, Grunwaldzka 6, 60-780 Poznan, Poland3Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Noskowskiego 12/14, 61-704 Poznan, Poland

Atherosclerosis as one of the cardiovascular diseases (CVDs) is the major cause of morbiditiesand mortalities worldwide. The knowledge about this multifocal immunoinflammatory diseaseof arteries is still incomplete. To better understand this complex biological phenomenon andto systematize the existing knowledge a systems approach has been used. Due to the fact thatmany mechanisms influence atherosclerosis development, this research has been focused on selectedsubprocesses which are disorders in prothrombotic states and the extrinsic pathway of coagulation.An analysis of the proposed model distinguished significant components of the modeled system, e.g.,thrombin (which is the main enzyme of coagulation) and the tissue factor (which is a specific cellularlipoprotein). These mechanisms and components are closely associated with formation of thrombus,which is one of the contributing risk factors leading to the development of atherosclerosis and othercardiovascular diseases. This complex biological process has been modeled using Petri nets. Theanalysis of the proposed model has been based on t-invariants. It allowed to determine a biologicalmeaning of components of the modeled system. Moreover, this systems approach confirmed animportant role of disorders in prothrombotic states and coagulation process in atherosclerosisdevelopment through thrombus formation.

ACKNOWLEDGMENTS

This research has been partially supported by the National Science Centre (Poland) grant No.2012/07/B/ST6/01537.

∗ To whom the correspondence should be addressed: [email protected]

34

Page 39: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

Bioinformatics Study of Structural Patterns in Plant MicroRNA

J. Miskiewicz1, ∗ and M. Szachniuk2

1Institute of Computing Science & European Centre for Bioinformatics and Genomics,Poznan University of Technology, Poland2Institute of Bioorganic Chemistry PAS

Small non-coding molecule of microRNA (19-24nt) is related to both positive and negativeaspects of various organisms lives. The amount of produced miRNA within an organism is highlycorrelated with key processes of human, plant or animal individuals, determines whether the systemwill work properly or defectively.

Before miRNA reaches the adult character, it appears in form of primary transcript and pre-cursor structure. Biogenesis starts in nucleus, where the long hairpin loop structure containingmicroRNA is trimmed to shorter version. Using specific enzymes, created molecule is transferredinto cytoplasm and further processed by molecular rulers. In animals this role is fulfilled by Dicerenzyme, in plants it is the Dicer Like 1 (DCL1) responsibility to perform the cleavages in miRNAmaturation process. Taking into account that animal miRNA biogenesis is almost fully understood,it would seem that the same process but in different organism, should also be almost completelyknown. But the situation is quite different, the last phase of miRNA maturation, recognition ofmiRNA in pre-miRNA structures by DCL1 in plants, remains an enigma.

Herein we present a bioinformatic approach to discover specific motifs in closest vicinity ofplant microRNAs. We believe that in sequence or secondary structure of pre-miRNA occurs amotif/motifs which guide DCL1 enzyme to perform a cleavage right before the miRNA occurrence.To test our hypotheses we use known bioinformatic programs and develop scripts to analyse thedata taken from database of miRNA precursors – miRbase.

∗ To whom the correspondence should be addressed: [email protected]

35

Page 40: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

A comparison of Petri net-based models of monocyte-macrophage axis insteady-state and in inflammation?

K. Rzosinska,1, ∗ D. Formanowicz,2 and P. Formanowicz1, 3

1Institute of Computing Science, Poznan Univesity of Technology, Piotrowo 2, 60-965 Poznan, Poland2Department of Clinical Biochemistry and Laboratory Medicine,

Poznan University of Medical Sciences, Grunwaldzka 6, 60-780 Poznan, Poland3Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Noskowskiego 12/14, 61-704 Poznan, Poland

Monocytes are circulating leukocytes that are key players in tissue homeostasis and immunity.The major monocyte subset is referred to as classical monocytes. During in ammation these mono-cytes are rapidly recruited to invade the in amed tissue and contribute to immunological responses,such as recognizing and removing microorganisms and dying cells. On the other hand, non-classicalmonocytes display a patrolling behavior and constantly survey the endothelium as a part of theinnate local control. Due to the high complexity of these phenomena there are many unknowns.Therefore, using Petri net theory, two models of monocyte-macrophage axis, in steady-state andduring in ammation, have been created and analyzed. The analysis have been based mainly ont-invariants, i.e., t-clusters and MCT sets have been calculated and investigated in detail. A com-parison of the models helped to better understand these complex issues.

ACKNOWLEDGMENTS

This research has been partially supported by the National Science Centre (Poland) grant No.2012/07/B/ST6/01537.

∗ To whom the correspondence should be addressed: [email protected]

36

Page 41: BIT17 Book of Abstracts - UMKfizyka.umk.pl/~bit/BIT17/boa_BIT17.pdf · Data mining in biomedical sciences Big data perspective W. Minor 1 1 University of Virginia, Charlottesville,

LCS-TA to identify similarity in molecular structures

J. Wiedemann,1, ∗ T. Zok,1, 2 M. Milostan,1, 2 and M. Szachniuk1, 3

1Institute of Computing Science & European Centre for Bioinformatics and Genomics,Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland

2Poznan Supercomputing and Networking Center,Jana Pawla II 10, 61-139 Poznan, Poland

3Institute of Bioorganic Chemistry, Polish Academy of Sciences,Z. Noskowskiego 12/14, 61-704 Poznan, Poland

Identification of common features and differences in biomolecule structures is an important taskwhose solution requires an involvement of bioinformatics methods. There is a necessity to developand tune similarity measures to better analyse and evaluate structures, especially those predictedby computational approaches. Here, we present LCS-TA, a new method to detect local structuralsimilarity. It finds the longest continuous segments in 3D structures folded in like manner. Thefolds are compared in torsion angle space and the measure of similarity is computed as the lengthof a segment.

∗ To whom the correspondence should be addressed: [email protected]

37