Top Banner
e-Science Data Information and Knowledge Transformation e-Science Data Information and Knowledge Transformation BinX – A Tool for Binary BinX – A Tool for Binary File Access File Access eDIKT project team Ted Wen [email protected] Robert Carroll [email protected]
42

E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen [email protected]@edikt.org.

Mar 27, 2015

Download

Documents

Zachary Conley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

BinX – A Tool for Binary File BinX – A Tool for Binary File AccessAccess

eDIKT project team

Ted Wen [email protected]

Robert Carroll [email protected]

Page 2: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

AgendaAgenda

About the BinX project Introduction to the BinX language Introduction to the BinX library Example application Overview of the BinX API Discussion

Page 3: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

The problemThe problem

Most scientific data are in binary files Binary data files are not all standardized Binary data files are platform-dependent

XML is useful to represent metadata Scientific datasets can be too large in

XML

Page 4: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

What is BinX?What is BinX?

Binary in XML– Annotation language

Using XML Descriptive Low-level

– Software components BinX library Generic utilities API

Page 5: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

How and Why BinX is usedHow and Why BinX is used

0101010101

0101010101

0101010101

0101010101

0101010101010101010100010000101110101010101010101010110

0101010101010101010100010000101110101010101010101010110

SpecialApplication

Program

SpecialApplication

Program

<dataset>… …</dataset>

<dataset>… …</dataset>

BinXLibrary

ApplicationProgram

ApplicationProgram

ApplicationProgram

ApplicationProgram

ApplicationProgram

ApplicationProgram

Page 6: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

The BinX LanguageThe BinX Language

Annotating a binary data stream

Mark up data typesMark up sequences

Mark up arraysComplex structures

Page 7: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data elementsData elements

Primitive data elements– Byte, character, integer, real

Complex data elements– Arrays, struct, union

User-defined data elements

Page 8: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Primitive Data TypesPrimitive Data Types

Character– <character-8>– <string> (Fixed length, variable length and delimited)

Integer– <byte-8>– <short-16>, <unsignedShort-16>– <integer-32>, <unsignedInteger-32>– <long-64>, <unsignedLong-64>

Real– <float-32>– <double-64>– <quadruple-128>

Page 9: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

1. <short-16 byteOrder=“littleEndian”> 32767</short-16>

2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>

3. <float-32 byteOrder=“littleEndian”>100.0</float-32>

4. <float-32 byteOrder=“bigEndian”>100.0</float-32>

Primitive Data Types Primitive Data Types

Mark up data types

FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00

1 2 3 4

Page 10: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Abstract “struct” typesAbstract “struct” types

Mark up a sequence

<struct> <unsignedShort-16 /> <unsignedShort-16 /> <byte-8 /> <byte-8 /> <byte-8 /></struct>

Screen descriptor in GIF:

Screen width: unsigned short;

Screen height: unsigned short;

Packed field: a byte

Background colour index: byte

Pixel aspect ratio: byte

Page 11: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Abstract “array” typesAbstract “array” types

Mark up an array

<arrayFixed> <integer-32 /> <dim indexTo=“99”> <dim indexTo=“9” /> </dim></ arrayFixed >

A 2-dimensional array containing 10-by-100,32-bit integers

Page 12: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Embedded abstract typesEmbedded abstract types

Complex structures<struct>

<short-16 />

<arrayFixed>

<byte-8 />

<dim indexTo=“7” />

</arrayFixed>

<struct>

<integer-32 />

<float-32 />

<double-64 />

</struct>

</struct>

Page 13: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

User-defined metadataUser-defined metadata

Label the data types and structures<struct varName=“Data Sample”>

<short-16 varName=“ID” />

<arrayFixed varName=“List of 10 complex numbers”><struct varName=“Complex”>

<float-32 varName=“Real” /><float-32 varName=“Imaginary” />

</struct>

<dim indexTo=“9” />

</arrayFixed>

</struct>

Page 14: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Reusable type definitionsReusable type definitions

Define macros for reuse<definitions>

<defineType typeName=“FourCC”><arrayFixed>

<character-8 /><dim count=“4” />

</arrayFixed></defineType>

</definitions>

<struct varName=“Wave_Header”><useType typeName=“FourCC” varName=“Keyword” /><integer-32 varName=“Chunk_Size” />

</struct>

Page 15: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Linking to binary dataLinking to binary data

Reference the binary data file<definitions>

<defineType typeName=“Header”>… …</defineType><defineType typeName=“Format_Chunk”>… …</defineType><defineType typeName=“Data_Chunk”>… …</defineType>

</definitions>

<dataset src=“myfile.wav”><useType typeName="Header" /><useType typeName="Format_Chunk" /><useType typeName="Data_Chunk" />

</dataset>

Page 16: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

The BinX documentThe BinX document

<?xml version=“1.0”?>

<binx xmlns=“http://www.edikt.org/binx”>

<dataset src=“binary.bin” byteOrder=“littleEndian”>

<short-16/>

<integer-32/>

<double-64/>

</dataset>

</binx>

Page 17: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

A BinX documentA BinX document

<binx byteOrder=“bigEndian”>– <definitions>

<defineType typeName=“myTyp”>– <arrayFixed>

• <character-8/>• <dim indexTo=“9”/>

– </arrayFixed>

</defineType>

– </definitions>– <dataset src=“myfile.bin”>

<useType typeName=“myTyp”/> <integer-32 varName=“X” />

– </dataset>

</binx>

Root element

Data class section

Data instance section

Abstract data type

Page 18: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

DataBinXDataBinX

DataBinX = BinX with Data<dataset src=“myfile.bin”>

<struct><short-16 /><long-64 /><double-64 />

</struct>

<arrayFixed><integer-32 /><dim count=“2” />

</arrayFixed>

</dataset>

<dataset> <struct> <short-16>100</short-16> <long-64>1000</long-64> <double-64>5.257</double-64> </struct> <arrayFixed> <dim> <integer-32>1</integer-32> </dim> <dim> <integer-32>2</integer-32> </dim> </arrayFixed></dataset>

Page 19: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

The BinX LibraryThe BinX Library

Core library

Utilities

Applications

Page 20: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Output from the libraryOutput from the library

DataBinX

combined data and BinX document SchemaBinX Binary data stream

DataBinX = SchemaBinX + Binary data

Page 21: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

BinX ComponentsBinX Components

The library has core functionality to support generic utilities and applications

Applications

Utilities

BinX LibraryCore

BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX

Generic tools DataBinx pack/unpack Extractor

Applications Domain-specific

Page 22: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

BinX application modelsBinX application models

Data manipulation model

Data transportation model

Data service model

Data query model

Data catalogue model

Page 23: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data manipulation modelData manipulation model

Extraction– Subset of a dataset

Combination– Merge several datasets

Transformation– Conversion of data types– Change of sequence order– Transposition of array dimensions

Transparency– Automatic change of byte order

Page 24: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data transportation modelData transportation model

DataBinX as interlingua

XMLdocument

XMLdocument

DataBinX

DataBinX Schem

aBinX

SchemaBinX

BinX+Binary

BinX+Binary

ZIP(MIME)

ZIP(MIME)

XSLTBinXUtil

ZIPtool

SendReceive

XSLTBinXUtil

ZIPtool

Page 25: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data service modelData service model

Publishing logical datasets in BinX

DB

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

Client

BinX

BinX

BinX

BinX

Grid

0101010101

0101010101

BinX

BinX

Dataset from one binary file

Dataset from several binary files

Dataset from multiple data sources

Page 26: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data query modelData query model

Create DataBinX– From Binary and BinX

Query DataBinX– Use XPath

Create New DataBinX– Results from query

Parse DataBinX– Create new Binary and

BinX

010101010

010101010

BinX+

Binary

BinX+

BinaryDataBinX

DataBinX

XPath

NewDataBi

nX

NewDataBi

nX

010101010

010101010

BinX+

Binary

BinX+

Binary

Page 27: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Data catalogue modelData catalogue model

Primary storage

Binary data files

Metadata

Syntactic annotation

Semantic annotation

Classification

Domain specific

Cross-reference

XLink 0101010101

0101010101

BinX

1.1

BinX

1.1

BinX

1.2.1

BinX

1.2.1

BinX

1.2.2

BinX

1.2.2

BinX

1.2.3

BinX

1.2.3

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

BinX

1.2

BinX

1.2

BinX1

BinX1

BINARY

Detailed

Abstract

METADATA

Page 28: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

Application in AstronomyApplication in Astronomy

Case Study

Data Conversion

Between FITS and VOTable

Page 29: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Application in astronomyApplication in astronomy

FITS and VOTable conversion

DataBinX Utility

BinX libraryCore

SIMPLE = T… …END

01010101

SIMPLE = T… …END

01010101

<?xml version=.<VOTABLE>… …

</VOTABLE>

<?xml version=.<VOTABLE>… …

</VOTABLE>

Page 30: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

FITS fileFITS file

SIMPLE = T / file does conform to FITS standard

BITPIX = 8 / number of bits per data pixel

NAXIS = 1 / number of data axes

… …

END

3D 4A 14 0F 1C FE 25 04 … …

XTENSION= ‘BINTABLE’ / binary table extension

BITPIX = 8 / 8-bit bytes

NAXIS = 2 / 2-dimensional binary table

… …

END

7B 3E 40 2C 16 70 E7 6F … …

0 79

Primary HDU

Extension

Header

Header

Data

Data

Page 31: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

VOTableVOTable

<VOTABLE><RESOURCE>

<PARAM name=“Obs” value=“Bob”/><TABLE name=“Stars”> <FIELD name=“Star-name” datatype=“char” arraysize=“10” /> <FIELD name=“RA” datatype=“float” /> <FIELD name=“Dec” datatype=“float” /> <FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” /> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> </TABLEDATA> </DATA></TABLE>

</RESOURCE></VOTABLE>

Page 32: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

FITS →DataBinX →VOTableFITS →DataBinX →VOTable

FITS to VOTable conversion

DataBinX Utility

FITSFITS

SchemaBinX

SchemaBinX

Preprocessor

DataBinX

DataBinX

VOTable

VOTable

XSLTXSLT

XSLTtransformer

Page 33: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

VOTable→DataBinX→FITSVOTable→DataBinX→FITS

VOTable to FITS conversion

XSLTtransformer

VOTable

VOTable

XSLTXSLT

DataBinX

DataBinX

FITSFITS

SchemaBinX

SchemaBinX

DataBinXUtility

BinaryData

BinaryData

Postprocessor

FITSHeader

FITSHeader

Page 34: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

SupportSupport

Information and software download:– http://www.edikt.org/binx

Questions:– [email protected]

Requirements and suggestions:– [email protected][email protected]

Page 35: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

BinX APIBinX API

Page 36: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Parsing a BinX documentParsing a BinX document

BxBinxFile* pReader = new BxBinxFile();

If (pReader->parse(“mybinx.xml”))

{BxDataset* pDataset =

pReader->getDataset();

}

Page 37: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Reading a BinX documentReading a BinX document

BxArrayFixed* pArray = pDataset->getArray(0);

BxArrayFixed* pArray = pDataset->getArray(“fixed”);

Get an array object

BxDataset* pStruct = pArray->get(0, 0); Get a struct from the array

Page 38: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Reading a BinX documentReading a BinX document

BxFloat32* pReal = pStruct->getFloat(“Real”);

Float real = pReal->getFloat(); Get the data value

Page 39: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Creating BinX documentCreating BinX document

BxBinxFileWriter* pWriter = new BxBinxFileWriter();

Create a object to write out the document

BxDataset* pData = new BxDataset(); Create a new dataset (in memory BinX

document)

BxShort16* i16 = new BxShort16(100);pData->addDataObject(i16);

Page 40: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Creating BinX documentCreating BinX document

BxBinaryFile* pbf = new BxBinaryFile(); Create a new binary file

pbf->setDatasetPointer(pData); Create a link to the BinX document

pWriter->setBinaryFilePtr(pbf);pWriter->save("TestDataset.xml"); Save the BinX document

Page 41: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

Merge binary dataMerge binary data

BxBinxFileReader * pFile1 = new BxBinxFileReader(“file1.xml”);

BxBinxFileReader * pFile2 = new BxBinxFileReader(“file2.xml”);

BxDataset * pDataset1 = pFile1->getDataset();BxDataset * pDataset2 = pFile2->getDataset();BxArray * pArray1 = pDataset1->getArray(0);BxArray * pArray2 = pDataset2->getArray(0);BxDataObject * pData1 = pArray1->getNext();BxDataObject * pData2 = pArray2->getNext();FILE * fo = fopen(“output.dat”,”wb”);pData1->toStreamBinary(fo);pData2->toStreamBinary(fo);

Page 42: E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen tedwen@edikt.orgtedwen@edikt.org.

www.edikt.orgwww.edikt.org

SummarySummary

One BinX document can describe

many binary files Generate BinX document from code Easy to use interfaces Flexible