Top Banner
1
47

Blackray @ SAPO CodeBits 2009

Jan 28, 2015

Download

Technology

fschupp

These are the slides of the BlackRay Talk at SAPO CodeBits in Lisboa, Portugal on December 4th.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Blackray @ SAPO CodeBits 2009

1

Page 2: Blackray @ SAPO CodeBits 2009

2

Presentation Agenda

➔ Brief History➔ Technology Overview➔ Positioning towards other Projects➔ Roadmap➔ The Team➔ Wrap-Up

Page 3: Blackray @ SAPO CodeBits 2009

Brief BlackRay History

Page 4: Blackray @ SAPO CodeBits 2009

4

What is BlackRay?

● BlackRay is a relational, in-memory database● SQL with JDBC and ODBC Driver support● Fulltext (Tokenized) Search in Text fields● Object-Oriented API Support● Persistence via Files● Scalable and Fault Tolerant● Open Source, Open Community● Available under the GPLv2

Page 5: Blackray @ SAPO CodeBits 2009

5

BlackRay History

● Concept of BlackRay was developed in 1999 for a Web Phone Directory at Deutsche Telekom

● Development of current BlackRay started in 2005, as a Data Access Library

● First Production Use at Deutsche Telekom in 2006● Evolution to Data Engine in 2007 and 2008● Open Source under GPLv2 since June 2009

Page 6: Blackray @ SAPO CodeBits 2009

6

Why Build Another Database?

● Rather unique set of requirements:● Phone Directory with approx. 80 Million subscribers● All queries in the 10-500 Millisecond range● Approximately 2000 concurrent users● Over 500 Queries per Second (sustained)● Updates once a day may only take several minutes● Index needs to be tokenized (SQL: CONTAINS)● Phonetic search● Extensive Wildcard queries (leading/midspan/trailing)

Page 7: Blackray @ SAPO CodeBits 2009

7

Decision to Implement BlackRay

● Decision was formed in Mid 2005● Designed as a lightweight data access engine● Implementation in C++ for performance and

maintainability● APIs for access across the network from different

languages (Java, C++)● Feature set was designed for our specific business

case in a particular project

Page 8: Blackray @ SAPO CodeBits 2009

8

Current Release

● Current release 0.9.0 released on June 12th, 2009● Production level quality● All relevant index functions for large scale search

applications● APIs in C++ and Java fully functional● SQL still only a small subset of the API functionality

Page 9: Blackray @ SAPO CodeBits 2009

9

Some more details

● Written in C++● Relies heavily on boost● Compiles well with gcc and Sun Studio 12● Behaves well on Linux, Solaris, OpenSolaris and

MacOS● Complete 64 Bit development and platform support● Use of cmake for multi platform support

Page 10: Blackray @ SAPO CodeBits 2009

Technology Overview

Page 11: Blackray @ SAPO CodeBits 2009

11

Why call it Data Engine?

● BlackRay is a hybrid between a relational database and a search engine → thus we call it „data engine“

● Database features:● Relational structure, with Join between tables● Wildcards and index functions● SQL and JDBC/ODBC

● Search Engine Features● Fulltext retrieval (token index)● Phonetic and similar approximation search● Extremely low latency

Page 12: Blackray @ SAPO CodeBits 2009

12

BlackRay Architecture

C++ API

Java API

Management Server

InstanceServer

Data Universe

<

Redo Log

Snapshots

SQLInterface

Postgres*Clients

L5: Multi-Values

L4: Multi-Tokens

L5: Multi-Values

L3: Row Index

L2: Postings

L1: Dictionary

5-Perspective Index

Python API

PHP API

Python API

C# API

Page 13: Blackray @ SAPO CodeBits 2009

13

Hierarchical Model

● Each BlackRay node can hold many Instances● One Management process per node● Each Instance runs in a separate Process● Instances are completely separated from each other● Snapshots (for persistence) are taken on an

Instance level● In an Instance, Schemas and Tables can be created● Within one Instance, Queries can span across

Tables and Schemas

Page 14: Blackray @ SAPO CodeBits 2009

14

Getting Data Into BlackRay

● Once an Instane is created, data can be loaded● Schemas, Tables, and Indexes are created using an

XML description language ● Standard loader utility to load CSV data● Bulk loading is done with logging disabled● Indexing is done with maximum degree of

parallelism depending on CPUs● After all data is indexed, a snapshot can be taken

Page 15: Blackray @ SAPO CodeBits 2009

15

Basic Load Performance Data

● German yellowpage and whitepage data● 60 Million subscribers● 100 Million phone numbers● Raw data approx 16GB

● Indexing performance● Total index size 11GB● Time to index: 40 Minutes, on dual 2GHz Xeon (Linux)● Updates: 300MB, 200K rows in approx 5 minutes

● Time to load snapshot: 3.5 Minutes for 11GB

Page 16: Blackray @ SAPO CodeBits 2009

16

Data Universe

● BlackRay features a 5-Perspective Index ● Layer 1: Dictionary● Layer 2: Postings● Layer 3: Row Index● Layer 4: Multi-Token Layer● Layer 5: Multi-Value Layer

● Layer 1 and 2 comprise a fully inverted Index● Statistics in this Index used for Query Plan Building● All data - index and raw output - are held in memory

Page 17: Blackray @ SAPO CodeBits 2009

17

Snapshots and Data Versioning

● Persistence is done via file based snapshots● Snapshots consist of all schemas in one instance● Snapshots have a version number● To make a backup of the data, simply copy the

snapshot file to a backup media● It is possible to load an older snapshot: Data is

version controlled if older snapshots are stored● Note: Currently Snapshots are not fully portable.

Page 18: Blackray @ SAPO CodeBits 2009

18

Transactions in BlackRay

● BlackRay supports transactions via a Redo Log● All commands that modify data are logged if

requested ● In case of a crash, the latest snapshot will be loaded● Replay of the transaction log will then bring the

database back to a consistent state● Redo Log is eliminated when a snapshot is persisted● For better performance snapshots should be taken

periodically, ideally after each bulk update

Page 19: Blackray @ SAPO CodeBits 2009

19

Native Query APIs

● C++, Java and Python Object Oriented APIs are available

● Built with ICE (ZeroC) as the Network and Object Brokerage Protocol

● Query Objects are constructed using an Object Builder

● Execution via the network, Results as Objects● Load balacing and failover built into the protocol● Native support for Ruby, PHP and C# upcoming

Page 20: Blackray @ SAPO CodeBits 2009

20

Native Query APIs

● Queries can use any combination of OR, AND● Index functions (phonetic, sysnonyms, stopwords)

can be stacked● Token search (fulltext) and wildcard are also

supported● Advantage of the APIs:

● Minimized overhead● Very low latency● High availability

Page 21: Blackray @ SAPO CodeBits 2009

21

Standard Query Interface

● BlackRay implements the PostgreSQL server socket interface

● All PostgreSQL compatible frontends can be utilized against BlackRay

● JDBC, ODBC, native drivers using socket interface are all supported

● Limitations: SQL is only a reasonable subset of SQL92, Metadata is not yet fully implemented...

● Currently not all index functions can be accessed via SQL statements

Page 22: Blackray @ SAPO CodeBits 2009

22

Management Features

● Management Server acts as central broker for all Instances

● Command line tools for administration● SNMP management:

● Health check of Instances● Statistics , including access counters and performance

measurements

Page 23: Blackray @ SAPO CodeBits 2009

23

Clustering and Replication

● BlackRay supports a multi-node setup for fault tolerance

● Cluster have N query nodes and single update node● All query nodes are equal and require the full index

data available● Index creation is done offline on the update node● Query nodes can reload snapshots to receive

updated data, via shared storage or local copy● Load-balancing handled by native APIs

Page 24: Blackray @ SAPO CodeBits 2009

24

Administration Requirements

● In-Memory Databases require very little administrative tasks

● Configuration via one file per Instance● Disk layout etc all are of no importance● Backups are performed by copying snapshots● Recovery is done by restoring a snapshot● No daily administration required● SNMP allows remote supervision with common tools

Page 25: Blackray @ SAPO CodeBits 2009

Positioning BlackRay

Page 26: Blackray @ SAPO CodeBits 2009

26

BlackRay and other Projects

● BlackRay is being positioned as a Query intensive database addition

● Ideally suited where updates are done in Bulk and searches outweigh updates by many orders of magnitude

● Wildcards come with little overhead: No overhead for trailing wildcard, some overhead for leading and midspan wildcard

● Good match when index functions such as phonetic or word rotation/position search combined with relational data are required

Page 27: Blackray @ SAPO CodeBits 2009

27

FOSS Projects with similar Goals

● Relational Databases (the usual suspects): ● MySQL, MariaDB, Drizzle....● PostgreSQL...● Many more alternatives, including embedded etc....

● Fulltext Search: ● Sphinx● Lucene

● In-Memory Databases:● FastDB● HSQLDB/H2 (Java → Garbage collection issues....)

Page 28: Blackray @ SAPO CodeBits 2009

28

Commercial Alternatives

● Commercial In-Memory Databases● ORACLE/TimesTen (Acquired by ORACLE in 2005)● IBM/SolidDB (Acquired by IBM in 2007)● VoltDB (No real data available as of yet)● eXtremeDB (embedded use only)

● Dual-Licensed Alternatives● CSQL (Open Source Version is severely crippled)● MySQL with memcached

Page 29: Blackray @ SAPO CodeBits 2009

29

Is it the right thing for me?

● BlackRay is not designed as a 100% RDBMS replacement

● Questions to ask:● Do I need ad-hoc data updates, or are updates done in

bulk?● How important are fulltext search and extensive

wildcards?● How large is my data? Gigabytes: OK. Terrabytes: Not yet● Do I need a relational data model?● Is SQL an important feature?

Page 30: Blackray @ SAPO CodeBits 2009

30

BlackRay will fit you well when...

… searches outweigh updates

… data is primarily updated in bulk

… fulltext search is important

… you have Gigabytes (not Terabytes) of data

… index functions (phonetic …) are required

… SQL is necessary

… a relational data model is required (JOIN)

… source code must be available

… high availability/clustering is required

Page 31: Blackray @ SAPO CodeBits 2009

Project Roadmap

Page 32: Blackray @ SAPO CodeBits 2009

32

Immediate Roadmap

● Upcoming 0.10.0 – Due in December 2009● Complete rewrite of SQL Parser (boost::spirit2)● PostgreSQL client compatibility (via network protocol) to

allow JDBC/ODBC... via PostgreSQL driver● Rewritten CLI tools● Major bugfixes (potential memory leaks)● Authentication supported for Instances

● BlackRay Admin Console (Remora) 0.10

Page 33: Blackray @ SAPO CodeBits 2009

33

Immediate Roadmap

● Planned 0.11.0 – Due in February 2010● Support for ad-hoc data manilulation via SQL

(INSERT/UPDATE/DELETE) ● Aggregate functions for SELECT● Make all index functions available in SQL● Support for Prepared Statements (ODBC/JDBC) ● Improved thread and memory management (Perftools?)

● BlackRay Admin Console (Remora) 0.11● Engine Statistics via GUI● Cluster Node management

Page 34: Blackray @ SAPO CodeBits 2009

34

Midterm Roadmap

● Scalability Features● Sharding & Partitioning Options● Federated Search

● Fully portable snapshot format (across platforms)● Query Performance Analyzer● Improved Statistics Module with GUI● Removal of ICE as a core component● BlackRay as a Storage Backend for SUN OpenDS

LDAP Engine

Page 35: Blackray @ SAPO CodeBits 2009

35

Midterm Roadmap

● Security Features● Improved User and Access Control concepts● SSL for all connections● External User Store (LDAP/OpenSSO/PAM...)

● Increased Platform support● Windows 7 and Windows Server platforms● Embedded platforms

● Other, random features by popular request.

Page 36: Blackray @ SAPO CodeBits 2009

36

Longterm Roadmap

● Integration with other DBMSs● Storage Engine for other DBMSs

MariaDB/MySQL: → Depends on potential modification needs of the storage engine interfaces

● Trigger-based updates to support BlackRay as a Query cache instance over a regular DBMS

● Update: The Storage Engine Interface Issue was discussed at the OpenSQL Camp 2009 in Portland, Tokutek suggested a portable and more efficient Universal Storage Engine Interface

Page 37: Blackray @ SAPO CodeBits 2009

37

Longterm Roadmap

● Standalone Engine Improvements● SQL92 compliance: SUBSELECT/UNION support● Triggers in BlackRay ● Full ACID Transaction support

Page 38: Blackray @ SAPO CodeBits 2009

The Team behind BlackRay

Page 39: Blackray @ SAPO CodeBits 2009

39

SoftMethod GmbH

● SoftMethod GmbH initiated the project in 2005 and ● Company was founded in 2004 and currently has

10 employees● Focus of SoftMethod is high performance software

engineering● Product portfolio includes directory assistance and

LDAP enabled applications● SoftMethod also offers load testing and technical

software quality assurance support.

Page 40: Blackray @ SAPO CodeBits 2009

40

Development Team

● SoftMethod Core Team● Thomas Wunschel – Lead Developer● Felix Schupp – Project Sponsor● Andreas Meyer – Documentation, Porting● Frank Fiedler – PhD Student in Database Systems

● Key Contributors● Mike Alexeev – Senior Developer● Souvik Roy, Intern at SoftMethod GmbH● Andreas Strafner – Developer, Porting to AIX

Page 41: Blackray @ SAPO CodeBits 2009

41

Thomas Wunschel

● Director of Development, SoftMethod GmbH● Almost 10 years of development experience● Involved with BlackRay and its applications since

2005● Currently involved in the Network Protocol Stack● Lead Designer and Decision Lead for new Features

Page 42: Blackray @ SAPO CodeBits 2009

42

Felix Schupp

● Managing Director, SoftMethod GmbH● Over 10 years of commercial software development● Designer of first BlackRay predecessor in 1999● Project sponsor and spokesperson● Responsible for funding and applications● Guide and coach in the development process

Page 43: Blackray @ SAPO CodeBits 2009

43

Mike Alexeev

● Senior Software Developer● Over 10 years of C++ experience● First outside committer to BlackRay● Currently involved in rewriting the SQL grammar to

support all index features available via the APIs

Page 44: Blackray @ SAPO CodeBits 2009

44

Souvik Roy

● Computer Science Student and Software Developer ● Seceral years of C++ and Java experience● Current Intern at SoftMethod GmbH● Google Summer of Code participant● Working on reliable performance comparison with

other Engines● Provides additional sample applications

Page 45: Blackray @ SAPO CodeBits 2009

Wrap-Up

Page 46: Blackray @ SAPO CodeBits 2009

46

What to do next

● Get BlackRay:● Register yourself on http://forge.softmethod.de● SVN checkout available at

http://svn.softmethod.de/opensource/blackray/trunk● Get Involved

● Anyone can register and create tickets, news etc● We have an active mailing list for discussion as well

● Contribute● We require a signed Contributor agreement before being

allowed commit access to the repository

Page 47: Blackray @ SAPO CodeBits 2009

47

Contact Us

● Website: http://www.blackray.org● Twitter: http://twitter.com/dataengine● Facebook http://facebook.com/dataengine● Mailing List: http://lists.softmethod.de● Download: http://sourceforge.net/projects/blackray

● Felix: [email protected]● Thomas: [email protected]