�����
DBA BDA
�����
■
■ FAE
Qunar DBA MySQL
HBase ,
RDBMS
■ xzwen96( )
Contents
�����
Part 01
�����
-
PPTSUCAIPPT PPT
——PPTSUCAI
1
2
3
4
database
�����
-
1 2 MySQL 3 Namespace
�����
SQL ?
hash
MySQL/HBase
�����
Part 02
�����
01
03
05
02
04
06�����
…�����
-HBase
�����
HBase
写 E+�h\)k�a`mE+6S)V�k�T=%�.I���Ce1
��5�row-keyO$�.Ik�,�:ZO� PUk�HYOqpsP��.I
列 J?L_*bkE+!J6)��k��MeQJ�.Ikcgi^�!�;[/�
HBasek�F?KV6Sd
�����
HBase
�����
MySQL VS HBase
MySQL
InnoDB
B+tree
HBase
Key-Value
LSM-tree Phoenix
web
String
SQL SQL kv scan�����
MySQL VS HBase
MySQL
Galera/Group replication
/Proxy
NDB
HBase
Namenode QJM
Datanode
HBase Master
RegionServer
HDFS
RS�����
HBase
GC G1
Zookeeper
HBase
OOM�����
HBase
HBase
Scan.setCaching RPC
Get RPC
scan.setBlockCache(false)
rowkey <startrow,stoprow>
blockcache LRUBlockCache SlabCache BucketCache
HBase
rowkey
BlockCache LRUBLockCache+BucketCache
HFile hbase.hstore.compactionThresholdhbase.hstore.compaction.max.size
Compaction IO
vs BloomFilter row rowcol
HDFS
Region major compaction
Hedged Read dfs.client.hedged.read.threadpool.sizedfs.client.hedged.read.threshold.millis
Short Circuit Local Read dfs.client.read.shortcircuitdfs.domain.socket.path�����
HBase
HBase
WAL ?
VS (PUT)
(keyvalue length) RS ( 100G)
rowkey
HBase
RegionServer flush memstore Flush
blockingStoreFiles hbase.hstore.blockingStoreFiles
Full GC Java GC�����
HBase BlockCache
�����
HBase BlockCache
�����
Part 03
�����
HBase
•&jlJavak�GHDFS0�8%�2•' �B4SQL>3kSchema-less�(!#K•�B4f��I•��f��IkRW�B4�7]X•"MySQL@%HBasek�AD�)9<
HBasek-�6SN������
�����
SQL on Hadoop
�����
SQL ? Hadoop Hive
�����
SQL ? Hadoop Hive
1
2
3
, HQL
MR
Hive HadoopMySQL
Hive Client Hadoop
4Hive -> SparkSQL , SparkSQL Hive HiveContext (1.x)SparkSession(2.x)���
��
MR VS Spark
�����
Hadoop vs Spark
02
04
01
03
Hadoop: HiveSpark: SparkSQL
SQL Query
Hadoop: MahoutSpark: Spark ML Lib
Machine Learning
Hadoop: MR(Java,Pig,Hive)Spark: RDDs(Java,Scala,Python)
Batch processing
Hadoop: Strom+KafkaSpark: Spark Streaming
Strem Processing
�����
SQL ? SparkSQL
�����
SQL ? SparkSQL
1
2
3
Spark : Local Standalone YARN( ) Mesos
:
DataFrame vs SQL , vs
4 :spark.sql.shuffle.partitions: spark.sql.sources.partitionColumnTypeInference.enabled�����
MySQL NoSQL/NewSQL
�����
Part 04
�����
1 2 SQL3 4 MySQL
�����
TiDB
MySQL01
03
05
HTAP
02
04
06 �����
TiDB
�����
�����
�����
THANKS�����