Transcript
InnoDB Anatomy
The InnoDB EngineIntroduction to InnoDB
• Currently the default in MySQL (as of 5.5)
• Referential/Structural Integrity
• Consistent data
• Transactional
InnoDB is atomic in that its transactions have only two possible outcomes - complete fully, or fail completely.
• SUCCESS:
• FAILURE:
All changes committed.
All changes rolled back.
InnoDB Anatomy – ACID Compliance
ATOMICITY
COMMITROLLBACKUnchanged Data
Changes Applied
• Data stays consistent before, during, and after a transaction.
• No conflict of “versions”
• Successful transactions end with a commit. InnoDB maintains optimism.
Introduction to InnoDB Structure
CONSISTENCY
Valid State Work performed
Still a valid state
• Transactions cannot interact with each other
• Adjustable level of isolation.*
• Row-level locking
Introduction to InnoDB Structure
ISOLATION
* Isolation level changed via the transaction-isolation configuration option.
• Atomic transactions keep data durable.
• Changes are permanent once committed.
• Doublewrite buffer helps to recover from crashes that occur during page writes.
Introduction to InnoDB Structure
DURABILITY
Explore the InnoDB structure within the file system, and at its lower levels, to find out how it can affect database operations.
InnoDB Anatomy
The Goal
Understanding the Physical Structure of Data in InnoDB
InnoDB Anatomy – Physical StructurePhysical File Structure
Data
Dire
ctor
y(D
efau
lt: /v
ar/li
b/m
ysql
) Ibdata1System Tablespace
ib_logfile0Redo/Transaction Log File
ib_logfile1Redo/Transaction Log File
Database Folder
table.ibdTablespace File
table.frmFormat File
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
File System
Partition
Disk Block
Allocation Unit
File
Inode
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
User Records (Index Pages)…
First Inode Page Number
File HeaderInsert Buffer Bitmap
IBD File
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
PAGE…
FIL Header - 38 bytes…
FIL Trailer - 8 bytes1638
4
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
1MB Page 1
…Page 64
EXTENT
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
ExtentExtent
…
SEGMENT
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
File segment ID
Extent Data (Free/Partial/Full Listing)
InnoDB Anatomy – Physical Structure
Tabl
espa
ceSegment
Segment
Segment
Segm
entExtent
Extent
Extent
Extent
Extent
Extent Exte
nt
SYSTEM TABLESPACEibdata File
InnoDB Anatomy – System Tablespace
Undo Logs
Rollback Segment
Rollback Segment
Data Dictionary
SYS_TABLES
SYS_INDEXES
SYS_COLUMNS
SYS_FIELDS
Doublewrite Buffer
Block 1 (64
Pages)
Block 2 (64
Pages)
Change Buffer
Insert Buffering
Update Buffering
Purge Buffering
Undo Log Space
Rollback Segment
…
InnoDB Anatomy – System Tablespace
Separate your Undo Logs (5.6+ only)innodb_undo_logs; innodb_undo_tablespaces; innodb_undo_directory
I/O to the undo logs is random, instead of sequential like some other areas of InnoDB. Because of this, it makes sense to separate your Undo Log tablespaces out from the system tablespace onto a disk that handles random reads and writes more effectively, such as a SSD.
Use the Information Schema, or innodb_table_monitor, to view the Data Dictionary table data.
In 5.6, the information_schema database contains INNODB_SYS* tables that allow you to view data dictionary information directly in MySQL. Alternatively, you can create a table called “innodb_table_monitor” to dump the data dictionary into the MySQL error logs.
How can you use this?
InnoDB Anatomy – System Tablespace
Do you need the Doublewrite buffer?innodb_doublewrite
With Doublewrite buffer enabled, there is a 5-10% impact on I/O. If you operate on a transactional file system, you disable this to avoid this impact.
Customizing Change Buffering for your Workloadinnodb_change_buffering
Change buffering, by default in 5.5+, encompasses insert, update, and delete buffering. If your workload consists almost entirely of one or the other, it can make sense to limit this down to only one type of buffering.
How can you use this?
InnoDB in Memory and on Disk
Memory
Buffer Pool
Insert Buffer
Log BufferAdditional Memory
Disk
System TablespaceDoublewrite Buffer
Transaction Log Files
Insert Buffer
Undo Logs
Rollback Segment
Data Dictionary
Undo Buffering
Indexing
Thread Processing
Tablespace Files
InnoDB Anatomy – PagesPage Headers/Trailers
Name Byte Length Offset DescriptionFIL_PAGE_SPACE 4 0 Space ID
FIL_PAGE_OFFSET 4 4 Page Number
FIL_PAGE_PREV 4 8 Previous Page (in key order)
FIL_PAGE_NEXT 4 12 Next Page (in key order)
FIL_PAGE_LSN 8 16 LSN of page’s latest log record
FIL_PAGE_TYPE 2 24 Page Type
FIL_PAGE_FILE_FLUSH_LSN 8 26 Flushed-up-to LSN (only in space ID 0, page 0)
FIL_PAGE_ARCH_LOG_NO 4 34 Latest archived LSN (only in space ID 0, page 0)
FIL Header (38)
FIL Trailer (8)Name Byte Length Offset Description
FIL_PAGE_END_LSN 8 16376 Low 4 bytes: Checksum, Last 4 bytes: FIL_PAGE_LSN
storage/innobase/include/fil0fil.h
InnoDB Anatomy - Demonstration
Changing values directly
At the byte level, these values can be changed directly in many situations to “trick” InnoDB in one way or another. One good example of this is to get around a page checksum failure. You can change the stored checksum to match the calculated checksum, bypassing the crash and often allowing you sufficient access to your records.
How can you use this?
InnoDB: Page checksum 2047964429, prior-to-4.0.14-form checksum 4196043695InnoDB: stored checksum 1873408413, prior-to-4.0.14-form stored checksum 1946395024# printf '%X\n' 2047964429; printf '%X\n' 41960436957A11750D Primary calculated checksumFA1A8BAF “Old-style” calculated checksum# expr 16384 \* 6 Example Page 698304 Starting byte offset for Page 6Writing the primary calculated checksum over the stored value of page 6:# printf ‘\x7A\x11\x75\x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc
Writing the “old-style” calculated checksum over the stored value of page 6:# printf ‘\x7A\x11\x75\x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc
•Stored in 2 files by default (ib_logfile0/1)
•Treated as single file•Circular buffer
LOG BLOCKHeader (12)Log Records
Trailer (4)…
LOG BLOCKHeader (12)Log Records
Trailer (4)…
LOG BLOCKHeader (12)Log Records
Trailer (4)…
ib_lo
gfile
0ib
_logfi
le1
InnoDB Anatomy – Redo Logs
Structure• Log blocks are 512 bytes•Each block contains
checkpoint data
The Logical Log FileThe Redo Logs
InnoDB Anatomy – Redo Logs
Optimized log file sizeinnodb_log_file_size
Larger size means less checkpoint flushing required, reducing I/O impact. Balance with expected recovery time required as a result of the size (less of an issue in 5.6).General Formula: (Current LSN – LSN 60 seconds later) * 60 / 1024 / 1024
Optimized log buffer sizeinnodb_log_buffer_size
Log buffer allows transactions to move forward without having to write the log to disk before commit. Increased size allows larger transactions to run without requiring writes to disk before a commit is performed.
How can you use this?
InnoDB Anatomy – Index PagesINDEX Pages - B+Tree Structure
•Efficient method of storing data on disk in a tree format.•Actual records stored in leaf pages (level 0).•Root-level pages exist at the top of the tree structure.•Non-leaf pages contain only pointers to leaf pages.
Level 0
Level 1
Level 2 Root
Non-Leaf
Leaf Leaf
Non-Leaf
Leaf
InnoDB Anatomy – Index PagesB+Tree Structure – Basic Index Example
Root NodeCustomer IDs
1-500
Non-Leaf 1-250
Non-Leaf251-500
Leaf Node251-260
Leaf Node261-270
Leaf Node1-10
Leaf Node11-20
…
InnoDB Anatomy – Index PagesINDEX Pages
•Not physically in order
•User data “grows down”
•Page directory “grows up”
FIL Header (38)
… Page Directory
FIL Trailer (8)
INDEX Header (36)FSEG Header (20)
System Records (26)
User Data …
EMPTY
InnoDB Anatomy – Index PagesINDEX Page Header (after FIL Header)
Name Byte Length Offset DescriptionPAGE_N_DIR_SLOTS 2 38 + 0 Number of Slots in Page Directory
PAGE_HEAP_TOP 2 38 + 2 Pointer to Record Heap Top
PAGE_N_HEAP 2 38 + 4 Number of Records in Heap
PAGE_FREE 2 38 + 6 Pointer to start of page’s free-record list
PAGE_GARBAGE 2 38 + 8 Number of bytes in “deleted” records
PAGE_LAST_INSERT 2 38 + 10 Pointer to last inserted record, or NULL if this has been reset – eg. by a delete.
PAGE_DIRECTION 2 38 + 12 Last Insert direction, PAGE_LEFT, PAGE_RIGHT …
PAGE_N_DIRECTION 2 38 + 14 Consecutive inserts in the same direction
PAGE_N_RECS 2 38 + 16 Number of user records on the page
PAGE_MAX_TRX_ID 8 38 + 18 Highest ID of transaction that may have modified a record on the page.
PAGE_LEVEL 2 38 + 26 Level of node in index tree
PAGE_INDEX_ID 4 38 + 28 Index ID that page belongs to
storage/innobase/include/page0page.h
InnoDB Anatomy – Demonstration
Demonstration
Determining page level on an INDEX pageFirst, find your page’s start byte:# expr 16384 \* 349152The offset of the PAGE_LEVEL value is 26 after the FIL Header (38):# expr 49152 + 38 + 2649216The byte-length is 2# xxd –ps –s 49216 –l 2 customer.ibd0001Page Level: 1
/var/lib/mysql/testdb/
Additional Resources
Conclusion
• Jeremy Cole• http://blog.jcole.us/• https://github.com/jeremycole/
• Percona• http://www.percona.com/files/percona-live/justin-inno
db-internals.pdf• MySQL Internals Documentation & Source
• http://dev.mysql.com/doc/internals/en/innodb.html• https://launchpad.net/mysql
Sources and Thanks
top related