Bernhard Zeller Alfons Kemper University of Passau <last name>@db.fmi.uni-passau.de * This work was supported by an SAP contract within the so-called Terabyte-Project Exploiting Advanced Database Optimization Features for Large- Scale SAP R/3 Installations* Experience Report:
34
Embed
Bernhard Zeller Alfons Kemper University of Passau @db.fmi.uni-passau.de * This work was supported by an SAP contract within the so-called Terabyte-Project.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bernhard Zeller Alfons Kemper
University of Passau<last name>@db.fmi.uni-passau.de
* This work was supported by an SAP contract within the so-called Terabyte-Project
Exploiting Advanced Database Optimization Features for Large-Scale
SAP R/3 Installations*
Experience Report:
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
2
Outline• Brief Overview of SAP R/3
• Motivation
• Related Work
• Traditional Performance Tuning Techniques
• Exploiting Horizontal Partitioning for Tuning Purposes
• Partitioning Scenarios/Techniques and their Pros and Cons
• Possible Benefits and Drawbacks of Partitioning
• Performance Analysis
• Conclusion
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
3
Overview of SAP R/3• SAP is the market leader for integrated business
solutions
• SAP R/3 is SAP’s enterprise resource planning (ERP) product
• SAP R/3 provides modules for finance, human resources, material management, etc.
• today about 18.000 customers world wide use SAP R/3 (used by most Fortune 500 companies)
• more than 44.000 Installations world wide
• Three-Tier Client/Server-Architecture
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
4
Three-Tier Client/Server-Architecture of SAP R/3
(second party) Relational
DBMSDBMS
Application Server 2
Application Server N
Application Server 1
Presentation Server 1
Presentation Server 2
Presentation Server M...
...
LAN/WAN
LAN
Many
Even more
ONE DBMS on ONE Host !
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
5
Motivation (1)
Today‘s high end SAP R/3 installations have reached their load capacity limits:
• data volumes of SAP R/3 System (i.e., the database volumes) are growing tremendously (several hundred Terabytes)• hard to maintain (7 x 24)• performance worsens Exploiting advanced features like horizontal partitioning can widen these load capacity limits Already implemented by most database vendors: No additional effort, just switch on.
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
6
Motivation (2)High end systems are the most important systems:• revenue
• prestige new contracts, business competition
Therefore: every day business is the top priority, i.e.
• only tolerable slow down of important (=OLTP) transactions due to the use of new techniques• if this can‘t be guaranteed: don’t do it !
In this case: The benefits are obvious. Prove that horizontal partitioning doesn’t conflict with daily business !
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
7
Related Work
• G. Copeland, W. Alexander, E. Boughter, and T. Keller Data placement in bubba. In Proc. of the ACM SIGMOD Conf. on Management of Data, Chicago, IL, USA, 1988.
• D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna Gamma - a high performance dataflow database machine. In Proc. Of the Conf. on Very Large Data Bases (VLDB), Kyoto, Japan, 1986.
• M. Mehta and D. J. DeWitt Data placement in shared-nothing parallel database systems. VLDB Journal, 6(1):53-72, 1997.
• S. Ceri, M. Negri, and G. Pelagatti Horizontal data partitioning in database design. In Proc. of the ACM SIGMOD Conf. on Management of Data, Orlando, USA, 1982.
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
- use of local computing power, i.e. #partitions #CPUs , #disks - network between the nodes is the bottleneck
• “centralized“ systems like SAP R/3:
- limited number of CPUs, main memory, disks, i.e., #partitions >> #CPUs, #disks - shared memory/disk access
hazards on disk page / CPU level likely (thrashing)
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
9
Outline• Brief Overview of SAP R/3
• Motivation
• Related Work
• Traditional Performance Tuning Techniques
• Exploiting Horizontal Partitioning for Tuning Purposes
• Partitioning Scenarios/Techniques and their Pros and Cons
• Possible Benefits and Drawbacks of Partitioning
• Performance Analysis
• Conclusion
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
10
Traditional Tuning Techniques
Reduce data (Archiving)Additional Indices
Additional job instances
Additional/better hardware
sometimes not possible
even more data huge hard to maintain slow down updates/inserts
limited number of CPU‘s problems due to data skews
too expensive already the newest HW installed
Special designedsoftware
really expensive long deployment times
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
11
Partitioning Techniques and their Pros and Cons
• round robin, hash partitioning
+ balanced partition sizes
- in which partition will record R be stored ?
• range partitioning
+ users have knowledge about the data distribution can use this knowledge at application level (work load balancing, definition of working sets, ...)
- unbalanced partition sizes due to data skews
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
12
Partitioning Scenarios
partitioning
no partitioning
partitiononly table
choose onepartitioningalgorithm
partitiontable andindices
partitiononly indices choose one
partitioningalgorithm
different #of partitions
same #of partitions
choose partitioningalgorithm for indexand table
equipartitioned
non-equipartitioned
choose one uniformpartitioning algorithm
choose separatealgorithm for indexand table
done
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
13
Benefits of Partitioning
Keep data manageable by processing data partition wise:
• administrative tasks like index re-creation, gathering statistics, table re-organization, ...
• partition-wise, parallel processing at application level, e.g., partition by plant number and start an inventory job for each plant
• (equi) join processing (when partitioning fields are subset of join attributes)
• bulk deletes drop whole partitions
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
14
Possible Drawbacks:Row Movement (RM)
• RM = movement of data from one partition
to another because of an update
• doubles the cost of an update transaction
• produces additional logging and locking overhead
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
15
Possible Drawbacks:Row Movement (Example)Table Equipment: Equipment of company COMP Inc.
Partition Plant LA
PlantID Description
LA
LA
LA
EquID
006
007
008
Desk
Chair de luxeOffice copier
Partition Plant NY
PlantID Description
NY
NY
NY
EquID
015
017
018
LCD DisplayChair simpleMainframe
NY 008 Office copier
Task: New office copier for LA, old one to NYDB: update equipment set PlantID = NY where PlantID = LA and EquID = 008
DB Step 1: Delete in Partition “Plant LA“DB Step 2: Insert into Partition “Plant NY“
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
16
Possible Drawbacks:Conflicts with Parallel Jobs
• resources of ERP systems (CPU, memory, storage) managed at application level
• ERP System has no knowledge of partitioning (i.e., parallelization) at database level
Conflicts at CPU and disk page level are likely
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
17
Possible Drawbacks:Conflicts with Parallel Jobs
Host
CPU 1 CPU 2
CPU 4CPU 3
DBMS
Table A
Partition A_1 ERP
JobList
Job 1: analyze A
CPUs used:
0
CPUs used:
1
Partition A_2Partition A_3Partition A_4
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
18
Possible Drawbacks:Conflicts with Parallel Jobs
Host
CPU 1 CPU 2
CPU 4CPU 3
DBMS
Table A
Partition A_1 ERP
JobList
Job 1: analyze A
CPUs used:
0
CPUs used:
1
Partition A_2Partition A_3Partition A_4
Job 2: show ...Job 3: calculateJob 4: transform
CPUs used:
4
Job 2: show ...Job 3: calculateJob 4: transform
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
19
Outline• Brief Overview of SAP R/3
• Motivation
• Related Work
• Traditional Performance Tuning Techniques
• Exploiting Horizontal Partitioning for Tuning Purposes
• Partitioning Scenarios/Techniques and their Pros and Cons
• Possible Benefits and Drawbacks of Partitioning
• Performance Analysis
• Conclusion
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
20
Analyzed Scenarios
• Flat: table and index are not partitioned
• Global Index: the table is partitioned but the index is not
• Partitioned Index: the table and the index are partitioned
using the same partitioning algorithm (range partitioning)
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
21
Used Data and Hardware (1)
Copies of SAP R/3 standard tables (Material Management):
indexed key columns
av. row length
# rows table size
MARC mandt, matnr, werks
496 Byte
5 million 2.5 GB
MARD mandt, matnr, werks, lgort
142 Byte
25 million
3.5 GB
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
22
Used Data and Hardware (2)
• range partitioned according to value of werks 100 partitions
• anonymized data from a productive SAP R/3 system
Hardware:
• SUN Enterprise 450 with four 400 MHz CPUs and 4 GB RAM
• SUN A1000 500 GB RAID with RAID level 5
• the R/3 system and the DBMS used 512 MB main memory each
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
23
Analyzed Statements
• selects (single, for all entries, up to n rows, parallel selects,...)
• inserts, updates, and deletes
• joins
• parallel jobs at application level
• administrative tasks
• ...
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
24
Evaluated Dimensions
• number of processed rows
• set orientated and one-record-at-a-time
approach
• number of commits
• number of indices
• parallel jobs: processing data partition wise
and
across partition borders
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
25
Joining Single Plants (Hash)
0
100
200
300
400
500
600
700
Flat Global Indices Partitioned Indices
time
(sec
)
Join where mard.matnr = marc.matnr and mard.werks= marc.werks and mard.werks = '0001'
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
26
Joining Whole Tables (Hash)
0
20
40
60
80
100
120
140
Flat Global Indices Partitioned Indices
time
(min
)
Join where mard.matnr = marc.matnr and mard.werks= marc.werks
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
27
Joining Single Plants (Merge)
select /*+use_merge(c,d) index(c) index(d) */ c1.abcin,c2.speme from marc c, mard d where c.mandt = d.mandt and c.werks = d.werks and c.matnr = d.matnr and c.werks = '0001'
0
100
200
300
400
Flat Global Indices PartitionedIndices
time
(ms)
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
28
Insertions and DeletionsMARD, with commits
020406080
100120140
Insert 10 Delete10
Insert100
Delete100
Insert1000
Delete1000
tim
e (s
ec)
Flat Global Index Partitioned Index
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
29
Insertions and DeletionsMARD, without commits
0
200
400
600
800
1000
1200
Insert 10 Delete10
Insert100
Delete100
Insert1000
Delete1000
time
(ms)
Flat Global Index Partitioned Index
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
30
Update Statements
05
1015202530
update withRM 10
update withRM 100
update withRM 1000
updatewithout RM
100
updatewithout RM
1000
tim
e (m
in)
Flat Global Index Partitioned Index
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
31
Select StatementsSelect one record
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
Flat Global Index Partitioned Index
time
(m
s)
MARC
MARD
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
32
Select StatementsSelect across partition borders(10 records from each partition)
0
10
20
30
40
50
60
Flat Global Index Partitioned Index
time
(ms)
MARC
MARD
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de
33
Conclusion
• Results show: horizontal partitioning is applicable (with tolerable costs)
• especially joins, administrative task, parallel selects benefit greatly
horizontal partitioning is already used in
some large-scale system
20.08.2002 University of Passau - http://www.db.fmi.uni-passau.de