Top Banner
Performance tuning dataset refresh in Power BI Chris Webb Power BI Customer Advisory Team Microsoft
27

Performance tuning dataset refresh in Power BI

Aug 02, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance tuning dataset refresh in Power BI

Performance tuning dataset refresh in Power BI

Chris Webb

Power BI Customer Advisory Team

Microsoft

Page 2: Performance tuning dataset refresh in Power BI

Agenda

• Gathering requirements for data refresh in Power BI

• Choosing a storage mode

• Import refresh tuning methodology

• Measuring refresh performance

• Data modelling

• Tuning your data source

• Tuning the Power Query engine

• Tuning the Analysis Services engine

• Refresh in the Power BI Service

Page 3: Performance tuning dataset refresh in Power BI

Why is refresh performance important?

• Your reports are ready for your users to view faster

• You can refresh more frequently during the day if you need to

• Dataset development is easier

• If something goes wrong with your data you can fix and reload faster

• Slow refresh of one dataset may impact• Refresh performance of other datasets

• Report performance

• But how fast is fast enough?

Page 4: Performance tuning dataset refresh in Power BI

How often do you want your data to refresh?

Page 5: Performance tuning dataset refresh in Power BI

I want real-time data!

Page 6: Performance tuning dataset refresh in Power BI

Requirements for data refresh

• Don’t ask what your users want, ask what they need

• Questions:• When is your source data ready to use?

• How often does your source data change?

• What time do you need your data by?

• How many times do you need to refresh in a day? What is the business need?

• What if you unexpectedly need to refresh (eg to fix data problems)?

• How important is keeping data up-to-date versus report performance?

Page 7: Performance tuning dataset refresh in Power BI

Choosing a storage mode

• Import – fastest query performance but data must be refreshed

• Push – data is pushed into a dataset; many limitations

• DirectQuery – no need to refresh but query performance is slower• Composite models allow you to mix DirectQuery and Import tables

• Aggregations are pre-aggregated tables that improve query performance

• Use auto-refresh to make sure your report always shows the latest data

• Use Import unless you have a good reason not to!

Page 8: Performance tuning dataset refresh in Power BI

What happens during refresh?

Data sources Power BI

Power Query Analysis Services

Dataset

Query

Query

Query

Query

Page 9: Performance tuning dataset refresh in Power BI

Import refresh tuning methodology

• Steps:• Model your data properly

• Remove all data that isn’t needed for your reports/analysis

• Tune your data source

• Tune your Power Query queries

• Tune the Analysis Services engine inside Power BI

• You need to check:• Performance of a single refresh while developing

• Actual performance of dataset refresh in production

Page 10: Performance tuning dataset refresh in Power BI

Measuring overall refresh performance

• SQL Server Profiler is the best tool for measuring refresh performance• Connect to Power BI Desktop via DAX Studio

• Connect to Power BI Premium capacities via XMLA endpoint• Not possible to connect to Power BI Shared capacity

• Displays all activity in the Analysis Services engine

• Look for Process command and Duration column

• Power BI Service refresh history also has overall refresh times

• Refresh summary page (and API) shows refresh times for datasets in Premium

• Power BI Capacity Metrics app shows refresh times for Premium

Page 11: Performance tuning dataset refresh in Power BI

Data modelling and refresh performance

• Good data modelling is important for many reasons – data refresh performance is only one of them

• Good modelling may make refresh performance slower, but will make report query performance faster

• Basic rule: always build a star schema!

• Common problems:• Tables with lots of columns

• Do you need to unpivot measures?• Do some of your fact table columns actually belong on a dimension table?• Are you even going to use all of these columns?

• One big table instead of fact tables and dimension tables• Use of expensive data types, eg Double instead of Currency

Page 12: Performance tuning dataset refresh in Power BI

Only load the data you need

• The more data you load, the slower refresh will be

• So:• Remove any columns you don’t need

• Filter out any rows you don’t need

• Think about applying a limit on history, eg only loading one year of data

• Do this as soon as possible, ideally before the data even reaches Power BI

• It’s easier to add data back if you need it than remove data from a dataset in production

• Deployment pipelines (in Premium) can be used to limit the amount of data you work with in a development environment

Page 13: Performance tuning dataset refresh in Power BI

Data source type and refresh performance

• How quickly can your data source send data to Power BI?

• Some tips:• Relational databases perform better than files

• CSV files will perform better than JSON, XML and especially Excel

• Files stored in SharePoint may be slow to load compared to local files

• Web services may also be slow

• Consider loading your data into a fast data source before loading it into Power BI

Page 14: Performance tuning dataset refresh in Power BI

Tuning your data source

• If your data source is a relational database, tune the SQL queries that are run when refresh takes place• Tools like SQL Server Profiler can be used to see what queries are run

• Other useful tools:• Fiddler for viewing requests made to web services

• Process Monitor for viewing reads from text files

• Power Query Query Diagnostics

Page 15: Performance tuning dataset refresh in Power BI

Data source location

• Network latency between your data source and Power BI can affect refresh performance• If you’re using an On-premises data gateway, think about the location of the

gateway machine

• Power BI Premium allows you to locate different capacities in different Azure Regions

Page 16: Performance tuning dataset refresh in Power BI

Power Query engine performance

• Power Query performance can vary depending on where Power Query queries are run:• Power BI Desktop – when you are developing

• Power BI Service – if you’re only connecting to cloud data sources

• On-premises data gateway – if any of your data sources are on-prem, all traffic has to go through a gateway

• Performance will depend on:• Hardware of the machine where queries are run

• Configuration settings and properties

• Efficiency of the queries themselves

Page 17: Performance tuning dataset refresh in Power BI

Power Query Power BI Desktop

• Measure performance of Power Query queries in Desktop using:• SQL Server Profiler

• Power Query Query Diagnostics

• Settings to improve performance in Power BI Desktop:• Disable queries that you don’t need to load into the dataset

• Turn off “Allow data preview to download in the background”

• Turn off data privacy checks – but only if you know what this means!

• Experiment with “Enable parallel loading of tables”

• Use Table.View to stop multiple reads

• Turn off “Include in report refresh” if a query doesn’t need to be refreshed

Page 18: Performance tuning dataset refresh in Power BI

Query folding

• Query folding refers to the way the Power Query engine can push transformations back to the data source

• Almost always results in much better performance

• Only possible with some data sources: relational databases, Analysis Services, OData feeds, some others

• Only possible for some transformations• Different data sources support folding for different transformations

• Some transformations stop other folding happening

• Writing your own SQL queries also prevents folding

Page 19: Performance tuning dataset refresh in Power BI

Tuning the Power Query engine

• If query folding is not taking place, then the Power Query engine does the transformations in your queries

• Some transformations such as sort, merge, pivot/unpivot require all data to be loaded into RAM• A query is limited to using 256MB RAM, so paging may take place

• Some transformations force multiple reads from a data source• Using Table.Buffer may help – but may also cause paging

Page 20: Performance tuning dataset refresh in Power BI

Tuning the on-premises data gateway

• If you are using an on-premises data gateway to load data, your Power Query queries will be executed on the gateway machine

• Tips:• Locate the gateway machine close to the data source

• Make sure the gateway server has enough CPU and memory

• Clustered gateways allow for the load to be spread across multiple servers

• Turn on performance logging and use the Power BI template report to analyse it

Page 21: Performance tuning dataset refresh in Power BI

Using dataflows to improve performance

• Dataflows let you share the output of a Power Query query between multiple datasets• Do complex transformations once instead of inside multiple datasets

• Do transformations when the data for one query is ready, no need to wait until all data needed by the dataset is ready

• Data privacy checks are off by default -> better performance

• In a Premium capacity:• Enhanced compute engine improves performance by loading data into SQL

• Container Size property = more RAM for the Power Query engine

• Computed entities allow you to stage data from slow data sources

Page 22: Performance tuning dataset refresh in Power BI

Power BIData Source

Dataset A

Dataset B

Table

Query

Query

Page 23: Performance tuning dataset refresh in Power BI

Power BIData Source

Dataset A

Dataset B

Table Dataflow Entity

Query

Query

Page 24: Performance tuning dataset refresh in Power BI

Tuning the Analysis Services engine

• SQL Server Profiler displays a lot of detail about what happens during refresh in the Analysis Services engine

• Official support for Tabular Editor within Desktop will allow changing more properties:• IsAvailableInMDX – controls whether hierarchies are built on columns (only

relevant for clients that query using MDX such as Excel)

• EncodingHints – forces the use of a certain type of encoding for a column

Page 25: Performance tuning dataset refresh in Power BI

Calculated columns and calculated tables

• Calculated columns and calculated tables are evaluated during refresh• So the more you have, the slower refresh will be

• Can you replace a calculated column with a measure?• Strange but true: this may also help query performance

• Can you replace a calculated table with a Power Query query or a table in your data source?

• Loading data into hidden tables and then using DAX to transform it is usually a bad thing

• BUT certain calculations will be much quicker in DAX

Page 26: Performance tuning dataset refresh in Power BI

Incremental refresh

• Incremental refresh lets you refresh only the data that is new or has changed• Less data to load -> faster refresh

• Works by creating and managing partitions within the table

• Now available in Power BI Shared as well as Premium

• Designed for use with data warehouses built on relational databases

• Can be adapted for use with other data sources such as:• Web services

• Folders containing multiple files

Page 27: Performance tuning dataset refresh in Power BI

Refresh in the Power BI Service

• Refresh in the Power BI Service only when resources are available

• Therefore, refresh does not always start at the scheduled time

• Refresh may be slower in the Service because:• You have a very fast development PC

• It takes longer to load data into the cloud than into Power BI Desktop

• Refresh may run faster on Premium because: • More resources = more parallelism, but only on a P2+

• More likely to start on time – assuming your capacity isn’t overloaded