Optimizing Performance in Qt Applications 11/16/09
May 12, 2015
Optimizing Performance in Qt Applications
11/16/09
Introduction
• Bjørn Erik Nilsen
– Software Engineer / Qt Widget Team
– The architect behind Alien Widgets
– Rewrote the Backing Store for Qt 4.5
– One of the guys implementing
WidgetsOnGraphicsView
– Author of QMdiArea/QMdiSubWindow
– Author of QGraphicsEffect/QGraphicsEffectSource
2
Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
3
Why Performance Matters
• Attractive to users
• Looks more professional
• Help you get things done more efficiently
• Keeps the flow
4
Why Performance Matters
• An example explains more than a thousand words
5
Why Performance Matters
• Performance is more important than ever before– Dynamic user interfaces
• Qt Everywhere– Desktop– Embedded platforms with limited hardware
• We cannot just buy better hardware anymore
• Clock speed vs. number of cores
6
Why Performance Matters
• Not all applications can take advantage of
multiple cores
• And some will actually run slower:
– Each core in the processor is slower
– Most applications not programmed to be multi-
threaded
• Multi-core crisis?
7
Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
8
Performance Improvements in Qt 4.6
• We continuously strive to optimize the performance
– QWidget painting performance, for example:
– Qt 4.6 no exception!9
Performance Improvements in Qt 4.6
10
QtOpenGL QtGui
QtCore
QtSvg
QtNetwork
QtScript
QtWebKit
QtOpenVG
Performance Improvements in Qt 4.6
• Graphics View
– New update mechanism
– New painting algorithm
– New scene indexing
– Reduced QTransform/QVariant/floating point
overhead
• QPixmapCache
– Extended with an int based API
11
QtGui
Performance Improvements in Qt 4.6
• Item Views– Item selection– Drag 'n' drop– QTableView and QHeaderView
• QTransform– fromTranslate/fromScale– mapRect for projective transforms
• QRegion– No longer a GDI object on Windows
12
QtGui
Performance Improvements in Qt 4.6
• QObject– Destruction– Connect and disconnect– Signal emission
• QVariant– Construction from float and pointers
• QIODevice– Less (re)allocations in readAll()
13
QtCore
Performance Improvements in Qt 4.6
• QNetworkAccessManager– HTTP back-end
• QHttpNetworkConnectionChannel– Pipelining HTTP requests (off by default)
• QHttpNetworkConnection– Increased the number of concurrent connections
• QLocalSocket– New Windows implementation– Major performance improvements
14
QtNetwork
Performance Improvements in Qt 4.6
• QtScript now uses JavaScriptCore as the back-end!– Still the same API, but with JSC performance
15
QtScript
Performance Improvements in Qt 4.6
• New OpenGL 2.x paint engine
• General improvements– Clipping– Text drawing
16
QtOpenGL
Performance Improvements in Qt 4.6
• New OpenVG paint engine– Uses Khronos EGL API– Configure Qt with “-openvg”
• Support for hardware-accelerated 2D vector graphics on:– Embedded, mobile and consumer electronic devices– Desktop
• More info: http://labs.trolltech.com/blogs
17
QtOpenVG
New module!
Performance Improvements in Qt 4.6
• Improved support for DirectFB– Enabling hardware graphics acceleration on
embedded platforms
• Maemo Harmattan optimizations
18
Embedded
Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
19
How You Can Improve Performance
• Theory of Constraints (TOC) by Eliyahu M. Goldratt• The theory is based on the idea that in any complex
system, there is usually one aspect of that system that limits its ability to achieve its goal or optimal functioning. To achieve any significant improvement of the system, the constraint must be identified and resolved.
• Applications will perform as fast as their bottlenecks
20
Theory of Constraints
• Define a goal:– For example: This application must run at 30 FPS
• Then:1) Identify the constraint 2) Decide how to exploit the constraint3) Improve4) If goal not reached, go back to 1)5) Done
21
Identifying hot spots (1)
• The number one and most important task
• Make sure you have plausible data
• Don't randomly start looking for slow code paths!– An O(n2) algorithm isn't necessarily bad– Don't spend time on making it O(n log n) just for fun
• Don't spend time on optimizing bubble sort
22
Identifying hot spots (1)
• “Bottlenecks occur in
surprising places, so
don't try second guess
and put in a speed hack
until you have proven
that is where the
bottleneck is” -- Rob Pike
23
Identifying hot spots (1)
• The right approach for identifying hot spots:
– Any profiler suitable for your platform• Shark (Mac OSX)• Valgrind (X11)• Visual Studio Profiler (Windows)• Embedded Trace Macrocell (ETM) (ARM devices)
• NB! Always profile in release mode
24
Identifying hot spots (1)
• Run application: “valgrind --tool=callgrind ./application”
• This will collect data and information about the program
• Data saved to file: callgrind.out.<pid>
• Beware:– I/O costs won't show up– Cache misses (--simulate-cache=yes)
• The next step is to analyze the data/profile• Example
25
Identifying hot spots (1)
• Profiling a section of code (run with “–instr-atstart=no”):
26
#include<BbrValgrind/callgrind.h>
int myFunction() const{ CALLGRIND_START_INSTRUMENTATION; int number = 10; ... CALLGRIND_STOP_INSTRUMENTATION; CALLGRIND_DUMP_STATS;
return number;}
Identifying hot spots (1)
• When a hot-spot is identified:– Look at the code and ask yourself: Is this the right
algorithm for this task?
• Once the best algorithm is selected, you can exploit the
constraint
27
How to exploit the constraint (2)
• Optimize– Design level– Source code level– Compile level
• Optimization trade-offs:– Memory consumption, cache misses– Code clarity and conciseness
28
How to exploit the constraint (2)
• “Any intelligent fool can
make things bigger,
more complex, and more
violent. It takes a touch
of genius – and a lot of
courage – to move in the
opposite direction.”
--Einstein
29
How to exploit the constraint (2)
• Wouldn't it be great to have a cross-platform tool to
measure performance?
30
QTestLib
• Say hello to QBENCHMARK
• Extension to the QTestLib framework
• Cross-platform
• Straight forward: QBENCHMARK { <code here> }
• Code will then be measured based on– Walltime (default)– CPU tick counter (-tickcounter)– Valgrind/Callgrind (-callgrind)– Event counter (-eventcounter)
31
QTestLib
• Let's create a benchmark
• Run with ./mytest -xml -o results.xml
• git clone git://gitorious.org/qt-labs/qtestlib-tools.git
• Visualize with– Graph (generatereport results.xml)– BMCompare (bmcompare results1.xml results2.xml)
• Now that we have tool, it is easier to measure and
decide which algorithm to use
32
How to exploit the constraint (2)
• General tricks:– Caching– Delay a computation until the result is required– Reduce computation in tight loops– Compiler optimizations
• Optimization Techniques for Qt:– Choose the right container– Use implicit data sharing efficiently– Discover the magic flags
33
Implicit data sharing in Qt
• Maximize resource usage and minimize copying
34
Object 2ObjectData
Object 3
Object 1
Shallow copies
Object 0
Object obj0; // Creates ObjectData
// Copies (share the same data)Object obj1, obj2, obj3 = obj0;
Implicit data sharing in Qt
• Data is only copied if someone modifies it:
35
Object 2ObjectData
Object 3
Object 1
Shallow copies
Object 0
ObjectData
Deep copy
Implicit data sharing in Qt
• How to avoid deep-copy:– Only use const operators and functions if possible– Be careful with the foreach keyword
• For classes that are not implicitly shared:– Always pass them around as const references– Passing const references is a good habit in any case
• Examples
36
Implicit data sharing in Qt
37
T *readOnly = list[index];
Original
T *readOnly = list.at(index);
Optimized
QList<T>::iterator i;i = list.begin();
QList<T>::const_iterator i;i = list.constBegin();
foreach (QString s, strings) foreach (const QString &s, strings)
void foo(QTransform t); void foo(const QTransform &t);
NB! QTransform is not implicitly shared!
Implicit data sharing in Qt
• See the “Implicitly Shared Classes” documentation for a
complete list of implicitly shared classes in Qt
• http://doc.trolltech.com/4.6-snapshot/shared.html
• Note: All Qt containers are implicitly shared
38
Qt Containers
39
QList
QLinkedList
QStackQQueue
QSet
QMultiMap
QVector
QHash
QMultiHash
QMap
Qt Containers
40
QMultiMap
QHash
QMultiHash
QMap
Associative Containers
Qt Containers
41
QList
QLinkedList
QStackQQueue
QVector
Sequential Containers
QSet
Qt Containers
42
QList
QLinkedList
QStackQQueue
QVector
Sequential Containers
vs
QVector<T>
• Items are stored contiguously in memory
• One block of memory is allocated:
43
ref
QBasicAtomicInt
alloc size flags array[alloc - 1]array[0] ...
int uintint T T
QVectorTypedData<T>
d
QVector<T>
QVector<T>
• Reserves space at the end
• Growth strategy depends on the type T– Movable types: realloc by increments of 4096 bytes– Non-movable types: 50% increments
• What is a movable type?– Primitive types: bool, int, char, enums, pointers, …– Plain Old Data (POD) with no constructor/destructor– Basically everything that can be moved around in
memory using memcpy() or memmove()– Good article: http://www.ddj.com/cpp/184401508
44
Movable types
• User-defined classes are treated as non-movable by
default
• Oh no!
• Have no fear, Q_DECLARE_TYPEINFO is here
• You can tell Qt that your class is a:– Q_PRIMITIVE_TYPE: POD with no constr./destr.– Q_MOVABLE_TYPE: has constr./destr., but can be
moved in memory using memcpy()/memmove()
45
Movable types (Q_PRIMITIVE_TYPE)
46
struct Point2D{ int x; int y;};
Q_DECLARE_TYPEINFO(Point2D, Q_PRIMITIVE_TYPE);
Movable types (Q_MOVABLE_TYPE)
47
class Point2D{public: Point2D() { data = new int[2]; } Point2D(const Point2D &other) { … } ~Point2D() { delete [] data; }
Point2D &operator=(const Point2D &other) { … }
int x() const { return data[0]; } int y() const { return data[1]; }
private: int *data;};
Q_DECLARE_TYPEINFO(Point2D, Q_MOVABLE_TYPE);
QVector<T>
• Insertion in the middle:– Movable type: memmove()– Non-movable type: operator=()
48
0 1 2 3 4 5 6
1
0 2 3 4 5 6 7
0 1 2 3 4 5 6
1
QList<T>
• Two representations
• Array of pointers to items on the heap (general case)
49
ref
QBasicAtomicInt
alloc begin
int intint
QListData::Data
d
QList<T>
end flags
uint
array[alloc - 1]array[0] ...
void * void *
T T
QList<T>
• Special case: T is movable and sizeof(T) <= sizeof(void *)
• Items are stored directly (same as QVector)
50
ref
QBasicAtomicInt
alloc begin
int intint
QListData::Data
d
QList<T>
end flags
uint
array[alloc - 1]array[0] ...
T T
QList<T>
• Reserves space at the beginning and at the end
• Benefits of reserving space at the beginning– Prepending an item usually takes constant time– Removing the first item usually takes constant time– Faster insertion
51
QVector<T> vs. QList<T>
• QList expands to less code in the executable
• For most purposes, QList is the right class to use
• If all you do is append(), use QVector– Use reserve() if you know the size in advance– Also consider QVarLengthArray or plain C array
• When T is movable and sizeof(T) <= sizeof(void *)– Almost no difference, except that QList provides faster
insertions/removals in the first half of the list
• (Constant time insertions in the middle: Use QLinkedList)
52
General Qt Container Advices
• Avoid deep copies, e.g:– Use at() rather than operator[]– constData()/constBegin()/constEnd()– Basically: limit usage of non-const functions
• When you know the size in advance:– Use reserve()
• Let Qt know whether your class is movable or not– Q_DECLARE_TYPEINFO
• Choose the right container for the right circumstance
53
General Painting Optimizations
• Prefer QPixmap over QImage (if possible)– QPixmap is accelerated– QPixmap caches information about the pixels
• Avoid QPixmap/QImage::setAlphaChannel()– Use QPainter::setCompositionMode instead
• Avoid QPixmap/QImage::transformed()– Use QPainter::setWorldTransform instead
• If you for sure know the image has alpha:– Qt::NoOpaqueDetection (QPixmap::fromImage)
54
General Painting Optimizations
55
int width = image.width();int height = image.height();
for (int y = 0; y < height; ++y) { for (int x = 0; x < width; ++x) { QRgb pixel = image.pixel(x, y); … }}
Original
NB! Image is 32 bit
General Painting Optimizations
56
int width = image.width();int height = image.height();
for (int y = 0; y < height; ++y) { QRgb *line = reinterpret_cast<QRgb *>(image.scanLine(y)); for (int x = 0; x < width; ++x) { QRgb pixel = line[x]; … }}
Optimized
General Painting Optimizations
57
int numPixels = image.width() * image.height();QRgb *pixels = reinterpret_cast<QRgb *>(image.bits());
for (int i = 0; i < numPixels; ++i) {QRgb pixel = pixels[i];
… }}
Even more optimized
General Painting Optimizations
58
MyWidget::paintEvent(...){ QPainter painter(this); painter.fillRect(rect(), Qt::red);}
int main(int argc, char **argv){ ... MyWidget widget; ...}
Original Optimized
MyWidget::paintEvent(...){ QPainter painter(this); painter.fillRect(rect(), Qt::red);}
int main(int argc, char **argv){ ... MyWidget widget; widget.setAttribute( Qt::WA_OpaquePaintEvent); ...}
General Painting Optimizations
59
painter.drawLine(line1);painter.drawLine(line2);painter.drawLine(line3);
Original
QLine lines[3];...painter.drawLines(lines, 3);
Optimized
painter.drawPoint(point1);painter.drawPoint(point2);painter.drawPoint(point3);
QPoint points[3];...painter.drawPoints(points, 3);
QString key(“abcd”);QPixmapCache::insert(key, pm);QPixmapCache::find(key, pm);
QPixmapCache::Key key;key = QPixmapCache::insert(pm);pm = QPixmapCache::find(key);
Other Optimizations
60
const QString s = s1 + s2 + s3;
Original
#include <QStringBuilder>...const QString s = s1 % s2 % s3;
Optimized
QTransform xform = a.inverted();xform *= b.inverted();
QTransform xform = b;xform *= a;xform = xform.inverted();
foreach (const QString &s, slist) { if (s.size() < 5) continue; const QString m = s.mid(2, 3); if (m == magicString) doMagicStuff();}
#define QT_USE_FAST_CONCATENATION
#define QT_USE_FAST_OPERATOR_PLUS
foreach (const QString &s, slist) { if (s.size() < 5) continue; QStringRef m(&s, 2, 3); if (m == magicString) doMgicStuff();}
Other Optimizations
61
qFuzzyCompare(opacity+1, 1));
Original
qFuzzyIsNull(opacity));
Optimized
int nRects = qregion.rects().size(); int nRects = qregion.numRects();
#button1 { background:red }#button2 { background:red }
*[readOnly=”1”] { color:blue }
if (cheap() && expensive())if (expensive() && cheap())
#button1,#button2 { background:red }
/* Only QLineEdit can possibly be read-only in my application*/QLineEdit[readOnly = “1”]{ color:blue }
Graphics View Optimizations
• Viewport update modes
• Scene index– BSP tree index– No index
• Avoid QGraphicsScene::changed signal
• QGraphicsScene::setSceneRect
• Cache modes– Device coordinates– Item coordinates
• OpenGL viewport62
Platform Specific Optimizations
• Link time optimization LTCG (Windows only)– Approx. 10%-15% speedup– Configure Qt with “-ltcg”
• Don't use explicit double arithmetic– qreal is float on embedded (QWS)– 100 / 2.54 → 100 / qreal(2.54)
• It's time time to take advantage of what we have
learned
• Let's do some real optimizations!
63
Theory of Constraints
• Define a goal:– For example: This application must run at 30 FPS
• Then:1) Identify the constraint 2) Decide how to exploit the constraint3) Improve4) If goal not reached, go back to 1)5) Done
65
Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
66
Questions?