Transcript
Copyright © 2015 Itseez 1
Yury Gorbachev
12-May-2015
OpenCV for Embedded: Lessons Learned
Copyright © 2015 Itseez 2
• Open-source Computer Vision library (>2500 algos)• De-facto standard in CV, BSD license• Written in C++, C interface is now deprecated• Supports multiple platforms (Linux, Windows, OSX, Android, iOS,
QNX)• Used by Google, nVidia, Microsoft, Intel, Stanford, etc.• Funding/contributions from Willow Garage, nVidia, GSoC, AMD,
Intel• Maintained by Itseez
Copyright © 2015 Itseez 3
• OpenCV provides extensive means to create an entire application• Camera interface (for example, V4L2 interface on Linux)• Video Reading interface (using ffmpeg)• UI primitives (windows, keyboard/mouse input, etc.)
• Decent performance out of the box• Scalar performance is already good enough• Some algorithms are capable of working ~100 FPS on average
desktops• Extra optimization is not required in most of the cases
• Good and pretty stable acceleration possibilities• Intel® TBB is sufficient for multi-core• AVX, IPP, OpenCL, CUDA
Copyright © 2015 Itseez 4
• Mostly ARM platforms• Exotic execution environments
• C++ is not default language (e.g. on Android)• Different interfaces (Camera, UI, Log)• Hard to troubleshoot
• Insufficient and unpredictable performance• Mobile and Embedded are still behind Desktop• Thermal protection, power saving and other tricky issues
• Zoo of acceleration possibilities• SIMD, DSP, GPU offload, FPGA• Multi-core systems, heterogeneous systems
Copyright © 2015 Itseez 5
OpenCV
Platform Agnostic Modules
core, imgproc, calib3d, video, ml, objdetect, features2d,
photo, …
Platform Dependent Modules
gpu, highgui, androidcamera python and java bindings
DependenciesJPEG, PNG, Jasper, multimedia, OpenNI
DependenciesCMake
• Algorithm modules are easy to migrate to new environment• С++ and CMake are the only requirements!
• OpenCV accuracy tests• Easily verify correctness of OpenCV on a new platform• Some vendors use for regression tests during environment
updates
AccelerationsTBB/GDC/Concurrency,
IPP, Eigen
Copyright © 2015 Itseez 6
Prototyping(x86)
Porting
Profiling
Bottleneck optimization
Fine Tuning
Productization
Regression Tests
Performance Tests
• Video input, more debug possibilities, simple UI, higher speed• Focus on algorithm, not environment!
Copyright © 2015 Itseez 7
• HW performance is always an issue for vision systems• Heavy image processing requires significant memory
bandwidth• Usual bottleneck; multiple cores do not help
• Collocation of multiple algorithms on a single system (e.g. ADAS)
• Mobile platforms are even more complicated• Thermal protection, power saving are hard to control and
influence• Hard to predict when/if we are consuming too much• Unstable FPS impacts algorithm complexity (e.g. object
tracking)
• Hardware selection is not easy• Very hard to predict final application performance beforehand• No valid benchmarks to emulate computer vision patterns
Copyright © 2015 Itseez 8
• OpenCV was initially optimized for desktop where it works fast• ARM optimizations are far behind• Scalar code does not perform on ARM as good as on x86• Optimization might help to some extent
SSE
IPP
NEON (OpenCV 3)
NEON
150
10050
5
Number of optimized functions within OpenCV
Copyright © 2015 Itseez 9
• Algorithm optimization and only then hotspots• Reduce search and track areas, use grayscale, reduce
resolution
• Select proper HW if possible• Compare development kit performance at least• Try ARMv8, it is better in scalar performance
• Use OpenCV packages from HW vendors (NVIDIA, TI)• Vendor specific packages yield out of the box improvements
on specific HW, very easy to try• Not a cross-platform solution
• Optimize functions yourself• NEON, DSP and other HW specific options
Copyright © 2015 Itseez 10
Filter 2
D
Adapt
ive T
hres
hold
Blur
FAST
18.9
138163.6
32.42.3 3.1 3.1 7.9
Processing on ARM v7A
OpenCV Itseez
• Note scalar difference ARM v7A vs. v8
Filter 2
D
Adapt
ive T
hres
hold
Blur
FAST
30.8 30.1 27.123.2
2.5 1.4 0.65
Processing on ARM v8
OpenCV Itseez
Copyright © 2015 Itseez 11
• Itseez ADAS solution• Traffic Sign Recognition• Front Collision Warning• Line Departure Warning• Pedestrian Detection
• All algorithms are running real-time on off-the-shelf ARM device• Designed and tested using OpenCV• Product implements intelligent pipeline layer to reduce load• Uses custom accelerated functions
Copyright © 2015 Itseez 12
• Intelligent pipeline• Shares computation results between algorithms• Complicated processing is performed only once, used by all• Multiple frame sizes used where appropriate
• Custom NEON optimizations• Heavily optimized using only NEON, no GPU, DSP• Multiple processing functions are joined to reduce memory
access• E.g. demosaicing with conversion to grayscale & RGBA
• Some interesting statistics• Algorithm optimizations accelerate by factor 2-3• NEON accelerations give another 3-4x
Copyright © 2015 Itseez 13
• OpenVX standard by Khronos• Hardware accelerated vision – easier life for everyone• Currently being implemented by number of vendors
• OpenCV HAL (a part of OpenCV 3.x)• Low level API beneath the standard OpenCV• Open-source, but potentially can use proprietary
components • Generic multi-core scheduler (Planned feature)
• Make multi-core scheduler more intelligent on mobile architectures
• pthread-based backend in addition to existing options• Vision benchmarks for hardware (Desired feature)
• Some performance tests are present in OpenCV already• Not possible to use for benchmarking directly, some work is
needed • OpenCV Manager for Android could also contain
benchmarking
Copyright © 2015 Itseez 14
• Itseez Web: www.itseez.com
• OpenCV home: www.opencv.org• OpenCV documentation: docs.opencv.org• GitHub: https://github.com/Itseez/opencv• OpenCV resources on Embedded Vision Alliance (plenty of info):
http://www.embedded-vision.com/opencv-resources
• OpenCV on TI: http://www.ti.com/lit/wp/spry175/spry175.pdf• OpenCV on NVIDIA: https://developer.nvidia.com/opencv
• E-mail me: yury.gorbachev@itseez.com
Copyright © 2015 Itseez 15
Q & A
top related