EXPLOIT THE INTEGRATED GRAPHICS IN PACKET PROCESSING Speaker: Supervisor: Course: Academic year: Francesco Corazza Prof. Fulvio Risso Progetto di Reti Locali 2010/2011
Jan 27, 2015
EXPLOIT THE INTEGRATED GRAPHICS IN PACKET PROCESSING
Speaker:
Supervisor:
Course:
Academic year:
Francesco Corazza
Prof. Fulvio Risso
Progetto di Reti Locali
2010/2011
2
Scenario
Packet processing are demanding more performances:• Increasing network speed• More intelligence in network devices• Deeper packet analysis• …
Intel is the best network hardware choice thanks to:• Scale economy• Price/quality ratio• Power Consumption
We will deal with packet processing on Intel platforms…
Francesco Corazza
3
Overview
Issues:• Intel
• Have not yet deployed efficient tools for our needs
• Discrete GPU• Heavy• Expensive• Not power-saving• Affected by BUS bottleneck
Focus:• Consumer platforms• CPU + GPU solutions
Two different objectives can be identified…
Francesco Corazza
4
How convenient hardware can be
exploited in these app?
Presentation Structure
What kind of application is
packet processing?
Which features
differentiate them from
general computing?
What is the hardware best
fit on these applications?
What is the hardware
most profitable for these app?
Francesco Corazza
GPU solutions
CPU+GPU
solutions
Focus on the Field
Focus on Integrated Graphics
Objectives:
Chapter Division:
FOCUS ON THE FIELD
6
Focus on the field
• What kind of application is packet processing?
• Which features differentiate them from general computing?
• What is the hardware best fit on these applications?
• What is the hardware most profitable for these app?
• How convenient hardware can be exploited in these app?
Focus on the FieldFrancesco Corazza
7
Packet processing Applications• Memory intensive
• Frequent data load from packet• Huge amount of data involved in the processing
• No data locality• Unpredictable loads from different memory areas
• Small tasks, over a large number of packets
Focus on the FieldFrancesco Corazza
8
Focus on the field
• What kind of application is packet processing?
• Which features differentiate them from general computing?
• What is the hardware best fit on these applications?
• What is the hardware most profitable for these app?
• How convenient hardware can be exploited in these app?
Focus on the FieldFrancesco Corazza
11
Differences in hardware will mirror differences in software…
General computing vs. Packet processing
Francesco Corazza
StructureMemory access patterns
Core activity
CPU bounded
ALU-based computation
Locality pattern
Caches are useful
Complex tasks launched once
Small amount of memory required
Memory bounded
Load/Store-based
computation
Random pattern
Unpredictable loads from memory
Very repetitive small tasks
Huge amount of memory involved
General Computing Application
Packet Processing Application
12
Focus on the field
• What kind of application is packet processing?
• Which features differentiate them from general computing?
• What is the hardware best fit on these applications?
• What is the hardware most profitable for these app?
• How convenient hardware can be exploited in these app?
Focus on the FieldFrancesco Corazza
13
Network Processors• Memory
• Narrow data buses• Multiple data buses• Memory Hierarchies• Few caches
• Superscalar execution• Massive number of threads• Thread-level parallelism• Zero-overhead switching• Asynchronous code
Packet processing is a market niche, so the industry was obliged to move to solutions borrowed from mainstream consumer market…
Focus on the FieldFrancesco Corazza
Packet processing Applications
• Memory intensive• Huge amount of data involved
in the processing• Frequent data load from packet
• No data locality• Unpredictable loads from
different memory areas
• Small tasks, over a large number of packets
14
Network Hardware Evolution
Focus on the FieldFrancesco Corazza
The scale economies have dropped out specific hardware:
• Network Processors• CISCO• Tilera• …
• Consumer Processors• GPU solutions
• Nvidia Fermi• CPU+GPU solutions
• Our investigation lays here• Hybrid Processors
• Intel Many Integrated Core• AMD Fusion
TIME
15
Focus on the field
• What kind of application is packet processing?
• Which features differentiate them from general computing?
• What is the hardware best fit on these applications?
• What is the hardware most profitable for these app?• GPU
• CPU + GPU
• Intel MIC
• How convenient hardware can be exploited in these app?
Focus on the FieldFrancesco Corazza
16
GPU – Features • Shared Memory
• High bandwidth• Coalesced access
• Lots of Execution Units• Slow cores• Massive parallelism
• SIMT execution model• More flexible than SIMD
Focus on the FieldFrancesco Corazza
• Memory intensive• Huge amount of data involved
in the processing• Frequent data load from packet
• No data locality• Unpredictable loads from
different memory areas
• Small tasks, over a large number of packets
Packet processing Applications
19
CPU + GPU solutions
… just wait few slides to find out how it will end up
Let's take a look to the architectures that we will face in the future…
Focus on the FieldFrancesco Corazza
20
Intel MIC (Many Integrated Core)• Built from Single-Chip Cloud Computer and Larrabee
researches• Programming GPU with x86 Instruction Set
• Development tools in common with Xeon• Same tools can compile both for the processor and for the co-processor• HPC market target
• Knights Corner (First Implementation):• 50 x86 cores: four threads, 64KB L1, 256KB L2 cache, 512-bit
vector unit, GDDR5 memory, PCI Express 2.0
Focus on the FieldFrancesco Corazza
21
Focus on the field
• What kind of application is packet processing?
• Which features differentiate them from general computing?
• What is the hardware best fit on these applications?
• What is the hardware most profitable for these app?
• How convenient hardware can be exploited in these app?• GPGPU• DirectCompute• OpenCL
Focus on the FieldFrancesco Corazza
22
GPGPU – Overview • General-Purpose computing on graphics processing units
• Programming GPUs through accessible programming interfaces and industry-standard languages such as C
• Allows software developers to use stream processing on non-graphics data
• Competing interfaces• Nvidia Compute Unified Device Architecture (CUDA)• AMD Stream (now joined into OpenCL)• Microsoft DirectCompute (new subset of DirectX10/11 APIs)
• Convergence towards standardization (like OpenGL)• Khronos Group OpenCL
These frameworks lye just above hardware…
Focus on the FieldFrancesco Corazza
23
GPGPU – Layer representation
Focus on the FieldFrancesco Corazza
Accelerator, Brook+, Rapidmind, Ct
MKL, ACML, cuFFT, D3DX, etc.
Media playback or processing, media UI, recognition, etc. Technical
DirectCompute, CUDA, CAL, OpenCL, LRB Native, etc.
CPU, GPU, LarrabeenVidia, Intel, AMD, S3, etc.
Applications
Processors
Domain Libraries
Domain Languages
Compute Languages
25
GPGPU – Analysis• CUDA
• Tight hardware integration• Depence on Nvidia hardware
• OpenCL • Give up lower-level hooks into the architecture • Heterogeneous computational resources• Integration in the Khronos family (eg. OpenGL)
• DirectCompute• Only Windows (Wine/Mono are immature)• Integration in DirectX APIs• GPGPU under the hood of Windows 7
For their spread, we are going to cover the latter two languages…
Focus on the FieldFrancesco Corazza
26
DirectCompute
Exposes the compute functionality of the GPU as a new type of shader (tool that determines the final appearance of an object's surface)
• Compute Shader • Delivers the performance of 3-D games to new applications
• Rendering integration• Demonstrates tight integration between computation and rendering
• Supported by all processor vendors• DirectX 10.1/11.0 respectively support Compute Shader 4.0/5.0
• Scalable parallel processing model• Code should scale for several generations
Focus on the FieldFrancesco Corazza
27
DirectCompute – Rendering Pipeline
Focus on the FieldFrancesco Corazza
Render scene
Write out scene image
Use Compute for image post-processing
Output final image
30
DirectCompute – Programming Model
Threads in the same group run concurrently
Focus on the FieldFrancesco Corazza
Dispatch• 3D grid of thread groups
Thread Group • 3D grid of threads • numThreads(nX, nY, nZ)
Thread• One invocation of a shader
31
DirectCompute – Execution Model
• A thread is executed by a scalar processors
• A thread group is executed on a multiprocessor
• A compute shader kernel is launched as a grid of thread-groups (Only one grid of thread groups can execute on a device at one time)
Focus on the FieldFrancesco Corazza
35
DirectCompute – Example HLSL codestruct BufferStruct{ uint4 color;};
// group size
#define thread_group_size_x 4
#define thread_group_size_y 4
RWStructuredBuffer<BufferStruct> g_OutBuff;
/* This is the number of threads in a thread group, 4x4x1 in this example case */
// e.g.: [numthreads( 4, 4, 1 )]
[numthreads( thread_group_size_x, thread_group_size_y, 1 )]
void main( uint3 threadIDInGroup : SV_GroupThreadID, uint3 groupID : SV_GroupID, uint groupIndex : SV_GroupIndex, uint3 dispatchThreadID : SV_DispatchThreadID )
{
int N_THREAD_GROUPS_X = 16; // assumed equal to 16 in dispatch(16,16,1)
int stride = thread_group_size_x * N_THREAD_GROUPS_X;
// buffer stide, assumes data stride = data width (i.e. no padding)
int idx = dispatchThreadID.y * stride + dispatchThreadID.x;
float4 color = float4(groupID.x, groupID.y, dispatchThreadID.x, dispatchThreadID.y);
g_OutBuff[ idx ].color = color;
}
Focus on the FieldFrancesco Corazza
36
OpenCL – Overview
Open Computing Language• Access to heterogeneous computational resources• Parallel execution on single or multiple processors
• GPU, CPU, GPU + CPU or multiple GPUs
• Desktop and Handheld Profiles• Work with graphics APIs
• OpenGL
• C99 with extensions• Familiar to developers• Rich set of built-in functions• Easy to develop data- and task- parallel compute programs• Defines hardware and numerical precision requirements
Focus on the FieldFrancesco Corazza
37
OpenCL – Execution Model (I)• Work item
• Basic unit of work on an OpenCL device
• Kernel• Basic unit of executable code • Similar to a C function• Data-parallel or task-parallel
• Program• Collection of kernels and functions• Analogous to a dynamic library
• Context • Environment within which work- items executes
• Applications • Queue kernel execution instances
• In-order: one queue to a device
• Executed in-order or out-of-order
Focus on the FieldFrancesco Corazza
43
OpenCL – Coding (I)• Work-item
• Smallest execution entity• Every time a Kernel is launched, lots of work-items (a number specified by the
programmer) are launched, each one executing the same code • Unique ID
• Accessible from the kernel• Used to distinguish the data to be processed by each work-item
• Work-group• Allow communication and cooperation between work-items • Reflect work-items organization
• (N-dimensional grid of work-groups, N = 1, 2 or 3)• Independent element of execution in N-D domain
• ND-Range• Computation domain (Organization level)• Specify how work-groups are organized
• (N-dimensional grid of work-groups, N = 1, 2 or 3)• Defines the total number of work-items that execute in parallel
Focus on the FieldFrancesco Corazza
44
OpenCL – Coding (II)
Focus on the FieldFrancesco Corazza
45
OpenCL – Coding (III)
Process a 1024 x 1024 imageGlobal problem dimensions:
• 1024 x 1024 = 1 kernel execution per pixel• 1,048,576 total executions
Focus on the FieldFrancesco Corazza
scal
ar
data
-par
alle
lvoid scalar_mul ( int n, const float *a, const float *b, float *result){
int i;for (i=0; i<n; i++)result[i] = a[i] * b[i];
}
kernel void dp_mul(global const float *a,global const float *b, global float *result ) {
int id = get_global_id(0);result[id] = a[id] * b[id];
}// execute dp_mul over “n” work-items
FOCUS ONINTEGRATED GRAPHICS
47
CPU+GPU solutions
The architectures involved are:• Intel Core 2° Generation (Sandy Bridge)• Intel Atom E600 Series (Tunnel Creek)• Nvidia Tegra (Tegra 2)• AMD Fusion
Let’s compare them…
Focus on Integrated GraphicsFrancesco Corazza
48
CPU+GPU solutions
Focus on Integrated GraphicsFrancesco Corazza
Market Target Release Date
Desktop / Hi-End 01/2011
Mobile / Industrial embedded
11/2010
Mobile / Tablets 01/2010
Consumer / Desktop 01/2011
49
Focus on Integrated Graphics
• Intel Core 2° Generation (Sandy Bridge)• Features• Integrated GPU• AVX (Advanced Vector Extensions)
• Intel Atom E600 Series (Tunnel Creek)• Nvidia Tegra (Tegra 2)
• AMD Fusion
Focus on Integrated GraphicsFrancesco Corazza
50
Sandy Bridge – Features (I) • CPU die redesigned
• Chip’s northbridge and GPU are both on-die (in the previous versions they were on a physically separate chip)
• LLC (Last Level Cache, formerly L3 Cache) • Thanks to new ring bus LLC is shared amongst all components,
including the GPU• Each individual core had its own private path to the LLC cache
• Unified Memory Architecture (UMA)• Architecture where the graphics subsystem does not have
exclusive dedicated memory and uses the host system’s memory• Dynamic Video Memory Technology (DVMT)
• Hyper Threading
Focus on Integrated GraphicsFrancesco Corazza
51
Sandy Bridge – Features (II)• Turbo Boost Technology 2.0
• Adjust the processor core and GPU frequencies to increase performance and maintain the allotted power/thermal budget
• Processor can increase individual core speed or graphics speed as the workload dictates
• Developers cannot directly control it
• AVX (Advanced Vector eXtension)• Extends SIMD instructions from 128 bits to 256 bits. • AVX enables a single instruction to work on eight floating points at
a time instead of the four that the current SIMD provides• Increased processor performance with minimal power gains
(HUGI: Hurry Up And Get Idle)
Next diagram shows the integration that Intel have reached…
Focus on Integrated GraphicsFrancesco Corazza
52
Sandy Bridge – Block Diagram
Now we have to zoom in into the graphic processor…
Focus on Integrated GraphicsFrancesco Corazza
53
Sandy Bridge – Integrated GPU (I)
Focus on Integrated GraphicsFrancesco Corazza
54
Sandy Bridge – Integrated GPU (II)• DirectCompute support
• DirectX 10.1• The internal ISA maps one-to-one with most DirectX10 API
instructions resulting in a very CISC-like architecture
• Execution Unit (EU)• The pipeline decoder uses only fixed-type function logic to limit the
overall power consumption (unlike NVIDIA and AMD that have programmable stream processors)
• Each EU can dual issue picking instructions from multiple threads• Transcendental math is handled by hardware in the EU and its
performance has been sped up considerably
GPU’s parallel capabilities are exploited thanks DirectCompute, but what about CPU?
Focus on Integrated GraphicsFrancesco Corazza
55
AVX – Overview
Some assembly instructions can show the power of AVX…
Francesco Corazza Focus on Integrated Graphics
•KEY FEATURES• Wider Vectors
• Increased from 128 to 256 bit• Two 128-bit load ports
• Enhanced Data Rearrangement• Use the new 256 bit primitives to broadcast, mask loads and stores and data permutes
• Three and four Operands• Non Destructive Source for both AVX 128 and AVX 256
• Flexible unaligned memory access support• Extensible new opcode (VEX)
•BENEFITS• Higher peak FLOPs with good power efficiency• Organize, access and pull only necessary data more quickly and efficiently• Fewer register copies, better register use for both vector and scalar code• More opportunities to fuse load and compute operations• Code size reduction
56
AVX – Instructions (I)
Focus on Integrated GraphicsFrancesco Corazza
57
AVX – Instructions (II)
Focus on Integrated GraphicsFrancesco Corazza
58
AVX – Code Example (I)H
igh
leve
l cod
e:
Ass
embl
y:
Focus on Integrated GraphicsFrancesco Corazza
#include <immintrin.h>
void foo(float *a, float *b, float *r){
__m256 s1, s2, res;
s1 = _mm256_loadu_ps(a);s2 = _mm256_loadu_ps(b);
res = _mm256_add_ps(s1, s2); _mm256_storeu_ps(r, res);
}
; -- Begin _fooALIGN 16 PUBLIC _foo
_foo PROC NEAR; parameter 1: 4 + esp ; parameter 2: 8 + esp ; parameter 3: 12 + esp$B2$1: ; Preds $B2$0
mov eax, DWORD PTR [4+esp] mov edx, DWORD PTR [8+esp] mov ecx, DWORD PTR [12+esp] vmovups ymm0, YMMWORD PTR [eax] vaddps ymm1, ymm0, YMMWORD PTR [edx] vmovups YMMWORD PTR [ecx], ymm1; LOE ebx ebp esi edi
$B2$2: ; Preds $B2$1ret ;10.1ALIGN 16
; LOE_foo ENDP ;_foo ENDS
61
AVX – Benchmarks
Focus on Integrated GraphicsFrancesco Corazza
62
AVX – Benchmarks
Focus on Integrated GraphicsFrancesco Corazza
SIMD processing works best with data-parallel applications where the data is arranged in a
structure of array (SOA) format. Graphics and image processing applications are often highly parallel and
well-structured, and thus are typically good candidates for SIMD processing. Geometry or mesh
data, on the other hand, is not always uniformly structured in a neat grid.
63
Sandy Bridge – Conclusion • Interesting features for packet processing
• Integrated Memory controller• DirectCompute• AVX
• CPU+GPU integration is only on the physical layer• Packet processing can exploit CPU or GPU• Unpredictable evolution
• DirectCompute could exploit CPU• AVX could exploit GPU
• Next Ivy Bridge will support both OpenCL and DirectX11
Focus on Integrated GraphicsFrancesco Corazza
64
Focus on Integrated Graphics
• Intel Core 2° Generation (Sandy Bridge)
• Intel Atom E600 Series (Tunnel Creek)• Features• Block Diagram• Customization
• Nvidia Tegra (Tegra 2)
• AMD Fusion
Focus on Integrated GraphicsFrancesco Corazza
65
Atom E600 – Features (I) • SoC (System on Chip)• Power optimized
• Fanless performance
• I/O flexible and open• Flexible application Specific Needs• PCIe instead of proprietary FSB
• 7 years long life support
• Hyper-Threading Technology• Two logical processors
• SSE3 (Streaming SIMD Extensions)• Support for SIMD intructions
Focus on Integrated GraphicsFrancesco Corazza
66
Atom E600 – Features (II) • Power saving
• Intel SpeedStep Technology• Enables the operating system to program a processor to transition to
lower frequency and/or voltage levels while executing a workload
• Deep power down technology• Able to reduce static power consumption by turning off power to cache
and other sub-systems in the processor.
• In-order processing• Guarantees greater power efficiency, CPU will not reorder an instruction
stream to extract instruction-level parallelism
• DirectCompute support• Tunnel Creek supports only DirectX9
The next diagram shows the insight of the Atom architecture…
Focus on Integrated GraphicsFrancesco Corazza
67
Atom E600 – Block Diagram
Atom does not support DirectCompute, so we have to concentrate on the great
flexibility of the architecture…
Focus on Integrated GraphicsFrancesco Corazza
68
Atom E600 – Customization • Open connection
• Developers can attach the processor to a variety of chipsets• application-specific third-party
chipsets• FPGAs• ASIC
• Processor can be used without a chipset (limited I/O needs)• The processor’s four PCIe
connections can attach to discrete PCIe peripherals such as Ethernet controllers
Focus on Integrated GraphicsFrancesco Corazza
69
Atom E600 – Conclusion • Interesting features for packet processing
• Power saving features• Long support • Flexible Architecture
• Any support to GPGPU• Old school GPGPU
• Use OpenGL ES 2.0 shaders (programmable shaders)• Rewrite the code as a fragment shader
• Wait for Cedar Trail (2011 – not yet released)• DirectX 10.1
Focus on Integrated GraphicsFrancesco Corazza
70
Focus on Integrated Graphics
• Intel Core 2° Generation (Sandy Bridge)
• Intel Atom E600 Series (Tunnel Creek)
• Nvidia Tegra (Tegra 2)• Features
• Block Diagram
• AMD Fusion
Focus on Integrated GraphicsFrancesco Corazza
71
Tegra – Features • SoC (System-on-a-chip)
• ARM CPU Dual Core• GeForce GPU
• ULP (Ultra-low power consumption)• Graphics support
• No DirectX support• No CUDA support• OpenGL ES 2.0 support
The next diagram shows quantitatively a view of a Tegra chip…
Focus on Integrated GraphicsFrancesco Corazza
72
Tegra – Block Diagram
Focus on Integrated GraphicsFrancesco Corazza
73
Tegra – Conclusion• Interesting features for packet processing
• Integrated Memory controller• Low power consumption
• Any support to GPGPU• Old school GPGPU
• Use OpenGL ES 2.0 shaders (programmable shaders)• Rewrite the code as a fragment shader
• Wait for Tegra 3 (third quarter of 2011)• DirectX 11• CUDA
Focus on Integrated GraphicsFrancesco Corazza
74
Focus on Integrated Graphics
• Intel Core 2° Generation ( Sandy Bridge)
• Intel Atom E600 Series (Tunnel Creek)
• Nvidia Tegra (Tegra 2)
• AMD Fusion• AMD Vision
• Features
• APU Roadmap
• Integration Highlights
Focus on Integrated GraphicsFrancesco Corazza
75
Fusion – AMD Vision
Fusion is a step-forward technology:
AMD have realized this heterogeneous architecture developing APUs…
Focus on Integrated GraphicsFrancesco Corazza
76
Fusion – Features (I)
Focus on Integrated GraphicsFrancesco Corazza
Video
77
Fusion – Features (II) • DirectCompute support (DirectX 11)• OpenCL 1.1
• Additive capabilities of an APU and a discrete graphics solution
• Power-oriented benefits
• Massive SIMD GPU (SSE5)• Programmable scalar and vector
processor cores
• APU family• Bulldozer (Sandy Bridge’s opponent)
• Performance and scalability
• Bobcat (Atom’s opponent)
Let’s compare this two solutions…
Focus on Integrated GraphicsFrancesco Corazza
79
Fusion – Features (III)
The difference between Bulldozer/Bobcat is also the market target…
Focus on Integrated GraphicsFrancesco Corazza
81
Fusion – APU roadmap
The high level of integration differentiate APUs from CPUs…
Focus on Integrated GraphicsFrancesco Corazza
82
Fusion – Integration Highlights• Shared memory
• Lower latencies
• PCI Express • Cut down some latencies
• No discrete GPU, less• Cost• Power• Motherboard complexity
Focus on Integrated GraphicsFrancesco Corazza
83
Fusion – Conclusion• Interesting features for packet processing
• OpenCL/DirectCompute/SSE5• Architecture tight integrated• New technology (First-Come-First-Served)
• OpenCL• Could be the “El Dorado” for packet processing
• CPU/GPU working in AND/OR configuration• Shared Memory• Embedded implementation of Fusion technology
• AMD declaredly support it to bring the power of heterogeneous computing mainstream
Focus on Integrated GraphicsFrancesco Corazza
CONCLUSIONS
85
Summary (I)This presentation has disclosed several ways of exploiting integrated graphics and, more generally, consumer architectures for packet processing:
• GPGPU-driven solutions• CUDA, OpenCL, DirectX11
• SIMD-driven solutions• Exploit very parallel operations through this SIMD implementation• AVX, SSE
• Custom hardware solutions• Design flexible modules tailored on specific needs• FPGA
The former solutions are the most in vogue at the moment…
ConclusionsFrancesco Corazza
86
Summary (II)
ConclusionsFrancesco Corazza
Open CL SSE FPGA Direct
Compute Open
GL
X V(AVX)
X V V
XV
(SSE 3)
V X V
XV
(SSE 3)
X X V
VV
(SSE 5)
X V V
87
Recommendations
Write directly parallel code is more efficient than hardware parallelization:
ConclusionsFrancesco Corazza
THANK YOUQuestions?
89
Bibliography• Lecture notes of course “Tecnologie per reti di calcolatori”• http://www.intel.com/technology/architecture-silicon/2ndgen/index.htm• http://www.intel.com/technology/atom/index.htm• http://www.intel.com/technology/architecture-silicon/mic/index.htm• http://sites.amd.com/us/fusion/apu/pages/fusion.aspx• http://www.hwupgrade.it/articoli/cpu/2674/intel-sandy-bridge-analisi-dell-architettura_i
ndex.html• http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed/• http://www.multicorepacketprocessing.com/ • http://www.nvidia.co.uk/object/tegra-2.html• http://www.tomshardware.com/reviews/sandy-bridge-fusion-nvidia-chipset,2763-6.ht
ml• http://www.tomshardware.com/reviews/amd-fusion-brazos-zacate,2786-2.html• http://gpgpu.org/• http://channel9.msdn.com/tags/DirectCompute-Lecture-Series/• http://gpgpu-computing.blogspot.com/• http://blogs.msdn.com/b/chuckw/archive/2010/07/14/directcompute.aspx• http://www.khronos.org/developers/resources/opencl/#ttutorials• http://www.youtube.com/watch?v=VIs1CxuUrpc&feature=related
Francesco Corazza