Embedded System
Embedded System
親手打造 HD 播放器
錢逢祥錢逢祥Fred Chien
本課程目標ARM 的初步認識
快速入門 Embedded System
理解 Linux 架構細節
親手打造實際應用
作中學
Embedded System?完全嵌入受控器件內部,為特定應用而設計的專用電腦系統
Embedded System Example
為何使用 ARM ?Advanced RISC Machine
ARM Advantage
高性能
低耗電
體積小
總生產成本低
Marketing
手機
PDA/MID/ 消費性電子產品
微型裝置
家電產品
各種網路設備( Router, Wifi AP... )
工業控制領域
ARMARM 晶片的眾多製造商晶片的眾多製造商
TI
Atmel
Broadcom
Freescale
Samsung
Marvell
IBM
Intel
Sharp
任天堂
Nvidia
… more
ARM HistoryAcorn Computers Ltd 自 1983 年開始研發
目標是打造類似 MOS Technology 6502 的晶片
直至 1985 年打造出第一顆 ARM1 晶片
1980 年代晚期 Apple 與 Acorn 合作研發新晶片
1990 年時設計團隊組成 Advanced RISC Machines Ltd.
這次選用的開發板
DevKit 8000(Beagleboard Clone)
DevKit 8000 接頭圖解
Processor
TI 德州儀器 OMAP3530600-MHz ARM Cortex™-A8
430-MHz TMS320C64x+™DSP Core
Core Supply is 0.8V to 1.35V
Video/Audio Support
S-Video
HDMI (1280x720 DVI-D)
LCD (2048x2048 24bits)
Mic In
Audio Out
Our LCD and Touch Panel
4.3 inch
480x272 24bits
Transfer Interface
USB2.0 Host/OTG (480Mbps)
SD/MMC
10/100M Ethernet
RS232
More DevKit8000 Specifications
110mm x 95mm
Input Vol +5V
0.5A 5V 2.5W
ARM Development
Serial Port (COM Port, USB)
Booting from NAND/SD/NFS
UART, SDIO, SPI, GPIO
General Booting Progress
BIOS (x86 Only)
Boot Loader
Linux Kernel
/sbin/init
RC Scripts (Services)
Graphical User Interface
Applications
Standby
Details for PC
Load BIOS
BIOS checks Hardware and scan Harddrive (POST)
Find MBR from Harddrive
Jump to MBR(Boot Loader)
Boot Loader Loads Linux kernel
Jump to Linux Kernel
Detect and Initiate Hardware
Mount Rootfs from Harddisk, Ramdisk, other storages...etc
Execute /sbin/init or /init
init program reads /etc/inittab
init program executes scripts to start up services(Initializing system, daemon, servers...etc)
GDM/SLIM service and Graphical Environment(Xorg server) are started
After login, starting GNOME/KDE/Other desktop environment (Here is Chromium Browser for Chromium OS)
Bootstrap
Details for ARM
xloader (Initializing Clock and RAM)
Jump to RAM to start up "uboot"
uboot Loads Linux kernel from NAND/NFS/SD Card
Jump to Linux Kernel
Detect and Initiate Hardware
Mount Rootfs from Harddisk, Ramdisk, other storages...etc
Execute /sbin/init or /init
init program reads /etc/inittab
init program executes scripts to start up services(Initializing system, daemon, servers...etc)
GDM/SLIM service and Graphical Environment(Xorg server) are started
After login, starting GNOME/KDE/Other desktop environment (Here is Chromium Browser for Chromium OS)
Bootstrap
Bootstrap for PC
POST(Power On Self Test) Initializes system hardware and chipset registers
Initializes power management
Tests RAM (Random Access Memory)
Enables the keyboard
Tests serial and parallel ports
Initializes hard disk drive controllers
Displays system summary information
Compares the system configuration data
Looks for the boot program
Bootstrap for ARMxloader
Initializes Clock
Initializes RAM
Starting up uboot
ubootLoad Kernel(From NAND/SD/Ethernet)
Execute kernel
Words for ARM Development
BSP
DSP
OpenGL|ES
Development Environments
OpenEmbedded
Emdebian
Ubuntu for ARMDebian Based
Linux 基礎
Root Filesystem/bin – executable binary for basic commands (ls, mount...)
/dev – device files (tty*, ram*, null, loop*, sd*...)
/etc – config files (resolv.conf, ldocnfig, hostname...)
/lib - libraries
/home – user's data
/sbin – executable binary for utility (mke2fs, init, ifconfig...)
/tmp
/usr/bin - executable binary for software
/usr/lib – libraries for software
/usr/sbin - executable binary for software
/proc – Linux kernel informations
/sys – Linux kernel informations
Linux Booting Progress
Linux Kernel
/sbin/init or /init
/etc/init.d/rcS
/etc/init.d/rc
/etc/init.d/rc*.d/S* (services)
X
GDM
GNOME/KDE/LXDM Desktop Environment
Graphical Applications
GUI
Early Stage
Understanding LiveCD/USB
xPUD
Damn Small Linux
Knoppix
More...
Analyse xPUD
Made in Taiwan
Size: 25MB
Boot Time: ~6s
Based on Ubuntu/Debian/Fedora...etc
xPUD Booting Progress
Bootloader Load Kernel/initrd
Jump to Kernel
Uncompress initrd
Mount initrd
Execute /init
Initializing system
Execute startx
Execute /etc/X11/xinitrc
運用工具了解 Linux 細節
dpkg -i <package>● 安裝 Package
dpkg -S <path>● 找出目標檔案所屬的 Package
dpkg -L <package>● 列出 Package 內包含的檔案清單
重要的檔案和程式
/etc/resolv.conf
/etc/passwd and /etc/groups
/etc/ldconfig
/sbin/ifconfig
/sbin/iwconfig
/sbin/insmod and /sbin/modprobe
一步步製作自己的 OS
Kernel
Command
Utility
Library
Write Scripts (/init)
Rootfs
親手使 ARM 開機
Power ON
xLoader
Uboot
Linux Kernel
Mount Rootfs
ARM 的開機流程
Prepare
Kernel (uImage) Root Filesystem
Uboot Commands
printenv - 列出環境設定
setenv - 設定變數
saveenv – 將環境設定燒回 Flash 供下次使用
boot - 開始執行開機動作
Booting from MMC/SD Card
mmcinit
fatload mmc 0:1 0x82000000 uImage
bootm 0x82000000
FAT File SystemuImage(Linux kernel)
EXT2 File System/bin /lib /sbin /usr ...
0:1
MMC/SD CardMMC/SD Card
Booting from Ethernet
Devkit8000
PC
uImage via tFTPuImage via tFTP
Rootfs via NFSRootfs via NFS
tFTP ServerNFS Server
IP: 140.128.36.30
IP: 140.128.36.210
ServerServer
Inatall tftp server
Install Package
sudo apt-get install tftpd openbsd-inetd
Create a Directory
mkdir /home/student/tftp
Setting /etc/inetd.conftftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd /home/student/tftp
Restart tftp serversudo /etc/init.d/openbsd-inetd restart
Install NFS Server
Install Package:
sudo apt-get install nfs-kernel-server
Create a Directory:
mkdir /home/student/roofs
Add a line to /etc/exports:
/home/student/roofs *(rw,no_root_squash,no_subtree_check,async)
Restart NFS server:
sudo /etc/init.d/nfs-kernel-server restart
Uboot Settings for tFTP
setenv bootcmd tftpboot 82000000\; bootm 82000000;
setenv ipaddr 140.128.36.30
setenv serverip 140.128.36.210
setenv bootfile uImage
setenv bootargs console=ttyS2,115200n8 noinitrd root=/dev/nfs rw nfsroot=140.128.36.210:/home/student/rootfs ip=140.128.36.30::140.128.36.254:255.255.255.0:non:eth0:off omapdss.def_disp=lcd
boot 同一行
實例安裝 Android
Suppose We have...
A Linux Kernel (uImage)
An Android Rootfs (android-rootfs.tgz)
Setup Android on Server
Move Kernel Image to tftp Directory:
mv uImage /home/student/tftp
Setup Android Rootfs on NFS Directory:
sudo tar -zxvf android-rootfs.tgz -C /home/student/rootfs
Setup Uboot
Setting bootargs for kernel:
setenv bootargs console=ttyS2,115200n8 noinitrd root=/dev/nfs rw nfsroot=140.128.36.210:/home/student/rootfs ip=140.128.36.30::140.128.36.254:255.255.255.0:non:eth0:off init=/init omapdss.def_disp=lcd
同一行
自己做Kernel + Rootfs
How to install Cross-compiler
Add Repository to /etc/apt/source.list● deb http://www.emdebian.org/debian/ unstable main
Install Emdebian Tools
● apt-get install emdebian-tools
Install cross-compiler and libraries
● apt-get install linux-libc-dev-armel-cross● apt-get install libc6-armel-cross libc6-dev-armel-cross● apt-get install binutils-arm-linux-gnueabi● apt-get install gcc-4.3-arm-linux-gnueabi● apt-get install g++-4.3-arm-linux-gnueabi
How to Cross-Compile kernel
Get Kernel Source
Prepare Kernel Config
Edit .config (make menuconfig)
Cross-compile for ARM
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- uImage
Build up Root Filesystem
sudo apt-get install rootstock dctrl-tools
bzr branch lp:project-rootstock
cd projector
sudo ./rootstock –fqdn beagleboard –login ubuntu –password temppwd –imagesize 2G –dist karmic –serial ttyS2 –seed xfce4,gdm,xubuntu-gdm-them,xubuntu-artwork
Sudo apt-get install qemu debootrap
Setup Our System
Change Root:cd /home/student/rootfssudo chroot .
Add two peremeter to /etc/apt/sources.list: universe multiverse
Update repository and install mplayer:apt-get updateapt-get install mplayer
Write Your First ARM Program with
Cross-compiler寫你的第一支 ARM 程式
CompilerCompiler
Source CodeSource Code
ProgramProgram
void main(){ ...} Binary
Eg, gcc
How to compile
Compile directly:
gcc -o example example.c
*.c *.c *.c *.c *.c *.c
*.o *.o *.o*.o *.o *.o*.o *.o
ProgramProgramCompile Process
Machine code
Link
X86X8632bits32bits
ARMARM32bits32bits
X86X8664bits64bitsIA64IA64 ARCARC
32bits32bits
ARMARM32bits32bits
NEONNEONPowerPCPowerPC
32bits32bits
AlphaAlpha32bits32bits
X86X8664bits64bits
AMD64AMD64
MIPSMIPS32bits32bits
MoreMoreX bitsX bits
MoreMoreX bitsX bits
MoreMoreX bitsX bits
MoreMoreX bitsX bitsMoreMore
X bitsX bits
fred@Fred-Debian:~$ fred@Fred-Debian:~$ cat /proc/cpuinfocat /proc/cpuinfoprocessorprocessor : 0: 0vendor_idvendor_id : GenuineIntel: GenuineIntelcpu familycpu family : 6: 6modelmodel : 23: 23model namemodel name : Intel(R) Core(TM)2 Duo CPU U9400 @ 1.40GHz: Intel(R) Core(TM)2 Duo CPU U9400 @ 1.40GHzsteppingstepping : 6: 6cpu MHzcpu MHz : 800.000: 800.000cache sizecache size : 3072 KB: 3072 KBphysical idphysical id : 0: 0siblingssiblings : 2: 2core idcore id : 0: 0cpu corescpu cores : 2: 2apicidapicid : 0: 0initial apicidinitial apicid: 0: 0fdiv_bugfdiv_bug : no: nohlt_bughlt_bug : no: nof00f_bugf00f_bug : no: nocoma_bugcoma_bug : no: nofpufpu : yes: yesfpu_exceptionfpu_exception : yes: yescpuid levelcpuid level : 10: 10wpwp : yes: yesflagsflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm ida tpr_shadow vnmi flexprioritytm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm ida tpr_shadow vnmi flexprioritybogomipsbogomips : 2793.05: 2793.05clflush sizeclflush size : 64: 64cache_alignmentcache_alignment : 64: 64address sizesaddress sizes : 36 bits physical, 48 bits virtual: 36 bits physical, 48 bits virtual
How About your System?
fred@Fred-Debian:~$ fred@Fred-Debian:~$ ls -l /bin/lsls -l /bin/ls-rw-rwxxr-r-xxr-r-xx 1 root root 91728 2010-03-06 21:23 /bin/ls 1 root root 91728 2010-03-06 21:23 /bin/ls
fred@Fred-Debian:~$ fred@Fred-Debian:~$ file /bin/lsfile /bin/ls/bin/ls: /bin/ls: ELF 32-bitELF 32-bit LSB executable, LSB executable, Intel 80386Intel 80386, version 1 , version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped2.6.18, stripped
NOTE: NOTE: ELF 32-bitELF 32-bit LSB executable, LSB executable, ARMARM, version 1 (SYSV), , version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, dynamically linked (uses shared libs), for GNU/Linux 2.6.16, strippedstripped
How to know format of binary file
fred@Fred-Debian:~$ fred@Fred-Debian:~$ ldd /bin/lsldd /bin/lslinux-gate.so.1 => (0xb7896000)linux-gate.so.1 => (0xb7896000)libselinux.so.1 => libselinux.so.1 => /lib/libselinux.so.1/lib/libselinux.so.1 (0xb7862000) (0xb7862000)librt.so.1 => librt.so.1 => /lib/i686/cmov/librt.so.1/lib/i686/cmov/librt.so.1 (0xb7859000) (0xb7859000)libacl.so.1 => libacl.so.1 => /lib/libacl.so.1/lib/libacl.so.1 (0xb7851000) (0xb7851000)libc.so.6 => libc.so.6 => /lib/i686/cmov/libc.so.6/lib/i686/cmov/libc.so.6 (0xb770a000) (0xb770a000)libdl.so.2 => libdl.so.2 => /lib/i686/cmov/libdl.so.2/lib/i686/cmov/libdl.so.2 (0xb7706000) (0xb7706000)/lib/ld-linux.so.2 (0xb7897000)/lib/ld-linux.so.2 (0xb7897000)libpthread.so.0 => libpthread.so.0 => /lib/i686/cmov/libpthread.so.0/lib/i686/cmov/libpthread.so.0
(0xb76ed000)(0xb76ed000)libattr.so.1 => libattr.so.1 => /lib/libattr.so.1/lib/libattr.so.1 (0xb76e8000) (0xb76e8000)
Check all Shared libs
fred@Fred-Debian:~$ fred@Fred-Debian:~$ gcc -vgcc -vUsing built-in specs.Using built-in specs.Target: i486-linux-gnuTarget: i486-linux-gnuConfigured with: ../src/configure -v --with-pkgversion='Debian Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.3-5' --with-bugurl=file:///usr/share/doc/gcc-4.4.3-5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c+4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-+ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i486 --with---enable-objc-gc --enable-targets=all --with-arch-32=i486 --with-tune=generic --enable-checking=release tune=generic --enable-checking=release --build=i486-linux---build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnugnu --host=i486-linux-gnu --target=i486-linux-gnuThread model: posixThread model: posixgcc version 4.4.3 (Debian 4.4.3-5)gcc version 4.4.3 (Debian 4.4.3-5)
Compiler Flags
Cross-CompilerCross-Compiler
Source CodeSource Code
ProgramProgram
void main(){ ...} Binary for Other platforms
Eg, arm-linux-gnueabi-gcc
gcc -mcpu=cortex-a8 -mfpu=vfp -mfloat-abi=softfpgcc -mcpu=cortex-a8 -mfpu=vfp -mfloat-abi=softfpgcc -mcpu=cortex-a8 -mfpu=neongcc -mcpu=cortex-a8 -mfpu=neon......
Compiler options for Platforms
Optimization with NEON intrinsics
Accelerate multimedia and signal processing algorithms
Video/Audio encode/decode
2D/3D graphic
Gaming
Working on 64 or 128 bit register in parallel
void reference_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n)void reference_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){{ int i;int i; for (i=0; i<n; i++) {for (i=0; i<n; i++) { int r = *src++; // load redint r = *src++; // load red int g = *src++; // load greenint g = *src++; // load green int b = *src++; // load blue int b = *src++; // load blue
// build weighted average:// build weighted average: int y = (r*77)+(g*151)+(b*28);int y = (r*77)+(g*151)+(b*28);
// undo the scale by 256 and write to memory:// undo the scale by 256 and write to memory: *dest++ = (y>>8);*dest++ = (y>>8); }}}}
Reference Implementation in C
15.1 cycles per pixel.
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n)void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){{ int i;int i; uint8x8_t rfac = uint8x8_t rfac = vdup_n_u8vdup_n_u8 (77); (77); uint8x8_t gfac = uint8x8_t gfac = vdup_n_u8vdup_n_u8 (151); (151); uint8x8_t bfac = uint8x8_t bfac = vdup_n_u8vdup_n_u8 (28); (28); n/=8;n/=8;
for (i=0; i<n; i++) {for (i=0; i<n; i++) { uint16x8_t temp;uint16x8_t temp; uint8x8x3_t rgb = vld3_u8 (src);uint8x8x3_t rgb = vld3_u8 (src); uint8x8_t result;uint8x8_t result;
temp = temp = vmull_u8vmull_u8 (rgb.val[0], rfac); (rgb.val[0], rfac); temp = temp = vmlal_u8vmlal_u8 (temp,rgb.val[1], gfac); (temp,rgb.val[1], gfac); temp = temp = vmlal_u8vmlal_u8 (temp,rgb.val[2], bfac); (temp,rgb.val[2], bfac);
result = result = vshrn_n_u16vshrn_n_u16 (temp, 8); (temp, 8); vst1_u8vst1_u8 (dest, result); (dest, result); src += 8*3;src += 8*3; dest += 8;dest += 8; }}}}
Optimization with NEON intrinsics
9.9 cycles per pixel.
160: f46a040f vld3.8 {d16-d18}, [sl]160: f46a040f vld3.8 {d16-d18}, [sl] 164: e1a0c005 mov ip, r5164: e1a0c005 mov ip, r5 168: ecc80b06 vstmia r8, {d16-d18}168: ecc80b06 vstmia r8, {d16-d18} 16c: e1a04007 mov r4, r716c: e1a04007 mov r4, r7 170: e2866001 add r6, r6, #1 ; 0x1170: e2866001 add r6, r6, #1 ; 0x1 174: e28aa018 add sl, sl, #24 ; 0x18174: e28aa018 add sl, sl, #24 ; 0x18 178: e8bc000f ldm ip!, {r0, r1, r2, r3}178: e8bc000f ldm ip!, {r0, r1, r2, r3} 17c: e15b0006 cmp fp, r617c: e15b0006 cmp fp, r6 180: e1a08005 mov r8, r5180: e1a08005 mov r8, r5 184: e8a4000f stmia r4!, {r0, r1, r2, r3}184: e8a4000f stmia r4!, {r0, r1, r2, r3} 188: eddd0b06 vldr d16, [sp, #24]188: eddd0b06 vldr d16, [sp, #24] 18c: e89c0003 ldm ip, {r0, r1}18c: e89c0003 ldm ip, {r0, r1} 190: eddd2b08 vldr d18, [sp, #32]190: eddd2b08 vldr d18, [sp, #32] 194: f3c00ca6 vmull.u8 q8, d16, d22194: f3c00ca6 vmull.u8 q8, d16, d22 198: f3c208a5 vmlal.u8 q8, d18, d21198: f3c208a5 vmlal.u8 q8, d18, d21 19c: e8840003 stm r4, {r0, r1}19c: e8840003 stm r4, {r0, r1} 1a0: eddd3b0a vldr d19, [sp, #40]1a0: eddd3b0a vldr d19, [sp, #40] 1a4: f3c308a4 vmlal.u8 q8, d19, d201a4: f3c308a4 vmlal.u8 q8, d19, d20 1a8: f2c80830 vshrn.i16 d16, q8, #81a8: f2c80830 vshrn.i16 d16, q8, #8 1ac: f449070f vst1.8 {d16}, [r9]1ac: f449070f vst1.8 {d16}, [r9] 1b0: e2899008 add r9, r9, #8 ; 0x81b0: e2899008 add r9, r9, #8 ; 0x8 1b4: caffffe9 bgt 1601b4: caffffe9 bgt 160
Assembly output
convert_asm_neon:convert_asm_neon:
# r0: Ptr to destination data# r0: Ptr to destination data # r1: Ptr to source data# r1: Ptr to source data # r2: Iteration count:# r2: Iteration count:
push push {r4-r5,lr} {r4-r5,lr} lsr r2, r2, #3lsr r2, r2, #3
# build the three constants:# build the three constants: mov r3, #77mov r3, #77 mov r4, #151mov r4, #151 mov r5, #28mov r5, #28 vdup.8 d3, r3vdup.8 d3, r3 vdup.8 d4, r4vdup.8 d4, r4 vdup.8 d5, r5vdup.8 d5, r5
NEON and Assembler
2.0 cycles per pixel.
.loop:.loop:
# load 8 pixels:# load 8 pixels: vld3.8 {d0-d2}, [r1]!vld3.8 {d0-d2}, [r1]!
# do the weight average:# do the weight average: vmull.u8 q3, d0, d3vmull.u8 q3, d0, d3 vmlal.u8 q3, d1, d4vmlal.u8 q3, d1, d4 vmlal.u8 q3, d2, d5vmlal.u8 q3, d2, d5
# shift and store:# shift and store: vshrn.u16 d6, q3, #8vshrn.u16 d6, q3, #8 vst1.8 {d6}, [r0]!vst1.8 {d6}, [r0]!
subs r2, r2, #1subs r2, r2, #1 bne .loopbne .loop
pop { r4-r5, pc }pop { r4-r5, pc }
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n)void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){{ int i;int i; uint8x8_t rfac = uint8x8_t rfac = vdup_n_u8vdup_n_u8 (77); (77); uint8x8_t gfac = uint8x8_t gfac = vdup_n_u8vdup_n_u8 (151); (151); uint8x8_t bfac = uint8x8_t bfac = vdup_n_u8vdup_n_u8 (28); (28); n/=8;n/=8;
for (i=0; i<n; i++) {for (i=0; i<n; i++) { uint16x8_t temp;uint16x8_t temp; uint8x8x3_t rgb = vld3_u8 (src);uint8x8x3_t rgb = vld3_u8 (src); uint8x8_t result;uint8x8_t result;
temp = temp = vmull_u8vmull_u8 (rgb.val[0], rfac); (rgb.val[0], rfac); temp = temp = vmlal_u8vmlal_u8 (temp,rgb.val[1], gfac); (temp,rgb.val[1], gfac); temp = temp = vmlal_u8vmlal_u8 (temp,rgb.val[2], bfac); (temp,rgb.val[2], bfac);
result = result = vshrn_n_u16vshrn_n_u16 (temp, 8); (temp, 8); vst1_u8vst1_u8 (dest, result); (dest, result); src += 8*3;src += 8*3; dest += 8;dest += 8; }}}}
Optimization with NEON intrinsics
Optimizations with Compiler
Analyze Source code
Logic Optimization
Optimize with specific hardware (NEON...etc)
How to use Cross-compiler
Compile directly:
arm-linux-gnueabi-gcc -o example example.c
Development without Real
Hardware在模擬器上開發
Get and compile QEMU
Download The Source Code
wget http://qemu-omap3.googlecode.com/files/qemu-omap3-v0.01.tar.bz2
tar jxvf qemu-omap3-v0.01.tar.bz2
cd qemu-omap3
Build qemu-omap3
./configure --target-list=arm-softmmu
make
Download and Configure
Download u-boot/kernel/rootfs
● cd arm-softmmu
● wget http://qemu-omap3.googlecode.com/files/image-v0.01.tar.bz2
● tar jxvf image-v0.01.tar.bz2
● wget http://beagleboard.googlecode.com/files/rd-ext2-8M.bin
Generate nand flash image
● dpkg-reconfigure dash
Generate images
Generate u-boot/kernel/rootfs
● cp ../bb_nandflash.sh .
● cp ../bb_nandflash_ecc .
● ./bb_nandflash.sh x-load.bin.ift beagle-nand.bin x-loader
● ./bb_nandflash.sh u-boot.bin beagle-nand.bin u-boot
● ./bb_nandflash.sh uImage beagle-nand.bin kernel
● ./bb_nandflash.sh rd-ext2-8M.bin beagle-nand.bin rootfs
● ./bb_nandflash_ecc beagle-nand.bin 0x0 0xe80000
Run QEMU
Run with commands:
./qemu-system-arm -M beagle -mtdblock beagle-nand.bin
Using Ctrl-Alt-3 to switch to serial port (uboot)● nand read 0x80000000 0x280000 0x400000
● nand read 0x81600000 0x680000 0x800000
● setenv bootargs 'console=ttyS2,115200n8 ramdisk_size=8192 root=/dev/ram0 rw rootfstype=ext2 initrd=0x81600000,8M nohz=0ff'
● bootm 0x80000000