Top Banner
Simon Jackson James Sleeman Pete Hemery
52

Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Simon JacksonJames SleemanPete Hemery

Page 2: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Simon Jackson

Page 3: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Divided into a number of “Zones”Zone_DMA : 0 – 16MBZONE_NORMAL : 16MB – 896MBZONE_HIGH : 896MB – 4GBMost Kernel operations may only

take place in ZONE_NORMALOrganised into Pages, x86 has 4KB

Pages Include/linux/mm_types.h

Page 4: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Each page has a struct page associated with it

The kernel maintains one or more arrays of these that track all of the physical memory on the system

Functions and macros are defined for translating between struct page pointers and virtual addresses

struct page *virt_to_page(void *kaddr); struct page *pfn_to_page(int pfn); void *page_address(struct page *page);

Page 5: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

0 – 16MBUsed for Direct Memory AccessLegacy ISA devices can only access

first 16MB of memory and thus the kernel tries to dedicate this area to them

Page 6: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

16MB – 896MBAKA Low MemoryNormally addressable region for

kernelKernel addresses that map it are

called Logical Addresses and have a constant offset from their physical addresses

Page 7: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

896MB – 4GB Kernel can only access by mapping into

ZONE_NORMAL Results in a virtual address, not logical Kmap first checks to see if page is

already in low memory Kmap uses a page table to track

mapped memory called pkmap_page_table which is located at PKMAP_BASE and set up during system initialisation

Page 8: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Virtual addresses mapped to physical memory by Page Tables

Each process has it’s own page tables Once the MMU is enabled, Virtual Memory

applies to all programs, including the kernel

Kernel doesn’t necessarily use that much physical memory, it just has that address space available to map physical memory

Page 9: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Kernel space is constantly present and maps the same physical memory in all processes – it is Resident

Marked as exclusive to privileged code in page tables, i.e. kernel only

Mapping for the user land VM changes whenever a process switch happens

Page 10: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

For devices that cannot access full address range, such as 32bit devices on 64bit systems

In memory low enough for device to address

Copied to desired page in high memory Used as buffer pages for DMA to and from

the device Data is copied via the bounce buffer

differently depending on whether it is a read or write buffer

Buffer can be reclaimed once IO done

Page 11: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

In 2.4, the high memory manager was the only subsystem that maintained emergency pools of pages

In 2.6, memory pools are implemented as a generic concept where a minimum of memory is needed even when memory is low

Two emergency pools are maintained for the express use by bounce buffers

Page 12: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Maintains a three-level architecture independent page table to handle 64 bit addresses

Architectures that manage their MMU differently emulate three-level page tables

Each process has a pointer to its own Page Global Directory (PGD) which is a physical page

Each active PGD entry points to a page containing an array of Page Middle Directory (PMD) entries

Each PMD entry points to a page of Page Table Entries (PTE), which in turn point at pages of actual data

Page 13: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Linear addresses may be broken up into parts to yield offsets within these three page table levels and an offset within the actual page

Macro definitions on x86

Page 14: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

James Sleeman

Page 15: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Slab allocation

Buddy Allocation

Mempools

Look aside buffers

Page 16: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

The main motivation for slab allocation is initialising and freeing Kernel data objects can outweigh the cost of allocating them.

With slab allocation, memory chunks suitable to fit data objects of certain type or size are preallocated.

Page 17: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 18: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Is a fast memory allocation technique that divides memory into power of 2 partitions and attempts to allocate memory on a best fit approach

When memory is freed by the user, the buddy block is checked to see if any of its contiguous neighbours have also been freed. If so, the blocks are combined to minimize fragmentation

Page 19: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

A memory pool has the type mempool_t, defined in <linux/mempool.h>

Page 20: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Kmalloc is a memory allocation function that returns contiguous memory from kernel space.

Void *kmalloc(size_t size, int flags) buf = kmalloc(BUF_SIZE, GFP_DMA |

GFP_KERNEL); void kfree(const void *ptr) Kfree(buf); <linux/slab.h> and <linux/gfp.h>

Page 21: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

#define BUF_LEN 2048

void function(void){ char buf[BUF_LEN]; /* Do stuff with buf

*/ }

#define BUF_LEN 2048

void function(void){ char *buf; buf =

kmalloc(BUF_LEN, GFP_KERNEL);

if (!buf) /* error! */}

Page 22: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

All flags are listed in include/linux./gfp.h

Type flags: GFP_ATOMIC GFP_NOIO GFP_NOFS GFP_KERNEL GFP_USER GFP_HIGHUSER GFP_DMA

Page 23: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

unsigned long get_zeroed_page(int flags);

unsigned long __get_free_page(int flags);

unsigned long __get_free_pages(int flags,

unsigned long order);

unsigned long __get_dma_pages(int flags,

unsigned long order);

Page 24: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

#include<linux/percpu.h>DEFINE_PER_CPU(type, name);get_cpu_var(sockets_in_use)++;put_cpu_var(sockets_in_use);per_cpu(variable, int cpu_id);cpu = get_cpu( )ptr = per_cpu_ptr(per_cpu_var, cpu);

Page 25: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

sudo cat /proc/slabinfo | awk '{printf "%5d MB %s\n", $3*$4/(1024*1024), $1}' | sort –n

0 MB vm_area_struct1 MB dentry2 MB ext4_inode_cache2 MB inode_cache8 MB buffer_head

Page 26: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Some of the causes of OOM: The kernel is really out of memory,

its used more memory than the system has in ram and swap

Kernel memory leaks Deadlocks kind of, writing data to

disk may require memory allocation OOM KILLER: Linux/mm/oom_kill.cvm enough memory();out_of_memory();

Page 27: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Thomas Habets had an unfortunate experience recently. His Linux system ran out of memory, and the dreaded "OOM killer" was loosed upon the system's unsuspecting processes. One of its victims turned out to be his screen locking program.

Page 28: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

DMA is a feature inside modern microcontrollers that allows other hardware subsystems to access system memory independently of the CPU.

Without DMA, large amount of CPU cycles are taken up, and PIO can be tied up for the entire duration of the read or write.

Page 29: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Useful websites: Kmalloc and more:

lwn.net/images/pdf/LDD3/ch08.pdfhttp://www.ibm.com/developerworks/linux/

library/l-linux-slab-allocator/http://www.makelinux.net/books/lkd2/

ch11lev1sec4

Page 30: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Pete Hemery

Page 31: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Programmed I/O (Polling) Simplest method but inefficient

Interrupt Driven I/O Interrupt Service Routine in Device Driver

How does the CPU know when a device is ready?

Page 32: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Direct Memory Access Bypasses the CPU to get to system memory

Page 33: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

A DMA deals with physical addresses, so: Programming a DMA requires retrieving a physical

address at some point (virtual addresses are usually used)

The memory accessed by the DMA shall be physically contiguous

The CPU can access memory through a data cache Using the cache can be more efficient (faster accesses

to the cache than the bus) But the DMA does not access the CPU cache, so care

needs to be taken for cache coherency (cache content vs. memory content)

Either flush or invalidate the cache lines corresponding to the buffer accessed by DMA and processor at strategic times

Page 34: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Need to use contiguous memory in physical space.

Can use any memory allocated by kmalloc (up to 128 KB) or __get_free_pages (up to 8MB).

Can use block I/O and networking buffers, designed to support DMA.

Can not use vmalloc memory (would have to setup DMA on each individual physical page).

Page 35: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Memory caching could interfere with DMA Before DMA to device:

Need to make sure that all writes to DMA buffer are committed.

After DMA from device: Before drivers read from DMA buffer, need

to make sure that memory caches are flushed.

Bidirectional DMA Need to flush caches before and after the

DMA transfer.

Page 36: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

The ARM Cortex™-A8 processor is based on the ARMv7 architecture and has the ability to scale in speed from 600MHz to greater than 1GHz. The Cortex-A8 processor can meet the requirements for power-optimized mobile devices needing operation in less than 300mW; and performance-optimized consumer applications requiring 2000 Dhrystone MIPS.

Cortex A8 Netbook

Page 37: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 38: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 39: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Arbitration

“The process by which the parties to a dispute submit their differences to the judgment of an impartial person or group appointed by mutual consent or statutory provision.”

Page 40: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 41: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 42: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Sitara™ ARM® Microprocessors Welcome to the Sitara™ ARM®

Microprocessors Section of the TI E2E Support Community. Ask questions, share knowledge, explore ideas, and help solve problems with fellow engineers. To post a question, click on the forum tab then "New Post".

This group contains forums for discussion on Cortex A8 based AM35x, AM37x and AM335x processors and ARM9 based AM1x processors.  For faster response please be sure to tag your post.http://e2e.ti.com/support/dsp/sitara_arm174_microprocessors/f/416/t/159602.aspx

Page 43: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 44: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 45: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 46: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 47: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 48: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

I am currently working on getting WLAN up and running. It seems that the SDIO driver is broken for libertas_sdio:

libertas_sdio: probe of mmc1:0001:1 failed with error -16

A second problem is the USB Host interface. it seems to be completely broken.

Hotplugging USB mouse:

[ 156.736999] drivers/hid/usbhid/hid-core.c: can't reset device, ehci-omap.0-2.3/input0, status -71

Adding a webcam:

[ 25.468078] Linux video capture interface: v2.00[ 25.566772] gspca: main v2.9.0 registered[ 25.703247] gspca: probing 046d:08da[ 25.725189] twl_rtc twl_rtc: rtc core: registered twl_rtc as rtc0[ 25.964385] lib80211: common routines for IEEE802.11 drivers[ 25.964416] lib80211_crypt: registered algorithm 'NULL'[ 26.142181] ads7846 spi1.0: touchscreen, irq 274[ 26.143157] input: ADS7846 Touchscreen as /devices/platform/omap2_mcspi.1/spi1.0/input/input1[ 26.617645] cfg80211: Calling CRDA to update world regulatory domain[ 26.829406] libertas_sdio: Libertas SDIO driver[ 26.829437] libertas_sdio: Copyright Pierre Ossman[ 26.835327] zc3xx: probe 2wr ov vga 0x0000[ 26.865203] zc3xx: probe sensor -> 0011[ 26.865234] zc3xx: Find Sensor HV7131R(c)[ 26.865936] input: zc3xx as /devices/platform/ehci-omap.0/usb1/1-2/1-2.3/input/input2[ 26.866424] gspca: video0 created[ 26.866455] gspca: found int in endpoint: 0x82, buffer_len=8, interval=10[ 26.866516] kernel BUG at arch/arm/mm/dma-mapping.c:409![ 26.871887] Unable to handle kernel NULL pointer dereference at virtual address 00000000[ 26.885864] libertas_sdio: probe of mmc1:0001:1 failed with error -16[ 26.915069] cfg80211: World regulatory domain updated:[ 26.915100] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)[ 26.915130] (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)[ 26.915161] (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)[ 26.915161] (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)[ 26.915191] (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)[ 26.915222] (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)[ 26.938995] pgd = cff58000[ 26.942321] [00000000] *pgd=8ff36031, *pte=00000000, *ppte=00000000[ 26.994537] Internal error: Oops: 817 [#1] PREEMPT[ 26.999359] last sysfs file: /sys/devices/platform/ehci-omap.0/usb1/1-2/1-2.3/bcdDevice[ 27.007415] Modules linked in: libertas_sdio libertas cfg80211 joydev rfkill ads7846 mailbox_mach lib80211 mailbox rtc_twl gspca_zc3xx(+) rtc_core gspca_main videodev v4l1_compat[ 27.023590] CPU: 0 Not tainted (2.6.35.3 #1)

Page 49: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

This is a case where a thorough knowledge of the hardware is essential to making the software work. DMA is almost impossible to troubleshoot without using a logic analyzer.

No matter what mode the transfers will ultimately use, and no matter what the source and destination devices are, I always first write a routine to do a memory to memory DMA transfer. This is much easier to troubleshoot than DMA to a complex I/O port. You can use your ICE to see if the transfer happened (by looking at the destination block), and to see if exactly the right number of bytes were transferred.

At some point you'll have to recode to direct the transfer to your device. Hook up a logic analyzer to the DMA signals on the chip to be sure that the addresses and byte count are correct. Check this even if things seem to work - a slight mistake might trash part of your stack or data space.

Some high integration CPUs with internal DMA controllers do not produce any sort of cycle that you can flag as being associated with DMA. This drives me nuts - one lousy extra pin would greatly ease debugging. The only way to track these transfers is to trigger the logic analyzer on address ranges associated with the transfer, but unfortunately these ranges may also have non-DMA activity in them.

Be aware that DMA will destroy your timing calculations. Bit banging UARTs will not be reliable; carefully crafted timing loops will run slower than expected. In the old days we all counted T-states to figure how long a loop ran, but DMA, prefetchers, cache, and all sorts of modern exoticness makes it almost impossible to calculate real execution time.

http://www.ganssle.com/articles/adma.htm

Page 50: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.
Page 51: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

Modified Version of omap_hsmmc_start_dma_transfer

Page 52: Simon Jackson James Sleeman Pete Hemery. Simon Jackson.

http://www.talktoanit.com/A+/aplus-website/lessons-io-principles.html

http://www.ti.com/lit/ug/spru234c/spru234c.pdf http://www.ti.com/lsds/ti/dsp/platform/sitara/whats_new.pa

ge?DCMP=AM33x_Announcement&HQS=am335x http://www.arm.com/products/processors/cortex-a/cortex-a

8.php http://www.ti.com/lit/ds/symlink/omap3530.pdf http://e2e.ti.com/support/dsp/

omap_applications_processors/f/447/t/96365.aspx http://www.ganssle.com/articles/adma.htm