DMA(1)

Lab6: Using Direct Memory Access

1.1. DMA DAC

1.1.1. Objective In this lab, we will discuss two methods to generate a continuous time analog signal using a DAC. The first method uses an ISR but involves the CPU. The second method uses DMA transfers to write data from flash to the DAC data register. Although both methods are conceptually similar, the latter provides a way for us to learn about DMA data transfer in general. Since we have focused on ISR’s in Lab2, we will focus more on the DMA DAC data transfer. On completion of the exercise you will: • Understand the basic concepts of DMA: Channels, Transaction descriptors (TD’s), channel configuration (burst

count, source/destination addresses), TD configuration (transfer count, TD properties), hardware signals (like drq, trq, nrq etc.)

• Learn about different kinds of DMA transfer possible in the context of the PSoC architecture: (1) Point-to-Point Transfer, (2) Point-to-Array transfer, (3) Array to Point Transfer, (4) Array to Array Transfer and (5) Ping-Pong buffering.

• Perform a simple Array-to-Point Transfer using DMA at the end of this lab. • Correlate the ISR based and DMA based methods to generate a continuous time analog signal and be able to

determine where one should be used instead of the other.

1.1.2. Conceptual Background Direct Memory Access (DMA) is a ubiquitous feature in most modern computers allowing subsystems within a computer to access memory independent of the CPU. It turns out that many high-end microcontrollers and SoC’s have adapted this architectural feature. When using DMA we are copying one or more bytes of data from source to destination without any CPU intervention. At best, the CPU may be responsible to initiate and transfer and receive an interrupt from the Interrupt Controller when the transfer is complete. On a traditional computer, DMA is used in disk drive controllers, network cards, GPU’s and sound cards. Essentially, computation and data transfer can proceed in parallel. This is particularly useful when we need to the CPU to perform other useful work and not wait (blocked) while data transfer takes place (In this lab exercise, we keep the CPU free for other functions). DMA would be typically used when we want to unburden the CPU from the task of transferring data or when we know that the data needs to be transferred in a predictable way (this is the case for the lab exercise). On the PSoC, a Peripheral Hub (PHUB) is available which connects different on-chip peripherals (both analog and digital) through data buses called spokes. These spokes can be 8, 16 or 32 bits long. This bus can have two masters: the CPU and the DMA Controller (DMAC). In case, the CPU and DMAC both try to access a data bus at the same time, an arbitration logic circuit takes over and resolves the conflict (we will not need to worry about it). Now the DMA Controller (DMAC) can manage 24 independent channels and a maximum of 128 descriptors. The number of DMA channels indicates the number of independent DMA transfers that can take place at one time. Each channel is attached to one or more Transaction Descriptors (TD’s) – typically organized in a TD chain (In this lab exercise, we will be using one channel and one TD for simplicity). Each TD in turn describes a single transfer transaction and contains information such as source address, destination address, how many bytes to transfer and the next TD (if in a chain). The following snapshot should make this description clear. When a new DMA request signal is received (typically by the CPU or by a peripheral in our case), the DMAC gets access to the peripheral data buses. It looks up the source and destination addresses as configured in both the channel and TD configuration registers. When we do a DMA Channel Configuration we will need to specify the following: (1) Upper 16 bits of the source address, (2) Upper 16 bits of the destination address, (3) number of bytes the DMA channel must move (called Burst

count), (4) whether subsequent bytes will be transferred automatically or with individual DMA request signals, (5) what is the first TD in a channel and (6) whether it is essential to preserve the TD after a DMA request is complete. Likewise to make a Transaction Descriptor (TD) configuration, we will need to specify the following: (1) Lower 16 bits of the Source Address, (2) Lower 16 bits of the Destination Address, (3) how many bytes to be moved in one transfer, (4) what is the next TD (if in a chain, similar to a linked list), (5) how to find the next source and destination addresses. This is shown the subsequent snapshot.

Why do we need to handle the upper and lower 16 bits of the source and destination address separately? Isn’t this overkill? Yes, it is. But, it has got to do with the way the PSoC5 has been designed (you may see a slightly different behavior in other microcontrollers). It turns out that the DMA can not access SRAM at internal addresses 0x1FFF8000 to 0x1FFFFFFF (32 bit addresses) directly but it can access them if they are re mapped to 0x20008000 to 0x2000FFFF (device peculiarity). Something needs to make this translation. As firmware developers, we would prefer to be unaware of these details. PSoC’s designers enabled the DMA API’s to perform this translation for us. All we need to do is to pass the upper and lower 16 bits of the source and destination addresses to it separately. You can think of this situation as a coarse grained control with the upper 16 bits at the channel configuration stage and a more fine-grained control with the lower 16 bits in the TD configuration stage. Recall that the TD property where we apply an auto-increment on the source or destination addresses is done at the TD Configuration level. Please see AN52075: Getting Started with DMA on PSoC3 and PSoC5 (Introduction, Basic Concepts and DMA Configuration) – http://www.cypress.com/?docID=38384 to learn more. In this lab, we will be performing a simplified version of Example 3: Array-to-Point Transfer (please see the screenshot below) in the second half of this lab.

We will update the Data register (Digital value) of an 8-bit VDAC at regular intervals (controlled directly by our clock) with a sequence of values that are kept in a flash array (lookup table). These values create a sine wave at the DAC output. An exercise for this lab will be to replace the values in this lookup table to create a square, triangular and saw-tooth waves. A digital clock signal will be used whose frequency determines (1) the update rate of DAC and (2) new DMA requests. The lookup table, we use has 64 points representing a sine wave (you could graph these points using Excel to see for yourself). In one clock cycle, 1 byte is transferred from this LUT to the DAC’s data register. The DAC responds immediately and generates the analog equivalent of this value. In 64 clock cycles of the DMA, all 64 bytes will have been transferred sequentially (and a single sine wave generated). We will use a source clock, which is 64 times the desired frequency of the output wave. Say, we want a sine wave at 2kHz, therefore we will make the source clock 64x2KHz = 128 KHz.

With this information in mind, we will be doing the second half of this lab. Before we look at an example using DMA, let us try and understand how this would have been implemented without DMA i.e. using Interrupt Service Routines we learnt about in Lab2. We will trigger an ISR once every clock cycle. The ISR will read a value from a lookup table in sequence and increment a position counter. The ISR will then write the value to the DAC data register using the VDAC8_1_SetValue() function call. Once we hit the end of the lookup table, we will reset this counter back to 0. This process will go on indefinitely. Although this method will work well, note that the CPU will almost constantly be “servicing” the Interrupt Service Routine. If we would like to do some other operation with our code, it is very likely that events will be missed (because the CPU will constant be dropping everything and heading out to service the ISR). The recommended method to avoid this is to “offload” the task of loading the VDAC register to a DMA component, which will do it independent of the CPU (second half of this lab).

1.1.3. Equipment Required • PSoC Creator • PSoC Programmer • CY8CKIT-001 with CY8C5588AX*-060 (PSoC5) based processor module • Oscilloscope

Create a new project (say ISR_DAC)

1.1.4. Choose and Place User Modules • AnalogDAC Voltage DAC • System Clock • System Interrupt • Ports and Pins Digital Output Pin • Ports and Pins Analog Pin

You could wire them up in the schematic view as shown below:

1.1.5. Setting up Global and User Modules

• Configure the DAC as shown in the snapshots below.

• We select the output range as 0-4.080V. • We also select the High Speed mode, CPU/DMA Data source and External Strobe. External strobe

will allow us to trigger the VDAC from an external digital signal, in this case Clock_1.

• Default configuration for the Interrupt should be sufficient. We trigger it from Clock_1.

• For the moment, lets choose Clock1 to be 100Hz. Changing the clock will be an exercise for this lab.

1.1.6. Define Pin Out for the Device – Pins tab in CYDWR file for project. • Clock1 is routed to P1[2]. • VDAC_Out is routed to P1[4].

1.1.7. Application Code • In main.c

#include <device.h> CY_ISR_PROTO(waveGen); //Define function prototype void main() { // Enable the clock used to trigger the VDAC updates. Clock_1_Enable(); // Enable interrupt isr_1_Start(); isr_1_SetVector(waveGen); // Set waveGen function as ISR CyGlobalIntEnable; // Start VDAC8_1 VDAC8_1_Start(); for(;;) { } } CY_ISR(waveGen) { static uint8 cnt = 0;

extern volatile char8 recByte ; // Waveform table const uint8 sin[] = { // Sine waveform lookup table, 64 steps. 127, 139, 152, 164, 176, 187, 198, 208, 217, 226, 233, 239, 245, 249,

252, 254, 255, 254, 252, 249, 245, 239, 233, 226, 217, 208, 198, 187, 176, 164, 152, 139, 127, 115, 102, 90, 78, 67, 56, 46, 37, 28, 21, 15, 9, 5, 2, 0, 0, 0, 2, 5, 9, 15, 21, 28, 37, 46, 56, 67, 78, 90, 102, 115 } ;

cnt++; // Advance through table cnt &= 0x3F; // Any overflow when cnt >63 is zeroes (0x3f = 63)

// Update VDAC8 with value from the waveform table VDAC8_1_SetValue(sin[cnt]);

}

1.1.8. Programming • Build->Build Project • Select Debug->Program

1.1.9. Demonstrate Working Configuration • Connect P1[2] (Clock1) and P1[4] (VDAC_Out) to an oscilloscope. You should see something like this:

1.1.10. Exercises and Modifications 1. Replace the sine wave waveform with a (1) square wave (2) triangular wave, (3) saw tooth wave. You

should be able to replace the lookup table values and generate these other waves with minimal change in code.

2. In creating this lab, we intentionally chose a very low clock speed (Clock_1= 100Hz). Calculate what

Clock speed would be required to produce a 2 KHz wave (say sine wave)? Demonstrate that you can produce a 2 KHz sine wave.

3. What will happen if I wanted to use 128 data points in the LUT instead of 64 as shown below? Create a 128 point LUT to simulate a sine wave using this method? What should Clock_1 be if a 128 point LUT is used to generate a sine wave? Optional: Write a C program to generate this LUT for an arbitrary number of points.

1.1.11. Choose and Place User Modules • AnalogDAC Voltage DAC • System Clock • System DMA • Ports and Pins Digital Output Pin • Ports and Pins Analog Pin

We will wire these components up as shown in the schematic view below:

1.1.12. Setting up Global and User Modules

• Configure the DAC as shown in the snapshots below.

• This is the same as before. • We select the output range as 0-4.080V. • We also select the High Speed mode, CPU/DMA Data source and External Strobe. External strobe

will allow us to trigger the VDAC from an external digital signal, in this case Clock_1.

• We setup the DMA component’s hardware request line to be triggered by the rising edge of a digital signal. By default Hardware Request is disabled (i.e. Data request ‘drq’ pin is hidden). Therefore if the drq is unavailable, we can only begin DMA transactions through the CPU. By default “Notify on completion of request” ‘nrq’ pin is always available. This terminal may be connected to an interrupt or to some other component to indicate that DMA transfer is complete. The second option in the window is for Hardware termination – trq. When enabled, this would allow for a DMA transfer (already in progress) to be terminated prematurely based on an input. For this lab, we will not use the trq or nrq lines. Please look at the DMA component datasheet to learn more.

• We setup the Clock at 128 kHz. Why we choose this value will become clear later. Changing this clock rate is part of exercises in this lab.

1.1.13. Define Pin Out for the Device – Pins tab in CYDWR file for project. • Clock1 is routed to P1[2].

• VDAC_Out is routed to P1[4].

1.1.14. Application Code • In main.c

#include <device.h> void MyDMA(void); // Sine waveform lookup table, 64 steps. uint8 sine_table[64]= { 128,140,153,165,177,188,199,209,218,226,234,240,245,250, 253,254,255,254,253,250,245,240,234,226,218,209,199,188, 177,165,153,140,128,116,103,91,79,68,57,47,38,30,22,16, 11,6,3,2,1,2,3,6,11,16,22,30,38,47,57,68,79,91,103,116}; void main() { VDAC8_1_Start();

MyDMA();

for(;;) { /* Place your application code here. */ } } void MyDMA(void) { uint8 td0, MyChannel;

//Allocate and initialize a DMA channel to be used MyChannel=DMA_1_DmaInitialize(1,

1, HI16(sine_table), HI16(VDAC8_1_Data_PTR)); //Allocate TD0 for the DMA Channel td0=CyDmaTdAllocate(); //configure the TD0 to transfer 64 Bytes,

// Increment Source Address after each byte CyDmaTdSetConfiguration(td0,64,td0,TD_INC_SRC_ADR); //Make TD0 Transfer data from the sine table to the VDAC data register CyDmaTdSetAddress(td0,

LO16((uint32)sine_table), LO16((uint32) VDAC8_1_Data_PTR) ); //Initialize one channel with TD0 CyDmaChSetInitialTd(MyChannel, td0); //enable the DMA CyDmaChEnable(MyChannel,1); }

• The DMA_1_DmaInitialize function has the template: uint8 DMA_DmaInitialize(uint8 burstCount, uint8 requestPerBurst, uint16 upperSrcAddress, uint16 upperDestAddress) and returns a channel to be used for DMA transfer. We specify that the burstCount = 1, and that all subsequent bursts after the first one must also be individually requested. Next we specify the upper 16 bits of the source address followed by the upper 16 bits of the destination address. Look at the DMA Component Datasheet API Interface Section to learn more.

• The source address pointer is sine_table (effectively &sine_table[0]). Where do we find the desintation address pointer? Let’s look up VDAC8_1.h (Generated Sources VDAC8_1 VDAC8_1.h). You should see a section marked “Registers”. You will find the following two entries:

#define VDAC8_1_Data (* (reg8 *) VDAC8_1_viDAC8__D ) #define VDAC8_1_Data_PTR ( (reg8 *) VDAC8_1_viDAC8__D ) VDAC8_1_Data_PTR is a macro, which refers to an internal 8 bit register called VDAC8_1_viDAC8__D which is the actual data register. VDAC8_1_Data is a macro which refers to the data itself. Refer to the content of the function VDAC8_1_SetValue() to see how VDAC8_1_Data is actually set when we call the function.

• HI16 and LOW16 are PSoC Creator Macro’s that return the upper and lower 16 bits of a 32-bit address. • Next, the CyDmaTdAllocate() function has a template: uint8 CyDmaTdAllocate(void) and returns

an index for the TD to be used with an allocated DMA channel. This is a Transaction Descriptor function. • Next, we configure the Transaction Descriptor (TD0 in our case) to transfer 64 Bytes and increment the

Source address after each byte. Refer to the previous version of this lab with ISR’s, we manually incremented a cnt variable. CyDmaTdSetConfiguration() has a template: CyDmaTdSetConfiguration(uint8 tdHandle, uint16 transferCount, uint8 nextTd,uint8 configuration). Here tdHandle is the transaction descriptor handle returned by the CydmaTdAllocate() function in the previous line. transferCount is the transfer size of the data in bytes. nextTd is the index of the next transfer descriptor in the TD chain. We are not doing DMA chaining in this

simple example – therefore, we refer to the single TD we have defined already. Therefore, the nextTD to be executed is the same td0 (looped). The last function parameter refers to a Bit field of configuration bits. We choose to increment the source address according to the size of each data transaction in the burst (= 1 Byte)

• Next, we set the lower 16 bits of the source and destination address for this TD only. It has a template: CyDmaTdSetAddress(uint8 tdHandle, uint16 source, uint16 destination).

Here tdHandle is the transaction descriptor handle returned by the CydmaTdAllocate() function in the previous line. Source and destination refer to the lower 16 bits of the sinetable and VDAC data register respectively.

• Next we set the initial TD to be executed for the channel when a channel enable function is called. The function template is: CyDmaChSetInitialTd(uint8 chHandle, uint8 startTd). Here chHandle is the transaction descriptor handle returned by the CydmaTdAllocate() function in the previous line. startdTd is the index of the TD to be set as the first TD associated with the channel. Again, for this simple lab exercise, we use a single channel and a single TD for the channel; therefore our arguments are MyChannel and td0.

• Finally, we call CyDmaChEnable(..). Although, this enables the DMA channel, a software or hardware request must still happen before the channel is executed i.e. the drq pin on DMA_1 must be clocked. This has the template: CyDmaChEnable(uint8 chHandle, uint8 preserveTds). chHandle has the same meaning as before. preserveTDs lets us preserve the original TD state when the TD has completed (=1) or leave it in their current state (=0). We choose to keep the same TD settings (source, destination, transfer count) because we are using the TD’s in looped mode – the same TD is executed on every new DMA request.

Note: Writing DMA function calls may become tedious even for as a simple a task as in this lab. Tools DMA Wizard in the PSoC Creator IDE is a tool to generate the necessary C code automatically (basically the content of MyDMA() in this lab exercise). Please refer to the DMA component datasheet and Getting Started with DMA (Appendix B) - http://www.cypress.com/?docID=38384 to learn more. Such a wizard may not be available on all high-end microcontrollers, therefore we leave this for your self study.

1.1.15. Programming • Build->Build Project • Select Debug->Program

1.1.16. Demonstrate Working Configuration • Connect P1[2] (Clock1) and P1[4] (VDAC_Out) to an oscilloscope. You should see something like this:

1.1.17. Exercises and Modifications 1. Replace the sine wave waveform with a (1) square wave (2) triangular wave, (3) saw tooth wave. You

should be able to replace the lookup table values and generate these other waves with minimal change in code.

DMA(1)

Documents

device pins tab

user modules configure

pins analog pin

high speed mode

sine wave waveform

external digital signal

dma component datasheet

cpu dma data source