Top Banner
Talend: tips and tricks part 2 Posted on August 26, 2014 by Jessica Smets In the first part of these entries we discussed how to test your expressions, the importance of optimizing the appearance of a tLogRow component and how to handle windows and views within Talend. This time around, we will be talking about the different ways to get components into your job, how to trace your dataflow and how to easily sync columns. As last time, this post will be useful for both starting and experienced users. 4. Getting components into your job There are many ways to get components into your job. Most people search the palette (by either the search-function or by manually exploring the folders) and drag/drop the components into their job. You can achieve the same thing by simply clicking on a random place in your job and then type the name of the component. Obviously this is only recommended once you’re familiar with the different components and their names. When working with metadata, you can use certain shortcuts to save a bit of time. Usually people just click on the metadata and then
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Talend

Talend: tips and tricks part   2 Posted on August 26, 2014 by Jessica SmetsIn the first part of these entries we discussed how to test your expressions, the importance of optimizing the appearance of a tLogRow component and how to handle windows and views within Talend. This time around, we will be talking about the different ways to get components into your job, how to trace your dataflow and how to easily sync columns. As last time, this post will be useful for both starting and experienced users.4. Getting components into your jobThere are many ways to get components into your job. Most people search the palette (by either the search-function or by manually exploring the folders) and drag/drop the components into their job. You can achieve the same thing by simply clicking on a random place in your job and then type the name of the component. Obviously this is only recommended once you’re familiar with the different components and their names.

When working with metadata, you can use certain shortcuts to save a bit of time. Usually people just click on the metadata and then drop it onto their job. This will pop up a window allowing you to choose which type of component you want to use. Holding the Control-key while dragging the component will directly create an Output-component. Holding Control+Shift will result into an Input-component.

Page 2: Talend

5. Syncing columnsOccasionally, you may have to change the schema of a certain component in the middle of development. This might affect other components in your job. In some cases, Talend asks if you want to propagate the changes you’ve made (to the other components).

You may accidently close this window, click “No” or not get this message at all, resulting in the following error: “The schema from the input link “youroutputlink”is different from the schema defined in the component”.

When this happens, you can go to the basic settings of the component that has the error and click on “Sync columns”. The error should now be gone.

6. Tracing your dataflow (Debug Run)Lastly, I would like to say a few words about the debug run. In some cases we want to closely watch our dataflow in order to get a better understanding of what’s exactly happening. You can achieve this by running your job in debug mode. This can be done by clicking on the Run-window, then click on the “Debug Run” tab on the left side of the window and start it by clicking on “Traces Debug”.

Page 3: Talend

The moment you open the “Debug run” tab, you’ll immediately see extra icons in your job. These magnifying glass icons indicate that details will be shown when you debug-run your job. The result should look something like this:

You can Pause and Resume the run at any time. You can also add breakpoints if you like. Do this by right-clicking on a dataflow and then selecting “Show Breakpoint Setup”.

Page 4: Talend

This brings you to the “Breakpoint” tab of the data flow you clicked on. You can also go there by clicking on the specific flow and manually selecting “Breakpoint”. Let’s add a breakpoint to pause our run whenever we come across a record with “Bloom” as last name. Firstly, make sure to check the “Activate conditional breakpoint” option. After that, click on the plus-icon underneath the conditions. Then select the InputColumn we want to put our condition on, in our case this is “Last_name”, and add a value (“Bloom” in this example). The default Operation is “Equals”, which is the one we want. You can also specify an Operation if you need to, but this is unnecessary for this case.

Page 5: Talend

You can add multiple breakpoints if you like. Whenever you debug run your job now, it will stop at a record where the Last_name is “Bloom” (if any exist).

That’s it for now. Thank you for reading!

Posted in data integration, ETL, Talend, Tips and Tricks | Tagged ETL, Talend | Leave a comment

Talend: tips and tricks part   1 Posted on August 4, 2014 by Jessica SmetsThis blog contains some convenient tips and tricks that will make working with the open source tool Talend for data integration a lot more efficient. This blogpost will be especially useful for people who are just discovering this amazing tool, yet I am sure that people who have been using it for a while will also find it very helpful. These series of tips will be spread over multiple blog entries so make sure to check back often for future tips!

1. Testing expressions in the tMap componentUsing the tMap component, you have the possibility to test your expressions. This way you can easily see whether or not the result is what you expected it to be. You can also use this to determine whether or not your expression will error. Let’s create an example.

We’ve got details of employees as input for our tMap. We would like the first name to be shown in uppercase. First of all, go into the expression builder by clicking the ellipsis next to your expression.

Page 6: Talend

To convert the first name to uppercase, we have to use the StringHandling function “UPCASE”. This will result in the following expression:StringHandling.UPCASE(employee.First_name)After you’re done filling in test values, click on the “Test!” button and wait for the result. If everything goes as expected, you should see your first name in uppercase on the right side of the window.

2. Optimizing the appearance of the tLogRow component outputtLogRow is one of the most frequently used components. It is recommended that you learn how to optimize its use. Firstly, make sure that you always have the right appearance selected for your output. You can find this property in the basic settings of your tLogRow-component.

Page 7: Talend

There are three types of Modes that you can choose between:

BasicBasic will generate a new line for each record, separated by the “Field Separator” you’ve chosen (see image above). When using basic mode, I highly recommend to check the “Print header” option when working with multiple column records or multiple outputs, purely for visibility reasons.

Table (print values in cells of a table)The table mode shows the records and their headers in a table-format, including the name of the component that generated this output (in our case: “tLogRow_1”). This emphasizes the importance of properly naming everything, especially when you have multiple components that generate output. In this case, it would have been better to rename our component to “EMPLOYEES”. Personally, I prefer this mode.

Page 8: Talend

Vertical (each row is a key value/list)Vertical mode will show a table for each one of your records.

The output mode you decide to use depends on what you’re trying to visualize. For example, when your goal is to show a single string, I would recommend using the basic mode. But when you have multiple table outputs (for example: departments, customers and employees in a single output), I’m certain the table mode would be the best option.

Sometimes your data is spread over multiple lines, resulting in an unclear output, like shown in the image below.

Page 9: Talend

To force the output to put all the data on one single line, you can uncheck the “Wrap” option. This option is located underneath your output and will enable a horizontal scrollbar.

Do you also want to be able to get data regarding tweets using Talend, as shown in the image above? Read my previous blogpost and find out how!3. Resetting windows and maximizing/minimizing themSometimes you accidently close a window and have a hard time finding a way to get it back. You can very easily reset your environment by clicking on “Window” – “Reset Perspective”.

Page 10: Talend

You can see all of the views by clicking on “Windows” – “Show View” – “Talend”. Some of the views are not shown by default, such as “Modules”. Modules can be used to import .jar-files without having to restart your studio, which will most likely save you some time.

Lastly, because Talend is Eclipse-based, you have the possibility to maximize and minimize windows. I personally use this function when examining the output of a tLogRow-component including a lot of data. You can achieve this by either double-clicking on the window or by right-clicking on it and selecting “Minimize”/”Maximize”.

That’s it for now. I hope you enjoyed reading this blog and make sure to return soon for future blogs!

Posted in data integration, ETL, Talend, Tips and Tricks | Tagged ETL, Talend | Leave a comment

Use of contexts within   Talend Posted on May 27, 2014 by Dieter Van RansbeekWhen developing jobs in Talend, it’s sometimes necessary to run them on different environments. For other business cases, you need to pass values between multiple sub-jobs in a project. To solve this kind of issues, Talend introduced the notion of “contexts”.In this blogpost we elaborate on the usage of contexts for easily switching between a development and a production environment by storing the connection data in context variables. This allows you to determine on which environment the job should run, at runtime, without having to recompile or modify your project.

Page 11: Talend

To start using contexts in Talend you have two possible scenario’s:1) you can create a new context group and its corresponding context variables manually, or2) you can export an existing connection as a context.In this example we’ll go over exporting an existing Oracle connection as a context.

Double click an existing database connection to edit it and click Next. ClickExport as context

Page 12: Talend

NOTE There are some connections that don’t allow you to export them as a context. In that case you’ll have to create the context group and its variables manually, add the group/variables to your job, and use the variables in the properties of the components of your job.

Page 13: Talend

After you’ve clicked the Export as context button you’ll see the Create/Edit context group screen. Enter a name, purpose and description and click Next.

Now you’ll see all the context variables that belong to this context group. Notice that Talend has already created all the context variables that are needed for the HR connection. If you want to change their names you can simply click them and they become editable.

Click the Values as table tab.