Top Banner
RapidAnalytics 1.0 User and Installation Manual Simon Fischer, Rapid-I GmbH November 7, 2011 Contents 1 Installation 3 1.1 Common Prerequisites ............................. 3 1.2 RapidAnalytics Installer ............................ 3 1.2.1 Headless Installation .......................... 4 1.3 RapidAnalytics/JBoss Bundle ......................... 4 1.3.1 Extracting the RapidAnalytics Archive ............... 4 1.3.2 Configuring the Database ....................... 4 1.3.3 Additional Configuration ....................... 5 1.4 Manual Installation ............................... 5 1.4.1 Prerequisites .............................. 5 1.4.2 Configuring the Database ....................... 5 1.4.3 Copy Additional Files ......................... 5 1.4.4 Configuring a Security Domain .................... 6 1.4.5 Additional Configuration ....................... 6 2 Launching RapidAnalytics 6 3 Initial Web-based Configuration 6 4 Migration from Earlier Versions of RapidAnalytics 7 5 Further Configuration 9 5.1 Setting up Database Connections ....................... 9 5.2 Creating a User ................................. 10 6 Connecting RapidMiner to RapidAnalytics 10 1
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RapidAnalytics Manual

RapidAnalytics 1.0User and Installation Manual

Simon Fischer, Rapid-I GmbH November 7, 2011

Contents

1 Installation 31.1 Common Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 RapidAnalytics Installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Headless Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 RapidAnalytics/JBoss Bundle . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Extracting the RapidAnalytics Archive . . . . . . . . . . . . . . . 41.3.2 Configuring the Database . . . . . . . . . . . . . . . . . . . . . . . 41.3.3 Additional Configuration . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Manual Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Configuring the Database . . . . . . . . . . . . . . . . . . . . . . . 51.4.3 Copy Additional Files . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.4 Configuring a Security Domain . . . . . . . . . . . . . . . . . . . . 61.4.5 Additional Configuration . . . . . . . . . . . . . . . . . . . . . . . 6

2 Launching RapidAnalytics 6

3 Initial Web-based Configuration 6

4 Migration from Earlier Versions of RapidAnalytics 7

5 Further Configuration 95.1 Setting up Database Connections . . . . . . . . . . . . . . . . . . . . . . . 95.2 Creating a User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Connecting RapidMiner to RapidAnalytics 10

1

Page 2: RapidAnalytics Manual

7 Working with RapidAnalytics and RapidMiner 127.1 Using the Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

7.1.1 Storing and Accessing Data in the Repository . . . . . . . . . . . . 137.1.2 Managing Access Rights . . . . . . . . . . . . . . . . . . . . . . . . 157.1.3 Accessing Data in Processes . . . . . . . . . . . . . . . . . . . . . . 16

7.2 Remote Process Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2.1 Running a Process Remotely . . . . . . . . . . . . . . . . . . . . . 187.2.2 Scheduled Process Execution . . . . . . . . . . . . . . . . . . . . . 187.2.3 Monitoring Job Execution . . . . . . . . . . . . . . . . . . . . . . . 19

7.3 Accessing Processes as Services . . . . . . . . . . . . . . . . . . . . . . . . 20

2

Page 3: RapidAnalytics Manual

1 Installation

There are three ways to download and install RapidAnalytics: The installer, the JBossbundle and the manual installation. The installer is the easiest way and is recommendedfor most users. The other installation types are recommended only for users that areexperienced in configuring application servers.

1.1 Common Prerequisites

Before you proceed with installing RapidAnalytics using any of the three methods, makesure you downloaded and installed the following:

• Download and install a Java Runtime Environment (JRE), at least version 1.6,e.g. from http://www.java.com.

• Install any SQL database. RapidAnalytics will store data and administrationinformation in there. The download bundle already contains JDBC drivers forMySQL, Ingres, Postgres, and Microsoft SQL Server. If you are using a differentdatabase make sure you also download an appropriate JDBC driver (jar file) forthis data base.

• In your SQL database server, create a new database and call it rapidanalytics.Also create a user rapidanalytics for this database and assign it a password.(Of course you can choose other names, but then you must also change the defaultvalues in the corresponding configuration steps below.)

• Install RapidMiner, version 5.1 or above. Get it from www.rapid-i.com. (Alter-natively, the Web start version of RapidMiner can be used.)

In case you have another JBoss instance installed on the same host, please make sureit does not conflict with the RapidAnalytics installation. To avoid such a conflict, makesure the environment variable JBOSS HOME is not set.

1.2 RapidAnalytics Installer

Unzip the downloaded file RapidAnalytics-Installer-1.1.xxx.zip to a directory ofyour choice and run the installer. If your java executable is on the path, open a commandshell and type:

java -jar RapidAnalytics-Installer-1.1.012.jar

On many systems you can as well double-click the jar file to execute it. However,note that on Windows, system-wide installation or installation of a Windows servicerequires administrator privileges. To obtain these, use “Run as administrator” whenopening the command shell.

3

Page 4: RapidAnalytics Manual

In the installer you can make various settings that are explained in the individualsteps. The most important one is the configuration of the database and user that youjust created. Don’t forget to check the connection once all settings are made.

Note: The installer does not start RapidAnalytics. Please continue reading inSection 2.

1.2.1 Headless Installation

If you want to install RapidAnalytics on a headless server, you can run the installeron any other machine to generate an installation configuration file without actuallyinstalling RapidAnalytics. Check the respective option in the last step of the installer.Specify all settings, especially directories, as if installing on the target server. Finally,copy the configuration file to the server and run the installer with a single commandline parameter which points to this file. This will run the installer without bringing upa window and using the settings in the file you created.

1.3 RapidAnalytics/JBoss Bundle

1.3.1 Extracting the RapidAnalytics Archive

First unzip RapidAnalytics-JBoss-bundle-1.1.xxx.zip to a place of your liking.Make sure no blank character appears in the parent path. (Beware of the Program Files

directory!) Denote the top-level directory that is thus created by ${RapidAnalytics}.If you are using any of the databases listed in in Section 1.1 above for which we

already provide a JDBC driver, you are done. If not, place the JDBC driver jar file to${RapidAnalytics}/server/default/lib.

1.3.2 Configuring the Database

Change to the directory ${RapidAnalytics}/server/default/deploy/ and search forfiles with names of the form rapidanalytics-XXX -ds.xml.template. Choose the onewhere XXX matches your database name, copy it to rapidanalytics-ds.xml, and editit. Search for the string XXX PASSWORD and replace it with the database password of theuser you created for RapidAnalytics. If you like, you can as well change user names anddatabase name in case you did not name them rapidanalytics as recommended. Incase your database server does not run on the same host as RapidAnalytics, you mustalso change the hostname from localhost to the appropriate host name.

If you run MySQL, Oracle, Microsoft SQL Server, or PostgreSQL, RapidAnalyticswill create the necessary database schema for you. Otherwise, you must create some ta-bles manually. Go to ${RapidAnalytics}/config/quartz/ and select the file tables XXX

where XXX matches your database system name, and run this script in your databasemanually to create the necessary tables.

4

Page 5: RapidAnalytics Manual

1.3.3 Additional Configuration

The default installation of RapidAnalytics is configured to use 1024 MB of main memory.To change this, edit bin/run.conf or bin/run.bat.conf, search for -Xmx1024m andreplace this number by the desired one.

Please continue reading in Section 2.

1.4 Manual Installation

This is only recommended for experienced users and developers that know how to con-figure an application server according to their needs, including defining a data sourceand security domain.

1.4.1 Prerequisites

RapidAnalytics will run on several recent application servers, but is tested only on JBoss6.0.0 Final. Download it and follow the vendor’s installation instructions. Also installSpring and Spring Security, version 3.1.

1.4.2 Configuring the Database

Follow the steps described in Section 1.3.2. In addition, copy the JDBC driver jar filefor your database to a place where the application server can find them.

1.4.3 Copy Additional Files

Create a folder for extensions and temporary files at a place of your choice. Preferrednames are plugins and tmp.

Go to the folder into which RapidMiner was installed. Copy all jar files from lib

and lib/freehep except rapidminer.jar and launcher.jar to a place where yourapplication server can find them.

If you want to enable Web Start, copy the same files including rapidminer.jar andlauncher.jar to a place that is served by your application server under the contextroot webstart. Sign the jars with jarsigner, or edit your Web browser’s Java plugin’ssecurity settings to accept unsigned classes.

To make the server redirect you directly to RapidAnalytics, place a file index.xhtmlso your application server serves it in the root directory:

<html>

<head>

<meta HTTP-EQUIV="REFRESH"

content="0; url=RA/faces/restricted/index.xhtml">

</head>

<body></body>

</html>

5

Page 6: RapidAnalytics Manual

Finally, copy the file RapidAnalytics-1.1.xxx.ear to the deploy directory of yourapplication server.

1.4.4 Configuring a Security Domain

In your application server, define a security domain RapidAnalyticsEJBDomain. ForJBoss, edit server/default/conf/login-config.xml and copy the application policyentry named client-login to the new name RapidAnalyticsEJBDomain.

1.4.5 Additional Configuration

Configure your application server according to your needs: Set memory consumption,port numbers, etc.

2 Launching RapidAnalytics

Change to the bin directory inside your installation directory. To start RapidAnalytics,run

run.bat -b 0.0.0.0

on Windows or

run.sh -b 0.0.0.0

on Unix-like systems (you may have to make the shell script executable by typing chmod

755 run.sh). This will launch RapidAnalytics listening on port 8080 on the local hostor on the port and hostname you configured.

You will see a lot of messages. Check whether anything unusual appears in the mes-sages. The error message “WARNING [com.sun.xml.bind.v2.runtime.reflect.opt.Injector](HDScanner) duplicate class definition bug occured? Please report this. . . ” can be ig-nored. Please don’t report it.

3 Initial Web-based Configuration

Point your favourite Web browser to http://localhost:8080 (assuming your applica-tion server listens on port 8080 which it does for the bundled download). You will bepresented with a login screen (Figure 1). The initial user and password are admin andchangeit.

After logging in, you will be presented with a setup screen (Figure 2). Check whetherRapidAnalytics detects your database system correctly so it can create the necessarytables. If this is not the case, create tables manually as described in Section 1.3.2, andcheck again.

Specify an absolute directory on your file system where RapidAnalytics searchesfor extensions and a directory in which RapidAnalytics places temporary files (“Uploaddirectory”). If you chose the installer or the bundled installation, these are the directories

6

Page 7: RapidAnalytics Manual

Figure 1: RapidAnalytics’ login screen. The initial user name is “admin”, password“changeit”.

• ${RapidAnalytics}/plugins and

• ${RapidAnalytics}/tmp.

For the manual installation, these are the folders you created in Section 1.4.3.Click Start installation now, and check potential error messages. If everything looks

as in Figure 3, you can click on Complete installation. You should then see the Rapid-Analytics welcome screen. That’s it.

You now see the Web interface of RapidAnalytics. In most views, there is a navigationbar on the left and a box with possible actions and online help on the right. The firstthing you should do is go to Administration, Preferences and change your administratorpassword (Figure 4).

4 Migration from Earlier Versions of RapidAnalytics

To migrate from RapidAnalytics 1.0 to 1.1 (or any other version), follow these steps:

• Backup your database.

• Run the installer or follow one of the other installation methods, using the samesettings as in your previous installation.

• Start RapidAnalytics.

• Go to the login screen. As of RapidAnalytics 1.1, passwords are MD5 hashed bydefault, so the old password no longer works. If you have configured a mail server,

7

Page 8: RapidAnalytics Manual

Figure 2: The RapidAnalytics installation procedure.

Figure 3: The RapidAnalytics installation procedure is complete.

8

Page 9: RapidAnalytics Manual

Figure 4: Changing the password is one of the first things you should do.

reset your password by clicking on “Forgot password”. If you haven’t, use yourdatabase administration tool and edit the table ra ent user. Reset the passwordof the admin user (but of no other user). This statement may help:

UPDATE ra_ent_user

SET passwd = MD5(passwd)

WHERE userName = "admin"

• Once you can log in again, do so. You will be presented a migration screen whichwill perform additional steps, including the hashing of the remaining passwords.

5 Further Configuration

RapidAnalytics is now ready to use. Before you start, you probably want to set up afew more things, including users and database connections.

5.1 Setting up Database Connections

One of the first things you probably want to configure in RapidAnalytics are yourdatabase connections. Here, the term “database connection” refers to connections todatabases that contain data that is to be analysed by RapidMiner and RapidAnalytics.This does not refer to the database connection you created exclusively for RapidAnalyticsto store administrative information. You can create multiple database connections inRapidAnalytics.

To do so, click on Administration, Database Connections in the menu on the left. Youwill see the screen depicted in Figure 5. Now, choose Create new connection entry fromthe box on the right and enter the data for your database connection as seen in Figure 6:database system, host, port, username, password, and a name under which it will be

9

Page 10: RapidAnalytics Manual

known in RapidMiner and RapidAnalytics. Press Submit and then Test in the box onthe right hand side. You should see “Ping succeeded” as in the figure. Otherwise, checkyour settings and network connection.

Figure 5: Creating a database connection in RapidAnalytics.

5.2 Creating a User

In everyday work, you should not work in RapidAnalytics as administrator. Instead ofthat, create a regular user by going to Administration, User management and selectingCreate new user from the box on the right hand side (see Figure 7).

You can as well create user groups in a very similar fashion. A list of users andgroups is available from the main User management view.

6 Connecting RapidMiner to RapidAnalytics

From the RapidAnalytics Web interface, you can launch RapidMiner via Web Start usingthe “Launch RapidMiner” button. If you do this, RapidMiner will be automaticallyconnected to RapidAnalytics: In your “Repositories” view, you will see a repositorynamed “Home”. This repository is actually your RapidAnalytics instance. Furthermore,database connections etc. will automatically be shared with RapidMiner.

If for some reason you do not like the Web Start solution, you can configure the con-nection to RapidAnalytics manually. Start RapidMiner and open the Repositories view.Click the Add Repository button (first button in the toolbar of the Repositories view), se-lect Remote repository, and enter the URL to your server, e.g. http://localhost:8080.Also fill in the username and password of the RapidAnalytics user you created.

10

Page 11: RapidAnalytics Manual

Figure 6: Entering connection details. The database connection test succeeded as indi-cated by the message box.

Figure 7: Managing users with RapidAnalytics.

11

Page 12: RapidAnalytics Manual

Figure 8: Connecting RapidMiner to RapidAnalytics.

Note: A common mistake becoming apparent at this stage is that the host that runs RapidAnalytics does

not know its own name. To check this, go to http://localhost:8080/RAWS/RepositoryService?wsdl.

Scroll to the bottom and search for something like this:

<port binding="tns:RepositoryServiceBinding" name="RepositoryServicePort">

<soap:address location="http://HOSTNAME:8080/RAWS/RepositoryService"/>

</port>

Check whether the host name is actually a host name under which the host is known in the local

network. If it is not, you will get weird error messages when connecting to it. This is mainly because

ISPs nowadays tend to redirect HTTP requests for unknown hosts to a search engine when they can’t

resolve a DNS entry rather than letting the request fail.

Once you are connected to RapidAnalytics, settings made in RapidAnalytics likedatabase connections etc. are also shared with RapidMiner. To check this, go to Tools,Manage database connections and check whether the database connections you definedin Section 5.1 have been published to RapidMiner.

7 Working with RapidAnalytics and RapidMiner

7.1 Using the Repository

Using RapidAnalytics as a server repository is straightforward if you know how to userepositories in RapidMiner: In the Repositories view you see a tree of folders, data, andRapidMiner processes. Using RapidMiner without RapidAnalytics, each of these entriesis stored on the local file system. With RapidAnalytics, the behaviour of RapidMinerstays the same, but the entries are stored on the server and can be accessed by a groupof people.

12

Page 13: RapidAnalytics Manual

7.1.1 Storing and Accessing Data in the Repository

We will walk you through some common steps in everyday work with RapidAnalytics.We assume that you connected RapidMiner to RapidAnalytics as described in Section 6and that you assigned the alias “RapidAnalytics” to the RapidAnalytics repository inRapidMiner. For using the repository, we first create a few folders and copy some data.

As a first step, locate the Repositories view in RapidMiner. If that view is cur-rently not showing on screen, go to View, Show View, Repositories. The top level ele-ments of the Repositories view shows the defined repositories. You should at least seea “Samples” repository and your “RapidAnalytics” repository. Now, open your homefolder in the “RapidAnalytics” repository. RapidAnalytics automatically created a folder/home/username where username is replaced by your user name.

First create two folders named data and processes: Right-click the folder corre-sponding to your user name, select Create Folder, and enter data. Repeat the same forcreating the processes folder. Now, also open the Samples repository, and navigate tothe data folder. Right-click the entry Labor-negotiations to open the context menuand select Copy. Now, right-click the data folder you just created and select Paste inthe same way.

Finally, create a new process. In RapidMiner, you typically specify the place atwhich a process is saved even before you create the process. Although this behaviourmay seem uncommon, you will soon see why saving the process first is a useful practice.Click File, New process (or use the first button in the tool bar), select the processes

folder you created, and enter Cleanse Data as a file name. Your (yet empty) processwill then be saved at this location. Your repository should now look as in Figure 9.

Figure 9: The repository populated with demo data.

You can use the repository just as any local repository in RapidMiner. However,you can inspect it also using the RapidAnlytics Web interface. To that end, either go to

13

Page 14: RapidAnalytics Manual

the Web interface and select Repository, Browse Repository from the navigation bar, orright-click a repository entry in RapidMiner’s repository tree and select Browse.

Each type of repository entry has an individual representation in the Web interface,but all have certain common parts:

• Actions available for an entry are at the top of the box at the right: Here, you can,e.g., rename and delete entries.

• Access rights can be defined for each entry. See Section 7.1.2 for details.

• Entries can be downloaded in a format appropriate for the type of entry.

• Entries can be navigated using the breadcrumps at the top.

Folders. Folders can contain other items (including sub-folders). Figure 10 shows anexample of our folders. Folders can be downloaded as a zip dump. In the box on theright, you can create subfolders or upload new entries.

Figure 10: A folder stored in the RapidAnalytics repository.

Data and Tables. This subsumes all kinds of objects that RapidMiner understands,including, e.g., example sets (i.e., tables), models, etc. Figure 11 shows the “Labor-Negotiations” data set we just copied to RapidAnalytics. The preview in the Webinterface displays the meta data of the table, i.e., the types and possible values of thecolumns. Also, you can download the table in various formats, e.g., as an HTML tableor as an Excel spreadsheet.

If you click on Dependencies you will see which processes read or generate this dataset.

14

Page 15: RapidAnalytics Manual

Figure 11: A table stored in the RapidAnalytics repository.

Processes. RapidMiner processes can also be stored on the server. They can be down-loaded as an XML file. Furthermore, in the Dependencies panel, RapidAnalytics showsthe input and output files of the processes, so you can navigate between linked objectsby a click.

Other Objects (Blobs). Finally, you can store objects like images and HTML files onthe server, in case you want to use them for reporting or other functionality. Rapid-Analytics does not interpret them, but just provides them for download exactly as theywere uploaded. Furthermore, you can use these blobs in processes by using operatorslike Open File and Read CSV.

7.1.2 Managing Access Rights

You can define individual access rights on a per-entry basis. For an example, look at thebox on the right in Figure 12. In the Permissions panel you see a list of three groups forwhich we have assigned access rights for this folder: The groups “users”, “simon”, and“rapid-i”. The group “users” contains all users that are created. The group “simon”contains only the user “simon”, and this cannot be changed. It is the user’s so-calledsingleton group. Finally, the group “rapid-i” is a custom group I made that containsthe user “simon”, among others. To edit the access rights for this entry, first click thesmall edit link. For each user you can grant (green check mark) or revoke (red cross) therights to read, write, and execute, respectively. You can remove the specifications fora particular group entirely by pressing the delete button in the rightmost column, andyou can insert a new group to the list of permission specifications by selecting a groupfrom the menu and pressing the plus sign.

15

Page 16: RapidAnalytics Manual

Figure 12: A RapidMiner process stored in the RapidAnalytics repository.

In this case, the user group “simon” has full access, whereas the group “users” isrejected. The group “rapid-i” has only the right to read from this folder. All otherpermissions are inherited from the parent folder.

7.1.3 Accessing Data in Processes

Now that we know how to access entries in our repository, let’s get back to designing ourfirst process in RapidMiner. You should still have the empty process named Cleanse

Data opened. The first thing you probably want to do in almost every process is to loadsome data. To that end, you have two choices:

• Drag the data set Labor Negotiations from the repository tree right into theprocess. RapidMiner will create a Retrieve operator and set the appropriateparameter referencing this entry.

• Drag the data set onto the input port in the upper left corner of the process.RapidMiner will connect it in the so-called process context. To show the processcontext, select View, Show view, Context. Using this option has two advantages overthe Retrieve operator: First, it can save space in the process view, since processestypically start with data loading operators. Second, the entries referenced in theprocess context are those that are displayed by the Web interface as links, asoutlined in Section 7.1.1.

Note that RapidMiner automatically inserted ../data/Labor-Negotiations as therepository entry parameter of the Retrieve operator or into the process context. Thisis a relative addressing of the repository entry: The sequence .. navigates one folder up(from the processes folder). This is a practice you should always use for two reasons:

16

Page 17: RapidAnalytics Manual

• You can move around folders without destroying functionality.

• RapidAnalytics can resolve them properly. Do not use the absolute repositoryname in the repository location (e.g. as //RapidAnalytics/home/data) becauseRapidAnalytics is an alias that only exists on your client. RapidAnalytics doesnot know that you are referencing it under this name in RapidMiner (you could,e.g., have several RapidAnalytics instances connected), and hence cannot resolvethis name. You can, however, use absolute locations without the leading repositoryreference //RapidAnalytics, i.e., only the part /home/simon/data/Labor-nego-tiations.

Before we execute our first RapidMiner process, we first add a bit of functionality.You may have noticed while looking at the Labor-Negotiations data set, that it con-tains a lot of missing values, indicated as question marks in RapidMiner. We replacethese values with more useful ones. Since we do not know what the correct values are,we just replace them with the average of the respective attribute (column). This is ex-actly what the Replace Missing Values operator does. In the Operators view, open DataTransformation, Data Cleansing, and drag the Replace Missing Values operator into theprocess. Connect its input port to the process input port on the left of the process view.

We must now tell RapidMiner to store the result of this process. Likewise, retrievingdata from the RapidAnalytics repository, we have two choices:

• Choose the Store operator from the Repository Access group in the Operators view.Drag it into the process, and connect its input to the topmost output of the ReplaceMissing Values operator. Enter ../data/Cleansed Data as the repository entry

parameter or select it using the repository location chooser available from the folderbutton next to this parameter. Again, RapidMiner will resolve the relative locationfor you.

• Instead, we can again use the process context as above: Just connect the topmostoutput of the Replace Missing Values operator to the process output port on theright and enter ../data/Cleansed Data as the first entry in the output port list inthe Context view. Here, too, you can use the folder button to bring up a repositorylocation chooser dialog.

For the same reasons, it is recommended to use the process context rather than usingthe Store operator. Your process should now look as depicted in Figure 13.

7.2 Remote Process Execution

You could now run your process locally on your desktop as usual in RapidMiner, pressingthe blue Play button. With RapidAnlaytics, you have a more powerful solution: Youcan run the RapidMiner process on the server, consuming no resources on the desktop,or run multiple processes simultaneously.

17

Page 18: RapidAnalytics Manual

Figure 13: Your first RapidAnalytics process.

7.2.1 Running a Process Remotely

Open the Remote Processes view using View, Show View. This view shows one top-levelentry for each RapidAnalytics installation you are connected to. To execute your processon the RapidAnalytics instance rather than on your local client machine, use the firstbutton in the toolbar at the top of the Remote Processes view. This item is also availabledirectly from the Process menu. RapidMiner will show the dialog presented in Figure 14.

For now, leave all options unchanged and press Ok. After a few seconds, you will seethat you can open the RapidAnalytics node in the Remote Processes view. You willsee an entry for the process you just started, together with information about when itstarted, when it completed, etc. If the process was still running, you would see at whichstage it was, but for such a small process this is unlikely to happen. Furthermore, youshould see the output produced by the process: The Cleansed Data table. The fact thatthis output (which is now stored on RapidAnalytics) is listed here is another advantageof using the process context. You can open this data in RapidMiner by selecting it andclicking the open folder icon in the toolbar of the Remote Processes view.

7.2.2 Scheduled Process Execution

In case you have long-running processes that you do not want to execute immediately,the remote execution dialog shown in Figure 14 provides the option to run a processonce, but later. In that case, you can choose a date and time using a date picker. Apartfrom that, the behaviour is equivalent.

For regular execution, you choose the option to schedule the process as a so-calledcron expression. Cron expressions are a compact yet powerful way to describe repeatingevents. In general, they take the form:

18

Page 19: RapidAnalytics Manual

Figure 14: Executing a process on a remote RapidAnalytics instance.

seconds minutes hours dayOfMonth month dayOfWeek [year]

For each entry you can specify a number or an asterisk (*), meaning “any”, or a questionmark, meaning “don’t care”. Use SUN-SAT for dayOfWeek and JAN-DEC for month. E.g.,the expression

0 0 1 * * ? *

means, everyday, at 1:00 am, on everyday of the month, no matter what day of week wehave. Note that you can use the asterisk only for dayOfMonth or dayOfWeek. Use thequestion mark for the other.

For dayOfMonth, you can use L to specify the last day of the month, or use MON#2

for dayOfWeek to specify the second Monday in a month, and FRI#L to specify the lastFriday. Furthermore, k/n means every n units of this interval, starting with k, so 5/20 inthe minutes field means every twenty minutes, starting at 5, so at 5, 25, and 45 minutesafter the hour. The complete cron expression would then be “0 5/20 * * ? *”.

7.2.3 Monitoring Job Execution

As mentioned in Section 7.2.1 you can monitor the running and completed processes inthe Remote Processes view of RapidMiner. You can filter the displayed list of processesby showing only the ones executed in this RapidMiner session (make sure you use thesame system time as the server does), showing only the processes of today, or by showingall processes (with a cap on the number of displayed processes).

You can as well monitor the running and scheduled processes in the Web interface.To that end, select Processes, Process Scheduler from the navigation bar. You will see ascreen similar to Figure 15.

19

Page 20: RapidAnalytics Manual

Figure 15: The Web interface to the process scheduler.

On the bottom, you can see a list of running and completed processes, together witherror messages, in case they aborted abnormally. E.g., the first process in the figure wasaborted because the user entered a wrong name for the input data (the dash is missing).

If a process is complete, you can directly click on the process’ output to navigate tothe corresponding repository entry and browse the data. Using the icon in the rightmostcolumn of the table, you can also access the log file. The log file is also accessible fromthe Remote Processes view of RapidMiner.

The current state of long-running processes is displayed in this view, similar to thefamiliar RapidMiner status bar. Processes can also be stopped here.

The list of processes scheduled for future execution is at the top of the page. You seea list of processes together with their last and next execution time. Each entry can beremoved using the leftmost icon or temporarily disabled using the icon next to it. Theentire scheduler can be paused using the link in the box on the right. This can be usefulfor system maintenance or before a system restart.

7.3 Accessing Processes as Services

One of the strengths of RapidAnalytics is the fact that you can access processes (orrather, their results), from outside, even without RapidMiner. To that end, we haveintroduced the concept of so-called services: You can simply expose RapidAnalyticsprocesses as Web services and easily define input parameters and output format. Tounderstand this, you must first understand the concept of macros. In RapidMiner, aprocess can use macros in place of any operator parameter. You can think of macros asvariables that take on different values.

To understand this concept, we re-use the process designed earlier. Recall that theReplace Missing Values operator by default replaces missing values by the respective

20

Page 21: RapidAnalytics Manual

average of the attribute. For the sake of simplicity, let us assume that we want tospecify the replacement value explicitly, but we want to make this particular value aconfigurable number. First, tell RapidMiner that the value replacement should onlybe applied to numerical attributes: Select the Replace Missing Values operator, setthe parameter attribute filter type to value type, value type to numeric, anddefault to value. For the actual replacement value we can now specify the parameterreplenishment value. This is a regular parameter and we could fill in a regular numberhere, but we use a macro: simply enter %{replacementValue}. If you would run theprocess now, RapidMiner would complain since the macro is not yet defined. Besidesdefining input and output, this is the third and last functionality of the Context view:In the bottom third of the Context view, press the Add macro button to add a new(the first) line to the macro table, enter “replacementValue” as the macro name and anumber, say 2, as the value. Your screen should look as depicted in Figure 16.

Figure 16: A service process configurable through macros.

If you run this process now, you will see that all missing values were replaced bythe number 2. Defining the macro in the process context is convenient in RapidMiner,because we can edit parameters we change frequently in a single place, but it has an ad-ditional advantage. Save the process and open it in the Web interface of RapidAnalytics(Figure 12). In the box on the right, you have an action Export as service. If you clickit, you will see a screen similar to Figure 17.

As you see, RapidAnalytics displays a list of macros defined in the process. In ourcase, there is only one such macro, replacementValue. RapidAnalytics proposes to bindthis macro to a service parameter of the same name. In this view, you can also makesettings that affect the output of the service. We select HTML as the output format.For now, we leave the remaining settings unchanged. Click Submit, and then choose Test

21

Page 22: RapidAnalytics Manual

Figure 17: Exporting a process as a service in the RapidAnalytics Web interface.

from the box on the right. You will see the screen in Figure 18.As you see, you are presented a form into which you can enter a value for the

replacementValue parameter. In our example we have filled in 5. On the bottom yousee the output of the service: The example set in HTML format, where all missing valueswere replaced by 5.

Despite the somehow artificial toy example, this shows that RapidAnalytics servicesare an extremely powerful tool to embed your processes into other IT environments:In Figure 18 you also see that there is a direct link to the process and embeddableHTML code. You can use this link to embed the process into any other page, simplysupplying the process macro replacementValue as a query parameter. In addition tothe representation as an HTML table, you can as well generate interactive charts, images,or, machine readable formats like XML files or JSON files.

22

Page 23: RapidAnalytics Manual

Figure 18: Applying a service process in the RapidAnalytics Web interface.

23