` Prerequisite Activity: Deploying the HDP and the Data Science VM
Contents
Overview ......................................................... 3
Deploy HDP Sandbox ..................................... 4
Create the VM ................................................. 6
Configure Azure Data Lake and SQL Data
Warehouse .................................................... 14
Terms of Use ................................................. 16
Interactive queries using Spark SQL on Azure HDInsight
3
Summary
In order for you to complete the labs we have prepared, you need
to ensure that you have an Azure subscription with admin
rights. This will allow you to create small clusters (max 4 nodes)
that we will utilize during the lab. We ask that you create the HDP
sandbox before arriving at the lab (see ‘Deploy HDP Sandbox).
Please liaise with your internal IT organization to gain the
necessary privileges to complete the lab.
Once your internal IT organization has granted you access to the
Azure Portal we highly recommend you complete the sections in
this document before coming to the lab to test the access granted.
This document should take no more than 30 minutes to complete.
If you have any difficulties at all then please get in contact with
your Microsoft representative.
The first lab will work with the Hortonworks sandbox environment.
We recommend you deploy this and shut it down before attending
the lab. We will also use the Twitter API, in the ‘Deploy HDP
Sandbox’ we’ve also included a link on instructions to set that up.
As part of this lab we will also be using Visual Studio to submit
Hive Queries. The software required to complete the lab is
already installed on a pre-configured VM in Azure called The Data
Science Virtual Machine. This virtual machine has the following
software installed:
Visual Studio 2015 Community Edition
Azure SDK.
Revolution R Open.
Power BI Desktop
SQL Server Express 2014
IPython
Azure PowerShell
Azure Storage Explorer
In this activity we will create an instance of this virtual machine
and install tools on the VM.
Overview
Interactive queries using Spark SQL on Azure HDInsight
4
Summary
Deplploying the Hortonworks sandbox ca be done before the lab,
and ensures you have all the necessary rights within your azure
subscription. It only takes a few minutes, and can be shutdown
once setup to prevent any further charges.
The instructions to setup a single node Hortonworks environment
are here: http://hortonworks.com/hadoop-tutorial/deploying-
hortonworks-sandbox-on-microsoft-azure/
Once deployed, you can shut the sandbox down by:
1. Log into the azure portal at https://portal.azure.com/
2. If you cannot see a VM on the dashboard with the name f
your cluster, you can search for it using the dialog at the
top:
3.
4.
5. Select your VM and in the Dashboard select ‘Stop’:
6.
7. Confirm you wish to stop the VM, confirm the status
updates to ‘stopped (Deallocated)’ after a few minutes to
ensure you aren’t charged for the Virtual machine:
Deploy HDP Sandbox
Interactive queries using Spark SQL on Azure HDInsight
5
8.
Create the Twitter API Keys
As part of the lab we’ll be collecting and processing Twitter feeds.
In order to connect to the Twitter API you’ll need to create a
twitter and and collect the API keys. Instructions for this can be
obtained here: http://www.gabfirethemes.com/create-twitter-api-
key/
Interactive queries using Spark SQL on Azure HDInsight
6
The HDInsight lab will also use Visual studio to connect to a
HDInsight cluster and run a Hive script. Visual studio community
edition is installed and configured on the Data Science VM, which
is freely available from Azure Market place.
1. Sign in to the Azure portal - https:// portal.azure.com/
2. Click on + New.
3. In the search box type Data science virtual machine press
the return key. You should see the following
4. Click on the Data Science Virtual Machine (published by
Microsoft)
5. Click on Create.
6. In the Basics Blade fill out a Name (n.b. this has to be a
unique name to the whole of Azure), User name, Password,
Resource group. Select a location nearest to you (this is the
location of the Microsoft data center). Example entry is
outlined below:
Create the VM
Interactive queries using Spark SQL on Azure HDInsight
7
7. The Size blade will pop up next. Select A3 (n.b. we will shut
down the VM at the end of this lab).
8. On the Settings blade click OK:
Interactive queries using Spark SQL on Azure HDInsight
8
9. On the Summary Blade click OK:
10. On the Buy Blade click Purchase:
Interactive queries using Spark SQL on Azure HDInsight
9
11. On the startboard you will see the VM being deployed. This
will take approximately 5-10minutes.
12. Once it is successfully deployed you will see the following on
the startboard:
13. Click on the VM you created from the startboard to get the
following page:
Interactive queries using Spark SQL on Azure HDInsight
10
14. Click on the Connect button as highlighted above. Save the
RDP file.
15. Double click on the downloaded RDP file to connect to the VM
and enter you credentials (note the \ before the username):
The next steps are optional, should you wish to explore other
features of the SDKs
1. Once you have connected to the Data Science Virtual
Machine install the Azure SDK by double clicking on the
Microsoft Web Platform shortcut on the desktop:
Interactive queries using Spark SQL on Azure HDInsight
11
In the installer click on Add for Microsoft Azure SDK for .Net
(VS 2015) - <VERSION NUMBER> and then install:
This takes approximately 5minutes to finish installing.
2. Ensure Azure Data Lake Tools for Visual Studio are installed
(Data Lake Tools for Visual Studio). Once Data Lake Tools
for Visual Studio is installed, you will see a Data Lake menu in
Visual Studio.
3. Next, install RTools by visiting the following site -
https://cran.r-project.org/bin/windows/Rtools/ - in Internet
Explorer and downloading Rtools33.exe
Interactive queries using Spark SQL on Azure HDInsight
12
Run through the installer ensuring that at the additional tasks
stage the following checkbox is ticked:
This updates the PATH environment variable so that various
R Tooling is available.
4. Close the VM by clicking the X on the blue bar highlighted
below:
5. Shutdown the VM by clicking on the Stop button on the VM
blade in the Azure preview portal (this will take a couple of
minutes).
Interactive queries using Spark SQL on Azure HDInsight
13
6. If you managed to successfully complete all these steps, then
you are ready for the Advanced Analytics lab!
Interactive queries using Spark SQL on Azure HDInsight
14
Introduction
This is an optional activity and not required to complete the labs.
However during the labs, you may also wish to review some of the
other Big Data Services available in Azure: Azure Data Lake and
the Data Lake analytics Service.
1) Familiarize with Azure Data Lake Store by reading this
2) By completing this tutorial you will enable your Azure
subscription for Data Lake Store Public Preview,
create an Azure Data Lake Store account and test
some basic Data Lake Store functionalities. At the end
don’t delete the ADL account.
3) Understand this post.
4) By completing this tutorial you will create a Data Lake
Analytics account, prepare source data and submit
Data Lake Analytics jobs
5) Familiarize with Azure SQL Data Warehouse by
reading this
6) Create a SQL Data Warehouse by completing this
tutorial
7) Configure integration with Visual Studio by
completing this tutorial
Configure Azure Data Lake and SQL Data Warehouse
Interactive queries using Spark SQL on Azure HDInsight
16
© 2015 Microsoft Corporation. All rights reserved.
By using this Hands-on Lab, you agree to the following terms:
The technology/functionality described in this Hands-on Lab is provided by
Microsoft Corporation in a “sandbox” testing environment for purposes of
obtaining your feedback and to provide you with a learning experience. You may
only use the Hands-on Lab to evaluate such technology features and
functionality and provide feedback to Microsoft. You may not use it for any other
purpose. You may not modify copy, distribute, transmit, display, perform,
reproduce, publish, license, create derivative works from, transfer, or sell this
Hands-on Lab or any portion thereof.
COPYING OR REPRODUCTION OF THE HANDS-ON LAB (OR ANY
PORTION OF IT) TO ANY OTHER SERVER OR LOCATION FOR FURTHER
REPRODUCTION OR REDISTRIBUTION IS EXPRESSLY PROHIBITED.
THIS HANDS-ON LAB PROVIDES CERTAIN SOFTWARE
TECHNOLOGY/PRODUCT FEATURES AND FUNCTIONALITY,
INCLUDING POTENTIAL NEW FEATURES AND CONCEPTS, IN A
SIMULATED ENVIRONMENT WITHOUT COMPLEX SET-UP OR
INSTALLATION FOR THE PURPOSE DESCRIBED ABOVE. THE
TECHNOLOGY/CONCEPTS REPRESENTED IN THIS HANDS-ON LAB MAY
NOT REPRESENT FULL FEATURE FUNCTIONALITY AND MAY NOT WORK
THE WAY A FINAL VERSION MAY WORK. WE ALSO MAY NOT RELEASE A
FINAL VERSION OF SUCH FEATURES OR CONCEPTS. YOUR
EXPERIENCE WITH USING SUCH FEATURES AND FUNCITONALITY IN A
PHYSICAL ENVIRONMENT MAY ALSO BE DIFFERENT.
FEEDBACK. If you give feedback about the technology features, functionality
and/or concepts described in this Hands-on Lab to Microsoft, you give to
Microsoft, without charge, the right to use, share and commercialize your
feedback in any way and for any purpose. You also give to third parties, without
charge, any patent rights needed for their products, technologies and services to
use or interface with any specific parts of a Microsoft software or service that
includes the feedback. You will not give feedback that is subject to a license that
requires Microsoft to license its software or documentation to third parties
because we include your feedback in them. These rights survive this
agreement.
MICROSOFT CORPORATION HEREBY DISCLAIMS ALL WARRANTIES AND
CONDITIONS WITH REGARD TO THE HANDS-ON LAB, INCLUDING ALL
WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER
EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR
PURPOSE, TITLE AND NON-INFRINGEMENT. MICROSOFT DOES NOT
MAKE ANY ASSURANCES OR REPRESENTATIONS WITH REGARD TO THE
ACCURACY OF THE RESULTS, OUTPUT THAT DERIVES FROM USE OF
THE VIRTUAL LAB, OR SUITABILITY OF THE INFORMATION CONTAINED IN
THE VIRTUAL LAB FOR ANY PURPOSE.
DISCLAIMER
This lab contains only a portion of the features and enhancements in Microsoft
Azure. Some of the features might change in future releases of the product.
Terms of Use