Interacting with Data on HDP using Apache Zeppelin and Apache Spark

In this section we are going to walk through the process of using Apache Zeppelin and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.

By the end of this tutorial, you will have learned:

  1. How to interact with Apache Spark from Apache Zeppelin
  2. How to read a text file from HDFS and create a RDD
  3. How to interactively analyze a data set through a rich set of Spark API operations

A short primer on Scala

Object-Oriented Meets Functional

Scala is relatively new language based on the JVM. The main difference between other “Object Oriented Languages” and Scala is that everything in Scala is an object. The primitive types that are defined in Java, such as int or boolean, are objects in Scala. Functions are treated as objects, too. As objects, they can be passed as arguments, allowing a functional programming approach to writing applications for Apache Spark.

If you have programmed in Java or C#, you should feel right at home with Scala with very little effort.

You can also run or compile Scala programs from commandline or from IDEs such as Eclipse.

Test on HDP 2.3.2 with Spark 1.4.1

