Interacting with Data on HDP using Apache Zeppelin and Apache Spark

In this section we are going to walk through the process of using Apache Zeppelin and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.

By the end of this tutorial, you will have learned:

  1. How to interact with Apache Spark from Apache Zeppelin
  2. How to read a text file from HDFS and create a RDD
  3. How to interactively analyze a data set through a rich set of Spark API operations

Continue reading

Running a MapReduce Job

Running a MapReduce Job (Nov 2015)

Test on:

  • Ubuntu 14.04.3 x64
  • Hadoop 2.7.1 (Pseudo-Distributed Mode)

I will use one of the examples that come with Hadoop package.

1. Preparation
2. Pi
3. WordCount

3.1 Download example input data
3.2 Copy local example data to HDFS
3.3 Run the MapReduce job
3.4 Retrieve the job result from HDFS

1. Preparation

change directory to $HADOOP_INSTALL

Continue reading

Install a single-node Hadoop on Ubuntu (ubuntu-14.04.3-desktop-amd64)

Install Hadoop 2.7.1 on Ubuntu (ubuntu-14.04.3-desktop-amd64) แล้วเข้าใช้งานด้วย putty บน Windows8.1

Hadoop สามารถติดตั้งได้ 3 แบบ (ในบทความนี้ติดตั้งแบบ Pseudo-Distributed Mode)

– Local (Standalone) Mode
– Pseudo-Distributed Mode
– Fully-Distributed Mode

1. สร้าง User และติดตั้งโปรแกรมพื้นฐาน

1.1 LXDE
1.2 putty
1.3 Java
1.4 Hadoop user
1.5 SSH
1.6 SSH Certificates

2. Install Hadoop

2.1 Download Hadoop
2.2 Setup Configuration Files

2.2.1 ~/.bashrc:
2.2.2 hadoop-env.sh
2.2.3 core-site.xml
2.2.4 mapred-site.xml
2.2.5 hdfs-site.xml

2.3 Starting Hadoop
2.4 Stopping Hadoop
2.5 Hadoop Web Interfaces

Continue reading