Interacting with Data on HDP using Apache Zeppelin and Apache Spark

Posted on 2 December, 2015 by jack

In this section we are going to walk through the process of using Apache Zeppelin and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.

By the end of this tutorial, you will have learned:

How to interact with Apache Spark from Apache Zeppelin
How to read a text file from HDFS and create a RDD
How to interactively analyze a data set through a rich set of Spark API operations

Continue reading →

Hands-on Tour of Apache Spark in 5 Minutes

Posted on 1 December, 2015 by jack

Hortonworks Sandbox

A Hands-On Example

Let’s open a shell to our Sandbox through SSH:

ssh -p 2222 root@127.0.0.1

1	ssh -p 2222 root@127.0.0.1

or putty

Continue reading →

Install Ambari

Posted on 26 November, 2015 by jack

Apache Ambari
https://ambari.apache.org/

test on

ubuntu-14.04.3-desktop-amd64
Ambari 2.1.2

1. Download
2. Install, Setup, and Start Ambari Server

2.1 Install Ambari Server
2.2 Setup Ambari Server
2.3 Start Ambari Server

3. Deploy Cluster using Ambari Web UI

Continue reading →

Benchmarking

Posted on 24 November, 2015 by jack

Hadoop distribution includes a number of benchmarks we can use.

1. Preparation
2. TestDFSIO

2.1 Run TestDFSIO in write mode and create data.
2.2 Run TestDFSIO in read mode.
2.3 Clean up the TestDFSIO data.

Continue reading →

MapReduce: WordCount Example

Posted on 23 November, 2015 by jack

WordCount v1.0

This works with a local-standalone, pseudo-distributed or fully-distributed Hadoop installation.

Continue reading →

Running a MapReduce Job

Posted on 22 November, 2015 by jack

Running a MapReduce Job (Nov 2015)

Test on:

Ubuntu 14.04.3 x64
Hadoop 2.7.1 (Pseudo-Distributed Mode)

I will use one of the examples that come with Hadoop package.

1. Preparation
2. Pi
3. WordCount

3.1 Download example input data
3.2 Copy local example data to HDFS
3.3 Run the MapReduce job
3.4 Retrieve the job result from HDFS

1. Preparation

change directory to $HADOOP_INSTALL

$ su hduser
$ cd /usr/local/hadoop

1 2	$ su hduser $ cd /usr/local/hadoop

Continue reading →

Install a single-node Hadoop on Ubuntu (ubuntu-14.04.3-desktop-amd64)

Posted on 14 September, 2015 by jack

Install Hadoop 2.7.1 on Ubuntu (ubuntu-14.04.3-desktop-amd64) แล้วเข้าใช้งานด้วย putty บน Windows8.1

Hadoop สามารถติดตั้งได้ 3 แบบ (ในบทความนี้ติดตั้งแบบ Pseudo-Distributed Mode)

– Local (Standalone) Mode
– Pseudo-Distributed Mode
– Fully-Distributed Mode

1. สร้าง User และติดตั้งโปรแกรมพื้นฐาน

1.1 LXDE
1.2 putty
1.3 Java
1.4 Hadoop user
1.5 SSH
1.6 SSH Certificates

2. Install Hadoop

2.1 Download Hadoop
2.2 Setup Configuration Files

2.2.1 ~/.bashrc:
2.2.2 hadoop-env.sh
2.2.3 core-site.xml
2.2.4 mapred-site.xml
2.2.5 hdfs-site.xml

2.3 Starting Hadoop
2.4 Stopping Hadoop
2.5 Hadoop Web Interfaces

Continue reading →

Phaisarn Sutheebanjard

blog.phaisarn.com

Category Archives: Hadoop

Interacting with Data on HDP using Apache Zeppelin and Apache Spark

Hands-on Tour of Apache Spark in 5 Minutes

A Hands-On Example

Install Ambari

Benchmarking

MapReduce: WordCount Example

WordCount v1.0

Running a MapReduce Job

1. Preparation

Install a single-node Hadoop on Ubuntu (ubuntu-14.04.3-desktop-amd64)