Install a single-node Hadoop on Ubuntu (ubuntu-14.04.3-desktop-amd64)

Install Hadoop 2.7.1 on Ubuntu (ubuntu-14.04.3-desktop-amd64) แล้วเข้าใช้งานด้วย putty บน Windows8.1

Hadoop สามารถติดตั้งได้ 3 แบบ (ในบทความนี้ติดตั้งแบบ Pseudo-Distributed Mode)

– Local (Standalone) Mode
– Pseudo-Distributed Mode
– Fully-Distributed Mode

1. สร้าง User และติดตั้งโปรแกรมพื้นฐาน

1.1 LXDE
1.2 putty
1.3 Java
1.4 Hadoop user
1.5 SSH
1.6 SSH Certificates

2. Install Hadoop

2.1 Download Hadoop
2.2 Setup Configuration Files

2.2.1 ~/.bashrc:
2.2.2 hadoop-env.sh
2.2.3 core-site.xml
2.2.4 mapred-site.xml
2.2.5 hdfs-site.xml

2.3 Starting Hadoop
2.4 Stopping Hadoop
2.5 Hadoop Web Interfaces

1. ติดตั้งโปรแกรมพื้นฐาน

1.1 LXDE

LXDE – Lightweight X11 Desktop Environment (Optional)

1.2 putty

Download putty on Windows

1.3 Java

append the following to the end of~/.bashrc:

ทดสอบ

1.4 Hadoop user

Adding a dedicated Hadoop user

add hduser to sudo

1.5 SSH

ทดสอบ ssh เข้าเครื่องตัวเอง เพื่อตรวจสอบว่าใช้ได้รึยัง (จะมีการถาม password)

 

1.6 SSH Certificates

Create and Setup SSH Certificates: ปกติเวลาใช้ ssh จะมีการถาม password แต่เราจะกำหนด SSH Certificates เวลาใช้ ssh จะได้ไม่ถาม password

ทดสอบ ssh อีกครั้ง จะไม่ถาม password แล้ว

2. Install Hadoop

2.1 Download Hadoop

2.2 Setup Configuration Files

The following files will have to be modified to complete the Hadoop setup:
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

2.2.1 ~/.bashrc:

Before editing the .bashrc file in our home directory, we need to find the path where Java has been installed to set the JAVA_HOME environment variable using the following command:

Now we can append the following to the end of~/.bashrc:

ถ้าติดตั้งบน i386 หรือ Raspbian ให้เปลี่ยนจาก

เป็น

ตรวจสอบเวอร์ชันของ hadoop

ส่วนนี้จำเป็นในกรณีที่มี Java ติดตั้งมากกว่า 1 ตัวล่ะมั้ง เช่น OpenJDK, Oracle JDK เป็นต้น ถ้ามี Java ติดตั้งแค่ตัวเดียวคงไม่ต้องทำก็ได้ .. มั้ง

note that the JAVA_HOME should be set as the path just before the ‘…/bin/’:

2.2.2 hadoop-env.sh

location: $HADOOP_INSTALL/etc/hadoop/

2.2.3 core-site.xml

location: $HADOOP_INSTALL/etc/hadoop/

make directory tmp

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up.
This file can be used to override the default settings that Hadoop starts with.

Open the file and enter the following in between the <configuration></configuration> tag:

 

2.3.4 mapred-site.xml

location: $HADOOP_INSTALL/etc/hadoop/

copy from mapred-site.xml.template to mapred-site.xml

The mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:

2.2.5 hdfs-site.xml

location: $HADOOP_INSTALL/etc/hadoop/

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being used. 
It is used to specify the directories which will be used as the namenode and the datanode on that host.

Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation. 
This can be done using the following commands:

Open the file and enter the following content in between the <configuration></configuration> tag:

Format the New Hadoop Filesystem

Note that hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop file system.

2.3 Starting Hadoop

or

We can check if it’s really up and running:

Another way to check is using netstat:

2.4 Stopping Hadoop

or

2.5 Hadoop Web Interfaces

Let’s start the Hadoop again and see its Web UI:

http://localhost:50070/ – web UI of the NameNode daemon (HDFS layer)

localhost50070

http://localhost:50090/ – SecondaryNameNode (HDFS layer)

localhost50090

http://localhost:50070/logs/ or http://localhost:50090/logs/ (HDFS layer)

localhost50070logs

Link

Single Node Setup
http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

Hadoop 2.6 Installing on Ubuntu 14.04 (Single-Node Cluster) – 2015
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php