Contents
- 11. ติดตั้งโปรแกรมพื้นฐาน
- 1.11.1 LXDE
- 1.21.2 putty
- 1.31.3 Java
- 1.41.4 Hadoop user
- 1.51.5 SSH
- 1.61.6 SSH Certificates
- 22. Install Hadoop
- 2.12.1 Download Hadoop
- 2.22.2 Setup Configuration Files
- 2.2.12.2.1 ~/.bashrc:
- 2.2.22.2.2 hadoop-env.sh
- 2.2.32.2.3 core-site.xml
- 2.2.42.3.4 mapred-site.xml
- 2.2.52.2.5 hdfs-site.xml
- 2.32.3 Starting Hadoop
- 2.42.4 Stopping Hadoop
- 2.52.5 Hadoop Web Interfaces
- 3Link
Install Hadoop 2.7.1 on Ubuntu (ubuntu-14.04.3-desktop-amd64) แล้วเข้าใช้งานด้วย putty บน Windows8.1
Hadoop สามารถติดตั้งได้ 3 แบบ (ในบทความนี้ติดตั้งแบบ Pseudo-Distributed Mode)
– Local (Standalone) Mode
– Pseudo-Distributed Mode
– Fully-Distributed Mode
1. สร้าง User และติดตั้งโปรแกรมพื้นฐาน
1.1 LXDE
1.2 putty
1.3 Java
1.4 Hadoop user
1.5 SSH
1.6 SSH Certificates
2. Install Hadoop
2.1 Download Hadoop
2.2 Setup Configuration Files
2.2.1 ~/.bashrc:
2.2.2 hadoop-env.sh
2.2.3 core-site.xml
2.2.4 mapred-site.xml
2.2.5 hdfs-site.xml
2.3 Starting Hadoop
2.4 Stopping Hadoop
2.5 Hadoop Web Interfaces
1. ติดตั้งโปรแกรมพื้นฐาน
1.1 LXDE
LXDE – Lightweight X11 Desktop Environment (Optional)
1 |
$ sudo apt-get install lxde |
1.2 putty
Download putty on Windows
1 |
http://www.putty.org/ |
1.3 Java
1 |
$ sudo apt-get install openjdk-7-jdk |
append the following to the end of~/.bashrc:
1 |
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 |
ทดสอบ
1 |
echo $JAVA_HOME |
1.4 Hadoop user
Adding a dedicated Hadoop user
1 2 |
$ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser |
add hduser to sudo
1 2 3 4 5 |
$ sudo adduser hduser sudo Adding user `hduser' to group `sudo' ... Adding user hduser to group sudo Done. jack@jack14043:~$ |
1.5 SSH
1 2 3 4 5 |
$ sudo apt-get install openssh-server $ which ssh /usr/bin/ssh $ which sshd /usr/sbin/sshd |
ทดสอบ ssh เข้าเครื่องตัวเอง เพื่อตรวจสอบว่าใช้ได้รึยัง (จะมีการถาม password)
1 |
$ ssh localhost |
1.6 SSH Certificates
Create and Setup SSH Certificates: ปกติเวลาใช้ ssh จะมีการถาม password แต่เราจะกำหนด SSH Certificates เวลาใช้ ssh จะได้ไม่ถาม password
1 2 3 |
$ su hduser hduser $ ssh-keygen -t rsa -P "" hduser $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys |
ทดสอบ ssh อีกครั้ง จะไม่ถาม password แล้ว
1 |
hduser $ ssh localhost |
2. Install Hadoop
2.1 Download Hadoop
1 2 3 4 5 6 |
hduser $ wget http://www.us.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz hduser $ tar zxvf hadoop-2.7.1.tar.gz hduser $ cd hadoop-2.7.1 hduser $ sudo mkdir /usr/local/hadoop hduser $ sudo mv * /usr/local/hadoop hduser $ sudo chown -R hduser:hadoop /usr/local/hadoop |
2.2 Setup Configuration Files
The following files will have to be modified to complete the Hadoop setup:
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
2.2.1 ~/.bashrc:
Before editing the .bashrc file in our home directory, we need to find the path where Java has been installed to set the JAVA_HOME environment variable using the following command:
1 |
hduser $ update-alternatives --config java |
Now we can append the following to the end of~/.bashrc:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
hduser $ nano ~/.bashrc #HADOOP VARIABLES START export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib" #HADOOP VARIABLES END hduser $ source ~/.bashrc |
ถ้าติดตั้งบน i386 หรือ Raspbian ให้เปลี่ยนจาก
1 |
java-7-openjdk-amd64 |
เป็น
1 2 |
java-7-openjdk-i386 java-7-openjdk-armhf |
ตรวจสอบเวอร์ชันของ hadoop
1 2 3 4 5 6 7 8 |
hduser $ hadoop version Hadoop 2.7.1 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a Compiled by jenkins on 2015-06-29T06:04Z Compiled with protoc 2.5.0 From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar hduser $ |
ส่วนนี้จำเป็นในกรณีที่มี Java ติดตั้งมากกว่า 1 ตัวล่ะมั้ง เช่น OpenJDK, Oracle JDK เป็นต้น ถ้ามี Java ติดตั้งแค่ตัวเดียวคงไม่ต้องทำก็ได้ .. มั้ง
note that the JAVA_HOME should be set as the path just before the ‘…/bin/’:
1 2 3 4 5 6 7 |
hduser $ javac -version javac 1.7.0_75 hduser $ which javac /usr/bin/javac hduser $ readlink -f /usr/bin/javac /usr/lib/jvm/java-7-openjdk-amd64/bin/javac |
2.2.2 hadoop-env.sh
location: $HADOOP_INSTALL/etc/hadoop/
1 2 |
$ nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 |
2.2.3 core-site.xml
location: $HADOOP_INSTALL/etc/hadoop/
make directory tmp
1 |
$ mkdir /usr/local/hadoop/tmp |
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up.
This file can be used to override the default settings that Hadoop starts with.
Open the file and enter the following in between the <configuration></configuration> tag:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
hduser$ nano /usr/local/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>A base for other temporary directories. </description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. </description> </property> </configuration> |
2.3.4 mapred-site.xml
location: $HADOOP_INSTALL/etc/hadoop/
copy from mapred-site.xml.template to mapred-site.xml
1 |
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml |
The mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:
1 2 3 4 5 6 7 8 9 10 11 |
hduser $ nano /usr/local/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration> |
2.2.5 hdfs-site.xml
location: $HADOOP_INSTALL/etc/hadoop/
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being used.
It is used to specify the directories which will be used as the namenode and the datanode on that host.
Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation.
This can be done using the following commands:
1 2 3 |
hduser $ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode hduser $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode hduser $ sudo chown -R hduser:hadoop /usr/local/hadoop_store |
Open the file and enter the following content in between the <configuration></configuration> tag:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
hduser $ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration> |
Format the New Hadoop Filesystem
1 2 3 4 5 6 7 |
hduser $ hadoop namenode -format ... ... ... /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at jack-VirtualBox/127.0.1.1 ************************************************************/ |
Note that hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop file system.
2.3 Starting Hadoop
1 |
hduser $ cd /usr/local/hadoop/sbin |
1 |
hduser $ start-all.sh |
or
1 2 |
hduser $ start-dfs.sh hduser $ start-yarn.sh |
We can check if it’s really up and running:
1 2 3 4 5 6 |
hduser $ jps 9026 NodeManager 7348 NameNode 9766 Jps 8887 ResourceManager 7507 DataNode |
Another way to check is using netstat:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
hduser $ netstat -plten | grep java (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 21229 3007/java tcp 0 0 127.0.0.1:49798 0.0.0.0:* LISTEN 1001 20744 3007/java tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 20309 2882/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 21905 3222/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 19810 2882/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 20731 3007/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 21223 3007/java tcp6 0 0 :::57565 :::* LISTEN 1001 24753 3494/java tcp6 0 0 :::8030 :::* LISTEN 1001 23135 3368/java tcp6 0 0 :::8031 :::* LISTEN 1001 23127 3368/java tcp6 0 0 :::8032 :::* LISTEN 1001 23142 3368/java tcp6 0 0 :::8033 :::* LISTEN 1001 24770 3368/java tcp6 0 0 :::8040 :::* LISTEN 1001 24761 3494/java tcp6 0 0 :::8042 :::* LISTEN 1001 24766 3494/java tcp6 0 0 :::8088 :::* LISTEN 1001 24752 3368/java hduser $ |
2.4 Stopping Hadoop
1 |
hduser $ cd /usr/local/hadoop/sbin |
1 |
hduser $ stop-all.sh |
or
1 2 |
hduser $ stop-dfs.sh hduser $ stop-yarn.sh |
2.5 Hadoop Web Interfaces
Let’s start the Hadoop again and see its Web UI:
1 |
hduser $ /usr/local/hadoop/sbin/start-all.sh |
http://localhost:50070/ – web UI of the NameNode daemon (HDFS layer)
http://localhost:50090/ – SecondaryNameNode (HDFS layer)
http://localhost:50070/logs/ or http://localhost:50090/logs/ (HDFS layer)
Link
Single Node Setup
http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
Hadoop 2.6 Installing on Ubuntu 14.04 (Single-Node Cluster) – 2015
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php