Contents
Hadoop distribution includes a number of benchmarks we can use.
1. Preparation
2. TestDFSIO
2.1 Run TestDFSIO in write mode and create data.
2.2 Run TestDFSIO in read mode.
2.3 Clean up the TestDFSIO data.
1. Preparation
change directory to $HADOOP_INSTALL
1 2 |
$ su hduser $ cd /usr/local/hadoop |
2. TestDFSIO
YARN also includes an HDFS benchmark application called TestDFSIO. The TestDFSIO benchmark is useful for testing the I/O performance of the HDFS. This benchmark uses a MapReduce job to read and write files in separate map tasks, whose output is used for collecting statistics that are accumulated in the reduce tasks to produce a summary result.
ดูว่าตัวทดสอบ jobclient ที่ให้มา (hadoop-mapreduce-client-jobclient-2.7.1-tests.jar) ทำอะไรได้บ้าง
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
$ yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar An example program must be given as the first argument. Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. JHLogAnalyzer: Job History Log analyzer. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures NNdataGenerator: Generate the data to be used by NNloadGenerator NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job NNstructureGenerator: Generate the structure to be used by NNdataGenerator SliveTest: HDFS Stress Test and Live Data Verification. TestDFSIO: Distributed i/o benchmark. fail: a job that always fails filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) largesorter: Large-Sort tester loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. minicluster: Single process HDFS and MR cluster. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. sleep: A job that sleeps at each map and reduce task. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill |
2.1 Run TestDFSIO in write mode and create data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
$ yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 100 15/11/24 10:39:25 INFO fs.TestDFSIO: TestDFSIO.1.8 15/11/24 10:39:25 INFO fs.TestDFSIO: nrFiles = 5 15/11/24 10:39:25 INFO fs.TestDFSIO: nrBytes (MB) = 100.0 15/11/24 10:39:25 INFO fs.TestDFSIO: bufferSize = 1000000 15/11/24 10:39:25 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO … File Input Format Counters Bytes Read=560 File Output Format Counters Bytes Written=78 15/11/24 10:39:43 WARN hdfs.DFSClient: DFSInputStream has been closed already 15/11/24 10:39:43 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 15/11/24 10:39:43 INFO fs.TestDFSIO: Date & time: Tue Nov 24 10:39:43 ICT 2015 15/11/24 10:39:43 INFO fs.TestDFSIO: Number of files: 5 15/11/24 10:39:43 INFO fs.TestDFSIO: Total MBytes processed: 500.0 15/11/24 10:39:43 INFO fs.TestDFSIO: Throughput mb/sec: 45.41326067211626 15/11/24 10:39:43 INFO fs.TestDFSIO: Average IO rate mb/sec: 47.21734619140625 15/11/24 10:39:43 INFO fs.TestDFSIO: IO rate std deviation: 8.264251533352473 15/11/24 10:39:43 INFO fs.TestDFSIO: Test exec time sec: 14.704 15/11/24 10:39:43 INFO fs.TestDFSIO: |
The benchmark data is then appended to a local file named TestDFSIO_results.log
and written to standard output.
1 2 |
$ ls -l TestDFSIO_results.log -rw-r--r-- 1 hduser hadoop 297 พ.ย. 24 10:39 TestDFSIO_results.log |
1 2 3 4 5 6 7 8 9 |
$ cat TestDFSIO_results.log ----- TestDFSIO ----- : write Date & time: Tue Nov 24 10:39:43 ICT 2015 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 45.41326067211626 Average IO rate mb/sec: 47.21734619140625 IO rate std deviation: 8.264251533352473 Test exec time sec: 14.704 |
2.2 Run TestDFSIO in read mode.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
$ yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar TestDFSIO -read -nrFiles 5 -fileSize 100 15/11/24 10:48:50 INFO fs.TestDFSIO: TestDFSIO.1.8 15/11/24 10:48:50 INFO fs.TestDFSIO: nrFiles = 5 15/11/24 10:48:50 INFO fs.TestDFSIO: nrBytes (MB) = 100.0 15/11/24 10:48:50 INFO fs.TestDFSIO: bufferSize = 1000000 15/11/24 10:48:50 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO ... File Input Format Counters Bytes Read=560 File Output Format Counters Bytes Written=77 15/11/24 10:48:58 WARN hdfs.DFSClient: DFSInputStream has been closed already 15/11/24 10:48:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 15/11/24 10:48:58 INFO fs.TestDFSIO: Date & time: Tue Nov 24 10:48:58 ICT 2015 15/11/24 10:48:58 INFO fs.TestDFSIO: Number of files: 5 15/11/24 10:48:58 INFO fs.TestDFSIO: Total MBytes processed: 500.0 15/11/24 10:48:58 INFO fs.TestDFSIO: Throughput mb/sec: 209.64360587002096 15/11/24 10:48:58 INFO fs.TestDFSIO: Average IO rate mb/sec: 214.97463989257812 15/11/24 10:48:58 INFO fs.TestDFSIO: IO rate std deviation: 35.67318429936355 15/11/24 10:48:58 INFO fs.TestDFSIO: Test exec time sec: 5.841 15/11/24 10:48:58 INFO fs.TestDFSIO: |
log
1 2 3 4 5 6 7 8 9 |
$ cat TestDFSIO_results.log ----- TestDFSIO ----- : read Date & time: Tue Nov 24 10:48:58 ICT 2015 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 209.64360587002096 Average IO rate mb/sec: 214.97463989257812 IO rate std deviation: 35.67318429936355 Test exec time sec: 5.841 |
2.3 Clean up the TestDFSIO data.
1 2 3 4 5 6 7 8 |
$ yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar TestDFSIO -clean 15/11/24 12:07:10 INFO fs.TestDFSIO: TestDFSIO.1.8 15/11/24 12:07:10 INFO fs.TestDFSIO: nrFiles = 1 15/11/24 12:07:10 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 15/11/24 12:07:10 INFO fs.TestDFSIO: bufferSize = 1000000 15/11/24 12:07:10 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 15/11/24 12:07:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/24 12:07:12 INFO fs.TestDFSIO: Cleaning up test files |