MapReduce examples are at
hadoop-[版本]/share/hadoop/mapreduce. Depending on where you installed Hadoop, this path may vary. For the purposes of this example, we define:
$YARN_HOMEIt must be defined as part of the installation. Also, the following examples have a version label, in this case "2.1.0-beta". Your installation may have a different version tag.
The following sections provide some sample Hadoop YARN benchmarks and programs.
List of available examples
$YARN_HOMEenvironment variable, we can get a list of available examples by running:
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.1.0-beta.jar
This command returns a list of available samples:
The example program must be specified as the first argument. Valid program names are: word_aggregation: an aggregation-based map/reduce program that counts the words in the input file. aggregatewordhist: an aggregation-based mapper/reduce program that computes a histogram of words in an input file. bbp: a map/reduce program that calculates the exact number of Pi using Bailey-Borwein-Plouffe. dbcount - An example job that counts database page views. distbbp: a mapper/reduce program that uses a formula similar to BBP to calculate the exact bits of Pi. grep: a mapper/reduce program that counts regular expression matches in the input. join: A job that performs the union of ordered data sets of equal partitions. multifilewc: a job that counts words in multiple files. pentomino: a tiling/mapping program that finds solutions to five-bone problems. pi: A mapping/reduction procedure for estimating Pi using quasi-Monte Carlo methods. randomtextwriter: a mapper/reduce program that writes 10 GB of random text data per node. randomwriter: a mapper/reduce program that writes 10GB of random data per node. Secondary Sort: Defines an instance of a secondary sort for reduction. sort: a mapper/reduce program that sorts data written by a random writer. Sudoku - Sudoku solving. teragen: generate data for terasart. terrasart: Start terrasart. teravalidate: Verify the terasart result. wordcount: a mapper/reduce program that counts the number of words in an input file. wordmean: a mapper/reduce program that calculates the average length of words in an input file. wordmedian: a mapper/reduce program that calculates the average length of words in an input file. wordstandarddeviation: a mapper/reduce program that calculates the standard deviation of word lengths in an input file.
To illustrate the various capabilities of Hadoop YARN, we'll show you how to run it
Run the pi example
To run the pi example with 16 maps and 100,000 samples, run the following command:
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.1.0-beta.jar pi 16 100000
This command should return the following output (after the Hadoop message):
14/10/13 20:10:01 INFO mapreduce.Job: map 0% reduce 0% 10/13/14 20:10:08 INFO mapreduce.Job: map 25% reduce 0% 10/13/14 20:10 :16 INFO mapreduce.Job: map 56% reduce 0% 13/10/14 20:10:17 INFO mapreduce.Job: map 100% reduce 0% 13/10/14 20:10:17 INFO mapreduce.Job: map 100% reduction 100% 10/13/14 20:10:17 INFO mapreduce.Job: Job job_1381790835497_0003 completed successfully 10/13/14 20:10:17 INFO mapreduce.Job: Counters: 44 File system counters FILE: read Bytes= 358 File: Bytes written = 1365080 File: Number of read operations = 0 File: Number of large read operations = 0 File: Number of write operations = 0 HDFS: Number of bytes read = 4214 HDFS: Bytes written = 215 HDFS: Read operations = 67 HDFS: Large Read Ops = 0 HDFS: Write Ops = 3 Job Counters Map Tasks Started = 16 Shrink Tasks Started = 1 Data Map-Local Tasks = 14 Local Map Tasks in Glass = 2 Total Time Spent in Prints for All Maps (ms) = 174725 Total Time Spent in Prints for All Reducers (ms) = 7294 Map-Reduce Frame Map Input Records = 16 Output Map Records = 32 Output Map Bytes = 288 Materialized Bytes Map Output = 448 Input Split Bytes = 2326 Input Merge Records = 0 Merge Output records = 0 reduce input group = 2 heap random bytes = 448 heap input records = 32 heap output records = 0 overflow records = 64 map shuffle = 16 encoding failure = 0 merge map output = 16 GC execution time (ms) = 195 reverse CPU time (ms)=7740 physical memory snapshot (bytes) = 6143696896 virtual memory snapshot (bytes) = 23140454400 committed heap usage (bytes) = 4240769024 shuffle error BAD_ID = 0 CONNECTION = 0 IO_ERROR = 0 WRONG_LENGTH = 0 WRONG_M AP = 0 WRONG_REDUCE = 0 reading input file format counter Bytes fetched = 1888 output file format counter Bytes written = 97 Job completed in 20.854 seconds Pi Evaluated to 3.141275000000000000000000
Note that the MapReduce progress is shown as in the case of MapReduce V1, but the application statistics are different. Most of the statistics are self-explanatory. The only important thing to note is the use of the YARN "Map-Reduce Framework" to run the program. The framework is designed to be compatible with Hadoop V1, and its usage is discussed in more detail in later chapters.
An example of tracking using the web GUI
The Hadoop YARN web graphical user interface (GUI) has been updated to Hadoop version 2. This section shows you how to use the web GUI to monitor and find information about YARN jobs. In the example below we use
PIan app that runs fast and finishes before you explore the GUI. Long-term applications such as
land sorting-- Useful when viewing all links in the GUI.
The image below shows the YARN main web interface (http://hostname:8088).
If you look at the Cluster Metrics table, you will see new information. First, you'll notice that instead of Hadoop version 1 "allocate/reduce task capacity", there is now information about the number of running containers. If YARN is running a MapReduce task, these containers will be used for map and reduce tasks. Unlike Hadoop version 1, in Hadoop version 2 the number of mappers and reducers is not fixed. There are also links to memory metrics and node status. To view a summary of node activity, click onknot. The image below shows the node activity when the pi application is running. Consider again the number of containers that the MapReduce framework uses as an allocator or reducer.
If you return to the main Running Applications window and click on the
application_138…the request status page will appear. This page provides similar information to the Running Applications page, but only for selected jobs.
clickApplication masterThe link on the Application Status page opens the MapReduce application page, as shown in the figure below. Note that the link to ApplicationMaster is also on the main "Launch Application" screen in the last column.
Details about the MapReduce process can be seen on the MapReduce Applications page. MapReduce applications now refer to Map and Reduce instead of containers. click
Rad_138...This link opens the MapReduce job page:
The MapReduce Jobs page provides more details about job status. After the job is done, the page refreshes as shown in the image below:
If you click on the node running ApplicationMaster (
n0:8042), the NodeManager summary page appears, as shown in the image below. Again, NodeManager only tracks containers. The actual tasks performed by the container are determined by the ApplicationMaster.
If you return to the MapReduce job page, you can access the ApplicationMaster log files by clickingRecordAssociation:
If you return to the main cluster page and selectApplications > Finished productsThen select a completed job and a summary page will appear:
There are a few things to be aware of while working through the above GUI. First, since YARN manages applications, all YARN entries refer to "application". YARN has no data on actual applications. Data for MapReduce jobs is provided by the MapReduce framework. Therefore, two different data streams are combined in the web GUI: YARN applications and skeleton jobs. If the framework does not provide job information, some parts of the web GUI will not display anything.
Another interesting aspect to consider is the dynamic nature of mapper and reducer tasks. They run as wrappers of YARN, and their number will vary as the application runs. This feature makes better use of the cluster because allocators and reducers are dynamic rather than fixed resources.
Finally, there are other links in the GUI above that can be explored. Using the MapReduce framework, it is possible to deepen individual maps and reduce tasks. If record aggregation is enabled, you can see separate records for each map and simplification task.
Run the Terasort test
Three separate steps are required to perform the terrasart benchmark. In general, the length of a line is 100 bytes, so the total amount of data written is 100 times the number of lines (that is, it takes 1000000000 lines to write 100 GB of data). You also need to specify the input and output directories in HDFS.
to healGenerate random rows of data to sort.
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.1.0-beta.jar teragen
land sortingSort the database.
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.1.0-beta.jar terasart
TevalidarCheck the Teragen ordered.
Hilo jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.1.0-beta.jar teravalidate
Run the TestDFSIO benchmark
YARN also includes an HDFS reference application called
land sorting, requires several steps. Here we will write and read ten 1 GB files.
test-dfsioIn write and create data mode.
希洛塔罗 $YARN_EXAMPLES/hadoop-mapreduce-client-jobclient-2.1.0-beta-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
The following is an example of the result (date and time removed):
fs.TestDFSIO: -----TestDFSIO----- : writefs.TestDFSIO: Datetime: Wed Oct 16 10:58:20 EDT 2013fs.TestDFSIO: Number of files: 10fs.TestDFSIO: Total megawords processed Number session: 10000.0 fs. TestDFSIO: Throughput mb/sec: 10.124306231915458fs.TestDFSIO: Average IO speed mb/sec: 10.125661849975586fs.TestDFSIO: Standard deviation of IO speed: 0.117293411921746 83fs.TestDFSIO: Test execution time in seconds: 120.45fs. . TestDFSIO:
test-dfsioin reading mode.
希洛塔罗 $YARN_EXAMPLES/hadoop-mapreduce-client-jobclient-2.1.0-beta-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
The following is an example of the result (date and time removed):
fs.TestDFSIO: -----TestDFSIO----- : readfs.TestDFSIO: Datetime: Wed Oct 16 11:09:00 EDT 2013fs.TestDFSIO: Number of files: 10fs.TestDFSIO: Total megabytes processed in Bytes: 10000.0 fs.TestDFSIO: Throughput mb/s: 40.946519750553804fs.TestDFSIO: Average IO Speed mb/s: 45.240928649902344fs.TestDFSIO: IO Speed Standard Deviation: 18.27387874605978fs. TestDFSIO : test execution time c:47.9 37fs.TestDFSIO:
tarro de hilo $YARN_EXAMPLES/hadoop-mapreduce-client-jobclient-2.1.0-beta-tests.jar TestDFSIO -clean