首页 > 美文阅读

Hadoop中MapReduce例子

更新时间:2023-06-22 22:14:38 阅读：评论：0

在下面的例子中我们分析Apache的log并把这些log进行分析并把分析完

的结果按用户IP为ROW，把log中用户的访问时间，请求方法，用户请求的协议，用户的浏览器，服务状态等写到HBa的表中。

首先我们要在HBa中建立我们的一个表来存储数据。 [ol]

• public static void

creatTable(String table) throws IOException{

• HConnection conn =

Connection(conf);

• HBaAdmin admin = new HBaAdmin(conf);

• if(!admin.tableExists(new Text(table))){

• System.out.println("1. " + table + " table

creating ... plea wait");

老人去世吊唁短信

• HTableDescriptor tableDesc = new

HTableDescriptor(table);

• tableDesc.addFamily(new

HColumnDescriptor("http:"));

• tableDesc.addFamily(new

成长路上的阳光作文600字HColumnDescriptor("url:"));

• tableDesc.addFamily(new

HColumnDescriptor("referrer:"));中国谜语大全

• ateTable(tableDesc);

• } el {

• System.out.println("1. " + table + " table

already exists.");

• }

• System.out.println("2. access_log files fetching using

水貂

map/reduce");

• }[/ol]复制代码

然后我们运行一个MapReduce任务来取得log中的每一行

数据。因为我们只要取得数据而不需要对结果进行规约，我们只要编写一个Map程序即可。

[ol]

• public static class MapClass extends MapReduceBa implements

最小的有理数• Mapper {

• • @Override

• public void configure(JobConf job) {

• tableName = (TABLE, "");

• }

滴胶手机壳• • public void map(WritableComparable key, Text value,• OutputCollector output, Reporter reporter)

• throws IOException {

• try {

• AccessLogParr log = new String());

• if(table==null)

• table = new HTable(conf, new Text(tableName));

• long lockId = table.startUpdate(new Ip()));

• table.put(lockId, new Text("http:protocol"), Protocol().getBytes());

• table.put(lockId, new Text("http:method"), Method().getBytes());

• table.put(lockId, new Text("http:code"), Code().getBytes());

• table.put(lockId, new Text("http:bytesize"), ByteSize().getBytes());

• table.put(lockId, new Text("http:agent"), Agent().getBytes());

• table.put(lockId, new Text("url:" + Url()), Referrer().getBytes());

• table.put(lockId, new Text("referrer:" + Referrer()), Url().getBytes());

• • it(lockId, Timestamp());

• } catch (ParException e) {

• e.printStackTrace();

• } catch (Exception e) {

• e.printStackTrace();

• }

我们在Map程序中对于传进来的每一行先交给AccessLogParr去处理在AccessLogParr德构造器中用一个正则表达式"([^]*) ([^ ]*) ([^ ]*) \\[([^]]*)\\] \"([^\"]*)\"　" ([^ ]*) ([^ ]*)

\"([^\"]*)\" \"([^\"]*)\".*"来匹配每一行的log。接下来我们把这些AccessLogParr处理出来的结果更新到HBa的表中去，好的，我们的程序写完了。我们要启动一个MapReduce的话我们要对工作进行配置。

• public static void runMapReduce(String table,String dir) throws IOException{

• Path tempDir = new Path("log/temp");

• Path InputDir = new Path(dir);

• FileSystem fs = (conf);

• JobConf jobConf = new JobConf(conf, LogFetcher.class);

• jobConf.tJobName("apache log fetcher");

• jobConf.t(TABLE, table);

• Path[] in = fs.listPaths(InputDir);

• if (fs.isFile(InputDir)) {

• jobConf.tInputPath(InputDir);

• } el {

• for (int i = 0; i • if (fs.isFile(in)) {

• jobConf.addInputPath(in);

• } el {

• Path[] sub = fs.listPaths(in);

• for (int j = 0; j • if (fs.isFile(sub[j])) {

• jobConf.addInputPath(sub[j]);

吴昌硕书法

• }

• jobConf.tOutputPath(tempDir);

• jobConf.tMapperClass(MapClass.class);

•

• JobClient client = new JobClient(jobConf);

• ClusterStatus cluster = ClusterStatus();

• jobConf.MapTasks());

• jobConf.tNumReduceTasks(0);

• JobClient.runJob(jobConf);

• fs.delete(tempDir);

• fs.clo();

• }[/ol]复制代码

在上面的代码中我们先产生一个jobConf对象，然后设定我们的InputPath和OutputPath，告诉MapReduce我们的Map类，设定我们用多少个Map任务和Reduce任务，然后我们不任务提交给JobClient，关于MapReduce跟详细的资料在HadoopWiki上。

下载：源码和已编译好的jar文件例子的运行命令是：

bin/hadoop jar examples.jar logfetcher [table]

如何运行上面的应用程序呢？我们假定解压缩完Hadoop分发包的目录为%HADOOP%

拷贝%HADOOP%\contrib\hba\bin下的文件到%HADOOP%\bin下,拷贝%HADOOP%\contrib\hba\conf的文件到%HADOOP%\conf下,拷贝%HADOOP%\src\contrib\hba\lib的文件到%HADOOP%\lib下,拷贝%HADOOP%\src\contrib\hba\hadoop-*-hba.jar的文件到%HADOOP%\lib下.然后编辑配

心理健康知识讲座置文件l设定你的hba.master例子：192.168.2.92:60000。把这些文件分发到运行Hadoop的机器上去。在regionrvers文件添加上这些已分发过的地址。运行bin/sta

rt-hba.sh命令启动HBa，把你的apache log文件拷贝到HDFS的apache-log目录下，等启动完成后运行下面的命令。

bin/hadoop jar examples.jar logfetcher apache-log apache访问localhost:50030/能看到你的MapReduce任务的运行情况，访问localhost:60010/能看到HBa的运行情况。

本文发布于:2023-06-22 22:14:38，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1016489.html

上一篇：2023年颁奖主持词(4篇)

下一篇：最新金婚宴主持词金婚夫妇活动主持词(3篇)

标签：运行用户任务

留言与评论（共有 0 条评论）