在Ubuntu上衣Hadoop

在Ubuntu上装Hadoop


在装Hadoop之前首先需要:

1.java1.6.x 最好是sun的,1.5.x也可以

2.ssh

安装ssh


$ sudo apt-get install ssh
$ sudo apt-get install rsync





下载Hadoop

从http://hadoop.apache.org/core/releases.html 下载最近发布的版本



最好为hadoop创建一个用户:

比如创建一个group为hadoop user为hadoop的用户以及组


$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hadoop



解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop

配置JAVA_HOME:


gedit ~/hadoop/conf/hadoop-env.sh






Java代码

   1. # The java implementation to use.  Required. 
   2. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun 

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun



修改成java的安装目录:(我的是:/usr/lib/jvm/java-6-sun-1.6.0.15)


# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15




现在可以使用单节点的方式运行:


$ cd hadoop
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*



Pseudo-distributed方式跑:



配置ssh


$ su - hadoop
$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu



让其不输入密码就能登录:


hadoop@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys



  使用:


$ ssh localhost



看看是不是直接ok了。





hadoop配置文件:

conf/core-site.xml


Java代码

   1. <?xml version="1.0"?> 
   2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
   3.  
   4. <!-- Put site-specific property overrides in this file. --> 
   5.  
   6. <configuration> 
   7.    <property> 
   8.     <name>hadoop.tmp.dir</name> 
   9.         <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value> 
  10.    </property> 
  11.    <property> 
  12.     <name>fs.default.name</name> 
  13.     <value>hdfs://localhost:9000</value> 
  14.    </property> 
  15. </configuration> 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
   <property>
<name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value>
   </property>
   <property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
   </property>
</configuration>



hadoop.tmp.dir配置为你想要的路径,${user.name} 会自动扩展为运行hadoop的用户名



conf/hdfs-site.xml


Xml代码

   1. <configuration> 
   2.   <property> 
   3.     <name>dfs.replication</name> 
   4.     <value>1</value> 
   5.   </property> 
   6. </configuration> 

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>



dfs.replication为默认block复制数量

conf/mapred-site.xml


Xml代码

   1. <configuration> 
   2.   <property> 
   3.     <name>mapred.job.tracker</name> 
   4.     <value>localhost:9001</value> 
   5.   </property> 
   6. </configuration> 

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>



执行



格式化分布式文件系统:


$ bin/hadoop namenode -format



启动hadoop:


Java代码

   1. $ bin/start-all.sh 

$ bin/start-all.sh



可以从


NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/



查看NameNode和JobTracker



运行例子:




$ bin/hadoop fs -put conf input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'



look at the run result:

$ bin/hadoop fs -get output output
$ cat output/*



参考: 1、http://hadoop.apache.org/common/docs/current/quickstart.html
2、http://www.michael-noll.com/wiki /Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29