Quick Setup Steps For Installing & Integrating Spark,Hive,Hadoop Processess Together !!!


If you wish For Indepth Understanding go to my Previous Blog and Install All  Separately.

You can Install all the Packages Separately or Install together using the following Commands.

SSH :                                                                            #For Interacting With All

  - ssh-keygen -t rsa -P ""
 - cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
 - # ssh-add -l # Not Working the run the command
 - ssh localhost


Install and Configuration for Spark, Hive and Hadoop:

Make a separate Directory for Installing All the Packages

you can use 'mkdir' command to create a directory 

 - mkdir /home/user/Dilli/Installed/All_Setup
 - cd /home/user/Dilli/Installed/All_Setup
 - java -version
 - readlink -f /usr/bin/java | sed "s:bin/java::"

 - wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz
 - tar -xzvf hadoop-2.9.0.tar.gz
 - mv hadoop-2.9.0 hadoop

 - wget http://www-eu.apache.org/dist/hive/hive-3.0.0/apache-hive-3.0.0-bin.tar.gz
 - tar -xzvf apache-hive-3.0.0-bin.tar.gz
 - mv apache-hive-3.0.0-bin hive

 - cd
 - wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
 - tar -xzvf spark-2.3.0-bin-hadoop2.7.tgz
 - mv spark-2.3.0-bin-hadoop2.7 spark

#if you dint Install Scite
Then Install it using the following Command.
 - sudo apt-get install scite

- scite .bashrc
        # Add the Following to the file
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

# Set Hadoop-related environment variables
export HADOOP_HOME=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

export HADOOP_CONF=/home/user/Dilli/Installed/All_Setup/hadoop/conf
export PATH=$PATH:$HADOOP_CONF/bin

export HADOOP_PREFIX=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

# Set Spark-related environment variables
export SPARK_CONF=/home/user/Dilli/Installed/All_Setup/spark/conf
export PATH=$PATH:$SPARK_CONF/bin

export SPARK_HOME=/home/user/Dilli/Installed/All_Setup/spark
export PATH=$PATH:$SPARK_HOME/bin

# Set HBase-related environment variables
export HIVE_HOME=/home/user/Dilli/Installed/All_Setup/hive
export PATH=$PATH:$HIVE_HOME/bin


 -  . .bashrc                                        # Don't Forget to do this step

 - sudo apt-get install mysql-server                 #For Connecting With Mysql

 - sudo apt-get install libmysql-java


Hadoop Configurations:

- cd /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop

 - vim core-site.xml
# Add the following,
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>


 - vim hdfs-site.xml
# Add the following,
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/name</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
</configuration>


 - vim hadoop-env.sh
# Change the 'JAVA_HOME' path
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"


Hive Configurations:


- cd /home/Dilli/Installed_Softwares/All_Setup/hive/conf

# To Improve Performance you can Add Some More Properties to Speedup the Process
   to speed up visit my post "Performance Improvement in Hive (Properties in HIVE -SITE.XML)"
   and add necessary properties that are needed for your Process.While Mentioning the Database Metastore specify the correct name.

 - vim hive-site.xml
# create a 'hive-site.xml' and add the following,

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/metastore_db_3?createDatabaseIfNotExist=true&amp;useSSL=false</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hiveuser</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hivepassword</value>
  <description>password to use against metastore database</description>
</property>

  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
    </description>
  </property>

  <property>
    <name>hive.metastore.schema.verification.record.version</name>
    <value>true</value>
    <description>
      When true the current MS version is recorded in the VERSION table. If this is disabled and verification is
       enabled the MS will be unusable.
    </description>
  </property>

<property>
    <name>hive.execution.engine</name>
    <value>mr</value>
    <description>
      Expects one of [mr, tez, spark].
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>
</configuration>


Copy hive-site.xml:

 
  # Copy 'hive-site.xml' from '/home/user/Dilli/Installed/All_Setup/hive/conf' path to '/home/user/Dilli/Installed/All_Setup/spark/conf' path

   - cp /home/user/Dilli/Installed/All_Setup/hive/conf/hive-site.xml /home/user/Dilli/Installed/All_Setup/spark/conf/hive-site.xml


Hadoop/ Spark Configuration - Data node/ Slave Node address in slaves files:

 - vim /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop/slaves
localhost

 - vim /home/user/Dilli/Installed/All_Setup/spark/conf/slaves
localhost


Create the directories to store stuff on name node and datanode and assign ownership
-----------------------------------
- cd /home/user/Dilli/Installed/All_Setup/
- mkdir process_data
- cd process_data

# NameNode
- mkdir name
- chown -R user:user name        #save based on your user   ex: if user is root then use root:user

#Datanode
- mkdir data
- chown -R user:user data



Create Hive user in mysql :

 - mysql -u root -proot123
CREATE USER 'hiver'@'%' IDENTIFIED BY 'hivepassword';
GRANT all on *.* to 'hiver'@localhost identified by 'hivepassword';
flush privileges;

Create metastore in mysql:

 Make sure you point out Correct Schema if you use hive 3 use the below schema or else point to the correct schema available.

 - cd /home/user/Dilli/Installed/All_Setup/hive/scripts/metastore/upgrade/mysql

 - mysql -u hiver -phivepassword
create database metastore_db_3;
use metastore_db_3;
source hive-schema-3.0.0.mysql.sql
exit;

 Hive Conf Add mysql jar file in PATH:

 - ll /usr/share/java/mysql-connector-java.jar

 - cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/hive/lib

 Spark Conf Add mysql jar file in PATH:

 - cd /home/user/Dilli/Installed/All_Setup/spark/conf
 - cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/spark/jars
 - cp  spark-env.sh.template  spark-env.sh

 - vim spark-env.sh
# add the line
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar

 - cp log4j.properties.template log4j.properties


If Hive database not showing then Add the following line in 'hive.sh' file

 - cd /home/user/Dilli/Installed/All_Setup/hive/bin

 - vim hive
for f in ${SPARK_HOME}/jars/*.jar; do
     CLASSPATH=${CLASSPATH}:$f;
done
# reference url: http://stackoverflow.com/questions/38345447/apache-hive-exception-noclassdeffounderror-scala-collection-iterable.


Hadoop name node format:

 - cd /home/user/Dilli/Installed/All_Setup/hadoop
 - hadoop namenode -format
 - sbin/start-all.sh
 - sbin/start-dfs.sh
 - spark-shell
 - hive
 - cd ../spark/sbin/
 - ./start-all.sh
 - jps

You will see Output Something Like this...











If Data Node is Not Starting then remove the temp files using the below step,

Make sure Hadoop is Stopped.

sudo rm -r /home/user/Dilli/Installed/All_Setup/process_data/data/current


That's it you have successfully Installed All the Three Setups and Integrated Together

Thanks,
Have Fun!!!!

Please Leave your Comment Below If you Have Any Clarifications Regarding Installations or Any Suggestions.









Comments

Post a Comment

Popular posts from this blog

Twitter Analysis Using R

Best Practices for Apache Hive Performance Tuning

Performance Improvement in Hive (Properties in HIVE -SITE.XML)

Installation Steps For MySql Server (version 5.7.23) For Centos

Installation Steps for Hive

Facebook Data Analysis Using R

Installation Steps For Spark