Quick Setup Steps For Installing & Integrating Spark,Hive,Hadoop Processess Together !!!

By Dilli Ganesh - August 23, 2018

If you wish For Indepth Understanding go to my Previous Blog and Install All Separately.

You can Install all the Packages Separately or Install together using the following Commands.

SSH : #For Interacting With All

- ssh-keygen -t rsa -P ""
- cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
- # ssh-add -l # Not Working the run the command
- ssh localhost

Install and Configuration for Spark, Hive and Hadoop:

Make a separate Directory for Installing All the Packages

you can use 'mkdir' command to create a directory

- mkdir /home/user/Dilli/Installed/All_Setup

- cd /home/user/Dilli/Installed/All_Setup
- java -version
- readlink -f /usr/bin/java | sed "s:bin/java::"

- wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz
- tar -xzvf hadoop-2.9.0.tar.gz
- mv hadoop-2.9.0 hadoop

- wget http://www-eu.apache.org/dist/hive/hive-3.0.0/apache-hive-3.0.0-bin.tar.gz
- tar -xzvf apache-hive-3.0.0-bin.tar.gz
- mv apache-hive-3.0.0-bin hive

- cd
- wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
- tar -xzvf spark-2.3.0-bin-hadoop2.7.tgz
- mv spark-2.3.0-bin-hadoop2.7 spark

#if you dint Install Scite
Then Install it using the following Command.
- sudo apt-get install scite

- scite .bashrc
# Add the Following to the file
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

# Set Hadoop-related environment variables
export HADOOP_HOME=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

export HADOOP_CONF=/home/user/Dilli/Installed/All_Setup/hadoop/conf
export PATH=$PATH:$HADOOP_CONF/bin

export HADOOP_PREFIX=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

# Set Spark-related environment variables
export SPARK_CONF=/home/user/Dilli/Installed/All_Setup/spark/conf
export PATH=$PATH:$SPARK_CONF/bin

export SPARK_HOME=/home/user/Dilli/Installed/All_Setup/spark
export PATH=$PATH:$SPARK_HOME/bin

# Set HBase-related environment variables
export HIVE_HOME=/home/user/Dilli/Installed/All_Setup/hive
export PATH=$PATH:$HIVE_HOME/bin

- . .bashrc # Don't Forget to do this step

- sudo apt-get install mysql-server #For Connecting With Mysql

- sudo apt-get install libmysql-java

Hadoop Configurations:

- cd /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop

- vim core-site.xml
# Add the following,
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

- vim hdfs-site.xml
# Add the following,
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/name</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
</configuration>

- vim hadoop-env.sh
# Change the 'JAVA_HOME' path
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Hive Configurations:

- cd /home/Dilli/Installed_Softwares/All_Setup/hive/conf

# To Improve Performance you can Add Some More Properties to Speedup the Process
to speed up visit my post "Performance Improvement in Hive (Properties in HIVE -SITE.XML)"
and add necessary properties that are needed for your Process.While Mentioning the Database Metastore specify the correct name.

- vim hive-site.xml
# create a 'hive-site.xml' and add the following,

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db_3?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password to use against metastore database</description>
</property>

<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
</description>
</property>

<property>
<name>hive.metastore.schema.verification.record.version</name>
<value>true</value>
<description>
When true the current MS version is recorded in the VERSION table. If this is disabled and verification is
enabled the MS will be unusable.
</description>
</property>

<property>
<name>hive.execution.engine</name>
<value>mr</value>
<description>
Expects one of [mr, tez, spark].
Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains the default engine for historical reasons, it is itself a historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
</configuration>

Copy hive-site.xml:

# Copy 'hive-site.xml' from '/home/user/Dilli/Installed/All_Setup/hive/conf' path to '/home/user/Dilli/Installed/All_Setup/spark/conf' path

- cp /home/user/Dilli/Installed/All_Setup/hive/conf/hive-site.xml /home/user/Dilli/Installed/All_Setup/spark/conf/hive-site.xml

Hadoop/ Spark Configuration - Data node/ Slave Node address in slaves files:

- vim /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop/slaves
localhost

- vim /home/user/Dilli/Installed/All_Setup/spark/conf/slaves
localhost

Create the directories to store stuff on name node and datanode and assign ownership
-----------------------------------
- cd /home/user/Dilli/Installed/All_Setup/
- mkdir process_data
- cd process_data

# NameNode
- mkdir name
- chown -R user:user name #save based on your user ex: if user is root then use root:user

#Datanode
- mkdir data
- chown -R user:user data

Create Hive user in mysql :

- mysql -u root -proot123
CREATE USER 'hiver'@'%' IDENTIFIED BY 'hivepassword';
GRANT all on *.* to 'hiver'@localhost identified by 'hivepassword';
flush privileges;

Create metastore in mysql:

Make sure you point out Correct Schema if you use hive 3 use the below schema or else point to the correct schema available.

- cd /home/user/Dilli/Installed/All_Setup/hive/scripts/metastore/upgrade/mysql

- mysql -u hiver -phivepassword
create database metastore_db_3;
use metastore_db_3;
source hive-schema-3.0.0.mysql.sql
exit;

Hive Conf Add mysql jar file in PATH:

- ll /usr/share/java/mysql-connector-java.jar

- cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/hive/lib

Spark Conf Add mysql jar file in PATH:

- cd /home/user/Dilli/Installed/All_Setup/spark/conf
- cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/spark/jars
- cp spark-env.sh.template spark-env.sh

- vim spark-env.sh
# add the line
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar

- cp log4j.properties.template log4j.properties

If Hive database not showing then Add the following line in 'hive.sh' file

- cd /home/user/Dilli/Installed/All_Setup/hive/bin

- vim hive
for f in ${SPARK_HOME}/jars/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
# reference url: http://stackoverflow.com/questions/38345447/apache-hive-exception-noclassdeffounderror-scala-collection-iterable.

Hadoop name node format:

- cd /home/user/Dilli/Installed/All_Setup/hadoop
- hadoop namenode -format
- sbin/start-all.sh
- sbin/start-dfs.sh
- spark-shell
- hive
- cd ../spark/sbin/
- ./start-all.sh
- jps

You will see Output Something Like this...

If Data Node is Not Starting then remove the temp files using the below step,

Make sure Hadoop is Stopped.

sudo rm -r /home/user/Dilli/Installed/All_Setup/process_data/data/current

That's it you have successfully Installed All the Three Setups and Integrated Together

Thanks,
Have Fun!!!!

Please Leave your Comment Below If you Have Any Clarifications Regarding Installations or Any Suggestions.