Quick Setup Steps For Installing & Integrating Spark,Hive,Hadoop Processess Together !!!
If you wish For Indepth Understanding go to my Previous Blog and Install All Separately.
You can Install all the Packages Separately or Install together using the following Commands.
SSH : #For Interacting With All
- ssh-keygen -t rsa -P ""
- cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
- # ssh-add -l # Not Working the run the command
- ssh localhost
Install and Configuration for Spark, Hive and Hadoop:
Make a separate Directory for Installing All the Packages
you can use 'mkdir' command to create a directory
- mkdir /home/user/Dilli/Installed/All_Setup
- cd /home/user/Dilli/Installed/All_Setup- java -version
- readlink -f /usr/bin/java | sed "s:bin/java::"
- wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz
- tar -xzvf hadoop-2.9.0.tar.gz
- mv hadoop-2.9.0 hadoop
- wget http://www-eu.apache.org/dist/hive/hive-3.0.0/apache-hive-3.0.0-bin.tar.gz
- tar -xzvf apache-hive-3.0.0-bin.tar.gz
- mv apache-hive-3.0.0-bin hive
- cd
- wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
- tar -xzvf spark-2.3.0-bin-hadoop2.7.tgz
- mv spark-2.3.0-bin-hadoop2.7 spark
#if you dint Install Scite
Then Install it using the following Command.
- sudo apt-get install scite
- scite .bashrc
# Add the Following to the file
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF=/home/user/Dilli/Installed/All_Setup/hadoop/conf
export PATH=$PATH:$HADOOP_CONF/bin
export HADOOP_PREFIX=/home/user/Dilli/Installed/All_Setup/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
# Set Spark-related environment variables
export SPARK_CONF=/home/user/Dilli/Installed/All_Setup/spark/conf
export PATH=$PATH:$SPARK_CONF/bin
export SPARK_HOME=/home/user/Dilli/Installed/All_Setup/spark
export PATH=$PATH:$SPARK_HOME/bin
# Set HBase-related environment variables
export HIVE_HOME=/home/user/Dilli/Installed/All_Setup/hive
export PATH=$PATH:$HIVE_HOME/bin
- . .bashrc # Don't Forget to do this step
- sudo apt-get install mysql-server #For Connecting With Mysql
- sudo apt-get install libmysql-java
Hadoop Configurations:
- cd /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop
- vim core-site.xml
# Add the following,
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
- vim hdfs-site.xml
# Add the following,
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/user/Dilli/Installed/All_Setup/process_data/name</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
</configuration>
- vim hadoop-env.sh
# Change the 'JAVA_HOME' path
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
Hive Configurations:
- cd /home/Dilli/Installed_Softwares/All_Setup/hive/conf
# To Improve Performance you can Add Some More Properties to Speedup the Process
to speed up visit my post "Performance Improvement in Hive (Properties in HIVE -SITE.XML)"
and add necessary properties that are needed for your Process.While Mentioning the Database Metastore specify the correct name.
- vim hive-site.xml
# create a 'hive-site.xml' and add the following,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db_3?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
</description>
</property>
<property>
<name>hive.metastore.schema.verification.record.version</name>
<value>true</value>
<description>
When true the current MS version is recorded in the VERSION table. If this is disabled and verification is
enabled the MS will be unusable.
</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
<description>
Expects one of [mr, tez, spark].
Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains the default engine for historical reasons, it is itself a historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
</configuration>
Copy hive-site.xml:
# Copy 'hive-site.xml' from '/home/user/Dilli/Installed/All_Setup/hive/conf' path to '/home/user/Dilli/Installed/All_Setup/spark/conf' path
- cp /home/user/Dilli/Installed/All_Setup/hive/conf/hive-site.xml /home/user/Dilli/Installed/All_Setup/spark/conf/hive-site.xml
Hadoop/ Spark Configuration - Data node/ Slave Node address in slaves files:
- vim /home/user/Dilli/Installed/All_Setup/hadoop/etc/hadoop/slaves
localhost
- vim /home/user/Dilli/Installed/All_Setup/spark/conf/slaves
localhost
Create the directories to store stuff on name node and datanode and assign ownership
-----------------------------------
- cd /home/user/Dilli/Installed/All_Setup/
- mkdir process_data
- cd process_data
# NameNode
- mkdir name
- chown -R user:user name #save based on your user ex: if user is root then use root:user
#Datanode
- mkdir data
- chown -R user:user data
Create Hive user in mysql :
- mysql -u root -proot123
CREATE USER 'hiver'@'%' IDENTIFIED BY 'hivepassword';
GRANT all on *.* to 'hiver'@localhost identified by 'hivepassword';
flush privileges;
Create metastore in mysql:
Make sure you point out Correct Schema if you use hive 3 use the below schema or else point to the correct schema available.
- cd /home/user/Dilli/Installed/All_Setup/hive/scripts/metastore/upgrade/mysql
- mysql -u hiver -phivepassword
create database metastore_db_3;
use metastore_db_3;
source hive-schema-3.0.0.mysql.sql
exit;
Hive Conf Add mysql jar file in PATH:
- ll /usr/share/java/mysql-connector-java.jar
- cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/hive/lib
Spark Conf Add mysql jar file in PATH:
- cd /home/user/Dilli/Installed/All_Setup/spark/conf
- cp /usr/share/java/mysql-connector-java.jar /home/user/Dilli/Installed/All_Setup/spark/jars
- cp spark-env.sh.template spark-env.sh
- vim spark-env.sh
# add the line
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar
- cp log4j.properties.template log4j.properties
If Hive database not showing then Add the following line in 'hive.sh' file
- cd /home/user/Dilli/Installed/All_Setup/hive/bin
- vim hive
for f in ${SPARK_HOME}/jars/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
# reference url: http://stackoverflow.com/questions/38345447/apache-hive-exception-noclassdeffounderror-scala-collection-iterable.
Hadoop name node format:
- cd /home/user/Dilli/Installed/All_Setup/hadoop
- hadoop namenode -format
- sbin/start-all.sh
- sbin/start-dfs.sh
- spark-shell
- hive
- cd ../spark/sbin/
- ./start-all.sh
- jps
You will see Output Something Like this...
If Data Node is Not Starting then remove the temp files using the below step,
Make sure Hadoop is Stopped.
sudo rm -r /home/user/Dilli/Installed/All_Setup/process_data/data/current
That's it you have successfully Installed All the Three Setups and Integrated Together
Thanks,
Have Fun!!!!
Please Leave your Comment Below If you Have Any Clarifications Regarding Installations or Any Suggestions.
ReplyDeleteThe way of you expressing your ideas is really good.you gave more useful ideas for us and please update more ideas for the learners.
Hadoop Training in Chennai
Big data training in chennai
Big Data Hadoop Training in Chennai
JAVA Training in Chennai
Python Training in Chennai
Digital Marketing Course in Chennai
Hadoop training in chennai
Big data training in chennai
big data training in velachery
This comment has been removed by the author.
ReplyDelete