
Sudo -u hive hadoop fs -mkdir /user/hive/warehouse If Hive service is set up, the path to the Hive warehouse could be /apps/hive/warehouse Spark configuration files Hive configurationĬreate a Hive warehouse and give permissions to the users. Update the system environment file by adding SPARK_HOME and adding SPARK_HOME/bin to the PATHĮxport SPARK_HOME=/usr/apache/spark-2.0.0-bin-hadoop2.7 Step into the spark 2.0.0 directory and run pwd to get full path Sudo chown -R spark:spark spark-2.0.0-bin-hadoop2.7 Remove the tar file after it has been unpackedĬhange the ownership of the folder and its elements Sudo tar -xvzf spark-2.0.0-bin-hadoop2.7.tgz Spark installation and configuration Install SparkĬreate directory where spark directory is going to reside. Sudo -u hdfs hadoop fs -chown -R spark:hdfs /user/spark

Sudo -u hdfs hadoop fs -mkdir -p /user/spark Sudo add-apt-repository ppa:openjdk-r/ppaĪdd JAVA_HOME in the system variables fileĮxport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64Ĭreate user spark and add it to group hadoop Update and upgrade the system and install Java
#Install apache spark godaddy how to
Update 12.January 2018: A post on how to install Apache Spark 2.2.1 Apache Spark 2.2.1 on Ubuntu 16.04 – Hadoop-less instance. My post on setting up Apache Spark 1.6.0. My notes on Spark 2.0 can be found here (if anyone finds them useful). The documentation on the latest Spark version can be found here. This is not the case for Apache Spark 2.0, because Hortonworks does not offer Spark 2.0 on HDP 2.4.0 My cluster has HDFS and YARN, among other services.

The Spark in this post is installed on my client node. I am running a HDP 2.4 multinode cluster with Ubuntu Trusty 14.04 on all my nodes.
