Introduction
This tutorial is intended for people who really need to run Apache Spark on windows. Usually it would be better to run it in a Linux VM or on Docker.
There are a few things that cause problems with Spark on windows. But using this way of installation I managed to minimize the impact.
Getting the needed files
First, download the spark-1.6.0-bin-hadoop2.6.tgz file from the Apache Spark Downloads website. You should also install a program to open it, for example 7zip.
Next, download hadoop2.6.0 for Windows
This is all we will need.
Putting everything together
Extract both files to their own folders.
Open the environment variables settings. Add your Spark bin folder to your PATH variable and create a new SPARK_HOME environment variable and set it to your Apache Spark base folder.
Next, create another environment variable named HADOOP_HOME and set it to your hadoop base folder.
Preparing the configuration files
Open the conf folder in your Apache Spark base location. There should be a file called log4j.properties.template, change the name to log4j.properties.
For the sake of convenience you can open the file and change the value of rootCategory from INFO to WARN.
Launching it
After trying different ways, this one seems to be the least error-prone:
To launch the master open the command line and type in:
spark-class.cmd org.apache.spark.deploy.master.Master
Next, open your browser, and navigate to: http://localhost:8080/
There you should see the address of your spark master, like spark://192.168.1.27:7077
Open another command line, and start one worker using:
spark-class.cmd org.apache.spark.deploy.worker.Worker masteraddress
That’s it, spark is running.
To run the spark shell, use:
spark-shell –master masteraddress
To submit a packaged application jar, use:
spark-submit –class mainclassname –master masteraddress packagedjarlocation
Have fun with it!