Archive for December, 2014

mvn build with dependencies

Wednesday, December 31st, 2014

mvn clean assembly:assembly -DdescriptorId=jar-with-dependencies

Spark Error: Too many open files

Monday, December 22nd, 2014

This is a typical Spark error happens on Ubuntu (probably other Linux versions too). To resolve it, one could do the following:

Change this file /etc/security/limits.conf

to add:

* soft nofile 55000
* hard nofile 55000

55000 is the one I use for example, you could choose larger or smaller number, it means, we allow the system to handle open as many as 55000 files.

After save the changes, you will need to REBOOT to make it effective.

One note is that, I recommend don’t go crazy on this number, for example, I once put it 1,000,000 and Spark generated too many temporary files and caused my harddisk very hard time deleting those temporary files.

Install Spark on AWS EC2 Ubuntu instances

Friday, December 19th, 2014

It has been annoying to install spark a couple of times, finally, I decide to document the process.

After the new instance is spin-up, here is the list, you might want to skip some if already installed.

sudo apt-get update
sudo apt-get install git

#install java 8
#sudo apt-get install software-properties-common
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

#install scala
wget http://www.scala-lang.org/files/archive/scala-2.11.4.deb
sudo dpkg -i scala-2.11.4.deb
sudo apt-get update
sudo apt-get install scala

#install sbt
wget http://dl.bintray.com/sbt/debian/sbt-0.13.6.deb
sudo dpkg -i sbt-0.13.6.deb
sudo apt-get update
sudo apt-get install sbt

#may need change to the latest and stable version if needed, it is the latest as of December 19, 2014
git clone https://github.com/apache/spark.git
cd spark
sbt/sbt clean assembly
#it took about 8 minutes on my laptop

#Is it ready?

./bin/spark-shell