nvidia_346_uvm error in Caffe on AWS g2 instance

July 26th, 2016

I got the following error after rebooted my G2 instance. The code is in Python and prior to the reboot, it works fine.

modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name=’nvidia_346_uvm’

modprobe: ERROR: could not insert ‘nvidia_346_uvm’: Function not implemented

I googled around and found no direct solution so took a stab myself by doing:

sudo apt-get remove nvidia-346-uvm

and then reboot.

Surprise. It works!

Still don’t know why it works.

Simple But Surprisingly Good Clustering “Algorithm”

January 17th, 2016

This is one of those jaw dropping papers. Density Clustering, astonishingly simple and yet phonemically performance. No need to put it into math and hardly call it an “algorithm” (truly an algorithm).  Here is how it works:

Given a distance/similarity matrix (pairwise distance/similarity for all data points) and a cutoff distance/similarity.

  1. For every point, count how many other points are within the cutoff distance. The count is the density of the current point.
  2. For every point, find all other points having a higher density. Among those with higher density, find the smallest distance, and use it as the current point’s distance.
  3. Plot density vs distance.
  4. All the outliers in the plot are cluster centers.
  5. Assign each point the cluster of its nearest neighbor.

No any fancy math, and it seems work. It has R library too. Well done.

mvn build with dependencies

December 31st, 2014

mvn clean assembly:assembly -DdescriptorId=jar-with-dependencies

Spark Error: Too many open files

December 22nd, 2014

This is a typical Spark error happens on Ubuntu (probably other Linux versions too). To resolve it, one could do the following:

Change this file /etc/security/limits.conf

to add:

* soft nofile 55000
* hard nofile 55000

55000 is the one I use for example, you could choose larger or smaller number, it means, we allow the system to handle open as many as 55000 files.

After save the changes, you will need to REBOOT to make it effective.

One note is that, I recommend don’t go crazy on this number, for example, I once put it 1,000,000 and Spark generated too many temporary files and caused my harddisk very hard time deleting those temporary files.

Install Spark on AWS EC2 Ubuntu instances

December 19th, 2014

It has been annoying to install spark a couple of times, finally, I decide to document the process.

After the new instance is spin-up, here is the list, you might want to skip some if already installed.

sudo apt-get update
sudo apt-get install git

#install java 8
#sudo apt-get install software-properties-common
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

#install scala
wget http://www.scala-lang.org/files/archive/scala-2.11.4.deb
sudo dpkg -i scala-2.11.4.deb
sudo apt-get update
sudo apt-get install scala

#install sbt
wget http://dl.bintray.com/sbt/debian/sbt-0.13.6.deb
sudo dpkg -i sbt-0.13.6.deb
sudo apt-get update
sudo apt-get install sbt

#may need change to the latest and stable version if needed, it is the latest as of December 19, 2014
git clone https://github.com/apache/spark.git
cd spark
sbt/sbt clean assembly
#it took about 8 minutes on my laptop

#Is it ready?



June 18th, 2014




I was using the dictionary function in Python to manage my database, which turned out a disaster. Well, I shouldn’t have tried it in the first place. It was painfully slow if more than 10,000 data entries. Redis, on the other hand is an in-memory database and gains flame quite dramatically and quickly among tech followers. So I gave it a try. Here is the time consume on my Mac.
RedisPerformanceThe x-axis is the number of data entries. The y-axis is time consumed by redis+python to store those data entries, its unit is second.

For the record, I indeed managed using dictionary() to store about 10,000 data entries. But after waited so long, I decided to abandon it entirely. It is safe to say it is out of this chart.


Process the wikipedia dump data

May 6th, 2014

The entire wikipedia data can be downloaded from here.

In order to get the articles, one way is to use the wikiprep code, which is written in Perl, my ex-favorite language. I ran into problems when tried to run it after installation. For example, when ran wikiprep, the output on screen is:

Can’t locate Log/Handler.pm in @INC (@INC contains: /Library/Perl/5.16/darwin-thread-multi-2level /Library/Perl/5.16 /Network/Library/Perl/5.16/darwin-thread-multi-2level /Network/Library/Perl/5.16 /Library/Perl/Updates/5.16.2/darwin-thread-multi-2level /Library/Perl/Updates/5.16.2 /System/Library/Perl/5.16/darwin-thread-multi-2level /System/Library/Perl/5.16 /System/Library/Perl/Extras/5.16/darwin-thread-multi-2level /System/Library/Perl/Extras/5.16 .) at /usr/local/bin/wikiprep line 40.

BEGIN failed–compilation aborted at /usr/local/bin/wikiprep line 40.

To solve this problem, after several tries and errors and Google searches, the solution is to install whatever missed module, here is “Log::Handler”. So I ran

sudo cpanm Log::Handler

Well, a note is that I installed cpanm already. And using cpanm installing the missed module made the problem goes away and now I’m running wikiprep to get the actual articles out of the dump with such a command:

wikiprep -format composite -compress -f ../enwiki-20140402-pages-articles-multistream.xml.bz2 > out

Port binding in DigitalOcean Ubuntu

February 23rd, 2014

I had trouble with binding port 80 in DigitalOcean Ubuntu and was rescued by this page on stackoverflow. Following the page, one needs is :

sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3000

Then edit the file "/etc/rc.local”, notice that it is NOT other file suggested on the webpage. Editing means add the above command with small modification:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3000

So far, it solved my problem and everything is working.


Limit RAM size used by redis

February 18th, 2014

If you want to limit the max size of RAM that is allocated to redis, you need change the file named “redis.conf” under the path where redis is installed. The trick is that when you run “./redis-server”, you have to run it from where the “redis.conf” file is. Unless you specifically call that modified “redis.conf” when start “redis-server”, it won’t use the modified file at all. What I did, for example, allocate only 100MB RAM to redis:

Just add one line: “maxmemory 100M” into “redis.conf”. Then start, “./src/redis-server”. It should work. But be aware of possible consequences of such restriction, “out of memory error” might come uninvitedly…


Comparing PaaS, Heroku vs Elastic Beanstalk vs Linode vs DigitalOcean?

February 17th, 2014

This part, I really did my homework after big time spent on try and error. Google PaaS doesn’t support Java, so sorry, out of the list.

The free tier of Heruko and Elastic Beanstalk. It works great if you app is relatively small, that is, 300MB for Heroku and 512MB for T1.micro for EB. If files larger than those two sizes, you simply won’t be able to submit. When I was blindly trying to fix my upload problem, there were no obvious statement about the two size anywhere ( as least where I looked.) In EB, the T1.micro will allocate at max 650MB RAM. If more, you will need upgrade for fee version.

MEAN infrastructure. They both have excellent support for these, in particular Heroku, users generally don’t need use npm install at all. They also have so many integrated libraries available thru their web. You could simply click on “add”. Also, Redis is available on Heroku as well. However, I didn’t figured out how to ssh to change anything on the server. In contrast, EB doesn’t have Redis at all, only PostgresDB and Dynamo for DB. The good thing is that it allows you to connect to the server via ssh. And so you could do more exotic things, hopefully. One trivial but definitely very time-consuming to figure out is that, you need zip your entire app with package.json and app.js in the root directory. If your zip will create a your subdirectory, you will never be able to see your app up (at least up until now). You will be able to see your app at /var/local/current. Again, don’t ever zip your dir, zip all files and subdir instead.

I didn’t explore the price tags in Heroku, but EB/AWS is not cheap at all, in particular, compared to Linode.com and digitalocean.com. I settled down on digitalocean.com, because it offers 2GB RAM + 40GB SSD for $20/month. Their tech support is more than excellent. Any questions so far were answered within a few hours, some within 30 minutes. The nice thing with digitalocean is that they provide MEAN as well. The catch is that something you will be your own, for example, DNS binding you need figure out yourself. They do provide excellent doc for help though.