Archive for February, 2014

Port binding in DigitalOcean Ubuntu

Sunday, February 23rd, 2014

I had trouble with binding port 80 in DigitalOcean Ubuntu and was rescued by this page on stackoverflow. Following the page, one needs is :

sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3000

Then edit the file "/etc/rc.local”, notice that it is NOT other file suggested on the webpage. Editing means add the above command with small modification:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3000

So far, it solved my problem and everything is working.

 

Limit RAM size used by redis

Tuesday, February 18th, 2014

If you want to limit the max size of RAM that is allocated to redis, you need change the file named “redis.conf” under the path where redis is installed. The trick is that when you run “./redis-server”, you have to run it from where the “redis.conf” file is. Unless you specifically call that modified “redis.conf” when start “redis-server”, it won’t use the modified file at all. What I did, for example, allocate only 100MB RAM to redis:

Just add one line: “maxmemory 100M” into “redis.conf”. Then start, “./src/redis-server”. It should work. But be aware of possible consequences of such restriction, “out of memory error” might come uninvitedly…

 

Comparing PaaS, Heroku vs Elastic Beanstalk vs Linode vs DigitalOcean?

Monday, February 17th, 2014

This part, I really did my homework after big time spent on try and error. Google PaaS doesn’t support Java, so sorry, out of the list.

The free tier of Heruko and Elastic Beanstalk. It works great if you app is relatively small, that is, 300MB for Heroku and 512MB for T1.micro for EB. If files larger than those two sizes, you simply won’t be able to submit. When I was blindly trying to fix my upload problem, there were no obvious statement about the two size anywhere ( as least where I looked.) In EB, the T1.micro will allocate at max 650MB RAM. If more, you will need upgrade for fee version.

MEAN infrastructure. They both have excellent support for these, in particular Heroku, users generally don’t need use npm install at all. They also have so many integrated libraries available thru their web. You could simply click on “add”. Also, Redis is available on Heroku as well. However, I didn’t figured out how to ssh to change anything on the server. In contrast, EB doesn’t have Redis at all, only PostgresDB and Dynamo for DB. The good thing is that it allows you to connect to the server via ssh. And so you could do more exotic things, hopefully. One trivial but definitely very time-consuming to figure out is that, you need zip your entire app with package.json and app.js in the root directory. If your zip will create a your subdirectory, you will never be able to see your app up (at least up until now). You will be able to see your app at /var/local/current. Again, don’t ever zip your dir, zip all files and subdir instead.

I didn’t explore the price tags in Heroku, but EB/AWS is not cheap at all, in particular, compared to Linode.com and digitalocean.com. I settled down on digitalocean.com, because it offers 2GB RAM + 40GB SSD for $20/month. Their tech support is more than excellent. Any questions so far were answered within a few hours, some within 30 minutes. The nice thing with digitalocean is that they provide MEAN as well. The catch is that something you will be your own, for example, DNS binding you need figure out yourself. They do provide excellent doc for help though.

 

Lucene Ngramtokenizer

Wednesday, February 12th, 2014

From what I found, NgramTokenizer() in lucene is really a character tokenizer. If a gram is a word, then you need ShingleFilter(). And the result of ShingleFilter() will have following behavior:

InputText = “This is the best, buy it.”

For bigram, you will have “best buy” as a bigram.

Here is the demo of how I tested.

NgramTokenizer()

	private static Analyzer WV_ANALYZER = new LocaleAnalyzer(new Locale("en")) {
		protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
			Tokenizer source = new NGramTokenizer(reader, 1, 2);
			TokenStream result = new StandardFilter(matchVersion, source);
			return new TokenStreamComponents(source, result);
		}
	};

That’s the definition for the Analyzer using NgramTokenizer().

ShingleFilter()

	private static Analyzer ANALYZER_NGRAM = new LocaleAnalyzer(new Locale("en")) {
		protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
			Tokenizer source = new LDATokenizer(matchVersion, reader);
			TokenStream result = new StandardFilter(matchVersion, source);
			result = new LowerCaseFilter(matchVersion, result);			
		        ShingleFilter shingle = new ShingleFilter(result, 2); //NOTE: asking for bi-gram
		        shingle.setOutputUnigrams(true);
			result = new StopFilter(matchVersion, shingle, getStopwords());
			return new TokenStreamComponents(source, result);
		}
	};

Now let’s call them:

		String text = "This is the best, buy it.";

		TokenStream stream = ANALYZER_NGRAM.tokenStream(null, new StringReader(text));
		stream.reset();
		System.out.println("ANALYZER_NGRAM:");
		while (stream.incrementToken()) {
			String token = stream.getAttribute(CharTermAttribute.class).toString();
		    System.out.println(token);
		}

		stream = WV_ANALYZER.tokenStream(null, new StringReader(text));
		stream.reset();
		System.out.println("WV_ANALYZER:");
		while (stream.incrementToken()) {
			String token = stream.getAttribute(CharTermAttribute.class).toString();
		    System.out.println(token);
		}

Results are:

ANALYZER_NGRAM:
this is
is the
the best
best buy
buy
buy it
WV_ANALYZER:
Th
hi
is
s 
 i
is
s 
 t
th
he
e 
 b
be
es
st
t,
, 
 b
bu
uy
y 
 i
it
t.