There are good stuff here on natural language processing.
Norvig Ngram collection
April 27th, 2012Very interesting website for understanding the universe
April 26th, 2012Text mining package in R
April 14th, 2012There is a nice text mining package in R (tm). I’d like to try it out.
Also, there is a very good site on R, http://www.r-bloggers.com/.
Vector Space Models in Text Analysis
January 18th, 2012This is a very good paper on vector space models in text analysis. I’d like to summary a bit here. One of the major purpose of text analysis is to enable machine understanding human language.
There are currently three ways to understand text in machine world, term-document, word-context, and pair-pattern.
A post about memory in R
December 20th, 2011Hello Chicago!
August 27th, 2011My next stop is Chicago. I never thought of staying in Chicago, but here I am. So it is true that life is full of surprises. And as always, lots of hopes and dreams…
Machine Learning Summer School 2009 Cambridge
August 4th, 2011This is good.
A new machine learning challege on Kaggle
June 29th, 2011This time it is a wikipedia’s participation challenge, which is to predict how many edits an editor will make in 5 months. It is very intrigue…
Someday I should set up a challenge myself, how often I update my blog! Given the fact that bloggers are tend to be older than facebkookers and twitters, and older people tend to do things slower, a hint for an excellent model is that it should also incorporate that!
Interview series II: sum of two elements
June 29th, 2011I was asked such an interview question.
Given an unsorted array of positive integers, and also a target value K, find two elements that sum up to K.
So I suggested, sort the array first, and then start from the last (maximum) element that is smaller than K, x_max and then use binary search to find K-x_max. Well the complexity for sorting is O(n log n) and for finding is another O(n log n). Any improvement if the array is sorted? I stuck there, 5 seconds later, he tipped me with something like how about using two index pointers. WOW, great hint. Now the complexity for this step is O(n). Below is one solution.
#include <iostream>
using namespace std;
void quickSort(int *arr, int left, int right) {
if(left>=right) {return;}
int Start=left;
int End = right;
int pivot=(left+right)/2;
int pivotV=*(arr+pivot);
int tmp;
while(left<right) {
if(*(arr+left) <= pivotV) {
left++;
}
if(*(arr+right)>=pivotV) {
right--;
}
if(*(arr+left)>pivotV && *(arr+right)<pivotV) {
tmp=*(arr+left);
*(arr+left)=*(arr+right);
*(arr+right)=tmp;
left++;
right--;
}
}
*(arr+pivot)=*(arr+right);
*(arr+right)=pivotV;
quickSort(arr, Start, left-1);
quickSort(arr, right+1, End);
}
void ksum(int *arr, int len, int k) {
int left, right, ktmp;
left=0;
right=len-1;
while(left<right) {
ktmp=*(arr+left)+*(arr+right);
if(ktmp==k) {
cout << *(arr+left) << " + " << *(arr+right) << " = " << k << endl;
return;
} else if(ktmp>k) {right--;} else {left++;}
}
cout << "Couldn't find the two elements!\n";
return;
}
int main() {
int a[10]={10,3,4,9,8,7,2,1,5,6};
quickSort(a, 0, 9);
ksum(a, 10, 12);
return 0;
}
install twitteR on Ubuntu
June 23rd, 2011Well, I have heard that Google is M$ yesterday, Facebook is Google today, and Twitter is Facebook tomorrow. It is still a surprise to see there is a package in R for twitter already. It is “twitteR”. In order to install it, the following is what worked for me on my Ubuntu.
twitteR can be downloaded here.
As claimed in the document, it requires some libraries installed first.
1). Install libraries for RCurl:
On Ubuntu menu,
System-> Administration->Synaptic Package Management
quick search “curl” will result all programs with curl in their names or description, select to install “curl” only or with additional programs related to libcurl development as well.
2). Install RCurl
Download it and then go to where is saved,
R CMD INSTALL RCurl_1.6-6.tar.gz
3). Install twitteR
Start R console, then install the following:
install.packages(“bitops”)
install.packages(“RJSONIO”)
install.packages(“twitteR”)
install.packages(“ROAuth”)
4), Run twitteR:
library(bitops)
library(RJSONIO)
library(RCurl)
library(twitteR)
Enjoy.