Archive for October, 2013

Is Legal Industry Ready for Big Data?

Monday, October 28th, 2013

Although legal industry has been stubbornly resisted to IT, they maybe more eager to join in the big data world. Especially, the advanced analytics tools driven by big data can be  applied to legal data and transform the legal analysis. After all, two of the goals in big data analytics are about discovering patterns in a more holistic view, and to make reliable and actionable predictions. Because of the availability of large amount of data, a specifically suited algorithm could help to reach these two goals. On the other hand, some of the goals in legal analysis are very similar to the ones of big data analysis.

But as a biased machine learning person, the question I have is, is the legal industry ready for big data? Or what are the reasons why big data can’t help the legal industry?

A small OpenMP practice

Monday, October 21st, 2013

I had to parallelize a piece of code. The requirements were, single node with multiple cores, RAM might be limited based on data sets. OpenMP seems a good fit for my purpose, here is my solution:

	#pragma omp parallel num_threads(MAX_THREADS_NUMBER)
		int total_threads = omp_get_num_threads();
		int ID = omp_get_thread_num();
		int block = (int) N / (double) total_threads;
		int start = ID * block;
		int end = (ID + 1) * block; 
		if (ID == (total_threads - 1)) {end = N;}

		VpTree* tree = new VpTree(X);

		for(int n = start; n< end; n++) {
		        search(n, tree);
		delete treeC;

At beginning, the start and end for the loop segment are manually defined. Well, this is not necessary in some cases since one could use

	#pragma omp parallel for
	for(int n = 0; n< N; n++) {
	        search(n, tree);

But the problem was that, in “search()” there are a bit messy/complicated/un-neat reading of the pointer tree. So I always ended up “Segmentation fault”. That’s why I was duplicating the pointer “tree” for each thread. The max number of threads allowed is pre-defined in order to comply with RAM issues. There might be more elegant solution, but I’m easy to satisfy.

Last but not least, I benefited a lot from the presentation of Tim Mattson and Larry Meadows from Intel: A “Hands-on” Introduction to OpenMP.