internet
"We crashed, now what?"
The other day I read an interesting article by researchers from my old Computer Science department in Amsterdam: “We crashed, now what?”
The paper is a short description of an experiment they did with real-time recovery of operating system crashes on the Minix operating system. Minix, of course, is message-driven, with most of the kernel’s components running in user space. With some smart book keeping they were able to put simple checkpoints in place that allow for successful recovery of crashes of kernel components, caused for example by memory errors. Pretty cool stuff:
“Preliminary results showed that our approach is able to restart even the most critical OS components flawlessly during normal system operation, keeping the system fully functional and without exposing the failure to user processes. For instance, our approach can successfully restart the process manager (PM), which stores and manages the most critical information about all the running processes—both regular and OS-related—in the system. Our preliminary experiments showed that the global state of PM was always correctly restored upon restart and no information was ever lost.”
One of the co-authors of the article is Andrew Tanenbaum, professor at the Vrije Universiteit and creator of the Minix operating system.
Uncovering Spoken Phrases in Encrypted Voice over IP Conversations
Today I read ‘Uncovering Spoken Phrases in Encrypted Voice over IP Conversations’, a very interesting article from the December 2010 issue of ACM Transaction on Information and System Security. (Read the full PDF version here).
The paper details a gap in the security of VBR compressed encrypted VoIP streams. The authors had earlier found that it is possible to determine the language that is spoken on such a VoIP call, based on packet lengths. Now they have expanded their research and show that it’s possible to detect entire spoken phrases during a VoIP call. On average, their method achieved recall of 50% and precision of 51% for a wide variety of phrases spoken by a diverse collection of speakers (some phrases are easier to detect than others; the recall various from 0% to 98%, depending on length of the phrase and the speaker).
In other words: they can detect fairly well if a certain phrase is being used in a conversation, even though the VoIP conversation is encrypted!
Fundamentally, this is possible because VoIP packets are compressed using variable bit-rate compression and not typically “padded”. Longer phonemes (such as vowels) correspond with longer packets, shorter phonemes (such as fricatives like ‘s’, ‘sh’ or ‘th’) use shorter packets — using sophisticated statistical analysis they can detect whole phrases.
A solution would be to add padding to VoIP packets, but that increases the bandwidth that is needed. Not only does padding increase the bandwidth because of padding itself, but it also negates a big benefit of VBR compression when dealing with quiet periods in a conversation, when one party is listening to another.
A fun read, quite accessible.
The Unreasonable Effectiveness of Data
Last week I finished a very interesting book, Data-Intensive Text Processing with MapReduce. For those of you interested in such matters, I can recommend this short paper by researchers at Google: “The Unreasonable Effectiveness of Data” (PDF). It makes the case that simple algorithms and models that scale well will outperform sophisticated algorithms and models that scale less well, given enough data.
This is particularly important in the field of human language processing, where two developments are intersecting. First, there is the availability of vast corpora of text harvested from the internet. Second, algorithms such as MapReduce can now provide near-perfect up-scaling of computational power. That means if you double the amount of computers available to an algorithm, the algorithm can now run at (almost) exactly at twice the speed. That provides the scalability needed to deal with these huge data-sets.
This is in contrast to older approaches in the field, where researches tried to model hand-coded grammars and ontologies, represented as complex networks of relations. As the article points out, this dichotomy is an oversimplification, and in practice researches combine “deep” approaches with statistical approaches.
From the article:
“So, follow the data. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data. Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail. For natural language applications, trust that human language has already evolved words for the important concepts. See how far you can go by tying together the words that are already there, rather than by inventing new concepts with clusters of words. Now go out and gather some data, and see what it can do.”
Cool stuff, and fun to read about.
Drupal 7 on OpenBSD: PDO extension required
I installed the second beta release of Drupal 7 on my OpenBSD server. Over-all, the beta looks very solid. This morning I spent some time testing and porting modules from version 6 to 7.
One thing I ran into is that Drupal 7 now requires PDO extension to be installed on your server. During the installation I saw this error message (I’m running OpenBSD 4.5):
“Your web server does not appear to support any common PDO database extensions. Check with your hosting provider to see if they support PDO (PHP Data Objects) and offer any databases that Drupal supports.”
Here are the steps I took to install these PDO database extensions on my OpenBSD server:
As root, run this:
pkg_add ftp://ftp.openbsd.org/pub/OpenBSD/4.5/packages/i386/php5-pdo_mysql-5.2.8.tgz
To complete the installation add the following two extension to your php.ini (for me, /var/www/conf/php.ini).
extension=pdo.so extension=pdo_mysql.so
Restart Apache and you’re good to go.
Lego building robot
Wired had a fun video today about a machine, made from Lego, that can build Lego objects.
http://www.wired.com/gadgetlab/2010/10/legobot:
“Here’s how the MakerLegoBot works: A feed system that’s about two-and-a-half feet tall and can hold about 35 bricks connects to the LegoBot. The object that the MakerLegoBot is to assemble is designed in MLCad, a modeling program. A Java app that runs on a PC takes the file from the MLCad software, determines a set of print instructions and sends those instructions over USB to the LegoBot.
The machine retrieves a brick from the feed system and places it in the exact location where it should be. It uses an axle-based release mechanism to leave the brick in place.”
Broken laptop screen
Just before the weekend Sasha’s laptop screen died. When you start the laptop the screen displays vertical, colored stripes.
Dell sent a replacement screen today and tomorrow an engineer will come and install the screen at my work. I could have easily replaced the part myself but it’s a nice service.
Modern times
A while back I connected my accounts of several social networks and tools to each other. If I update my ‘status’ in one place it shows up in the others automatically. So if I type a status update on my phone, it gets routed to my website, to Facebook and to Hyves (a Dutch Facebook clone).
I maintain two separate streams; I have a separate stream for work-related updates. Those will show up in my LinkedIn account, on Skype and on Yammer, which is a sort of Twitter for companies.
This week I changed the way status updates are displayed on my website. They are no longer limited to a box ‘What am I doing?’ but they became real nodes on the site, so you can add comments to them on the site.
Actuate & BIRT
This morning I attended a roadshow by Actuate, the company that created the open source project BIRT. I recently introduced BIRT in one of my products, and I’m very happy with that decision.
The roadshow was in Plainsboro near Princeton and about 40 minutes away from our place. Most of the presentations during weren’t very informative — ‘they had a low information density’, as one of my friends would put it. I always wonder, am I the only one who feels that things could be told 5 times faster?
The part I liked were the short 3 minute demo’s. While my product uses BIRT mainly to generate PDFs and other files, BIRT could be used for dashboard functionality as well. Apparently you can hook in your own Flash library to it, which is nice because we use FusionCharts already.
Disabled Windows XP beep
It’s 2010 and I’m still using Windows XP, and I’m actually quite happy with it.
One of the small annoyances is the ‘beep’ that sounds when you change the volume. Since it’s a quiet Saturday morning and the snow is impacting some of the plans we had I googled for a solution.
Howtogeek.com has the answer I was looking for.
What phone are you?