Guus Bosman

software engineering manager


Here I keep track of some of the books that I’ve read, often with a short review and some personal thoughts. These are only a selection since I read a lot more books for work.

I like to read book in their original languages where possible: French, German, Dutch, English and I even read three books in Bulgarian. Here is the list of books I’d like to read. See also books about technology or management, and my all-time favorite books.

I’m an engineer, and enjoy science fiction novels. Some of my favorite authors are Vernor Vinge, Terry Pratchett and LE Modesitt Jr. No overview of my reading habits would be complete without mentioning The Economist — I love that magazine.

Books below are in order of date read; this overview starts in October 2002.

The Signal and the Noise: why so many predictions fail-but some don't

work

This was an entertaining book by Nate Silver, who I got to know during the 2010 and 2012 elections as a insightful commentator. His background in statistics and love for numbers gives a nice dose of realism to the superficial world of political commentary.

This book describes Mr. Silver’s eclectic career so far and dives into several separate subjects where he beliefs his data-based analysis are useful. From climate change to the stock market, his point of view as statistician is valuable and he does a nice job explaining Bayesian logic to the general public.

The book is a little repetitive at times, and could have been 20% shorter, but this is not big deal.

Big Data, over-fitting

He is skeptical of the Big Data ‘movement’ which sometimes seems to imply that “if we only capture enough data, insight will follow automatically”. Mr. Silver has a lot of experience with large data sets and convincingly shows the dangers of over-fitting and emphasizes that human research and insight is no substitute for large amount of data. This is a refreshing counter-argument to some of the hype in the commercial data-gathering world.



Book details:

   The Signal and the Noise: why so many predictions fail-but some don't by Nate Silver. ISBN: 978-1594204111.
   I read this book in English.

Liars and Outliers: enabling the trust that society needs to thrive

internet

In February of this year Bruce Schneier released his latest book, Liars & Outliers — enabling the trust that society needs to thrive. This accessible book does a good job exploring the scientific theory of trust and collaboration and combines a theoretical framework with real-life examples. It does not bring many new insights to people who have followed Schneier’s other work but the theoretical framework is useful and this is a book worth reading.



Book details:

   Liars and Outliers: enabling the trust that society needs to thrive by Bruce Schneier. ISBN: 978-1-118-14330-8.
   I read this book in English.

Responsive Web Design

internet

This highly readable book introduces Response Web Design, a name coined by the author Ethan Marcotte for creating pages that work well on different devices, be it mobile phones, tablets or desktops.



Book details:

   Responsive Web Design by Ethan Marcotte. ISBN: 978-0984442577.
   I read this book in English.

Scalable Internet Architectures

internet

Scalable Internet Architectures provides a good introduction to scalability and performance engineering for large internet applications. The book has useful high-level discussions and interesting real-world insight but could have benefited from better editing. The book would have been even stronger with more focus on theoretical aspects — which the author explains well — and less emphasis on specific tools and code-snippets. Overall, even though the book is from 2006 it is worth a read, especially for engineers new to the field.

The author of the book, Theo Schlossnagle, is principal at a consulting company and his real-world experience with scalability and other aspects of large-scale engineering clearly shows in the book. He excels at outlining the challenges and possible solutions on a high-level, giving the reader a good background to make informed choices.

Still relevant 6 years later

The book was written in 2006 but most of the material is still relevant; the architectures and concepts that are described are still valid today. The code examples and the recurring emphasis on the author’s favorite tools, Spread and Whackamole, are less useful for a book on this level.

The book is almost exclusively focused on the ‘back-end’ server architecture and doesn’t talk much about ‘front-end’ items except for mentioning that cookies make an excellent ‘super local’ cache for web applications. Most of the development in the field since 2006 has been client-side, with the possible exception of experimental things like SPDY, Google’s new protocol. It would be interesting to read more about the impact of increased Ajax use and streaming partial page-rending such as Facebook’s on the back-end architecture.

“Developers have no qualms about pushing code live…”

The excellent first three chapters introduce the field of scalability and performance engineering and explain the challenges that occur once an internet application reaches a large scale. The classic tension between flexibility and stability is summarized succinctly, where “developers” are really a proxy for the demands of the business to deal with a changing internal and external world:

“In my experience, developers have no qualms about pushing code live to satisfy urgent business needs without regard to the fact that it may capsize an entire production environment at the most inopportune time. […] My assumption is that a developer feels that refusing to meet a demand from the business side is more likely to result in termination than the huge finger-pointing that will ensue post-launch”.

For me this is a very familiar discussion — part of being an engineering manager is to make these types of judgment calls: when will we push back, when will we take risk, what is the risk/benefit trade-off.

High-level problems and solutions

The author is at his best when explaining high-level problems and their possible solutions. The author explains the need for horizontal scaling and introduces various techniques that make this possible. He goes into advanced topics but doesn’t forget to cover the basics. For example, there is an excellent walk-through on the performance gains from serving static content vs dynamic content. This is a good description for people new to the field and it is well illustrated, including the slowness of the initial TCP handshake and the dramatic difference in memory footprint of Apache ‘bare-bones’ versus Apache with Perl or PHP compiled in.

An interesting piece of real-hand knowledge is the author’s claim that on web servers (in clusters > 3 servers) one can expect up to 70% resource utilization. That’s a good benchmark to have.

I also liked the explanation on caching semantics. The author illustrates the problems of having shared, non-scalable resources (such as databases) and explains how introducing caches can provide the ability to create a more scalable architecture. The sample PHP code is helpful in explaining caching and two-tier execution. The book discusses transparent caches, look-aside caches and distributed caches.

The descriptions of the various types of database replication were good to – master-master, master-slave, and even cross-vendor database replication, where an expensive Oracle master is used in combination with open source PostgreSQL slaves. The latter definitely has its pros and cons and would introduce quite a bit of extra maintenance, but author is right that is opens the mind to think about possibilities like that.

Peer-to-peer

Throughout the book Schlossnagle discusses peer-to-peer high availability software. The tools Spread and Whackamole are being pushed quite a lot; they are part of a project the author worked on at John Hopkins University. This peer-to-peer concept brings in an interesting perspective – for me looking at these solutions makes sense, although it is not something I have worked with yet. However, the author gets too specific in the last chapters of the book, and instead of high-level discussions he delves into the specifics of using Spread for logging, which is a missed opportunity to really discuss the various architectures in that area.

The book is clearly written by someone who has been in the trenches, although the tone is a little cynical at times: “And yes, 1 fault tolerant and N-1 fault tolerant are the same with two machines, but trying to make that argument is good way to look stupid”. The book could have benefited from a stronger editor who would have kept those things in check. The book is woolly, especially chapters 4 and 5, and could have been a bit shorter.

Recommended

The book provides a good high-level discussion of concepts such as various caching models, fail-over and scalability, combined with real-world experiences of the author. The book would have been stronger if it had had a better editor but is worth a read, especially for engineers new to the field of large scale websites.

There are very few books out there that discuss all these aspects on a high level. Perhaps a second edition can fix some of the minor shortcomings, but the book is recommended.

More info: http://scalableinternetarchitectures.com



Book details:

   Scalable Internet Architectures by Theo Schlossnagle. ISBN: 0-672-32699-X.
   I read this book in English.

Data-Intensive Text Processing with MapReduce

internet

It’s beautiful to see a real change in paradigm happening. I remember in college how much I enjoyed programming in functional languages, and how cool it is to be able to look at problems from a different viewpoint. What Google and others have achieved with MapReduce a similar change in the way of looking at problems.

MapReduce is the name of Google’s base algorithm for their processing of huge data sets. Since then, other companies have followed suit. I didn’t know much about this field and this book is a great introduction. It provides a good description of the foundation, and I love it that it describes practical uses. Examples they gave are machine translations, Google’s PageRank, shortest path in a graph etc.

Actually in use

What I like about MapReduce is that it provides an abstraction for distributed computing that is actually being used and is succesful. The book showed the scaling characteristics of an example algorithm (strips for computing word co-occurrence) on Hadoop: a R^2 of 0.997! That means that there is almost a linear scalability increase when you add extra machines.

Want to read more

This is one of those books that makes you want to read more. For example, since reading this book I’ve looked into terms such as Zipfian, Brewer’s CAP Theorem and Heap’s Law. I still need to learn more about Expectation Maximization and “Hidden Markov Models”, harping back on some fundamental mathematics I had in college.

I want to read more about machine translations now, Koehn’s book perhaps. And definitely want to read the Google article, about “unreasonable effectiveness of data”.

This is an excellent book, which provides a very readable introduction to the algorithms and real-world implementations.



Book details:

   Data-Intensive Text Processing with MapReduce by Jimmy Lin, Chris Dyer. ISBN: 9781608453429.
   I read this book in English.

   This book is one of my all-time favorites.

HTML5 for Web Designers

internet

HTML5 for Web Designers is a short and pleasant introduction to HTML5.

The book, 87 pages long, is published by the folks of A List Apart, a blog about website design that I follow. It’s a quick read — the book probably took me no more than 30 minutes — and it gives you the highlights of HTML5 quickly. The introduction, with the history of the development of HTML standards, was interesting.

HTML5

Web Forms 2.0 is very useful. I think the micro-format like elements such as mark and time are good additions, but I’m not so sure about the new structure elements. The article vs section is a little confusing, and I’m not sure what their added value is. I’m not so convinced of the benefits of the more flexible nesting and outlining that the author describes.

Obviously, the standardization of video and audio playback is huge (as long as we can all agree on the encoding…).

For my work, the Web Forms 2.0 elements are probably going to be the most useful: marking fields as required, specifying that input fields can take numeric input only, etc. Today we use JavaScript libraries for this. A library like ExtJS already allows you to specify this declaratively but native browser support would be even better.

The book purposely did not go into the new standardized JavaScript APIs that are part of HTML5, that would be a nice topic to read on.



Book details:

   HTML5 for Web Designers by Jeremy Keith. ISBN: 97809844425008.
   I read this book in English.

Rework

internet

A small book with great ideas. It describes an ‘agile’ approach to business — how to think small and be effective.

It’s an inspirational book, written with a great mindset: keep it simple, release early, be nimble.



Book details:

   Rework by Jason Fried and David Heinemeier Hansson. ISBN: 9780307463746.
   I read this book in English.

Operating Systems: Design and Implementation

internet

My first introduction to large scale development.

When I was 16 years old I borrowed this book from our neighbor next door. I brought it on vacation in France, and still remember the smell of fresh cut grass when I was reading this book in France, over and over again. The Appendix contained the entire source code of Minix.

Years later when I did my Master’s Degree in Amsterdam I followed two courses by the author, Andrew Tanenbaum.



Book details:

   Operating Systems: Design and Implementation by Andrew S. Tanenbaum.
   I read this book in English.

   This book is one of my all-time favorites.

Gödel, Escher, Bach: An Eternal Golden Braid

This book needs no further introduction. I read it when I was 14, 15 and this helped me decide to take on a degree in Computer Science.



Book details:

   Gödel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter.
   I read this book in English.

   This book is one of my all-time favorites.

Design Patterns: Elements of Reusable Object-Oriented Software

internet

In my first job at Chess patterns where just coming in fashion in the mid-1990’s. Can’t say that the GoF is a book that’s great to read but it has a wealth of knowledge distilled.



Book details:

   Design Patterns: Elements of Reusable Object-Oriented Software by Gang of Four.
   I read this book in English.

   This book is one of my all-time favorites.

About me

I’m a software engineering manager in Arlington, Virginia. I love technology and working with people to build great software.

Contact me

Send me a message, find me on Twitter, Facebook and LinkedIn.

Random facts

I was quoted in The Economist and my site was posted on Slashdot. I speak English and Dutch fluently, and pretty decent German, French and Bulgarian. I founded Dutch in America.com which has more than 3,300 Facebook fans.

More about Guus.

Recent comments

User login

, after login or registration your account will be connected.

New English words

Words & expressions I recently learned:

Recently read