Guus Bosman

software engineering manager


Books I read in English for work

Here I keep track of some of the books that I’ve read. It’s only a selection (I read a lot more books for work); it’s nice to keep track. I like to read book in their original languages where possible: French, German, Dutch, English and I even read a book in Bulgarian. See also books about technology or management, and my favorite books.

I’m an engineer, and enjoy science fiction novels. Some of my favorite authors are Vernor Vinge, Terry Pratchett and LE Modesitt Jr. No overview of my reading habits would be complete without mentioning The Economist — I love that magazine.

Books below are in order of date read; this overview starts with October 2002.

Responsive Web Design

internet

This highly readable book introduces Response Web Design, a name coined by the author Ethan Marcotte for creating pages that work well on different devices, be it mobile phones, tablets or desktops.



Book details:

   Responsive Web Design by Ethan Marcotte. ISBN: 978-0984442577.
   I read this book in English.

Scalable Internet Architectures

internet

Scalable Internet Architectures provides a good introduction to scalability and performance engineering for large internet applications. The book has useful high-level discussions and interesting real-world insight but could have benefited from better editing. The book would have been even stronger with more focus on theoretical aspects — which the author explains well — and less emphasis on specific tools and code-snippets. Overall, even though the book is from 2006 it is worth a read, especially for engineers new to the field.

The author of the book, Theo Schlossnagle, is principal at a consulting company and his real-world experience with scalability and other aspects of large-scale engineering clearly shows in the book. He excels at outlining the challenges and possible solutions on a high-level, giving the reader a good background to make informed choices.

Still relevant 6 years later

The book was written in 2006 but most of the material is still relevant; the architectures and concepts that are described are still valid today. The code examples and the recurring emphasis on the author’s favorite tools, Spread and Whackamole, are less useful for a book on this level.

The book is almost exclusively focused on the ‘back-end’ server architecture and doesn’t talk much about ‘front-end’ items except for mentioning that cookies make an excellent ‘super local’ cache for web applications. Most of the development in the field since 2006 has been client-side, with the possible exception of experimental things like SPDY, Google’s new protocol. It would be interesting to read more about the impact of increased Ajax use and streaming partial page-rending such as Facebook’s on the back-end architecture.

“Developers have no qualms about pushing code live…”

The excellent first three chapters introduce the field of scalability and performance engineering and explain the challenges that occur once an internet application reaches a large scale. The classic tension between flexibility and stability is summarized succinctly, where “developers” are really a proxy for the demands of the business to deal with a changing internal and external world:

“In my experience, developers have no qualms about pushing code live to satisfy urgent business needs without regard to the fact that it may capsize an entire production environment at the most inopportune time. […] My assumption is that a developer feels that refusing to meet a demand from the business side is more likely to result in termination than the huge finger-pointing that will ensue post-launch”.

For me this is a very familiar discussion — part of being an engineering manager is to make these types of judgment calls: when will we push back, when will we take risk, what is the risk/benefit trade-off.

High-level problems and solutions

The author is at his best when explaining high-level problems and their possible solutions. The author explains the need for horizontal scaling and introduces various techniques that make this possible. He goes into advanced topics but doesn’t forget to cover the basics. For example, there is an excellent walk-through on the performance gains from serving static content vs dynamic content. This is a good description for people new to the field and it is well illustrated, including the slowness of the initial TCP handshake and the dramatic difference in memory footprint of Apache ‘bare-bones’ versus Apache with Perl or PHP compiled in.

An interesting piece of real-hand knowledge is the author’s claim that on web servers (in clusters > 3 servers) one can expect up to 70% resource utilization. That’s a good benchmark to have.

I also liked the explanation on caching semantics. The author illustrates the problems of having shared, non-scalable resources (such as databases) and explains how introducing caches can provide the ability to create a more scalable architecture. The sample PHP code is helpful in explaining caching and two-tier execution. The book discusses transparent caches, look-aside caches and distributed caches.

The descriptions of the various types of database replication were good to – master-master, master-slave, and even cross-vendor database replication, where an expensive Oracle master is used in combination with open source PostgreSQL slaves. The latter definitely has its pros and cons and would introduce quite a bit of extra maintenance, but author is right that is opens the mind to think about possibilities like that.

Peer-to-peer

Throughout the book Schlossnagle discusses peer-to-peer high availability software. The tools Spread and Whackamole are being pushed quite a lot; they are part of a project the author worked on at John Hopkins University. This peer-to-peer concept brings in an interesting perspective – for me looking at these solutions makes sense, although it is not something I have worked with yet. However, the author gets too specific in the last chapters of the book, and instead of high-level discussions he delves into the specifics of using Spread for logging, which is a missed opportunity to really discuss the various architectures in that area.

The book is clearly written by someone who has been in the trenches, although the tone is a little cynical at times: “And yes, 1 fault tolerant and N-1 fault tolerant are the same with two machines, but trying to make that argument is good way to look stupid”. The book could have benefited from a stronger editor who would have kept those things in check. The book is woolly, especially chapters 4 and 5, and could have been a bit shorter.

Recommended

The book provides a good high-level discussion of concepts such as various caching models, fail-over and scalability, combined with real-world experiences of the author. The book would have been stronger if it had had a better editor but is worth a read, especially for engineers new to the field of large scale websites.

There are very few books out there that discuss all these aspects on a high level. Perhaps a second edition can fix some of the minor shortcomings, but the book is recommended.

More info: http://scalableinternetarchitectures.com



Book details:

   Scalable Internet Architectures by Theo Schlossnagle. ISBN: 0-672-32699-X.
   I read this book in English.

Data-Intensive Text Processing with MapReduce

internet

It’s beautiful to see a real change in paradigm happening. I remember in college how much I enjoyed programming in functional languages, and how cool it is to be able to look at problems from a different viewpoint. What Google and others have achieved with MapReduce a similar change in the way of looking at problems.

MapReduce is the name of Google’s base algorithm for their processing of huge data sets. Since then, other companies have followed suit. I didn’t know much about this field and this book is a great introduction. It provides a good description of the foundation, and I love it that it describes practical uses. Examples they gave are machine translations, Google’s PageRank, shortest path in a graph etc.

Actually in use

What I like about MapReduce is that it provides an abstraction for distributed computing that is actually being used and is succesful. The book showed the scaling characteristics of an example algorithm (strips for computing word co-occurrence) on Hadoop: a R^2 of 0.997! That means that there is almost a linear scalability increase when you add extra machines.

Want to read more

This is one of those books that makes you want to read more. For example, since reading this book I’ve looked into terms such as Zipfian, Brewer’s CAP Theorem and Heap’s Law. I still need to learn more about Expectation Maximization and “Hidden Markov Models”, harping back on some fundamental mathematics I had in college.

I want to read more about machine translations now, Koehn’s book perhaps. And definitely want to read the Google article, about “unreasonable effectiveness of data”.

This is an excellent book, which provides a very readable introduction to the algorithms and real-world implementations.



Book details:

   Data-Intensive Text Processing with MapReduce by Jimmy Lin, Chris Dyer. ISBN: 9781608453429.
   I read this book in English.

   This book is one of my all-time favorites.

HTML5 for Web Designers

internet

HTML5 for Web Designers is a short and pleasant introduction to HTML5.

The book, 87 pages long, is published by the folks of A List Apart, a blog about website design that I follow. It’s a quick read — the book probably took me no more than 30 minutes — and it gives you the highlights of HTML5 quickly. The introduction, with the history of the development of HTML standards, was interesting.

HTML5

Web Forms 2.0 is very useful. I think the micro-format like elements such as mark and time are good additions, but I’m not so sure about the new structure elements. The article vs section is a little confusing, and I’m not sure what their added value is. I’m not so convinced of the benefits of the more flexible nesting and outlining that the author describes.

Obviously, the standardization of video and audio playback is huge (as long as we can all agree on the encoding…).

For my work, the Web Forms 2.0 elements are probably going to be the most useful: marking fields as required, specifying that input fields can take numeric input only, etc. Today we use JavaScript libraries for this. A library like ExtJS already allows you to specify this declaratively but native browser support would be even better.

The book purposely did not go into the new standardized JavaScript APIs that are part of HTML5, that would be a nice topic to read on.



Book details:

   HTML5 for Web Designers by Jeremy Keith. ISBN: 97809844425008.
   I read this book in English.

Rework

internet

A small book with great ideas. It describes an ‘agile’ approach to business — how to think small and be effective.

It’s an inspirational book, written with a great mindset: keep it simple, release early, be nimble.



Book details:

   Rework by Jason Fried and David Heinemeier Hansson. ISBN: 9780307463746.
   I read this book in English.

Operating Systems: Design and Implementation

internet

My first introduction to large scale development.

When I was 16 years old I borrowed this book from our neighbor next door. I brought it on vacation in France, and still remember the smell of fresh cut grass when I was reading this book in France, over and over again. The Appendix contained the entire source code of Minix.

Years later when I did my Master’s Degree in Amsterdam I followed two courses by the author, Andrew Tanenbaum.



Book details:

   Operating Systems: Design and Implementation by Andrew S. Tanenbaum.
   I read this book in English.

   This book is one of my all-time favorites.

Gödel, Escher, Bach: An Eternal Golden Braid

This book needs no further introduction. I read it when I was 14, 15 and this helped me decide to take on a degree in Computer Science.



Book details:

   Gödel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter.
   I read this book in English.

   This book is one of my all-time favorites.

Design Patterns: Elements of Reusable Object-Oriented Software

internet

In my first job at Chess patterns where just coming in fashion in the mid-1990’s. Can’t say that the GoF is a book that’s great to read but it has a wealth of knowledge distilled.



Book details:

   Design Patterns: Elements of Reusable Object-Oriented Software by Gang of Four.
   I read this book in English.

   This book is one of my all-time favorites.

The Big Switch: Rewiring the World, from Edison to Google

I especially enjoyed the first part; the history of electricity and how technology transformed entire industries. The book makes the case that a similar revolution will take place in computing; where providers of cloud computing facilities will serve a role like electricity producers do today.



Book details:

   The Big Switch: Rewiring the World, from Edison to Google by Nicholas Carr.
   I read this book in English.

Enterprise Integration Patterns

Enterprise Integration Patterns is part of the same series as Patterns of Enterprise Architecture, a book I didn’t care much for 6 years ago because it was stating the obvious too often. The EIP book is from 2004 and is somewhat better, although at times it suffers from the same weakness.

I used it to look up good definitions of components I wanted to use in our product. The definitions of Message Bus and Message Router were particularly helpful. Not immediately helpful in deciding about implementation elements, but good for documenting and communication the design we had in mind.

At the other hand, the descriptions are superficial, and don’t offer much insight. This is the same beef I had with “Enterprise Application Patterns” — the content is too obvious.



Book details:

   Enterprise Integration Patterns by Gregor Hohpe and Bobby Woolf. ISBN: 0321200683.
   I read this book in English.

About me

I’m a software engineering manager in Arlington, Virginia. I love technology and working with people to build great software.

Contact me

Send me a message, find me on Twitter, Facebook and LinkedIn.

Random facts

I was quoted in The Economist and my site was posted on Slashdot. I speak English and Dutch fluently, and pretty decent German, French and Bulgarian. I founded Dutch in America.com which has 2,700 fans on Facebook.

Recent comments

User login

, after login or registration your account will be connected.

New English words

Words & expressions I recently learned:

Recently read

Books I've recently read: