Wednesday, December 8, 2010

How to create cleaner production code

One of the reasons why creating software seems so hard, and why it is so error prone is because we (programmers) try to do too much all at once. The way most programmers write code is very much like a musician trying to compose and perform a piece simultaneously. Management, in the meantime, is interested only in the performance, and is frustrated by the delays and bugs.

How about we decompose the process a bit?

Most people who write software would agree that the work they do every day goes something like this:

(1) An issue comes up -- a new feature is needed, a feature needs to be changed, a bug needs to be fixed

(2) The issue is investigated.

(3) A solution is conceived.

(4) The code is changed, tested and submitted.


Steps (2), (3) and (4) usually are done while looking at the code.

First, the code is run (usually from the debugger) to look into the issue.

Then the code is examined closely for what might be going wrong (in the case of a bug), or how it might be changed (in the case of adding or modifying a feature or fixing a bug). Often mingled with this step are some code changes either to write out debugging information or to try out some ideas.

More often or not, at some point, there is an aha! moment: a realization of what is causing a bug, an clever and inspired fix for a bug, a slick UI change, a devious way to create some new or changed functionality.

Then the code is changed, tested (to varying degrees, depending on the culture of the organization), and submitted to source control.

Task completed.

Nothing really wrong with this process, one might think.

So why does software development have a reputation for taking too much time and for being too error-prone?

I submit that it is because programmers are trying to do too much all at once. If the software end product is analogous to the music you hear from your iPod, then programmers are like composers trying to make changes while performers are laying down the production tracks.

In the case of music, the process works better when it is decoupled a bit. Likewise in software.

How can programming activities be decoupled? Let's start here:

One of the biggest offenses committed in programming is that the initial changes programmers make tend to be included the final changes. The changes that programmers make while they are thinking things through, say, to test out an idea, often aren't completely undone if the idea is abandoned.

In other words, the "production" code usually bears the scars of edits that were made for the purposes of investigating issues and trying out solutions.

How can these scars be avoided?

A long time ago, I learned by accident that it is not particularly difficult to make the same changes twice. The second round goes very quickly, and produces infinitely better code.

Specifically, I had written some code for a CAD application I was working on, and then accidentally deleted the source files. My initial reaction was horror (this was before the days of recycle bins). Then I pulled myself together to recreate the lost code.

To my surprise, it took almost no time to recreate what I had deleted. Even more surprising was how much better the second version of the code looked.

Now I don't recommend regularly deleting the code you write (although you might want to try it as an experiment sometime). I do recommend the following, however:

(1) To investigate an issue (especially if it involve source code changes), and to try out ideas, create a "sandbox" version, where you can code and fix to your heart's content.

(2) Once you have a clear understanding of how you are going to resolve the issue, write it up in some way that will be useful to you and future programmers. Writing it up won't take much time if (a) your organization has some structured way for recording the thinking behind its code, and (b) if you know what you are talking about.

Playing around in the sandbox is a good way to meet qualification (b). For (a), I'll provide some suggestions in a future post.

(3) Get a fresh copy of the production code, and then change it to incorporate only the solution you have decided upon (and which you have proven out in the sandbox code).

Most programmers do not bother with (2) or (3), and it is no surprise that most codebases degrade to nearly incomprehensible states long before their time.

You can't afford to do steps (2) and (3)? Consider another example from my own personal experience:

Some time ago, I was assigned a project to develop a particular graphics algorithm. This algorithm had to be highly reliable, and had to handle a lot of odd situations. A number of my co-workers eagerly provided me with problematic cases, but no one offered any ideas about how to handle them. That was up to me.

I was quite sure I could come up with a very good algorithm. Each Monday, at group meeting, I would assure my manager that the project would be done by "Friday". I just didn't say which Friday.

As it turns out, it took a lot of Fridays, about fourteen to be exact. However, eventually there was a Friday on which the code worked really, really well. It handled all the cases my co-workers could think of, and more.

At this point, I could have just submitted the code, but I didn't. To my manager's dismay, I told him that I was planning to rewrite the code first. A good way to get fired, eh?

What I felt was that the code -- despite its brilliant functioning -- was a mess. I had borrowed some data structures for convenience that really had no business being part of this code. On top of it, my naming convention had evolved over the course of the project, and now there was a lot of inconsistency.

It turns out that it took Saturday and part of Sunday to completely rewrite the code -- on the order of five thousand lines of code. What surprised even me was (a) how easily the whole rewrite went, and (b) how it all ran as well as the original code.

Thinking back, it should not have been a surprise. I knew what I was talking about, I had decided on a naming convention, and all I had to do was crank out what was already in the original implementation. I did copy and paste a certain fraction of the code, refactoring it in the process.

So keep this ratio in mind: the first version took fourteen weeks, the second version about a day and a half. Also keep in mind that you usually don't want to pull this kind of thing on your manager.

On Monday, when my manager finally calmed down, I received an additional assignment related to the project: the company wanted to file a patent for the algorithm. This involved writing a clear explanation of what I had created.

Since I understood the functioning of the algorithm in great detail, it was not particularly difficult to provide explanations. The structure of the documents needed for the patent application made things easier, since my creative options were limited. The on-going feedback from the patent attorney focused my writing on the purpose at hand, namely to explain the algorithm to the patent inspector.

The total time to create all the technical documentation, including hand-drawn sketches of diagrams: about three working days.

Note that writing the production code preceded the documentation of the thinking behind the code. However, I am certain that the patent documentation could have been written before the production code just as quickly.

The moral of this story is: it is highly beneficial -- for the purposes of keeping production code clean, and for leaving a record of one's thinking -- to decouple the step from where one is just working out one's ideas from the steps where production code is changed and the underlying intentions are recorded.

Programming in this fashion will significantly increase the engineering quality of your work. Moreover, no one really needs to know you are doing this.

If you do get caught, you can use this argument: your organization benefits in real economic terms. Your documentation will enable its programming staff to understand the thinking behind the code more easily, and it will be working with cleaner production code, undoubtedly resulting in faster project completion and more reliable software products.

Friday, November 12, 2010

So What?

The previous post reached the conclusion that there *is* such a thing as software engineering, and it raised the question: "So what?"

What is the value of this conclusion? Does it matter what we call the process that -- in the end -- produces some software product?

To be frank, no. The argument over what is software engineering, and whether software engineering is really a kind of engineering, is a distraction from the real issue at hand. Hopefully the previous post provides a reasonable resolution to this debate.

The real issue is how to make the process of creating software more reliable and predictable. This "software crisis" (ironically a product of advances in computer capability) encompasses a collection of shortcomings, including (i) a lack of effective techniques for managing software projects, (ii) disconnects between what users need and want and what programmers produce, and (iii) overall poor software quality and performance. The common factor in each of these problems is complexity, and poor management thereof.

The complexity is unavoidable. Therefore the only way out is to find effective ways to deal with it. The key is to decouple these problems to identify solutions for each.

Thursday, November 4, 2010

Software engineering is...

The debate about whether software engineering is really "engineering" has gone on since the term first came into popular use after the 1968 NATO Conference on software engineering.

Since I teach software engineering at Boston University, I'm highly motivated to assert: "Of course software engineering is engineering." Joking aside, it is clear to me that software is "engineered", no matter how it is created.

Perhaps the place to start is to see how Webster's Dictionary defines "engineering" :

1. The practical application of science to commerce or industry.

2. The discipline dealing with the art or science of applying scientific knowledge to practical problems; "he had trouble deciding which branch of engineering to study".


This definition squares well with the commonly-heard expression, "to engineer a product".

More specifically, all engineering fields are characterized by the following features:

a. Formal communication methods -- mechanical engineering uses drawings and engineering change orders, electrical engineering uses schematics, civil engineering uses elevation and detail drawings, etc.

b. Process-oriented -- creating a product ad hoc can be accomplished only when a very small number of like-minded people are involved; otherwise contributing, reviewing and modifying the information comprising an engineering project must follow some defined procedures.

c. Conscious of time, requirements and economic constraints -- in most cases, someone or some organization is investing in a product to address a particular market opportunity, and usually does not have endless patience and cash.

d. Relies on science -- try convincing yourself a bridge will be sufficiently rigid by looking at a sketch on paper; science provides some definite answers through math and logic.

e. Relies on body of knowledge -- when science can't quite provide a definite answer, past experience can.

f. Must sometimes meet standards and code on account of legal liability -- a major consideration in consumer products.

e. Iterative -- much of engineering is creativity, and creativity rarely follows a straight path without errors and insights.

So is software engineered? Let's look at each of these characteristics.

Formal communication methods and process -- Even if programmers engage in ad hoc code-and-fix, one can say that "formal communication" and "process" occurs nonetheless. It just is not particularly apparent to the casual observer since these things largely happen in the programmers' heads. Certainly there is concrete process in almost any software development organization today: source code control, bug reports, requirements documents, agile activities.

Conscious of time, requirements and economic constraints -- A reality of any business, including software. Agile has done a lot to bring prominence to requirements through its user stories.

Relies on science -- Many programmers don't have formal computer science training. Those who do, however, carry with them a store of algorithm and abstract data structure knowledge that influences their judgment, whether they are aware of it or not. Sometimes they explicitly apply this knowledge.

Relies on a body of knowledge -- There's nothing like experience, and programming is no exception. The best code is written by programmers who have written similar things before.

Must sometimes meet standards and code -- Certainly is true of software used in any sort of aviation. Internally, companies define (and increasingly apply) coding standards.

Iterative -- Would anyone disagree? Agile has done a lot of make this characteristic plainly clear.

The bottom line is that all software development is a form of software engineering, done in a wide range of professional quality.

OK, so what? We can argue that software engineering is engineering, but what is the value of this realization? Does this realization lead us to change how software is developed so it is created faster, better, and with more pleasantness all around?

I'll address these questions in the next few posts.

A good start

Two recent articles (MITnews), (blogs.msdn.com), both entitled "Teaching real-world programming", report how MIT computer science students improve their code-writing skills through novel code reviews. Specifically, MIT professors Saman Amarasinghe and Charles Leiserson require students taking 6.172 — Performance Engineering of Software Systems to submit their project code for face-to-face reviews by experienced Boston area software developers.

Some comments:

(1) Having taught software engineering for some time, it is clear that a lack of basic code-writing skills hampers students' ability to succeed in computer science and software engineering. The 6.172 effort is a very good step forward.

A one-time code review certainly is helpful. To take things to a level where lifelong code-writing habits are formed may require students to carry out projects as "apprentices" to experienced software developers. In such apprenticeships, for a certain period of time, students would be responsible only for converting psuedo-code to code, allowing them to focus on developing code-writing skills without the distractions of design and algorithm development.

(2) Both the MITnews and blogs.msdn.com articles discuss the usefulness of comments. Comments in code are great if they are well-written, and certainly more helpful if they address the purpose of the code, as Barry Perlman states in a comment about the MITnews piece. It seems human nature, however, (or at least for humans who become programmers) to feel that the purpose for which code was written is obvious and does not need to be explained. Accordingly, most programmers write only about the solution represented by the code, and leave implicit some very useful information -- namely, the issue that calls for the solution.

Additionally, words by themselves are a cumbersome medium, in particular for explaining the purpose and the workings of code. Figures and videos are often necessary for effective and efficient communication. In programming languages with which I'm familiar, however, it is not possible to include figures and videos in comments.

Lastly, a comment may need to refer to two or more sections of code, raising the question, "In which section should the comment about these sections be placed?"

A commenting method termed "Issue/Solution/Link" (ISL) addresses these shortcomings by allowing comments to be composed in a format such as html, and linked with code. The ISL technique contains a mandatory "issue" section, where the issue/purpose of the code, and not the solution, is explained and described. Often, the issue can be expressed with a single question, such as, "How does the system render so smoothly?"

Although an issue may address conspicuous software features such as impossibly-fast rendering, many issues are mundane, such as: "How does the system undo a delete operation?"

The solution section of an ISL comment provides an explanation of the solution chosen to resolve the issue, largely to spare programmers the time and effort to infer the intentions underlying the code. The best-written solutions unambiguously reveal the techniques employed, and allow programmers to quickly become fluent with the related code.

Both the issue and solution sections may contain figures, videos and other non-textual data that support and clarify the written comments.

Links emanating from the issue/solution sections provide a bi-direction connection with the corresponding sections of code.

The process of developing the habit of programming with ISL comments requires a bit of discipline and effort. With time and practice, the process becomes easier and more natural.

The benefit of using ISL is quite substantial. Estimates vary. However, Peter Hallam, Microsoft's tech lead for C#, estimated once that the fraction of time a programmer spends studying code is on the order of 75%. ISL vastly reduces this 75%.

(3) Barry Perlman's comment at the end of the MITnews article is right on target -- the big payoff is the thought process required to write [good comments].