Wednesday, July 14, 2010

Micro vs. Macro

Economics is often cleverly broken up into at least two sub-disciplines: micro-economics vs. macro-economics. The first one is really the science of individual behavior, not to be confused (too much) with behavioral economics. It is what individuals are going to do relative to the economy.

The second sub-discipline is how the economy -- as a large entity -- acts, and moves. It is the more visible version of economics, but without its micro cousin, it would be far less robust.

Programming too can be broken up into a micro/macro decomposition. Micro-programming is algorithmic or connective. It focuses on getting some set of instructions to perform a specific set of operations on a computer.

Macro-programming is more architectural, and is primarily concerned with how the larger pieces interact, the technologies, the domain, the users, the interface, the “product”, etc. All of the larger issues circulating within a project.

Between the two, micro-programming is far easier. You simply focus your efforts on a finite set of instructions, and using writing and debugging skills get it to match your expectations. It is not uncommon for domain experts, with little computer science background, to be working on algorithmic sections of code. With a bit of training, most people can micro-program (although some people inherently make it more complex than necessary).

Macro-programming on the other hand is a world of conflicting trade-offs and easy mistakes. It involves understanding a lot more, particularly being able to correctly guess the overall direction of the project, the users and the market. It’s far harder, far more blurry and requires considerable experience to get it right.

A popular way to hire programmers right now is to give them ‘programming’ exams. Tricky questions that require prior micro-programming exposure, in order to answer them well. For junior or intermediate positions, where most of what they’ll do is micro-programming, this may be appropriate, but it is entirely the worst way to pick seniors.

What you don’t want at the top level of a software project is a group of people mostly focused on pounding out neat little algorithms. That’s what the kids are for. What you need is a group of people with real hardcore experience that can guide the entire project away from the many pitfalls that commonly sink coding projects. You get that if you look for attributes like experience, independence and leadership. Not coding ability. If your team is run by non-vocal people who think that every problem -- including the personal and organizational ones -- is solvable by just adding more code to the system, you can easily guess how this will quickly result in a mess.

Because there are so many seniors out there lacking the right kind of skills, you often see career managers, or domain experts stepping in to run the projects. But leaving a large engineering problem in the hands of someone without the prerequisite background is just courting disaster. If you don’t have enough micro-programming experience, and then even more macro-programming experience, how will you acquire the underlying knowledge to make good decisions? We’re completely unable, right now, to get any of this down into nice short textbooks, and it changes quickly depending on the project’s environment. Experience is the only source of understanding. Guessing at the standards or rigorously following “industry best practices” to the letter are both easy ways to find serious trouble.

Most people with little or no experience confuse micro-programming with macro-programming. They believe that the two are the same, and that experience with one is the same as the other. This colors their expectations and often their own self-confidence. It’s probably the main reason why so many think that code is malleable, and that years worth of work should be doable in a couple of months. It’s a nice delusion, but often a very costly one. We see it’s effect in our failure rate.

In an immature field, absolutely nothing beats experience. Without it, a project is dependent on sheer luck, and given the dismal state of our knowledge and technologies, it’s a massive amount of luck that is required. Software projects generally succeed when they are driven by experts who understand their volatile nature. They usually fail otherwise.

Tuesday, July 13, 2010

Intellectual Leverage

In primitive times we used really crude tools. Rocks lightly chiseled and attached to sticks. Rock fragments tied to poles. Whatever was simple and would get the job done.

As we absorbed and evolved this tools, we began to include more and more different materials. From sticks and stones, we moved to metals, cloth, concrete and plastics.

As we practiced, we became more and more precise.

But we exploded in our understanding when we invented tools to help us build tools. The industrial revolution set us off on an accelerating path towards better and more focus tools, culminating in generalized robots that eventually will be able to build anything, at anytime.

As we did for our muscles, we can also do for our intellect. Currently we are extending our understanding with crude tools. We’re still in the intellectual bronze age, but hopefully that won’t last too long. Next, we’ll build powerful tools that will help us build better and more precise tools. We’ll be able to leverage our crude knowledge into something more sophisticated. In this way we will accelerate knowledge acquisition in the same way we accelerated our constructive abilities with tools and robots.

Is this future dependent on artificial intelligence (AI)? I doubt that it is a necessity, particularly in the way that it is commonly discussed in the popular media; seen often as a sentient independent being. As just an ability to extend and transform a group’s capabilities to accomplish complex tasks with minimal effort however, one could see the end product as an organized collective intellect. An intelligence beyond that of just a simple being.

If you’ve ever watched a gifted operator work a dozer or a backhoe at a construction site, you’ve seen the symbiotic relationship between man and machine. The equipment extends the man, enables him. But is still sub-servant to him, obeys him.  Someday we’ll be able to leverage our minds in this same way.

Sunday, July 11, 2010

Design by Committee

I love this video:

http://youtu.be/Wac3aGn5twc

It’s great because it quickly gets down to the essence of what happens when a group of people comes together to pursue a constructive project.

The things that have always worked best when building something new, have always had a strong, narrow vision. One strong person’s idea about the final design. One person’s aesthetic. When a committee gets involved, everyone has their own goals and agenda usually resulting in a mixture of watered down, inconsistent ideas. Without a single clear vision, the necessary compromises made by the group weaken the whole.

Committees do have their place, particularly when dealing with negatives or oversight. If you’re looking to protect things from going out of bounds, but the real substance of those bounds is not quantified then throwing a whole bunch of different brains -- different perspectives -- at the problem works well. A group of people are more likely to catch the range of potential problems than just a single person. In that way, a committee can be proactive, and keep something on track.

Construction, however, needs a single focus. An architect designs a building, a painter produces a masterpiece, and a composer creates a melody that reaches out and touches us. Rarely will you see a small team in sync and productive, but never a committee. It just doesn’t work.

We see this over and over again, everywhere in modern life, but we see it frequently in software. Most committee design software projects fail outright. Most “successful” projects, even if they required man-years to complete, started initially with very small teams. The larger the group, the muddier the code and interfaces. The final version of their “stop sign” in the video is a great example. Just too many different goals, perspectives, and agendas.

Not unexpectedly, because software projects are organic -- they continue to grow long after the initial creators have left -- the later versions get more features but become increasingly more painful to use.

My favorite example of this is always Microsoft Word. As each new generation of programmers “contributes” their vision to that product, it becomes increasingly unstable. It just gets worse. It becomes harder to use and less predictable. I liked Word a few decades ago when it was clean and young, but I’ll do anything to avoid it these days. It has succumbed to a committee mindset.

Software suffers more than other disciples because the projects are always huge, long and the developers jump ship frequently. Design by committee doesn’t have to be a formal process occurring all at once; a string of single developers extending a single product, but not fully refactoring it each time to their own vision will leave behind too many compromises. Too many partial, inconsistent implementations all layered over the same code base.

The reasonable thing to do is for projects to get strong initial leadership, and for all ‘assisting’ programmers to figure out the initial vision first, and then to try very hard to keep their work consistent with that approach. In practice, however, most software programmers are more interested in doing it “their way”, even if their way isn’t the best way for the overall project. We have a culture of strong egos and diverse opinions that when left uncontrolled gradually leads to substandard work, to ugly systems, and to outright failure. Too many chefs continually spoil the broth. 

Tuesday, July 6, 2010

Syntactic Noise

When programming, we usually know in some simplistic terms the behavior we expect from the computer. We can easily express this notion in some vague non-standard pseudo-code-like notation, such as:

for all bonds in the index
    calculate the yield
    add the weighted sum to the total
end
divide total by the number of bonds

This, in a sense is the essence of the instructions that we want to perform, in a manner that is somewhat ambiguous to the computer, but clear enough the we can understand it.

By the time we’ve coded this in a format precise enough for the computer to correctly understand, the meaning has been obscured by a lot of strange symbols, odd terms and inelegant syntax:

public double calcIndex()
{
   double weightedSum = 0.0;
   int bondCount = 0;
   for (int i=0;i
       status = yieldCalc(bondInfo, Calculate.YIELD,
           true, bondFacts);

       if (status != true) {
           throw new 
               YieldCalculationException(getCalcError());
       }
       weightedSum += calcWeight(bondFacts, Weight.NORMAL);
       bondCount++;
   }
   return weightedSum / bondCount;
}

Which is a considerably longer and more complex representation of the steps we want the computer to perform.

In the above contrived example, with the exception of the error handling and the flags for the calculation options, the code retains a close similarity to the pseudo code. Still, even as close as it is, the added characters for the blocks, the function calls and the operators makes the original intent of the code somewhat obscured.

Programmers learn to see through this additional syntactic noise in their own code, but it becomes a significant factor in making their work less readable to others.

Now my above example is pretty light in this regard. Most computer languages allow users to express and compact their code using an unlimited amount of noise. The ultimate example is the obfuscated C contest, were entries often make extreme use of the language’s preprocessor cpp.

Used well, cpp can allow programmers to encapsulate syntactic noise into a simple macro that gets expanded before the code is compiled. Used poorly, macros can contribute to strange bugs and mis-leading error messages.

For example, a decade (or so) ago, I used something like:

#define BEGIN(function) do {\
    TRACE(“Entered”, __LINE__, __FILE__, #function);\
    char last[] = #function;\
    } while(0);

#define RETURN(value) do {\
    TRACE(“Exited”, last);\
    return value;\
    } while(0);

in C for each function call, so that the code could be easily profiled (the original was quite a bit more complex, I’ve forgotten most of what was there) and each function call used the same consistent methods for handling its tracing.

The cost was the requirement for all code to look like:

int functionCall(int x, in y) {
   BEGIN(functionCall);
       ...
       ...
   RETURN(output);
}

But for the minor extra work and discipline, the result was the ability to easily and quietly encapsulate a lot of really powerful tracing ‘scaffolding’ into the code as needed. More importantly the macros hide a lot of the ugly syntactic noise such as language constants like __FILE__.

In a language like Java, the programmer doesn’t have a powerful tool like a preprocessor, but they still have lots of options. Developers could use an external preprocessor like m4, but they would likely interfere with the cushy IDEs that most programmers have become addicted to. Still, there are always ways to increase or decrease the noise in the code:

return new String[] { new Data().modifyOnce().andAgain().toString(),
“value”, ( x == 2 ? new NameString() : null )};

While the above is a completely contrived example -- hopefully people aren’t doing horrible things like this -- it shows that it is fairly easy to “build up” very noisy statements and blocks. However:

String modified = twiceModified();
String name = nameString(x);

return stringArray(twiceModified, “value”, name);

accomplishes the same result, but with the addition of having to create three methods to encapsulate some of the logic (you can guess what they look like). The noise in the initial example is completely unnecessary. What we want to capture are the three values that are returned, how they are generated is a problem that can be encapsulated elsewhere in the code.

Along with excess syntactic noise there are a couple of other related issues.

First, while try/catch blocks can be useful, they actually form a secondary flow of logic in and on top of the base code, thus they generate confusion. Recently a friend was convinced that the JVM was acting up (objects were apparently only getting ‘partially initialized’), but it was actually just a combination of sloppy error handling and threading problems.

Naming too, is a big factor in making the code readable. Data are usually nouns, and methods are usually verbs. The shortest possible “full name” at the given level of abstraction should match either the business domain, or the technical one. If you don’t immediately know what it is called, it’s time for some research, not just a random guess.

And finally, method calls are essentially free. OK, they are not, but unless you’re coding against some ultra-super extreme specs, they’re not going to make any significant difference, so use them where ever, and whenever possible. Big fat dense code is extremely hard to extend, which increases the likelihood that the next coder is going to trash your work. You don’t want that, so to preserve you labors, make it easy to extend the code.

Syntactic noise is one of the easiest problems to remove from a code base, but it seems to also be one of the more common ones. Programmers get used to ugly syntax and just assume that a) it is impossible to remove and b) everyone else will tolerate it. However, with a bit of thinking and a small amount of refactoring it can be quickly reduced, and often even nearly eliminated.

Good, readable code blocks have always been able to minimize the noise, thus allowing the intent of the code to shine through. As for other programmers, a few may tolerate or even contribute to the mess, but most will seize the opportunity to send it to The Daily WTF and then re-write it from scratch. Great programmers can do more than just get their code to work, they can also build a foundation to allow their efforts to be extended.