Monday, October 14, 2013

Out of Control

The complexity of modern software systems has gotten out of control. Decades back, computers served very well-defined roles. They computed specific values, or they kept track of manually entered data. It was all very constrained. Nice and simple.

These days, technical ability has embedded itself deeply into many industries; they need their computers to remain competitive. And these machines need mass amounts of data just to keep up. Everything is now interconnected, churning around in an information haze.

But the silos that enabled the earlier systems are now the impediments to utilizing all of this collected data. Above this lies layers of spaghetti so intertwined by history that there is no hope of sorting through the whole hideous clump of knots. All this is serviced by increasingly stressed operations departments just trying to stay afloat of the shifting technologies, security issues, devices, weird processes and out-of control user expectations. Most of these groups, just one small step away from catastrophe.

Modern systems are intrinsically complex, but their rough evolution has hugely amplified the problems. Underneath software is fairly simple. It rigorously models attributes of the world, grinding through the data to provide some insight into behaviour. We have enough knowledge now to really understand how to organize the data, and construct the code to achieve our goals. However, history, politics and a general lack of understanding inflate the issues forcing what could have been a straightforward engineering effort into a swirling cloud of chaos whose results are most often disappointing. People rush to build these systems, skipping over what is known in order to bend to the pressure of getting things out too early. These are of course self-inflicted injuries. Weak code, badly organized data, lack of standards, and no empathy for users or operations results into piles of unstable code fragments that behave badly when exposed to the real world. Compound this with an obsessive need to restart from scratch continually or to just blindly build on something else, and what results are sprints that prematurely burn out then come crashing down.

There are no short-cuts in software development. There are trade-offs, but there is no easy way to bypass the consequences of a long series of poor decisions. To get things working, people have to work through all of the problems diligently and to great detail which is often a slow and painful process. You can’t skip it, defer it until later or rely on luck. For code to not suck, it has to be well thought out, there is no way around this since code is essentially a manifestation of the the underlying knowledge of the programmers involved. If they don’t understand what they are writing, then the code reflects that. And if above them the environment is not organized, then the system and the data reflect that as well. In that sense a system is just a mirror of where it was created and where it is running. It is only as stable and reliable as it’s environment.

The irony of software development is that lots of experience makes one understand how easy it could be, yet exposes them to the full ugliness of how it usually is done. For programmers, once you can see above the code, the silos, the technologies, etc. you can imagine the possibilities of building way more sophisticated and usable systems, things that would radically change the value to the users, but now you can also now see the environmental, organizational and chronological roadblocks that will often prevent you from achieving this. Software could be better, but its rare.

Monday, September 2, 2013

Form Factor Free

I haven’t written any ‘crazy idea’ posts for months, so I figured I must be due. Over the years I’ve been playing around with various ‘dynamic’ behavior in the code as a way of maximizing reuse. The fundamental design has been to encapsulate the domain issues within a simple DSL, then drive both the interface and the database dynamically.


On the interface side, I’ve use declarative dynamic forms as the atomic primitive for the screen layouts. This allows me to dynamically alter the navigation as well as reduce the amount of code required to define a screen and to persistent user-constructed screens and workflows.


This type of paradigm is too expressive for basic relational database usage, so initially I built a key/value NoSQL-like (not distributed) database. For another attempt I wanted the external connectivity of an RDBMS, so I went with Hibernate and a long-skinny generic schema. The earlier attempt was significantly less code and easier to use, but the later attempt allowed for reporting and integration once I wrapped Hibernate with an OODB like interface.


Driven by a DSL, these systems have been very flexible, allowing the users to essentially move into the system and adapt it to their specific needs. The downside has been that the abstractions involved require fairly deep thinking about extending the systems. Most programmers prefer writing new code via brute force, so the speed of development is limited by finding people who won’t just hack madly around the existing code base, but are willing to read and reuse the infrastructure.


In thinking about these types of resource issues my feeling is that the kernel of any such architecture should be as small as possible and need very little modifications. Growing the system then is a matter of just inserting domain-specific algorithms and features into a predefined location in the architecture. That almost works, but in my last version I really ended up with three different places where the code had to be extended. With three different choices, I found that some programmers would pick the wrong location, slam their square peg into the round hole, and then try to compensate by shoving in lots of extra, artificial complexity. Choice it seems leads to people wanting to ‘creatively’ subvert the architecture.


My thinking these days (although it may be awhile before I get a chance to try it out) is that I want the extendability of the system to come down to one, and only one place. Taking away choice may sound mean, but I’ve always found it better to balance out programmer freedoms with system success. If too much freedom incurs a massive risk of failure, well… I’d rather the system really worked properly at the end of the day. It’s a happy user vs. a happy programmer tradeoff.


As well as encapsulating the extendability I got to thinking that the next wave of computing is going to take place on a nearly infinite number of form factors. That is, the screen size will vary from being watch size all the way up to wall size. It doesn’t make a lot of sense to write a huge number of nearly identical systems -- one for each size -- if we can enlist the computer to dynamically handle them for us.


ORMs and OODBs allow for the programmers to specify their internal data models, then have these drive the persistent storage structures. The slight wrinkle is that the persistent storage may be shared across several different applications, so it’s underlying model is likely a domain-driven ‘universal’ one instead of the various application specific models. Subsets and inherent context are likely the bulk of the differences.


Without worrying too much about the model differences, the other half of the dynamic equation is for that application model to directly drive the interface layout. Way back many companies tried to drive interfaces off relational schemas, but these systems proved too awkward and cumbersome to catch on. My sense is that the application modelling of the data needs to be driven heavily from the user/navigation side, rather than the storage side. That is, the application model reflects both the domain structure of the data, but also how the users want to manipulate that data.


If we can find an appropriate intermediate representation then the rest of it is easy. For each entity/datam in the model we attach a presentation template. To cope with the form factor free ability we attach links that handle both navigation and neighborhood relationships. When the user navigates to a screen, we get both the primary entities and the current form factor. From that we simply find anything in the neighborhood that fits in as well. Go to the user’s screen and if your screen is big enough, you’ll see all sorts of related information. Of course one has to deal with paging, dynamic navigation as well as widgets, validation and normal dynamic forms problems like cross-field validations/updates. The base problems in my earlier systems weren’t simple, but they weren’t really cutting edge either.


One time-saving possibility is that the screen construction happens at compile time rather than run time. Building the system would then produce many different components -- one for each form factor. It would be nicer to do this dynamically on-the-fly, but one always has to be weary of eating up too much CPU.


If it all worked as planned (it almost never does), extending the system is just extending the application data model. If you needed to add a new feature you’d start by integrating any new data into the model. New calculations would go in by adding new ‘derived’ entities which would be bound with calculations underneath. All of the presentation/navigation stuff would decorate the data, then all you’d need to do is just recompile, test and re-release. Changes that might normally take months could fall to weeks or days. The model intrinsically enforces any types of organization or conventions and can easily be reviewed by other programmers. With the extendability encapsulated, the base work would pay off in producing a system that could expand for years or decades without having clocked up much technical dept.

Saturday, August 3, 2013

Time and Shortcuts

I haven't posted anything for a while now. This has been the longest gap I've had since I started blogging five years ago. It's not that I don't have any anything to say -- my hard drive is littered with half-finished posts -- but rather that I just haven't had the time or energy to get my thoughts down nicely.

Lately I've been wrapped up in a large complicated system that has a wide array of problems, none of which are new to me but it's unusal to see them all together in the same project. I like this because it is extremely challenging, but I definitely prefer projects to be well-organized and focus on engineering problems, not organizational ones. 

My sense over the last couple of decades is that software has shifted from being an intense search for well-refined answers to just a high-stress fight against allowing all of the earlier shortcuts to swamp the project. I find that unfortunate because the part of the job that I've always loved was that satisfaction from building something good. Just whipping together a barely functoning mess, I find depressing.
 
What I've noticed over the years is that there is a whole lot more progress made when the code is well-thought out and clean at all levels (from syntax to the interface). The increasingly rare elegant solution takes it one step higher. You get so much more out of the work, since there are so many more places to leverage it. But of course spending the time to get it right means coming out of the gate that much slower, and it seems that people are increasingly impaitent these days. They just want a quick bandaid whether or not that will make the probem worse. 

Getting a mess back under control means having to say 'no' often. It's not a popular word and saying it comes with a lot of angst. It does not help that there are so many silver bullet approaches floating about out there. Anyone with little software knowledge is easily fooled by the over abundance of snake charmers in our industry. It's easy to promise that the work will be fast and simple, but experience teaches us that the only way to get it done is hard work and a lot of deep thinking. Writing code isn't that hard, you can teach it to high school students rapidly, but writing industrial strength code is a completely different problem.

I'd love to take off some more time and write another book that just lists out the non-controversial best practices that we've learned over the last few decades -- a software 101 primer -- but given that my last effort sold a massive 56 copies and I'm a wage slave it's not very likely that I'll get a chance to do this anytime soon. The trick I think is to shy away from the pop philosophies and stick to what we know actually works. Software development is easy to talk about, easy to thoerize about, but what often really works in practice is counter-intuitive for people with little experience. That's not unusal for complex systems and a large development projects contains millions of moving parts (people, code, technology, requirements, data, etc) with odd shifting dependencies. You can't understand how to organize such a volitile set of relationships without first devling deep into actual experience and even then it's hard to structure the understanding and communicate it. 

What flows around the production of the code is always more complex than the code itself and highly influenced by its environment. A good process helps the work progress as rapidly as possible with high quality results, most modern methodologies don't do that. That's one of our industries most embarrassing secrets and it seems to be only getting worse.

Hopefully one of these days I'll catch my breath and get some of my half-finished posts completed. There are some good lessons learned buried in those posts, it just takes time and patience to convert them into some shareable.

Sunday, June 16, 2013

Relationships

“Everything is relative in this world, where change alone endures.”

A huge problem in software development is to create static, rigid models of a world constantly in flux. It’s easy to capture some of the relationships, but getting them all correct is an impossible task.
Often, in the rush, people hold the model constant and then overload parts of it to handle the change. Those types of hacks usually end badly. Screwed up data is computer can often be worse than no data. It can take longer to fix the problem then it would to just start over. But of course if you do that, all of the history is lost.
One way to handle the changing world is to make the meta-relationships dynamic. Binding the rules to the data gets pushed upward towards the users, they become responsible for enhancing the model. The abstractions to do this are complex, and it always takes longer to build than just belting out the static connections, but it is often worth adding this type of flexibility directly into the system. There are plenty of well-known examples such as DSLs, dynamic forms and generic databases. Technologies such as NoSQL and ORMs support this direction. Dynamic systems (not to be confused with the mathematical ‘dynamic programming’) open up the functionality to allow the users to extend it as the world turns. Scope creep ceases to be a problem for the developers, it becomes standard practice for the users.
Abstracting a model to accommodate reality without just letting all of the constraints run free is tricky. All data could be stored as unordered variable strings for instance, but the total lack of structure renders the data useless. There needs to be categorization and relationships to add value, but they need to exist at a higher level. The trick I’ve found over the years is to start very statically. For all domains there are well-known nouns and verbs that just don’t change. These form the basic pieces. Structurally as you model these pieces, the same type of meta-structures reappear often. We know for example that information can be decomposed into relational tables and linked together. We know that information can also be decomposed into data-structures (lists, trees, graphs, etc) and linked together. A model gets construction on these types of primitives, whose associations form patterns. If multiple specific models share the same structure, they can usually be combined, and with a little careful thought, named properly. Thus all of the different types of lists can just one set of lists, all of the trees can come together, etc. This lifts up the relationships by structural similarity into a considerable smaller set of common relationships. This generic set of models can then be tested against the known or expected corner-cases to see how flexible it will be. In this practice, ambiguity and scope changes just get built directly into the model. They become expected.
Often when enhancing the dynamic capabilities of a system there are critics who complain of over-engineering. Sometimes that is a valid issue, but only if the underlying model is undeniably static. There is a difference between ‘extreme’ and ‘impossible’ corner-cases, building for impossible is a waste of energy. Often times though, the general idea of abstraction and dynamic systems just scares people. They have trouble ‘seeing it’, so they assume it won’t work. From a development point of view that’s where encapsulation becomes really important. Abstractions need to be tightly wrapped in a black-box. From the outside the boxes are as static as any other piece of the system. This opens up the development to allow a wide range of people to work on the code, while still leveraging a sophisticated dynamic behavior.
I’ve often wondered about how abstract a system could go before it’s performance was completely degraded. There is a classic tradeoff involved. A generic schema in an RDBMS for example will ultimately have slower queries than a static 4th NF schema, and a slightly denormalized schema will perform even better. Still, in a big system, is losing a little bit of performance an acceptable cost for not having to wait for 4 months for a predictable code change to get done? I’ve always found it reasonable.
But it is possible to go way too far and cause massive performance problems. Generic relationships wash out the specifics and drive the code to being in NP-complete or worse. You can model any and everything with a graph, but the time to extract out the specifics is deadly and climbs at least exponentially with increases in scale. A fully generic model of everything just being a relationship between everything else is possible, but rather impractical at the moment. Somewhere down the line, some relationships have to be held static in order for the system to perform. Less is better, but some are always necessary.
Changing relationships between digital symbols mapped back to reality is the basis of all software development. These can be modeled with higher level primitives and merged together to avoid redundancies and cope with expected changes. These models drive the heart of our software systems, they are the food for the algorithmic functionality that helps users solve their problems. Cracks in these foundations propagate across the system and eventually disrupt the user’s ability to complete their tasks. From this perspective, a system is only as strong as its models of reality. It’s only as flexible as they allow. Compromise these relationships and all you get is unmanageable and unnecessary complexity that invalidates the usefulness of the system. Get them right and the rest is easy. 

Saturday, June 1, 2013

Process

A little process goes a long way. Process is, after all, just a manifestation of organization. It lays out an approach to some accomplishment as a breakdown of its parts. For simple goals the path may be obvious, but for highly complex things the process guides people through the confusion and keeps them from missing important aspects.
Without any process there is just disorganization. Things get done, but much is ignored or forgotten. This anti-work usually causes big problems and these feed back into the mix preventing more work from getting accomplished. A cycle ensues, which among other problems generally affects morale, since many people start sensing how historic problems are continuously repeating themselves. Things either swing entirely out of control, or wise leadership steps in with some "process" to restore the balance.
Experience with the chaotic none-process can often lead people to believe that any and all processes are a good thing. But the effectiveness of process is essentially a bell curve. On the left, with no process, the resulting work accomplished is low. As more process is added, the results get better. But there is a maximal point. A point at which the process has done all that it can, after which the results start falling again. A huge over-the-top process can easily send the results right back to where they started. So too much process is a bad thing. Often a very bad thing.
Since the intent of having a process is to apply organization to an effort, a badly thought out process defeats this goal. At its extreme, a random process for example, it is just formalized disorganization. Most bad processes are not truly random but they can be overlapping, contradictory or even have huge gaps in what they cover. These problems all help reduce the effectiveness. Enough of them can drive the results closer to being random.
Since a process is keyed to a particular set of activities or inquires, it needs to take the underlying reality into account. To do this it should be drafted from a 'bottom-up' perspective. Top-down process rules are highly unlikely to be effective primarily because they are drafted from an over-simplification of the details. This causes a mismatch between the rules and the work, enhancing the disorganization rather than fixing it.
Often bad process survives, even thrives, because its originators incorrectly claim success. A defective software development process, for instance, may appear to be reducing the overall number of bugs reaching the users, but the driving cause of the decreases might just be the throttling of the development effort. Less work gets done, thus there are less bugs created, but there is also a greater chance for upper management to claim a false victory.
It's very easy to add complexity to an existing process. It can be impossible to remove it later. As such, an overly complex process is unlikely to improve. It just gets stuck into place becoming an incentive for any good employees to leave, and then continues to stagnate over time. This can go on for decades. Thus arguing for the suitability of a process based on the fact that its been around for a long time is invalid. All it shows is that it is somewhat better than random, not that is is good or particularly useful in any way.
Bad process leaves around a lot of evidence lying around that it is bad. Often the amount of work getting accomplished is pitifully low, while the amount of useless make-work is huge. Sometimes the people stuck in the process are forced to bend the truth just to get anything done. They get caught between getting fired for getting nothing done or lying to get beyond the artificial obstacles. The division between the real work and its phantom variant required by the process manifest into a negative conflict-based culture.
For software, picking a good process is crucial. Unfortunately the currently available choices out there in the industry are all seriously lacking in their design. From experience the great processes have all been carefully homegrown and driven directly by the people most affected by them. The key has been promoting a good engineering culture that has essentially self-organized. This type of evolution has been orders of magnitude more successful than going out and hiring a bunch of management consults who slap on a pre-canned methodology and start tweaking it.
That being said, there have also been some horrific homegrown processes constructed that revel in stupid make-work and creatively kill off the ability to get anything done. Pretty much any process created by someone unqualified to do so is going to work badly. It takes a massive amount of direct experience with doing something over and over again before one can correctly take a step back and abstract out the qualities that make it successful. And abstraction itself is a difficult and rare skill, so just putting in the 10,000+ hours doesn't mean someone is qualified to organize the effort.
Picking a bad process and sticking to is nearly the same as having no process. They converge on the same level of ineffectiveness.

Monday, May 20, 2013

Death by Code

A mistake I've commonly seen in software development is for many programmers to believe that things would improve on a project, if they only had more code.
It's natural I guess, as we initially start by learning how to write loops and functions. From there we move onto to being able to structure larger pieces like objects. This gradual broadening of our perspective continues, as we take on modules, architectures and eventually whole products. The scope of our understanding is growing, but so far its all been contained within a technical perspective. So, why not see the code as the most important aspect?
But not all code is the same. Not all code is useful. Just because it works on a 'good' day doesn't mean that it should be used. Code can be fragile and unstable, requiring significant intervention by humans on a regular basis. Good code not only does the right thing when all is in order, but it also anticipates the infrequent problems and handles them gracefully. The design of the error handling is as critical (if not more) than the primary algorithms themselves. A well-built system should require almost no intervention.
Some code is written to purposely rely on humans. Sometimes it is necessary -- computers aren't intelligent -- but often it is either ignorance, laziness or a sly form of job security. A crude form of some well-known algorithms or best practices can take far less time to develop, but it's not something you want to rely on. After decades we have a great deal of knowledge about how to do things properly, utilizing this experience is necessary to build in reliability.
Some problems are just too complex to be built correctly. Mapping the real world back to a rigid set of formal mechanics is bound to involve many unpleasant trade-offs. Solving these types of problems is definitely state-of-the-art, but there are fewer of these out there than most programmers realize. Too often coders assume that their lack of knowledge equates to exploring new challenges, but that's actually quite rare these days. Most of what is being written right now has been written multiple times in the past in a wide variety of different technologies, It's actually very hard to find real untouched ground. Building on past knowledge hugely improves the quality of the system and it takes far less time, since the simple mistakes are quickly avoided.
So not all code is good code. Just because someone spent the time to write it doesn't mean that it should be deployed. What a complex system needs isn't more code, but usually less code that is actually industrial strength. Readable code that is well-thought out and written with an strong understanding of how it will interact with the world around it. Code that runs fast, but also is defensive enough to make problems easier to diagnose. Code that fits nicely together into a coherent system, with some consistent form of organization. Projects can always use more industrial strength code -- few have enough -- but that code is rare and takes time to develop properly. Anything else is just more "code".

Sunday, April 21, 2013

Monitoring

The primary usage of software is collecting data. As it is collected, it gets used to automate activities directly for the users. A secondary effect from this collection is the ability to monitor how these activities are progressing. That is, if you've build a central system for document creation and dissemination, you also get the ability to find out who's creating these documents and more importantly how much time they are spending on this effort.
Monitoring the effectiveness of some ongoing work allows for it to be analyzed and improved, but it is a nasty double-edged sword. The same information can be used incorrectly to pressure the users into performing artificial speedups, forcing them to do unpaid work or to degenerate their quality of effort. In this modern age it isn't unusual for some overly ambitious upper-management to demand outrageous numbers like 150% effort from their staff. In the hands of someone dangerous, monitoring information is a strong tool for abuse.They do this to get significant short-term gains but these come at the expense of inflicting long-term damage. They don’t care, they're usually savvy enough to move on to their next gig long before that debt actually becomes a crisis.
Because of its dual nature, monitoring the flow of work through a big system is both useful but also difficult. It is done well when it gets collected, but is limited in its availability. Software that rats out its users is not appreciated, but feedback to help improve the working effectiveness is. One way of achieving this latter goal is to collect fine-grained information about all of the activities, but only make it available as generalized anonymous statistics. That is, you might know the minimum and maximum times people spend on particular activities, but all management can see is the average and perhaps the standard deviations. No interface exists for them to pull up the info on a specific user, so they can’t pressure or punish them.
Interestly enough, when collecting requirements for systems, fine-grained monitoring often shows up. Not only that, but there is usually some 'nice sounding' justification for having it. Most of software development these days is oriented to giving the ‘stakeholders’ exactly what they want, or even what they ask for, but this is one of those areas where professional software developers shouldn't bow directly to the pressure. It takes some contemplation, but a good developer should always empathize with their users -- all of them -- and not build anything that they wouldn't like applied to themselves. After all, would you really be happy at work if you had to do something demeaning like punch a timecard in and out? If you don't like it why would anyone else?

Sunday, March 24, 2013

Organization

“A place for everything, everything in its place.”


As Benjamin Franklin pointed out there are two parts to organization. The first is that absolute everything needs to fit somewhere. With software this really translates to having a solid 'reason' for every little bit in the system, be it config files, methods, data etc. and a reason for its location. It all needs its place, and often for that it also needs some level of categorization. "It doesn't matter" is synonymous with "it's not organized".


This includes names, conventions, even coding styles. Everything. Each tiny piece of work, each little change should all have its place. It should all have a reason for being where it is, as it is.


The second half of the quote is just as important. What's the point of having an organizational scheme if its not being used. As new things are added, they need to be added into their place. Sometimes it’s clear where they belong, but often times it hasn't been explicitly documented. Assuming that lack of documentation equates to lack of organization (and as such it is a free for all) is a common failing amongst inexperienced coders. Things can be organized with having an accompanying manual.


Failing to keep a development project organized is the beginning of the end. Disorganization is probably the worst aspect of technical debt because it is a direct path into various forms of spaghetti. And spaghetti is a quagmire for time, either to fix bugs or to extend out the functionality.


Part of software development culture over the last couple of decades has been a strong desire for 'freedom'. Freedom is great, but used improperly it just becomes an excuse for creating disorganization. For instance, utilizing the full freedom to create a new screen in a much fancier way than the existing ones is really just breaking the organization of what is already there. It's like helping someone dry the dishes in their kitchen, but then insisting on placing the clean stuff in any other location than where it actually belongs. It isn't really helpful is it?


It's true that in some regards a well-organized existing development project is a boring one. If the work isn't exceeding the existing organizational bounds then it can really just fit in like all the other pieces. But success in development doesn't come from making any one small piece of the system great, it's an all or nothing deal. The system is only as good as its crappiest screen or stupidest functionality. Thus changes to enhance it or its organization can't be selective. The whole thing needs to be fixed or it’s actually just making it worse.


If you're evaluating an existing development project, disorganization is easily the best indicator of the future. A small project can get away with disorganization, but anything larger absolutely needs organization to survive. If it is lacking or failing, then the whole effort is in deep trouble.

Sunday, March 3, 2013

Deep Underground

There are two distinct sides to software development. Since software is a solution to someone’s problem -- a tool to help them in some way -- it lies in a specific business or ‘domain’. Its purpose is tied directly to helping automate, track and organize very specific information. But the software itself is constructed using parts like computers, networks, libraries, programming languages, etc. so it is composed of and limited by ‘technology’. Both the domain and the technology are necessary ingredients in being able to help users, rather than make their issues worse.

If a software development project goes wrong by mostly ignoring the domain issues we usually say that it was built in an ‘Ivory Tower’. It’s a nice metaphor to describe how the developers may have completed the technical side, but they were too far up and away from the domain problems. The software is useless because the creators didn’t bother to dig into the details.

But what about the opposite problem? What if the developers pay close attention to the domain, but fail entirely to properly handle the technical issues?

Since that is fundamentally the opposite of an Ivory Tower we can flip the terminology. An ‘Ebony Dungeon’ project then is one that delves deep into the heart of the domain, but so deeply that it ignores the technology side. We often see this in in-house projects where the business or domain experts exert too much influence over the process, techniques and construction of the software.

Most domains have to tie themselves closely to some form of revenue stream. Those ties mean that they need to react quickly to changes. That sets up a culture of just focusing heavily on the immediate short-term, trying to be as malleable as possible. That works for business, but a large software development project is actually a very slow moving ‘ship’. It slowly goes from version to version, plodding along for years. While the business may need to wobble back and forth as its demands change, the only efficient path for development is to steer as straight a course as possible. There are a lot of moving parts in software and to get them all working properly they need to be organized; put into their proper places. A constantly shifting direction undoes any of this organization, which wastes time and causes severe problems.

A large system full of useful domain functionality isn’t actually very useful if it is technologically unstable. If it crashes frequently or is prone to serious bugs because of mass redundancies or if its performance is dismal, all of that existing functionality is unaccessible. A smaller more stable system would be far more effective at helping the users. The features available should work properly, be complete and be organized.

A very clear symptom that a project is trapped in an Ebony Dungeon is that most of the decisions keep getting punted back to the domain experts. “We’ll have to ask the business what they want”. If the project is balanced then at least 50% of the effort is related to the technical side. That includes using industrial strength components and algorithms, keeping the code base clean and insuring that the operations and installations side are built up to an automated level as well. Technical debt is unavoidable, but it needs to be controlled and managed with the same importance as any other aspect of the project.

In areas like GUIs, industry conventions should trump individual's preferences, so that the screens don’t become a sea of eclectic behaviors and that the functionality isn’t randomly distributed throughout the screens. Failure to properly organize the functionality at the interface level will cause a failure to use the functionality at the user level. Proper organization of a huge number of features is an extremely difficult problem that takes decades to master. A domain expert may understand the functionality requirements, but organization is just as, if not more, critical.

Being trapped in a dungeon exerts a lot of pressure on the programmers to create code as fast as possible. This usually manifests itself as a significant amount of “copy and paste” programming. Old code is copied over and then hastily modified with a small number of differences. We’ve known that this style of development is extremely bad for decades, but it is still commonly used to satisfy time pressure. Programmers need time to understand and refactor their work if its going to be extended properly. A rushed job is a sloppy job.

An Ivory Tower system misleads its backers because the real problems don’t become apparent until the users start working with the system. An Ebony Dungeon system also misleads its backers because it starts off fast and agile but slams into a wall when the work is to be extended. What appeared to be a big success in the early days ends up dying a premature death, usually costing way more than it should.

The software industry has swung rather heavily towards Ebony Dungeons lately. It’s easy to do because domain experts over-simplify the real work involved to build a system then they get mislead by the rapid progress. Without an experienced development crew it becomes easy to miss all of the symptoms of a project in deep trouble. Most that are dying get caught up in tunnel vision, thinking that just a few more features will turn the direction around and save the project. But “just a few more...” is actually the problem.

The best way out of a dungeon is to properly partition the system requirements. Everyone talks about ‘user requirements’ but they are only a small subset of what’s needed for a successful system:

  • user requirements
  • management requirements
  • technical operations/release requirements
  • support/debugging requirements
  • development/programmer requirements

If you factor in all of the different requirements necessary to build and run the system, you see that ‘features’ are only one aspect of the work. If you ignore the other four (or more) categories than obviously there will be serious problems with the system. If all of the requirements come from the domain experts, since they don’t know about or understand the other issues they won’t place any importance on getting them done.

The real art in building systems is to not go too high or too low, but rather to build on solid ground that is accessible to everyone. Towers and dungeons are equally bad.

Saturday, February 23, 2013

Systemic Breakdown

One of my favorite parts about building software is that I am free to indulge my endless curiosity. I prefer to dig deeply into the domains I’ve tackled so that I can tailor my solutions to actual problems, not just the high-level perception of them. The world is wonderfully complex place and it can be fascinating to delve into the details and underlying history. 

To really solve a person’s problem with some code running on a computer you have to understand the root causes. A 10,000 foot view is just too high to grok the details, which often become counter-intuitive as you descend. As I've switch from domain to domain, I've obviously found huge differences between the issues but I’ve also found many underlying consistent patterns.
For this post I thought I'd pull back a little and discuss one of these larger veins running through much of my work. I’ll pick a very abstract example, but only because I prefer not to antagonize both my past and future employers.

We can start with the distinction between formal and informal systems. Formal systems are such because the underlying rules are rigid. All things must follow the rules (or they are invalid and no longer within the system). A few examples are mathematics and running software programs.
Informal systems on the other hand often involve people, although not always. The rules are mostly followed, but there are always exceptions. Informal systems are not rigorously constrained. There are a huge number of examples of informal systems, but any sort of social interaction -- companies, governments, etc. -- form the basis of many of them. Basically all you need is some 'objects' (entities, nouns, etc) and a set of rules that are applied to them.

For an overly simple example I keep coming back to cars driving on a highway. The underlying road supports a number of vehicles that are attempting to get somewhere, each with a unique destination. There are mandated constraints such as following the speed limit and staying in particular lanes. Many highways these days are challenged with a lot of problems: congestion, accidents, speeding etc. They are fairly simple systems, but their behavior is fairly complex.

Because it doesn’t really matter, I’ll focus my discussion loosely around the highways in Ontario, Canada that I drive regularly. Please don’t get too lost in the specifics, I’m only using these examples to illustrate the larger patterns that are common amongst similar and very different informal systems. Cars, rules and roads just help as reference points, nothing more.

A key constraint when driving on the highway is a speed limit. The rules and limits on speeding were set long ago, most likely when cars were considerably less sophisticated. So for instance a speed limit of 100 km/h was probably created long ago as a protective measure to decrease the likelihood of serious accidents. When highways were poorly paved and cars crude, no doubt these rules made a great deal of sense and contributed significantly to the safety of the roadways. These days, for a well-maintained highway, modern cars which handled better are quite capable of navigating the same terrain at a significantly higher speeds. That is, 140 km/h for many drivers in a modern cars is well within their ability to react correctly to hazards, road conditions and other vehicles. Because of this, in Ontario at least, 'speeding' is epidemic. Everybody exceeds what seems like the relatively slow limit of 100 km/h. 

The response to people increasingly speeding was the creation of traffic police whose role is to enforce the rules. We can see this as a subsystem that sprang into life because of a degeneration of drivers in obeying rules like the speed limit (although there are probably many other causes). This new subsystem quickly figured out that ticketing people, beside enforcing the rules, was extremely profitable. And even more profitable if they put less effort into the overall ‘general’ enforcement but instead choose to hang out at inconvenient locations where people tend to speed naturally. Thus ‘speed traps’ sprang up from perhaps rather noble goals, but financial incentives quickly had considerable influence. Once create the local drivers quickly figured out where they were located, so they shifted their behavior to speed only in areas that they knew were safe. 

Summarizing: this informal system rule for speed limits sprang up, degenerated, was buttressed with a new subsystem of ticketing but that too degenerated. The key point here is that when we are analysing the creation of speed traps we need to go beyond just the obvious cause, that more people were violating the speed limit. Getting to the 'root' causes is extremely important in analysing informal systems. It is easy to say that speed limits and traps exist because people are speeding more often, but that's not really capturing the whole picture. Rather the deeper root cause is that modern cars started allowing people to drive at higher speeds. They handle better, have bigger engines and can accelerate rapidly. The root cause for speeding is an advance in technology and that technology was getting applied by a large number of drivers to violate a pre-existing set of rules, which is a downstream consequence.

What's particularly true of informal systems is that their interactions are extremely complex. To fix the speeding problem by just creating traffic police is not enough, and because it didn't really fix the problem the traffic enforcement directives gravitated towards finding their own usefulness (generating money). Thus adding speeding tickets to the overall system did not really improve it in any significant way. Some people probably speed less, but most people are likely to just be more picky about where they are speeding. At various times and locations speeding is reduced, but many drivers are probably over-compensating now for the added inconvenience. People aren't going to stop speeding for instance, and their expectation for travel time isn’t going to change.  

A solution like raising the speed limit wouldn't fix the issue either. The newer cars may allow good drivers to go faster, but highways contain more than just good drivers. They contain many poor ones, without the ability to safely handle higher speeds. It's perhaps this constraint that keeps the authorities from changing the rules. A faster highway might be significantly more dangerous if there are enough drivers on the road that can't react fast enough. So the average or less-skilled driver keeps the status quo intact, but as in the case of Toronto people simply moved forward and start creating their own rules of thumb for how fast to drive. Many in TO believe for example that +10 km/h for the city and +20 km/h for the highway are acceptable. This is so common that in the absence of any traffic police nearby, this is frequently the pace of most vehicles. Thus a breakdown in the informal system causes a creation of another even less formal system built on top.
Informal systems based on people always morph as time progresses. Some systems such as the highways find their own equilibriums by spawning off new subsystems. Some systems fluctuate, but generally drive themselves downward. We know how to make things more complex, but we seem to lack in the ability to reduce complexity. A degenerating informal system can often acquire enough unwarranted artificial complexity that its only path forward is to explode. The issues simply can't be fixed anymore.

The idea of speeding tickets was no doubt someone’s best effort to deal with the speeding problem. But as is true for all complex informal systems, the whole systemic complexity exceeds the ability of most, if not all, people to visualize and correctly modify it. Speeding tickets created speed traps. Speed traps are a rather grey aberration of the laws of highways, since they aren't helping to keep locals from speeding, but rather they are preying on unsuspecting strangers. This causes a hierarchy of drivers, creating a division between 'us' and 'them'. In that sense it is morally questionable, since a civilized society these days demands that the rules are applied equally to all people, not just some. A speed trap functions because it is based on ignorance. Non-local drivers following the normal rules of thumbs are caught simply because they strayed too far from their own locale. Stated that way, the enforcement part of society is actually preying on people for revenue, not to uphold the honor and virtue of the rules. 

If technology got traffic into trouble, it is likely that it will also get it out of it. Automatic driving may allow cars to chain together into larger loosely affiliated trains headed into common directions. If that comes to pass cars will scout for a suitable train, then join up. As that becomes more common, the efficiencies of the highway -- which are poor now and getting worse -- will start to improve creating less traffic congestion. These trains of cars will follow limits set by computers, so speeding tickets and speed traps will become a thing of the past. Of course some people will continue to 'manually' drive, but official enforcement of any remaining rules will likely shift to some other basis, like asserting that speeds of 140 km/h are 'reckless' rather than 'speeding'. The rule remains, but gets generalized to include a wider collection of behaviors.

Informal systems are everywhere. They form the basis of all human organizations and they fracture into an endless variety of associated subsystems. These days computers have made it possible to quickly introduce new poorly thought-out rules and enforce them, so the underlying complexity has exploded. Most large companies, for instance, are so outrageously complicated that no single human could hope to know or understand the basis behind even a fraction of the rules.Thanks to badly designed software and archaic historic rules many of these are in various stages of systemic breakdown, with rules degenerating and inefficient subsystems spewing off everywhere to replace high-level but hopeless directives. Computers have the capacity to solve problems for people, but their dark side is that they can allow for massive increases in artificial complex that can quickly get gouged into the status quo. Ambitious, but misguided people tunnel onto little aspects of the overall problem and seek change, but once the system has gone beyond a human's ability to comprehend, most such changes ultimately make the problems worse not better. 

Getting back to traffic, if someone really wanted to stop speeding and installed a massive number of high-tech cameras to cover every inch of the road, they might think that they'd be able to stop anyone, anywhere, from speeding. But sorting through that footage would be daunting, and just creating the infrastructure to fine and collect all of the tickets would be massive. So, ultimately there would still be blind spots, and over time these would become known. People, hoping to reduce their travel time would fly recklessly through the blind spots because they were forced to slow down once in the spotlight. Then some areas would become safer for driving, but some, well... they'd just push the envelop for danger until they too caused various spin offs. To fix complex system breakdowns the one thing that definitely doesn't work is to narrow the focus down to something believed to be 'tangible'. That type of tunnel vision only begs for disaster.

Saturday, January 19, 2013

Quality and Scale

[LACK OF EDITOR'S NOTE: I wrote and posted this entirely on my iPad, so there are bound to be spelling and formatting problems. Please feel free to point them out, I'll fix them as I go.]

I use a couple of metrics to size development work. On one axes I always consider the underlying quality of the expected work. It's important because quality sits on at least in a logarithmic space. That is small improvements in quality get considerable more expensive as we try for better systems. On the other axes I use scale. Scale in also at least logrithmic in effort.
My four basic catigories for both are:

Software Quality
- Prototype
- Demo
- In-house
- Commercial

Project Scale
- Small
- Medium
- Large
- Huge

Software Quality

The first level of quality are prototypes. They are generally very specific and crude programs that show a proof of concept. There is almost no error handling, and no packaging. Sometimes these are built to test out new technologies or algorithms, sometimes they are just people playing with the coding environment. Proototypes are imortant for reducing risk, they allow experience to be gained before commiting to a serious development effort.

Demos usually bring a number of different things together into one package. They are not full solutions, rather they just focus on 'talking points'. Demos also lack reasonable error handlng and packaging, but they usually show the essence of how the software will be used. Occasionally you see someone release one as a commercial product, but this type of premature exposure comes with a strong likelihood of turning off potential users.

My suspicion is that in-house software accounts for most of modern software development. What really identifies a system as in-house is that it is quirky. The interface is a dumping ground for random disconnected functionality, the system looks ugly and it's hard to navigate around. Often, but not always, the internal code is as quirky as the external appearance. There is little architecture, plenty of duplication and usually some very strange solutions to very common problems. Often the systems have haphazaardly evolved into their present state. The systems get used but its more often because the audience is captive, given a choise they'd prefer a more usable product. Many enterprise commercial systems are really in-house quality. Sometime they started out higher, but have gradually degenerated to this level after years of people just dumping code into them. Pretty much an unfocused development project is constrained by this level. It takes considerable experience, talent, time and focus to lift the bar.

What really defines commercial quality is that the users couldn't imagine life without the software. Not only does it look good, it's stunningly reliable, simple and intuitive to use. Internally the code is also clean and well organized. It handles all errors correctly, is well packaged and requires minimal or no support. Graphic designs and UX experts have heavily contributed to give the solution both a clear narrative and a simple, but easily understood philosopy. A really great example really does solve all of the problems that it claims to. Even the smallest detail has been though-out with great care. The true mastery of programming comes from making a hard problem look simple; commercial systems require this both to maintain their quality and to support future extensions. Lots of programmers claim commercial quality programming abilities, but judging from our industry very few can actually operate at this level. As user's expectation for scale and shortened development times have skyrocketed, the overall quality of software has been declining. Bugs that in the past would have caused a revolt or lawsuit are now convientently overlooked. This may lead to more tools available, but it also means more time wasted and more confusion.

Project Scale

It is imossible to discuss project scale without resorting to construction analogies. I know programmers hate this, but without some tangilble basis, people easily focus on the wrong aspects or oversimplify their analysis. Grounding the discusion with links to physical reality really helps to visualise the work involved and the types of experience necessary to do it correctly.
A small project is akin to building a shed out back. It takes some skill, but it is easily learned and if there are the inevitable problems, they are relative well contained. A slightly wonky shack still works as a shack, it may not look pretty but hey, it's only a shack. Small projects generally come in around less than 20,000 lines of code. It's the type of work that can be completed in days, weeks or a few months by one or two programmers.

A medium project is essentially a house. There is a great deal of skill in building a house; it's not an easy job to complete. A well-built house is impressive, but somewhere in the middle of the scale it's hard for someone living in the house to really get a sense of the quality. If it works well enough, then it works. Medium projects vary somewhat, falling in around 50,000 lines of code and are generally less than 100,000. Medium projects usually require a team, or thet are spread across a large number of years.

You can't just pile a bunch of houses on top of each other to get an apartment building. Houses are made of smaller, lighter materials and are really scaled towards a family. Building an apartment building on the other hand, requires a whole new set of skills. Suddenly things like steel frames become necessary to get the size beyond a few floors. Plumbing and electricty are different. Elevators become important. The game changes, and the attention to detail changes as well. A small flaw in a house, might be a serious problem in an apartment building. As such more planning is required, there are fewer options and bigger groups need to be involved, often with specializations. For software, large generally starts somewhere after 100,000 lines but can also get triggered by difficult performance constraints or more than a trvial number of users. In a sense it's going from a single family dwelling into a system that can accomodate significatly larger groups. That leap upwards in complexity is dangerous. Crossing the line may not look like that much more work, but underneath the rules have changed. It's easy to miss, sending the whole project into what is basically a brick wall.

Skyscapers are marvels of modern engineering. They seem to go up quickly, so it's easy to miss their stunning degree of sophistication, but they are complex beasts. It's impressive how huge teams come together and managed to stay organized enough to achieve these monuments. These days, there are many similar examples within the software world. Sytems that run across tens of thousands of machines or cope with millions of users. There isn't that much in common between an apartment building and a skyscaper, although the lines may be somewhat blurred. It's another step in sophistication.

Besides visualization, I like these categories because it's easy to see that they are somewhat independent. That is, just because someone can build a house doesn't mean they can build a skyscaper. Each step upwards requires new sets of skills and more attention to the detail. Each step requires more organization and more manpower. It wouldn't make sense to hire an expert from one category and expect them to do a good job in another. An expert house-builder can't necessarily build a skyscraper and a skyscraper engineer may tragically over-engineer a simple shed. People can move of course, but only if they put aside their hubris and accept that they are entering a new area. This -- I've often seen -- is true as well for software. Each bump up in scale has its own challenges and its own skills.

Finally

A software project has an inherent scale and some type of quality goals. Combining these gives a very reliable way of sizing the work. Factors like the environment and experience of the developers also come to play, but scale and quality dominate. If you want a commercial grade skyscaper for instance, it is going to be hundreds of man-years of effort. It's too easy to dream big, but there are always physical constraints at work, and as Brookes pointed out so very long ago, there are no silver bullets.

Sunday, January 6, 2013

Potential

Computers are incredibly powerful. Sure they are just stupid machines, but they are embodied with infinite patience and unbelievable precision. But so far we’ve barely tapped their potential, we’re still mired in building up semi-intelligent instruction sets by brute force. Someday however, we’ll get beyond that and finally be able to utilize these machines to improve both our lives and our understanding of the universe.

What we are fighting with now is our inability to bring together massive sets of intelligent instructions. We certainly build larger software systems now then in the past, but we still do this by crudely mashing together individual efforts into loosely related collections of ‘functionality’. We are still extremely dependent on keeping the work separated, e.g. apps, modules, libraries, etc. These are all works of a small groups or individuals. We have no real reliable ways of combining the effort from thousands or millions of people into focused coherent works. There are some ‘close but no cigar’ examples, such as the Internet or sites like Wikipedia where they are a collection from a large number of people, but these have heavily relied on being loosely organized and as such they fall short of the full potential of what could be achieved.

If we take the perspective of software being a form of ‘encoded’ intelligence, then it’s not hard to imagine what could be created if we could merge the collective knowledge of thousands of people together into a single source. In a sense, we know that individual intelligence ranges; that is some people operate really smartly, some do not. But even the brightest of our species isn’t consistently intelligent about all aspects of their life. We all have our stupid moments where we’re not doing things to our best advantage. Instead we’re stumbling around, often just one small step ahead of calamity. In that sense ‘intelligence’ isn’t really about what we are thinking internally, but rather about how we are applying our internal models to the world around us. If you really understood the full consequences of your own actions for instance, then you would probably alter them to make your life better...

If we could combine most of what we collectively know as a species, we’d come to a radically different perspective of our societies. And if we used this ‘greater truth’ constructively we’d be able to fix problems that have long plagued our organizations. So it’s the potential to utilize this superior collective intelligence that I see when I play with computers. We take what we know, what we think we know, and what we assume for as many people as possible, then compile this together into massive unified models of our world. With this information -- a degree of encoded intelligence that far exceeds our own individual intelligence -- we apply it back, making changes that we know for sure will improve our world, not just ones based on wild guesses, hunches or egos.

Keep in mind that this isn’t artificial intelligence in the classic sense. Rather it is a knowledge-base built around our shared understandings. It isn’t sentient or moody, or even interested in plotting our destruction, but instead it is just a massive resource that simplifies our ability to comprehend huge multi-dimensional problems that exceeds the physical limitations of our own biology. We can still choose to apply this higher intelligence at our own discretion. The only difference is that we’ve finally given our species the ability to understand things beyond their own capabilities. We’ve transcended our biological limitations.