Business Models for Software Developers

Recently I’ve been reading Steve Blank’s articles on startups. One of the principles that Steve espouses is that a startup is an organization formed for the purpose of finding a profitable business model. He points out that most business plans are not successful, and the successful startup is ready to quickly iterate, adapt the plan to change, and try again. Anyone familiar with agile development methodologies will quickly see many parallels in his approach to the business of startups. This got me thinking about the various models and approaches for software development today. There are three primary business models for software developers: Consulting, Internal (IT) development, and commercial (external) development.


In consulting, the software developer finds someone with a problem and creates software to address that problem. In most cases, the software is custom tailored for the customer and typically the entire IP (intellectual property, such as copyrights) is sold to the customer. This is the least cost efficient form of software development because there is no reuse nor amortization of cost for the development of the software. Software produced as a consulting effort typically costs tens if not hundreds of thousands of dollars, and often seems rather lackluster in comparison to many commercial offerings. The advantage of custom software is that it is, like a tailored suit, customized to the customer’s requirements. Unfortunately, customers aren’t often skilled application designers and can end up asking for rather unusable software, and a unskilled consultant won’t be able to guide them towards a better solution.

Internal (IT) Software Development

Much like consulting, internal software development involves writing for customers (other departments within the company such as HR, Marketing, and Sales) but is typically full time employment. Companies may look at the cost of software projects and decide it is more economical to hire a group of developers and produce the software themselves. It shares most of the disadvantages of consulting, but has the advantage of being the target of most development tools and methodologies. Virtually every company with more than a few hundred employees has some degree of internal software development, and most development tools (Visual Studio, Eclipse, NetBeans) and methodologies (Agile, Spiral, Scrum, XP) target internal development scenarios.

Commercial Software Development

Commercial software is sold by companies often called Independent Software Vendors (ISVs) or “third party” vendors, and is referred to as a Commercial Off-The-Shelf (COTS) product. The entire purpose of an ISV is to create, market, and sell software. Unlike consulting and internal software, the idea behind commercial software is that you can write software once and sell the same product to multiple customers, each of whom share a small portion of the overall development cost. Because the cost of development is shared by all, tremendous resources can be put into creating commercial software, and it can be sold at a small fraction of the cost of development to anyone who wants it. Because many customers buy the same software, much greater investments are justified in its creation and maintenance. A disadvantage of commercial software is that the buyer generally has to adapt to conform to the software, rather than the software to the buyer. For example, if you buy QuickBooks, your bookkeeping will be largely dictated by how QuickBooks operates. This is often considered an acceptable trade-off, since QuickBooks costs a few hundred dollars, and having a custom solution implemented would cost hundreds of thousands of dollars.

Commercial Software: Markets

Compared to consulting and internal development in which there is a single primary customer (the company paying for the software), commercial software exists in large markets. These markets can be broadly divided into two categories: Horizontal and Vertical. These categories define the overall potential customer base any software can have.

Horizontal software applies broadly across disciplines and industries. Unsurprisingly, the largest and most profitable software companies tend to provide software to horizontal markets. Some of the most common examples are Microsoft Windows and Office (Word, PowerPoint, Excel, Access, Outlook). Other examples include WinZip, VMware, and Firefox. Being sold to a large number of people allows vendors to price the software at relatively low price points, while earning a potentially tremendous amount of money to support research and development. For example, Microsoft’s Office division sells products, typically in the range of hundreds of dollars per seat, yet earns $16 to $20 billion in revenue. Apple’s recently released iWork software (at $10 per app) is expected to earn $40 million annually, comparable to Google Docs.

Vertical markets are specific industries or customers with a common set of needs. By definition, vertical markets are slices of the overall horizontal market. For example, dentists, restaurants, retailers, doctors, lawyers, and developers are vertical markets. Software for these markets would be dental scheduling software, point-of-sale systems, health record databases, and software development environments and control libraries. Vertical market software tends to priced considerably higher than Horizontal market software due to the smaller number of potential and actual customers. It’s not uncommon to see a particular practice management application cost $30,000 up front, and maintenance fees anywhere from 20 to 50% annually.

In practice, there is a continuum between horizontal and vertical markets. Windows would be on the horizontal extreme, software that literally could be applied to anyone in any possible business. Adobe Photoshop would also be considered a horizontal, but you could find many examples of companies without any need for Photoshop. On the vertical side, Araxis Merge is definitely targeted towards the software developer vertical, while dental scheduling software is clearly to the extreme vertical side. Also, horizontal software markets include all possible vertical markets, and so horizontals are necessarily much larger markets.

Another facet of commercial software markets is the type of customer expected to buy the software. Customers can be classified broadly into consumers (individuals) and companies (organizations), and the number of potential customers varies with each. Like vertical and horizontal markets, this translates directly into the number of customers available to amortize the cost of developing the software. Here’s an example breakdown of types of customers:

  • Consumer
  • Prosumer
  • Technical user
  • Small business
  • SMB (Small and Medium Business)
  • Enterprise

Some software appeals to all types of consumer in all markets (Microsoft Windows, for example). Because of the sheer market size, Microsoft can afford to spend literally billions of dollars on research and development of Windows and Office, and sell them for a few hundred dollars per unit. If you look at enterprise focused software like Remedy (IT service desk management, if you didn’t know), it typically only appeals to enterprise customers, and with a limited number of those, the costs per unit skyrocket, and typically involve a long and expensive implementation period in which the software is customized to the particular customer’s environment.

If you have an Enterprise:Vertical plan, your total customer base will only ever number in the hundreds (one could argue that your potential customers are exactly the Fortune 500). If you have a Consumer-Enterprise:Horizontal plan, your potential customer base is in the hundred-millions. Broad horizontal markets are extremely attractive for this reason, and so they are heavily defended, and if you come up with something interesting, competition will be fierce quickly.

In the search for a profitable business model for a software technology company (arguably any company), understanding the potential and actual markets is critical. If you have a great idea for some software software solution, how many customer are even possible? How many are likely? What will you have to charge, and how much will is cost to build the software? Software startups fail at a rate of greater than 90%. Answering simple questions like this before you write a single line of code just might be worth it. Perhaps more importantly, what kind of market and development do you want to do? Because if you don’t even want to do it, it’s not likely you will succeed.


Rethinking Virtualization

My first encounter with virtualization was in 1988, when a company called ReadySoft released a product called the A-Max. Because the Amiga and Macintosh shared the same Motorola 68000 based CPU, the A-Max allowed the Amiga to run the Mac OS as a virtual machine. The benefits were obvious, you could invest in a single machine (RAM, hard drives, video cards) and get the benefit of both platforms, using and allocating resources as needed for the task at hand. If you needed a full 4MB of RAM given to the Mac, you could bump the memory and work in Photoshop, or if you wanted to leave some breathing room for the Amiga, you could always bump it down to 1MB. This is much like Windows XP mode today in Windows 7, and this type of use still shares the many benefits it had in 1988 on my Amiga running Mac OS 7.

Server virtualization is a bit different. Server virtualization is focused on exposing very specific applications, and typically exposing only a single application on a given VM. With server virtualization, we don’t tend to use VMs to allow us to continue to run old applications ad hoc on older operating systems, nor to enable access to a non-primary platform that we want to share hardware with as we do with client virtualization. The reason we virtualize servers is primarily an issue that revolves around isolation and security. It’s certainly much more efficient to run 40 web sites off a single IIS server than host 40 VMs, each hosting a minimal Windows installation, each running an independent instance of IIS. If someone manages to exploit the web server, they haven’t exploited the email server as well. It also simplifies configuration, as you don’t have to debug a complex set of configuration requirements for each application, potentially conflicting dlls, and so on.

To this point, it seems that the entire virtualization market and process has been driven by the physical hardware workstation model of virtualization. Current operating systems are designed entirely around a physical hardware stack. Even with JeOS versions of operating systems, there is this entire virtual hardware layer the entire application and OS are going through, and the applications are typically running on established application stacks (ASP.NET, LAMP, etc.) indirectly through this hardware. But what if this legacy didn’t exist? What if someone looked at why we virtualize today, and created a platform specifically to address it?

With CPU virtualization support built in, what would an operating system look like today if it were built from the ground up for a virtual machine? Also, what would a virtual machine look like if it didn’t have to expose hardware abstractions to physical operating systems? Virtual machines are simulations of physical hardware: network cards, hard drives, BIOS, video cards. For most server virtualization functions, this is all at best useless, and at worst a drain on resources. What if, instead, virtual machines were simply a set of well defined services for networking, memory, and storage? Services that virtual operating systems could use directly, without the weight of device drivers, filesystem drivers, and all that cruft from the physical world?

If you carry the JeOS approach to its logical conclusion, what you end up with is a hypervisor that exposes not virtual machines, but virtual platforms. Stacks like ASP.NET, Apache/PHP, Mono, Tomcat/JVM, or even entire applications as operating systems like SQL Server or Exchange. I suspect that we’ll eventually end up somewhere like this, where the lines between the platform and the operating system continue to be erased.

I think it’s about time for the next phase of virtualization, a virtual machine that isn’t a machine at all.


Estimating Chaos: Complexity Levels

One of the most important questions in software development (hopefully) right after “what is it that we are building?” is the question, “when is it going to be done?”
Unfortunately, the answer to the question “when is it going to be done?” isn’t a particularly easy question to answer for any reasonably complex or interesting software project. What’s worse is that software development itself is the most unpredictable activity on a software project. Compounding this problem even further is that while one might expect software projects to both over and underestimate, in fact software projects experience a chronic underestimation problem. This underestimation problem is frequently revealed as projects are decomposed into smaller and smaller pieces. Every time a task is decomposed and better understood, it tends to grow, but rarely does it shrink.
It is important to note that no single estimation technique is appropriate for every project, or even every stage of a project. What is true of any effective estimation technique though is that at its core, it is a function that takes a set of input and produces an immutable output. In other words, the only way to change the estimate is to change the inputs. You can’t argue with the estimate.

Time based estimation

Time based estimation is one of the more common traditional approaches. The projects in which time based estimation is most helpful are those with very little research content (mostly development on well understood, existing systems) and in systems where there are few differentiating capabilities. For example, on a maintenance project on a legacy system in which tasks are well understood “cookbook” style upgrades, historical records of how long a similar task took isn’t a terrible approach. Generally speaking though, time based estimation has been one of the more problematic approaches in many more active software projects, and especially greenfield and commercial development.
Some of the problems with time based estimation:

  • It gives the impression of precision where there is none. By the time you add the hours up, you have a number that looks like 300.25 days, but in actual fact you are probably looking at 160-600 days of work, and closer to 600.
  • It’s personal. Time based estimates devolve quickly into who is doing the work and how long it should take one person over another. It puts the individual at the forefront rather than the team. Approaches like “Ideal hours” and “Ideal days” attempt to remedy this, but further confuse the issue by having hours that’s aren’t actually hours.
  • More subject to pressure. Instead of determining the relative difficulty of a task, the development team and management argue about whether something should *really* take 8 hours or not, or whether it would take Joe 6 hours and Bob 8 hours, the focus is put simply on whether a particular task is more or less difficult than another.
  • Time requires constant revision with “fudge factors” to accommodate meetings, lunch, distractions, discussions, communication… all the things a healthy team should be doing are “overhead penalties.” The project’s tasks contain time data that doesn’t correspond to the actual project duration.
  • Time assumes all resources are equal and all tasks are equal over time. Losing certain people will have a more dramatic effect on schedule than others, and adding people may actually cost you. The cost of disruptions have to all be manually adjusted.

So if time based estimation has all these problems, what should we use? Over the past decade, point based estimation has been used to significantly improve predictability in software projects. Many agile development processes have adopted complexity points as the overall recommended estimation technique across the project. Point based estimation especially helps projects in which the requirements are not well understood, expected to change over the course of the project, and projects in which the desired capabilities are unique and have little or no historical data supporting them. Well disciplined point based estimation also accounts for and reports the effects of gaining and losing resources, training, time off, and even employee morale.

Complexity Points

Points are an arbitrary scale of numbers which typically follow an ever increasing sequence. One of the most important features of points is that they decouple estimation from time, so the numbers should not represent hours or days. What is important is to establish a relative scale of increasing effort and difficulty that the team all understands and agrees upon. (A sign that the team has internalized the scale is that they all come up with the same complexity point number for a given task in planning poker.) While time based estimates are very subject to interpretation and give the illusion of precision, points estimates are specifically and intentionally abstract, establishing relative difficultly of tasks within the context of the project and the people on the project. Because they calibrate to the team and project, they tend to become very accurate very quickly when used correctly. Typically the calibration only takes two iterations.
A healthy software project is a cooperative game that relies heavily on teamwork to be fast, efficient, and successful, so the estimation approach and tasking should serve that purpose and be based around teams rather than collections of people. The Team is the abstraction that produces estimates and is assigned work. The members of the team are an implementation detail. Points excel at predicting team performance over individual performance and support a more advanced model of collective code ownership among the team, eliminating bottlenecks and knowledge silos.

Complexity Levels

What I describe here is an augmented form of point based estimation designed to improve the precision of pointing, allow for faster and more effective resource transition onto the team, and improve the accuracy of the overall project estimate. It should be compatible with any agile process that support points, and because Scrum doesn’t prescribe any particular estimation technique (other than it not be time based) it works fine with Scrum as well.
Major goals:

  • Answer the question: When will it be done?
  • Increase the accuracy of the answer to the question.
  • Increase the precision of the answer to the question.
  • Monitor project health.
  • Show schedule slippage as soon as possible.
  • Estimate, measure, and task against a team, not a person.
  • Understand and account for all the effort expended indirectly on a project. Whether it’s meetings, lunch, holidays, overtime, training, losing or gaining a team member, the estimation approach should capture this and show the effect on the schedule.

So far, points mostly address the goals, but they’ve also got some drawbacks that can sometimes result in losing these advantages. One problem with points is that they often become a proxy for time. As people estimate with numbers, especially if they’ve done time based estimation in the past, they tend to start associating the points with hours or days. 2 points becomes 2 hours, and so on. It would be better to have a scale that isn’t as easy to start confusing with time. In order to overcome this, we’ll borrow a page from fuzzy logic estimation by assigning names to the levels of difficulty. Even though we’ll still translate “under the hood” to points.

Designing the complexity scale

What should we be estimating? We should be estimating stories or features. What we are tasking are capabilities or features of the system. There are many activities that support the creation of the product (design, testing, meetings, support, teamwork) and it is these very activities will be captured in the velocity of the project. In typical agile form, tasks should result in completed, tested, demonstrable functionality with all code and tests written and accounted for. The idea here is that the code should be in production quality shape at the end of each iteration, and we are not estimating the overhead required to make these stories or features reality. Tasks that are not directly tied to a shippable feature such as “support Developer in this task” or “have a meeting with Paul” should not be included or estimated in the task. Why not? Because these tasks do not result in executable code, they can skew the results dramatically while not informing what the actual execution of a feature will require. They are the things we do as part of accomplishing the task, and so that is how they are factored into the estimate. In the end, we want to know how long it will be before a feature is working, not how long it took to have a meeting to talk about the feature. All of these things still have an effect the schedule, but they have an effect behind the abstraction of velocity.
Because we humans have a tendency to underestimate large, complex systems, the scale should be designed to prevent that. On one end, the scale should reflect the simplest features of the system, the kind of features that would take less than a day. On the other end of the scale should be something that will be difficult, but can reasonably be expected to be finished within the iteration. If tasks that take 5 minutes (rename a file) are included with tasks that produce features or functional code, it can create an overestimation problem (aka sandbagging) by inflating lots of small cleanup to the level of a feature.
Five complexity levels is a good starting point, particularly for a team not accustomed to this estimation approach. Before you become an expert, five levels is about the limit of discrimination that a human can effectively perform. As you become better and more accurate, you can introduce finer grained levels and other terms, but I recommend starting simpler and increasing complexity only as it proves itself to be necessary, otherwise you’ll end up spending more time trying to understand the minor differences between 10 or 15 levels than estimating.
In a pure fuzzy logic based approach, we could pick any set of terms we liked, so long as the team all understands the relative size and approach. This could be T-Shirt sizes, dog breeds, drink sizes (short, tall, grande, venti, big gulp), or whatever else the team bonds with. So long as the team understand the relative and increasing level of difficulty, you can go with any terminology.
For the sake of example, this is a list of terms I have used in the past (inspiration obvious…):

  • Trivial
  • Simple
  • I can do it
  • Ouch
  • Hurt me, plenty!

Even though we can technically use any terminology, we can increase the effectiveness of our estimation by giving the terms objective characteristics. This serves three purposes: It gives context and objective criteria to new team members who haven’t been “in the know” about what a particularly dog breed represents to the team, it clearly identifies the necessary skill level needed for a task, and it helps defeat pressure-based influence on the estimates.

  • Trivial – A trivial task is one in which the approach and design are well understood, no research is expected to be needed, and is not a large effort.
  • Simple – This requires some thought, perhaps evaluation of a few different approaches, but it’s mostly just coding without having to break new ground.
  • I can do it – This is the first feature that could be considered challenging, but the expectation is that it won’t be too difficult, and you’re still excited. It will require some research, perhaps prototyping, and design work. This is a fun problem to solve.
  • Ouch – These tasks need significant research, design, prototyping, and are known to have snags and gotchas. There are definite unknowns, and you wouldn’t want to assign more than a couple of these to any one person.
  • Hurt me plenty – The most complex task that anyone believes could reasonably be accomplished in an iteration. There are enough unknowns that these seem risky, but not so risky as to need to be broken down further. As a calibration, the most senior developers on the team should be able to accomplish tasks of this complexity, though not without focus and effort.

To achieve the level of accuracy and precision we want, it is critical that the scale not include anything we don’t expect could be accomplished in an iteration. It is very important that if you are scheduling “Hurt me, plenty” stories and they aren’t happening, then your bracketing difficultly level will be lost and “Ouch” will become the new upper bound reducing the effectiveness of your estimates. If a complexity level ever exceeds the ability to accomplish it, then our ability to predict end date will suffer, or become entirely meaningless as large unknowns with unreliable numbers are included. Once we don’t know how large it really is, it could be twice as hard as an Ouch, or it could be fifteen times as hard. Typically, tasks should not be carried. If they are, the team is either signing up for too many points, or the scale is flawed. With disciplined, our velocity accuracy and precision should increase.
Velocity is a common metric in iterative agile development. Velocity is a remarkable simple, yet remarkably accurate measure of how fast a team can produce product capabilities and features. To calculate velocity, you just add the number of points the team accomplished in an iteration, and compare that with how many points you have to go. If you complete 10 points in an iteration, an iteration is 2 weeks, and you have 90 points to go, you can expect to be done in 9 more iterations. Also, velocity measures the gestalt performance of a team, not individuals. Adding a member to the team could reduce the velocity (as readers of Mythical Man Month might guess), and removing a (problematic) team member could increase the velocity. Many non-intuitive discoveries about productivity have been made by companies calculating accurate velocity. This is something velocity and points can identify that simple time based approaches are simply incapable of doing.
Gestalt is a German word that means form or shape. In English usage, it refers to a conceptual “wholeness.” It is often said that the gestalt of something is “different than and greater than the sum of its parts.” When a team becomes a high performance team, a new entity emerges that performs orders of magnitude greater than what the group of individuals on that team could ever do independently. Particularly among experienced developers, high performance teams tend to create themselves when permitted, and most high performance teams simply form in the absence of management interference, and gestalt teams are not actually difficult to create.
Now, so far, we haven’t assigned points to our complexity levels. As with complexity points, the points should be an ever increasing scale, and reflect the magnitude of difference between the problems. For example, if we use powers-of-two, will a “Hurt me, plenty” take 16 times as long as a “Trivial”, and could I reasonably expect to do 16 “Trivial” tasks or 1 “Hurt me” task in an iteration? This is also why decoupling the complexity level from the numeric values will allow us to tweak the scale as we start measuring our performance. It may be that the fibonacci series more accurately reflects the actual results, or it could be that we determine the best fit mathematically from history project performance data.
Think of the iteration as a truck, and the complexity points as boxes of various sizes. When you plan the iteration, you look at the moving average of the previous iterations, and attempt to fill the truck to capacity for the next iteration. So, if the team has accomplished 20 points of complexity, they should be able to do 20 1-point tasks, or 1 16-point task and 1 4-point task, and so on. The question is, does this prove to be true? What if the team can do 40 1-point tasks, but only 2 8-point tasks? In that case, it may be that the 1 point tasks are best described as 2 points, but the 8 point tasks are fairly accurate. The fibonacci sequence beginning at 2 is a better fit for where estimates are landing.
Powers of 2: 1, 2, 4, 8. 16
Fibonacci: 1, 2, 3, 5, 8
Fibonacci starting at 2: 2, 3, 5, 8, 13
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
Think about the “midpoint” of any software project you’ve been on. If you said at that time that you thought it could be completed in a week, most everyone would think you’re mad. If you said a month, you’d get an incredulous reaction. By the time you start talking in a three to six month timeframe though, nobody can really conceive of the problem at that level of detail, and it seems like it might even be plausible. This is at the heart of the problem with underestimation in software. Once we are far enough from the details of the problem, or there are too many details to consider, everything just starts looking a little simpler…
Our scale is forcing us to get the problem broken down into chunks small enough that they can be envisioned within a few weeks (an iteration). Until they are broken down, the law of large numbers does not apply to our estimates because they are all skewed by a lack of knowledge and understanding. This is why it is important to exclude the huge unknowns from the estimation process entirely and bound our scale with what we can actually accomplish. If the problem can be broken down enough that we’ve avoided the underestimation problem, and we should be able to rely on the law of large numbers to move towards an average over time. What do you do with the huge unknowns? As long as you still have them, you are still in the envisioning or speculation phase of the project, too far in the front end of the code of uncertainly, and not ready to provide a responsible estimate for the project (at least not with any level of precision). Once you have broken down your epic stories and tasks to points that fit in an iteration, you can being to have the confidence in a fairly precise end date.
If you haven’t used points before, this is all why the scale matters and how it works. The points are tied to fixed length iterations, which form the unit by which the time estimate is calculate. At the beginning of the project, choose an iteration length and then stick with it. If a month is too long (and it usually is), try going to 3 weeks. If your project supports it, you can even try two weeks (more difficult if you are creating a lot of differentiable features, easier on a project with a lot of well known solutions). Once you find a comfortable length, don’t change it. Sometimes people may want to change it because of holidays, or because there’s a series of meetings. Do not do it. These are the very events that provide invaluable information. For example, you might find that your velocity goes up over the holidays and find out in the retrospective that developers had a lot more time to focus because of fewer meetings. I’ve met many people who get a tremendous amount done on holidays due to the reduced noise. What’s more, you now know the effect of a major event on the team and project.
By focusing on difficulty terms rather than points, it is possible to analyze the best fitting values for these difficulty levels over time to improve the accuracy and precision of the overall release date. Since the terms are not arbitrary, but reflect specific characteristics that make the tasks more difficult, they are less subjective than arbitrary terms and new terms can be added over time. For example, you might choose to add “Irritating” and “Tedious” with a values of 4 and 8 to describe work that isn’t particularly difficult, but is time consuming.
Something else we’ve improved upon is the ability to manage by skill level. Giving a junior developer a “Hurt me plenty” task is sure to have a detrimental effect on both velocity and morale because either the junior developer is going to need considerable attention from the more experienced developers who then cannot concentrate on complex tasks, or the new developer will fail to complete the work and have reduced morale. People are not interchangeable parts, and this estimation technique highlights difficulty in an unambiguous way.
This isn’t a far from the point based estimation used and proven time and again on agile projects over the past decade. It’s a minor tweak that provides three major benefits: By decoupling the estimation process from points, we make the estimates in terms that are less abstract to us humans. Secondly, we are able to refine the points easily during the project to better fit reality as it emerges. The last point is that because we are describing the tasks in terms of the nature of the difficulty of the assignment, we can use this to improve efficiency on teams with very mixed skill levels and understand the effect on the schedule.

Programming WPF

There is no ViewModel

I’ve noticed a strong trend of using a ViewModel suffix for classes in the ViewModel of Silverlight and WPF applications.
In practice, I discourage people from using the ViewModel suffix because it’s not logical, and tends to lend to more confusion than understanding. I’ve been trying to track down the source of the suffix to figure out if it’s an official naming guideline or just used in examples to clarify to which set or layer a class belongs. I saw this mentioned a few times as the “John Gossman” naming pattern, and I wondered if that were the case. When I contacted John, he expressed that it was, in fact, just a convention used to clarify an example and not intended as a recommendation.
The ViewModel is really a layer containing a set of classes, no one class can really be “the” model (ignoring façades for now). The problem I’ve run into is that people get confused as to whether the ViewModel is a class or is a set of classes, and when I talk about “the ViewModel” invariably people think I’m talking about a particular class, I have to un-train them a bit in order to clarify what we’re talking about. The other side-effect I have seen is that people tend to overlook things like the built-in converters that are often a better alternative than adding a dedicated class in the ViewModel layer.
A class is in a ViewModel (or Model) in the same way that a person is in a Group. You wouldn’t call a particular person in the group a MaxxGroup though, because a member of the set isn’t the set. It might be a little pedantic, but then I don’t think I’d be a software architect if I weren’t a little pedantic…
I also generally put my ViewModel classes in a folder/assembly/namespace, but tend to name them with a suffix that indicates the pattern they are generally following. ViewModel classes tend to actually be bridges, wrappers, façades, and adapters (much like Collection has a CollectionView in the ViewModel which adds viewstate to the Collection) but the point is that it’s more important to describe what the class actually is, not where it is.
Without this understanding, there is a de facto assumption that “ViewModel” classes should be created for every class in the model, an idea that the ViewModel layer is a necessary abstration rather than an optional layer to adapt classes that are unfriendly to the View. In many cases, that’s exactly the wrong thing to do. My approach to explaining how, in practical terms, to approach the MVVM model with WPF/Silverlight goes something like this:

  1. Bind to the object in the model. Some objects will bind just fine. But, If the object isn’t binding friendly…
  2. Use a built-in converter to convert the object to and from something more binding friendly,
  3. If you just need to get past a method call to something with properties, use an ObjectDataProvider,
  4. Write a class in the ViewModel and use a converter or custom provider class to adapt the class between the layers.

As an example, in a recent WPF application I wrote, I created two classes in the ViewModel: a DirectoryView and a FileView. I found that I wanted these classes to adapt the binding-unfriendly (because of method calls, primarily) DirectoryInfo and FileInfo classes. (If the binding system handled methods, I could have used the Info classes directly and not added classes to the ViewModel.)
The pattern goes something like this:

public class DirectoryView
	///Convert GetFiles() to Files enumerable on DirectoryInfo
	///and return bindable FileViews from FileInfo instances.
	public IEnumerable Files
		get { return dirInfo.GetFiles().Select(file => new FileView(file)); }

With this approach, you can write a FileSystemProvider that generally follows the XmlDataProvider/ObjectDataProvider patterns, which allows the developer to create and set up a static resource in XAML to browse a filesystem in XAML-friendly DirectoryView and FileView classes.
So in terms of a pattern, I would say this, “If you have a set of hierarchical classes in the domain that are method-heavy, write a view adapter that converts property calls to method calls, and wrap the results in further view adapters. If there is no hierarchical force in play, just use a ValueConverter to wrap and unwrap the class.”
At this point, we need to advance the discussion beyond the basics of the MVVM pattern and start creating a set of patterns and vocabulary for the types of classes in the ViewModel (adapter, bridge, provider, view), and guidance like the above on when and how to create them. Sort of a “Design Patterns for MVVM” for developers to draw from, and to identify which forces are observed to guide the solution.

Programming WPF

Using Windows in WPF

On our Manning book forum, I received a question from trifonius regarding the use of multiple windows in WPF:

First of all, I’d like to compliment the authors with the book. It’s a great read, thorough explanations and clear examples.
I’m working on an application that needs to be able to edit data as well as showing reports on data. Since these two interfaces are very different I think I need multiple windows in this app, available from a menu or something.
Since there is no mention in the book of multiple windows, I wonder whether this is the recommended way to go. Is it advisable to use panels with controls and set their visibility (or something like that) instead? Seems like a bit of a hassle to me.
If it is fine to use multiple windows, how can I display them from selecting a menu item?

Thank you for the compliments! It’s great to hear that our work is proving to be beneficial. It’s the result of a lot of blood, sweat, and tears, and mostly blood… 🙂
It’s fine to use multiple windows in a WPF application. For various reasons we had to cut some material, including drag and drop and multiple windows, but that should not be interpreted as a recommendation against them. It’s true that some people are very adamant about not having multiple windows, but it really comes down to what problem you are trying to solve, who are your users, and what your application does.
Working in a single window was a limitation of web development for a long time, and it forced web developers to think of new approaches to solve problems typically addressed by windowing operating systems. This resulted in some very nice UI approaches, but also resulted in what I see as an over-correction where a group of those people came to view windowing in general as “evil.” One of the more effective counter arguments comes from multiple monitor configurations.
I have used multiple monitors for years, and found the productivity of having 2 or more screens is unquestionable. An application that does not allow you to open multiple windows is fundamentally incapable of taking advantage of such a configuration, so if you have anything in your application that might be able to make better use of monitor real estate, definitely look here. Think about the XAML Designer itself in Visual Studio. If you have two monitors, wouldn’t it be great to open the XAML window on one monitor, and the full design surface on another? You can’t now, but this strikes me as a major omission. (The poor man’s approach is to split the designer and xaml vertically, then stretch the window across both monitors and you get nearly the right effect).
That’s enough pontificating (for now) so on to your question…
On the context menu of your WPF Application project, you can select “Add/Window…” to create the Window XAML and code. For the sake of this example, I created a new project and left the Window1, and created a new Window called “DataEditWindow”
Add a CommandBinding and a both a Button and MenuItem to demonstrate the call:


In the cs file, a couple special notes apply:
If you want to give the new window a particular DataContext for binding, you can set it here. Also, the Window should know who its parent is, so always set the Owner property to the Window you are launching from. In this example I’m binding to the ApplicationCommands.New command, but you could just as easily bind it to a control event or RoutedEvent (I tend to recommend thinking and binding in terms of commands):

private void new_CanExecute(object sender, CanExecuteRoutedEventArgs e)
    e.CanExecute = true;
private void new_Executed(object sender, ExecutedRoutedEventArgs e)
    var window = new DataEditWindow()
            DataContext = DataContext,
            Owner = this,
    bool? result = window.ShowDialog();
    if (result.HasValue && result.Value)
	MessageBox.Show("Ok clicked");
    else if (result.HasValue && !result.Value)
	MessageBox.Show("Cancel clicked");
    //window.Show(); // Show window, but do not wait

The next bit is a choice. If you want the window to be model and block all operations in the parent window (a print dialog is a good example where modal makes sense), then use the window.ShowDialog() method. If the window is more of a palette and you want to allow the user to be able to continue using the application, use the window.Show() method. Too many modal windows can be frustrating, but attempting to handle state changes can also hurt.
Since window.Show() is the simple case, this is what you’ll need to add to the child window for handling the DialogResult. In the XAML, we’ll need some buttons to let the user indicate their intent with the dialog:


Then you will need to set the DialogResult before you close the window so that the caller can decide what to do. This way, the parent window can decide whether to take the action or not. Note that you will get an exception if you try to set DialogResult, but the window was opened with window.Show(), so if you want this window to be dual purpose, you will have to handle both cases.

private void Ok_Click(object sender, RoutedEventArgs e)
    DialogResult = true;
private void Cancel_Click(object sender, RoutedEventArgs e)
    DialogResult = false;

In an application I would tend towards the approach of creating a DataTemplate, and I would create a generic WPF Window class that is simple a ContentControl. Set the DataContext of the Window to the object you want to edit, and let the template system pull up the right template for the editor. Something like this:


With this approach, you can choose to show the editor in-place or show it in a window with very few code changes.


Fine Tuning an LCD Monitor Connected to an Analog Port

Quite a while ago, I discovered the joys and productivity of using multiple monitors. On my current system, I have two identical Samsung 19″ LCD monitors connected to an ATI Radeon 9800 Pro. Unfortunately, the video card only has a single DVI connector so my second monitor is hooked up through the analog SVGA port.
When I first hooked it up, I couldn’t get over how bad the monitor looked when it was hooked to the analog port. I knew it would be worse, but it looked defective, or as if the resolution was not set to the native LCD resolution. Also, I love ClearType, but on the monitor connected via SVGA it looked horrible no matter how much I used the ClearType tuner.
This is what I learned…

For a while, I just hit auto adjust when things looked particularly poor. It usually helped a bit, but invariably the poor quality would show up again in spades in some other context, and it never helped a lot. At one point, I hit auto adjust when the only thing displayed on the monitor was a solid blue background. The next time I dragged a window across, it looked like a car hit my screen. That’s when I suspected the feature was relying on feedback from the signal generated by the image to adjust itself correctly.
With my newfound realization, I started hunting for an image complex enough to give the electronics something difficult to chew on. My first attempt was to see if the old Windows 3.1 desktop patterns survived through Windows XP. The black and white checkerboard gray-scale seemed like an ideal pattern to help the monitor adjust itself. No dice, Microsoft finally put a stake through that particular feature after Windows 2000.
So I put together this quick little utility to assist my monitor with its auto adjusting woes. All it does is set various background patterns to display on the monitor to adjust. I am happy to say that by using the white and black pixel checkerboard pattern, the result of the auto-adjust function on my Samsung 930B resulted in an image rivaling the DVI connected 930B. ClearType looks beautiful, there are no more strange shimmering or jittering effects, and I can be productive on both of my monitors again!
Just to make sure I wasn’t imagining things, I pressed the auto adjust button on the monitor with a solid blue background again. When I dragged the checkerboard window across, it looked horrendous and ClearType was in a shambles. Now I hit the auto adjust with the checkerboard up, and watched as the monitor figured out how to display it perfectly again.
I’ve attached the utility as well as the source code. It is mind-numbingly trivial, but I’ve already thought of more patterns (and color patterns) to add based on what I learned about LCD technology. Right now I am just using some embedded resources and tiling the image…
Usage is pretty straight-forward:

  1. Run the program
  2. Maximize the window on the LCD monitor to adjust (connected to an analog port)
  3. The checkboard pattern should be all you need, but I added a couple other patterns via context menu

Video Patterns
Video Patterns Source Code


Exception Handling 101

I hate buggy software. However, as a developer, I know how difficult it is to write bug-free software and so I am always looking for new ways to learn how to write better software. One of those ways is exception based programming. Sadly, exceptions are often glossed over in samples and books so exception anti-patterns tend to propagate.
Take a look at the following code…

public void HorrifyingMethod()
		Cursor preCursor = Cursor.Current;
		Cursor.Current = Cursors.WaitCursor;
			Log.Write("I did something");
		catch {}
		Cursor.Current = preCursor;
	catch (Exception ex)
		if (ex.Message == "Failed")
			throw ex;
			Log("Something failed", ex);

There’s so much wrong with this method it makes my skin crawl. Let’s start with the obvious:
Catching the base exception type
There aren’t a lot of cases where this is ever justified. When you catch an exception, you are effectively stating that you understand the nature of the failure, and you are going to resolve the problem (logging, by the way, is not resolving the problem). Our nested try/catch block may as well say “if the server caught fire, I’m just going to ignore it.” When you write an exception method, it is helpful to say to yourself, “The managed runtime exploded, therefore I am …” and say what your catch block does. If you find yourself saying something like “The managed runtime exploded, therefore I am returning the default value” you can see how problematic this really is.
Discriminating on exception data rather than exception type
Exceptions should describe the nature of the problem, not where it came from. Every exception already comes with a stack trace so we know where it came from. A good exception is something like “TimeoutException” rather than “MailComponentException” If you find yourself commonly digging through exceptions to determine what exactly went wrong, you are using a poorly designed library. If you are throwing exceptions, use an existing exception class if it fits the problem, or write a new class that describes the problem. The exception type itself is the filter used for catching, so it’s important for exceptions to describe the nature of the failure.
Re-throwing a caught exception from a new catch block
There are times when you might catch an exception, and after doing some programmatic investigation decide that you can’t actually handle it and you need to rethrow it. Never rethrow the same instance that you caught in the catch or you’ll wipe out your stack. The correct way to rethrow a caught exception is just a single “throw;” statement with no variable.

public void CorrectRethrow()
	catch(Exception ex)
		if (!ex.SomeProperty)

Eating the exception
This is probably by far one of the worst offenders. Exceptions work well because they are an opt-out method of detecting abnormalities rather than past “opt-in” methods such as errorcodes. Eating exceptions is rarely correct. If you are developing a library for use by other developers, you should not be making decisions for them with regard to exceptions. Always throw the exception to the library user so that she can handle it as she sees fit. The HorrifyingMethod() code shows two exception eating problems: The inner try/catch block is catching both managed and unmanaged exceptions, and completely hiding the fact that anything went wrong. This is all too common, and contributes to bugs, failures, and strange side-effects with nearly impossible to trace causes. The outer block logs the exception, making a token gesture of “handling” it. Imagine this method being called from a button click event. The user keep smashing the button, the program continues to fail to execute the procedure, and somewhere an obscure log file is recording all the detail. Logging an exception is not handling it.
(Note that if you do have more than one try/catch/finally in a method, your design is probably screaming for an ExtractMethod refactoring. Your method is almost certainly too large if there is a need for more than one)
Lack of a finally block to ensure consistency of changes in the method
There should be far more try/finally blocks in your code than try/catch blocks. In order to write exception-safe code, anything in a try block must be reverted by a finally block to leave the application in a consistent state. When an exception was thrown in Visio, it would bring up a dialog box saying that the application state was inconsistent and advised you to restart the application. This is decidedly not okay in the .NET world.


Eye Nutrition

A few years ago, I got a Mac. It started when I read about Apple jettisoning their entire operating system core and starting over with the UNIX-based NeXT operating system. Every day after work, I stopped by CompUSA and just explored all the OS X systems there. This all happened in late 2002 and, at the time, there was just nothing that looked anything like OS X… it was just incredibly gorgeous. When Apple announced Jaguar, I decided to jump on the train and buy an iBook. It was relatively inexpensive, and a good way to step into the world of OS X.
Today I am using Vista, Ubuntu, and deeply exploring the depths of WPF, and the unique shine of OS X doesn’t seem as unique anymore. Now that everyone has pretty well jumped on the “eye candy” bandwagon, I’ve been doing a lot of thinking and observing how much many of them are missing the point.
The thing that Apple did so well with OS X aesthetically was to use the full power of the hardware and software to create a beautiful and usable system. There are many subtle hints in OS X that employ alpha channels, stencils, and color to great effect, but I don’t call it eye candy. In most cases, it’s eye nutrition. For example, look at the search of the System Preferences in Tiger. Immediately it not only lists the relevant results, but through visual design, you get even more immediate feedback on where you might be most interested to look.

OS X showing some eye nutrition in a System Preferences search
OS X showing some eye nutrition in a System Preferences search

The use of an alpha channel in this way really helps improve the usability, and communicates with the user. Of course, the typical use of an alpha channel in most systems is to make semi-transparent, difficult to read windows.
OS X also uses the alpha channel to display a shadow under each window. The top-most window has the deepest shadow and provides an instant visual cue to the user about the arrangement of the windows on the screen.
None of this is to say that OS X doesn’t have eye candy as well, but it’s a lot sweeter when there’s nutrition to go with it.


Agile Development for Commercial Software

My development experience has primarily been in commercial software. As such, I am quick to recognize that some aspects of agile methodologies aren’t always practical for commercial software development, nor were they designed with commercial developers in mind. For example, sitting with the customer you have not yet won is not really practical (or even possible, unless you happen to know the Doctor). and the ability to freely modify and refactor published interfaces is greatly reduced without significant customer impact.
In fact, commercial software must often be written “on spec” prior to winning any customers, and may in fact be written in secret to avoid the possibility of larger competitors with a greater ability to execute from implementing your own product ideas before you have a chance to get them to market. Therefore, some agile cookbooks aren’t going to work for you, but that does not mean agility isn’t desirable or appropriate. Another problem with specific agile methodologies is that they are very geared toward the internal IT developer. This isn’t an unreasonable perspective as well over 90% of software development is internal, but again, this won’t necessarily address the needs of commercial software shops.
In addition, most commercial ISVs have a greater need for a predictable software releases. While a missed internal deployment can certainly cost money, a missed commercial deployment involving marketing, sales, partners, and competitors can call the future of a small commercial ISV into question. I have personally found a mutilated mix of waterfall, spiral, and cowboy coding in commercial software companies, and often with a culture of heroism.
Regardless of what specific agile methodology you look at, fundamentally they are all based on the same idea: Embrace Change. By “embrace change” I don’t mean “deal with change” with emergency design sessions and marathon coding sessions, or “cope with change” by padding estimates, or “ignore change” because you’ve got some heroic developers to save you. Change comes from many places: software or framework limitations, requirements, leadership, competitors, coworkers, design constraints, and so on. Arguably, commercial software is even more subject to change than internal projects, so there is a lot of benefit to be gained from greater agility.
Unfortunately, and with a few exceptions, internal IT development is generally the focus for both research and practice in agile methodologies. Commercial software companies are on their own in figuring out the best way to take advantage of recent agile practices. Commercial software can definitely benefit from agile development, but it will take a lot more effort and understanding to balance the constraints of commercial software with the tenants of agile development.


Cargo Cult Agile

I first came across the phrase “Cargo Cult” in the book Surely You’re Joking, Mr. Feynman! by Richard Feynman. In the book, Feynman warned researchers against fooling themselves and thus becoming cargo cult researchers. The term is from a 1974 commencement address given by Feynman where he talks about learning not to fool yourself, because you are the easiest person to fool.
What is a cargo cult?
The cargo cults Feynman described were based on natives from the islands of Melanesia in the South Pacific. During the war, the islands of Melanesia served as a staging area for the military where they built temporary operations. The natives observed all the ways in which the allied forces landed the planes and learned the techniques. After the war, the planes stopped landing and the cargo disappeared. The cults decided that they must perform the allied rituals of landing planes and bringing out the cargo, and so they built runways, control towers, bamboo “headsets” and military uniforms.
Apparently they have learned the “rituals” very well, and continue to perform them in the hopes of bringing back more planes full of bountiful cargo. Unfortunately, no matter how well they duplicate the ritual, there is no result.
Waterfall: The Worst Cargo Cult
As I wrote this post, I realized that the Waterfall process is actually the worst cargo cult of all. Waterfall software projects fail at astounding rates, and we still create our gantt charts and do our huge designs up front, and wonder why the planes aren’t landing or why a simple project costs tens if not hundreds of millions of dollars to complete.
Asked for the chief reasons project success rates have improved, Standish Chairman Jim Johnson says, “The primary reason is the projects have gotten a lot smaller. Doing projects with iterative processing as opposed to the waterfall method, which called for all project requirements to be defined up front, is a major step forward.”

In 1970, W.W. Royce described a software development model in which each phase of the development process sequentially follows one another, and in the end results in a finished product. In this article, often cited by waterfall proponents, Royce points out that this “grandiose” model is prone to failure and recommends instead an iterative “feedback” model. Unfortunately this is the worst software process cargo cult of all because it never actually had anything to back it up as being a successful approach. The waterfall cultists stopped reading before they got to the conclusion.
In an ironic twist, about the time waterfall was losing popularity, the US military created a software development methodology called DOD-STD-2167 which not only reintroduced the otherwise failing software development methodology into widespread use, but lent it renewed credibility. I guess the natives weren’t the only ones easily fooled into hopeless ritual destined to failure. Somewhere some cargo cultists are laughing.
Unfortunately, cargo cults tend to yield to more cargo cults.
What does a cargo cult agile shop look like?
From the outside, it may look a lot like an actual agile shop. After all, a cargo cult shop is imitating what they have seen about agile. However, like waterfall proponents, cargo cult agile shops are led by people who have looked at pictures of agile models, “read” agile books, or “learned” agile development from PowerPoint presentations. Perhaps there are a number of developers who know agile, but they may not be able to move the company towards agility in the face of generations of managers and developers who have been indoctrinated by DoD-2167.
By definition, agility describes an ability to respond to changes. As such, any agile process is going to have some sort of iterative process in which the software is improved and responds to any changes that have occurred during development. In a cargo cult shop, keep a particular eye out for iterations that don’t include tasks other than programming. If the details of analysis, design, and testing are excluded, you are probably looking at a “cowboy” project or a waterfall project in disguise. An iteration should always include or consider changes. While possible, it is unlikely that the project will remain completely unchanged after an iteration.
With any agile process, the speed and effectiveness of adapting to change is the primary measure of how agile your team has become. If you are using an effective agile process, responding to change should be graceful. If a change causes a lot of disruption, or re-visiting a line of code that someone has already typed causes alarms to go off, you may be in a cargo cult.
If you are looking at a potential agile company, look at their process in as much detail as they will allow during the interview. If they use XP, ask to look at story cards. Do their unit tests actually test the functionality, or are they a check box to get past a sign-off? Are the story cards simply a requirements document created in one big design up front or do they add, change, and remove cards as they get further into the project. On a scrum project, ask how long a typical scrum meeting lasts. If their idea of a daily scrum is two hours, run, don’t walk, to your next interview because they have missed the point entirely.
If you are already in a cargo cult agile shop challenge your process. Good agile process encourages a constant feedback loop regarding both your product and your process. If an agile process isn’t working look into why and fix it. Go past the powerpoint presentations and the diagrams. Reproduce the results that created the processes you are attempting to emulate. If the agile process is printed and bound, and they haven’t made a single change to it in five years, it is highly likely they have not challenged and enhanced their process to fit their people.
Some agile models are a toolkit and give guidance on how to effectively use them, while other methodologies require practices to be used together to achieve a result. Some practices stand on their own while others may be interdependent. Understand the benefits and consequences of cherry picking from agile, especially if you are doing a project waterfall style.
Agile as a Cult
I am a fan of agile development in general because the Agile movement formally questioned and rejected the dogma of how software development should be done and, in particular, challenged the continuously failing waterfall model. Agile is just a step along the way and isn’t without fault and dogma itself. We sorely need to keep moving towards improving our understanding of successful software development.
Post-agilism is starting to take hold with a tempered view, absent much of the hype and absolutism that has formed in some Agile communities. I look forward to the thoughts of these people who refuse to be constrained by dogma and continually challenge how we write software. It is this sort of skepticism that I think Feynman is looking for when he says we must not fool ourselves.
Related links:

Cargo Cult Science
The Cargo Cults
Scott Ambler
Agile Alliance
Post-Agilism: Process Skepticism
Don’t draw diagrams of wrong practices – or: Why people still believe in the Waterfall model
There’s no such thing as the Waterfall Approach! (and there never was)