Automation Strategy Manifesto

Automation is key to a successful DevOps strategy

In order to accelerate release velocity from quarterly to bi-weekly we need to develop and release software in smaller cycles.

This allows us to make decisions in an agile manner and get faster feedback from users.

But the release cycle has inherent friction.

To reduce this friction and release more frequently, we will need to automate our software build and deployment processes.

By using a DevOps strategy to create create consistent automated delivery processes.

Continuous integration & delivery tools can help with this.

In order to have confidence in our release cycle, we need to be able to test quickly.

To avoid testing becoming a bottleneck, we need to have automated, reliable tests.

Traditional end-to-end functional test automation can be slow, brittle, and expensive to maintain.

So our aim is to provide thorough test coverage with unit tests.

Unit tests are isolated, target a single object, method, or procedure and are faster and more consistent than end-to-end tests.

Unit tests provide a way to mitigate technical debt by making refactoring easier, and giving you confidence to change existing code without introducing regressions.

We will use code coverage tools it identify where we can improve tests, but not as a sole metric of quality.

Since too many unit test add technical debt tool.

Because not everything can be unit tested, we will use integration and system testing to cover areas of the application not easily tested at the unit level.

Also, because integration points (especially with external system) are the most likely place for unexpected errors, we will focus on testing these areas.

Manual acceptance and exploratory testing is also necessary, and accomplishes some things better.

Because of thorough unit, integration, and system test automation, people can focus on those aspects that are more challenging to automate.

100% automation is not the goal, because sometimes that takes more time and effort.

Test automation also incurs technical debt and can make code rigid and resistant to change if you also have to change tests.

In order to have confidence in tests, we will strive to keep results “green” — no failing tests are allowed to live.

To make sure automation is useful and timely, we will create a build-test-deployment pipeline including:

  • source control management
  • mainline development & branching strategy
  • code review process on pull requests
  • static analysis tools for code quality & security
  • automated builds
  • unit testing code while mocking external dependencies
  • code coverage metrics
  • controlled and versioned code artifacts
  • ephemeral & interchangeable environments created on demand
  • automated infrastructure provisioning
  • automated deployment & configuration
  • secure secrets sharing
  • blue green & canary releases
  • post-deployment smoke testing and monitoring & metrics
  • automatic rollback & notification

Two types of tests

There are lots of diffefrent types of tests:

unit tests
functional tests
integration tests
system integration tests
ui tests
api tests
performance tests
load tests
acceptance tests
accessibility tests
security tests
etc…

But I like to group them into two primary categories:

Tests you can perform on the code itself, and tests that reqired a complete system to be deployed.

There are obviously exceptions, and it’s more of a spectrum, of what needs to be compiled, linked, combined, integrated, deployed, and what backend or front end pieces, as well as operating system or network connectivity or other external dependencies.

But most software nowadays requires some sort of deployment — to a server or device, and probably some sort of connection — to a file system, a database, or network service.

And much of it needs some sort of compilation, preprocessing, or integration with external dependencies. Even mcode written with interpreted languages like python or javascript will be written with dependence on common libraries.

While it is possible to test a single unit of code in isolation, it’s often not practical to exclude all external libraries (even if just to print or log output) and you usually need a runtime, libraries, or operating system to execute them on.

So I tend to not be too purist about unit tests vs integration tests — except to make the distinctions that my code — or the code that needs tested — should not be combined with other code that that needs tested (as part of your system) as opposed to code that you assume to be working as designed, although the quality of some external libraries is questionable, and you’re likely to encounter bugs or deficiencies in such libraries (open source or closed.)

The distinction I like to make is this:

Can the code (compiled or interpretted, linked or not) be tested before being deployed or not?

This is an important distinction, because it determines whether the test can be executed before or after said deployment — and that matters for where the test is executed in your delivery pipeline.

Another way this distinction is generally useful is — does the test matter to the developer or the end user? By which I mean not, who cares whether the test passes or fails, but who cares about the result of the test.

A developer test is a test that the developer who wrote the code is primarily interested in. The output of a developer test (whether unit or integration test) is useful to the developer of the code — the implementor — and not really to anyone else.

The assumption here is: Did the code do what I intended it to do?

This is in contrast to functional tests (to use another term which is perhaps too vague to be really useful) where the output is useful to the user (or designer) of the system under test.

That assumption is: Does the software actually do what I wanted it to do?

If the author of the code is both the designer and end user, these questions amount to the same thing, but much software is designed to be used by someone other than the developer, or rather the developer creates code designed by someone else, intended for others to use.

So the functionality of the system is actually being tested, instead of the correctness of the code itself.

To reiterate, my two general categories of testing are:

Developer tests (unit or integration) which are performed on code before it is deployed,
and

System tests (functional or end-to-end) which are performed on a system after it is deployed, which general have external dependencies on the file system, network, or other services (database, etc.)

That’s not to say that there aren’t more categories of tests, or that you can’t do functional testing with mocked external systems — or developer tests which require external dependencies. But that these two grouping are generally useful to think about for 3 reasons:

  1. Who cares about the test?
  2. Does the test have external dependencies that require additional setup?
  3. Can the test be performed before or after a deployment?

These two type of tests usually have other attributes:

The first group are small, fast, and simple
The second group are larger, slower, and more complex

This makes another advantage of developer tests that they can be executed frequently, and relied upon for repeatability, and are easy to debug (generally targeting a single object or function).

And to be fair, functional (or system) tests also have an advantage that they perform a real world scenario that development tests may not be able to anticipate, and they are concerned primarily with outcomes, not implementation.

Working with large datasets in Python

How can you work with large datasets in Python — millions, billions, or more records?

Here is a great answer to that question:

https://www.quora.com/Can-Python-Pandas-handle-10-million-rows-What-are-some-useful-techniques-to-work-with-the-large-data-frames

  1. Pandas is memory hungry, you may need 8-16GB of memory or more to load your dataset into memory to work efficiently.

You can use large / extra large AWS cloud systems for temporary access. Being able to spin up cloud platforms on demand is only one part of the equation. You also need to get your data in and out of the cloud platform. So persistent storage and on-demand compute is a likely strategy.

  1. Work in stages, and save each stage.

You will also want to be able to save your intermediate states. If you process CSV or JSON — or perform filter, map, reduce, etc functions you’ll want those to be atomic processes.

And you’ll want to persist work as you go. If you process 100 million rows of data and something happens on row 99 million, you don’t want to have to re-do the whole process to get a clean data transformation. Especially if it takes several minutes or hours.

Better to save each stage iteratively and incur the IO cost in your ETL or processing loop than to lose your work — or have your data corrupted.

  1. Take adavantage of multiprocessing

Break work into batches that can be performed in parallel. If you have multiple CPUs, take advantage of them.

Python doesn’t do this by default, and Pandas normally works on a single thread. Dask or Vaex can work in parallel where Pandas itself cannot.
You might also consider using a Streaming processor such as Apache Spark instead of doing all your processing in a single DataFrame.

  1. Use efficient functions and data objects

Earlier I talked about saving incrementally within your processing loop. But don’t go overboard.

You do not want to open and close files every iteration in your inner loop millions of times. Make sure you fetch the data you need only once. Don’t reinitialize objects. Even someting as simple as calling len(data) can really add up. Finding out the length of an expanding list with millions of rows millions of times really adds up.

Also, consider when you want to use a list vs a numpy array, etc.

Test Automation isn’t for everyone

I once knew a guy who was a talented craftsman, he could make beautiful hand crafted furniture. His business became popular and demand increased. He got investment and built a large shop, hired a couple assistants and bought addition machinery so he could help meet the demand.

The business was a big success and grew even more. He was almost able to pay back his loan in just a couple years, but then sold his business (which is still successful today under new owners) and went back to working alone on individual orders. He still does quite well and with the proceeds from the sale has a comfortable life, but nothing like it could have been if he’d kept the company.

It turned out he wasn’t as interested in running a manufacturing business as working with his hands, alone, in his little shop in the woods.

Testers (and their bosses) should consider this before thinking about switching from manual QA to automation. It takes different skills and a different mindset, which can be learned, but may not be what you enjoy.

I happened to enjoy the change myself, but I can sympathize with those who don’t.

Challenges inheriting an existing test framework

This post started as a comment on to the following discussion on LinkedIn.

Inheriting a framework can be challenging.

First of all, because a real world framework is more complex than something just created. There is bound to be exception cases and technical debt included.

Secondly, an existing test framework was built by people on deadlines with limited knowledge. There are bound to be mistakes, hacks, and workarounds. Not to mention input from multiple people with different skillsets and opinions.

In this case, the best way to understand your framework is to understand your tests. Your tests exercise the framework. Pick one of low-moderate complexity and work your way through it, understanding the configuration, data, and architecture.

Don’t be afraid to break things. After all, you have all these tests that will let you know if you broke the framework.

Lastly, having a checklist of things to look for, good and bad practices will help you understand the framework better — and help you know what quality of framework you’re dealing with. Is this mystery function really clever or just a bad idea?

Look for standard patterns. OOP, Page Objects, etc. Also look for common problems – hard coded values, repetitious code, etc.

Test Automation Can’t Catch Everything

I remember a time, years ago, when I was working at a company at which I learned a lot about my craft.

Selenium was fairly new and I was one of the early adopters. I’d developed a pattern for structuring tests that I shared with the community and found that several others had independently developed ideas similar to my own “Page Objects.”

Agile was just beginning to penetrate mainstream development, and was at the same time attracting some healthy skipticism. Pair programming and test driven development were considered “extreme” and other patterns like Scrum were thought of as either common sense or a cargo cult.

Continuous Integration was still a novel idea, although my first job at Microsoft was essentially as part of a manual continuous integration process, no tools yet existed at the time to accomplish it. Now there were open source project like Cruise Control and Hudson (which became Jenkins) coming out.

And while I’d been involved in and an advocate of each of these — Selenium, Agile, and Continuous Integration, I’d yet to see them widely adopted and successfully implemented within an organization.

But about the time I came back from Fiji and stepped back into software develpment, all these things were starting to coalesce. At least they were for me in the mid 2000s.

We had a cross functional team of (mainly) developers, testers, sysadmins, designers, analysts, all working together We had business & customers giving feedback and acceptance criteria. We wrote user stories and assigned points. We wrote tests first and testers paired with developers. We sat with customers and took usability notes. We talked design patterns and had self organizing teams. We had fun and I learned a lot. I still keep in contact with some of my co-workers from more than a dozen years ago.

It was about a year and a half into it that we started to hit the wall. Tests were taking too long to execute. Complexity was slowing us down. We kept plugging along and using technology to fight the friction.

Parallel builds, restructuring tests, virtualized development environments. We fought technical debt with technical solutions and beat it back down.

I thought we were winning. But I had a manager, an old school QA guy, who knew that something was off. I think the rest of us were too busy churning out code, delivering features, spinning our wheels to see it.

But he saw it. He tried to alert upper management, he tried to get through to developers. We had lots of automated tests. We were deliver features every two weeks. Occasionally we’d stop and do a spike to experiment, or take some time off to reduce tech debt, refactor code, or rewrite tests. But we still had decent momentum.

Finally he got a together a group of stakeholders. Give me 15 minutes, he said, and I’ll break it. It took him maybe five. And it wasn’t that hard. And it was something users could be expected to do.

The moral of the story is: You can’t catch everything with automation. While automation is good, it frees up time to do other things, it allows feedback to occur faster, it helps testers to concentrate on finding issues instead of verifying functionality. It helps develpers to think about how they’re going to write their code, to reduce complexity, and to define clean interfaces.

Test Automation is like guard rails. It can help you from falling off the edge, or it can give you something to hold onto as you traverse a tricky slope.

It can catch obvious problems, much like the railing that helps you to not fall off the ledge (you’re not planning to do that anyway.) Anyone can walk along a narrow path and not fall off a cliff, test automation makes it so that you can run. But it won’t stop you from going off if you go past it.

In order for tests to be effective you need to have clear requirements, and then you need to explore beyond them. Test automation isn’t good at exploring. It’s good at adding guard rails to a known path.

And even if you catch all the functional problems, you need to be able to check performance, ensure security, and test usability.

If you have testers churning out tests, both automated exploratory tests, but no one else is involved, or hears their feedback, it’s not going to matter much. It’s like having guard rails and warning signs that everyone just hops over and walks right past. It’s no good finding bugs if they don’t get fixed. And eventually, if you have a bug backlog that’s too big, people will just ignore it.

You need test automation, but you also need exploratory, human testing. And you need everyone — not just testers to be concerned with quality. Quality code, environments, user experience, and requirements.

So use test automation for what it does best — providing quick feedback for repetitive, known functionality. Don’t try to get it to replace a complete quality assurance process.

There are two types of tests

Here’s a great question from a fellow test automation expert:

What are your thoughts on doing test automation by developers vs testers?

I think there’s a place for both. (That’s the politic answer.)

But in general, I categorize tests into two main types: developer focused tests, and user focused tests.

The goal of developer focused testing is to help the developer keep their code organized, to verify what they write does what they intended, and to allow them to mitigate technical debt.

Unit tests are the obvious example here. It shows that function x returns y given condition z. Which allows them to have confidence that they did what was expected, they accounted for exceptions and edge cases, and helps with establishing clean interfaces and aids in refactoring.

But there is also a lot more developer logic in front ends these days, so they may write some UI tests — particularly if they are front end developers and they are primarily concerned with the UI. Which usually requires data and user events. It’s great if they can isolate and mock this to test UI components individually, but they also need to be tested for correct integration.

So developers writing UI test automation does make sense.

But…

The other type of testing, user testing / acceptance testing / functional testing / end-to-end testing / regression testing / exploratory testing / quality assurance — whatever you want to call it (and these are overlapping buckets, not synonyms) — is based on the principal that you can’t know what you don’t know.

A developer can’t write a test that checks that a requirement has been met beyond what he understands the requirement to be. You can’t see your blind spots.

And a developer’s task is primarily creative — I want to get the computer to do this. They aren’t thinking in the mindset of what else could go wrong, or what about this other scenario.

It’s like having an editor look at your writing (I could use an editor). You’re too close to it to look at it objectively.

That’s not to say you can’t go back to it with fresh eyes later, take off your “developer” hat and put on your “tester” hat. But it’s likely that someone else will see different things. And looking at it from an object (or at least different) perspective is likely to identify different issues.

Some people are naturally (or trained to be) better at finding those exceptions and edge cases. My wife calls that type of person “critical” or “pessimistic” — I prefer the term tester.

Regardless, the second type of test — the big picture test, that looks at things from a user’s perspective, not a developer’s perspective, is — I think — critical.

And the industry has assumed that for decades. How that is accomplished, and how valuable that is has always been up for debate, but I think the general principals are:

  1. That there are two types of tests: those designed to help the developer do his work, and those that are designed to check what he has done.
  2. That there are two different roles: the creative process of making something work (development) and the exploratory process of making sure something doesn’t go wrong (testing).

Anyway, that’s my (overly) long take, summed up. I could go on about this for hours.

On Customer Development

I was recently asked about Customer Development as a process. I looked it up to see what the formal definition is, and concluded that I don’t know too much about the official “Customer Development” process, but I understand and practice the general principles of Customer Development in my own business.

Here my reply:

Do you know how to do customer development?

Not in a formal way. But I have two strategies I use for finding & acquiring customers.

  1. In areas I have expertise (such as software testing & delivery) I’ve built a strong network and reputation by publishing articles & tutorials and offering training videos and meetups.). This works great.
  2. In areas where I am not an expert where I have built products on a larger scale, I have relied on intuition and research to discover needs. I then build a customer base as I develop the product through social media and direct contact “growth hacking” with a freemium model.

With Resumelink, for example, I contacted individuals I thought would benefit, and offered the service before it was build — and manually did the steps before having a product, and got feedback on features. I initially built it for myself, but with some keyword search I realized the market had a hole.

So I think it takes an initial spark of inspiration to identify a need, with research to see if there is a market niche that can be filled and determine it’s profitability, followed up with a proof of concept that is more manual process that a product, and then targeting your perfect customer to identify ways to refine and improve the product — developing it by small feature increments as the needs become apparent (and keeping other ideas on the backlog), and then growth hacking through social media, content delivery, and community outreach.

Checking state in Selenium Test Automation

I wrote a medium length novel in response to Nikolay Advolodkin’s post about a common Selenium automation pattern. He advocates:

STOP CHECKING IF PAGE IS LOADED IN AUTOMATED UI TEST

You can read his article on his site, Ulimate QA: https://ultimateqa.com/stop-checking-if-page-is-loaded-in-automated-ui-test/

Or follow the discussion on his linkedin post: https://www.linkedin.com/posts/nikolayadvolodkin_testautomation-java-selenium-activity-6674278585743761408-Ud_p

Nikolay is a good friend, so please don’t take this as an attack. We have had many long discussions like this.

Here is my response in full:

I go the other direction on this. I use element checks to verify that the page is loaded.

My version of the Page Object pattern includes an isLoaded() method — which can be overloaded with custom checks as needed. This is to try to keep things synchronized, even though it means extra steps. In this case, I value stability over performance.  

I can understand someone making another decision however, especially when speed is important and latency between steps with a remote driver makes this more costly.

In practical terms, you could just check if the element you want to interact with is available and fail faster if it is not. The result of both success and failure would be the same, and you’d get there slightly faster — perhaps significantly faster if you have a long sequence of many page loads. But having such long test flows is a pattern I try to avoid, unless I’m explicitly testing the long flow through the UI.

Adding the sanity check helps me when it is time to debug the test or analyze the test failure. Knowing that my page is not loaded — or that I’m not on the page I expected helps me to understand the true cause of failure, the page is not loaded, rather than an element is not clicked.

However, I would not call isLoaded() repeatedly, only once, automatically when a page object is initialized, or explicitly if I have a logical reason to think that it is not — some state change.

Selenium tests (and UI tests in general) are brittle, and determining state before attempting to perform an action is one of the biggest challenges.

The challenge here is that an HTTP 200 status code doesn’t really mean a page is loaded anymore, with dynamic page creation, javascript frameworks, single page apps, and prefetch, this is hard to tell. Pages can load in chunks, dynamic elements can be added, and sometimes the concept of a “page” doesn’t even make sense.

Checking status codes or XHR ready state are meaningless (or at least misleading) in many modern web applications. But you see people trying to do this figure out the state of their app, so they can reliably automate it. This usually doesn’t work. So checking the state of the element you need to interact with makes more sense as well as saving time.

The WebDriver team dropped the ball on this — or at least punted. Selenium used to check that a page was loaded (using the methods above) but decided the decision was too complex and left it up to the user. I think this was an abrogation of responsibility — but don’t tell Jim or Simon that. It’s a less discussed detail of their least favorite topic.  

Validating state is hard, and most of the time, leaving it up to the user results in bugs. It’s even harder with mobile apps and the Appium team has had to make many difficult decisions about this, and sometimes a framework gets it wrong, or makes things unnecessarily slow.

So like most things there is a trade off between speed and reliability, and we all need to make our own decisions.

When you adopt the “page object” pattern for use on components that are only on part of a page, or may appear on multiple pages, having the explicit check that is user defined starts making even more sense — because widget.isLoaded() is a state check that can happen more than once, and not just a sanity check

But when you have a (reasonably) static page, an initial check that the page is loaded — rather than checking each element that you can safely assume *should* be there if the page is loaded can actually be more performant, as well as providing a clearer stack trace when things aren’t as expected.

Repeatedly checking if a page is loaded before performing any action is a bad idea in any case. 

public class MyPage {
 public MyPage open() {
  driver.get(this.url);
  if (isLoaded()) { return this; }
 }
public bool isLoaded() {
  get {
   try { driver.findElement(myElement): }
   catch (NoSuchElementException e) { 
    throw new MyException("page isn't loaded", e); }
  }
 }
}

The value of test automation

The real value of test automation doesn’t come from finding bugs. At least not primarily. Automation is not good at that, except in one case which we’ll discuss later.

On a consultation with a client the other day I was talking about this.

Our conversation was interrupted when my daughter came running in, crying, and told me that my 5 year old son had been trampled by horses. Earlier that day, our neighbors horses had gotten into our yard and I’d held him up to pet a horse’s mane and show him that it wasn’t scary.

Later, while the two of them were playing in our yard, picking new spring leaves from an Aspen tree, something spooked the horses and they bolted. We were lucky (and blessed) that he only received scrapes and bruises and no serious injuries and he was fine after a couple days. My boy chafed at the restrictions placed on him, but my wife is still shaken.

It got me thinking about a couple things. First, that horses are big and powerful, and while not malicious, they can be dangerous. Second that fences make good neighbors. Perhaps not coincidentally, I’ve been meaning to put up a stronger fence (mainly to keep our goats and dog in.)

Although I welcomed the horses’ visit, I wasn’t prepared for consequences. And now I have to listen to my wife’s calls for caution more.

To trivialize the incident — and draw a tenuous parallel — tests are like fences. And like other fences, they need gates. Even with a strong fence, I would have let the horses in the yard. And even with good tests that catch bugs, you can still let bugs into production. You may do this intentionally, or not.

If you don’t trust your test results, if there are always “random” errors, if tests take too long to run, or are out of date, or don’t actually check what they purport to be checking, bugs can slip through, and the consequences can be severe.

But despite what you test, there are sometimes what you deem “acceptable risks” and there is always the unknown. You can’t test everything, especially what you don’t know. So you need to be able accept that testing isn’t the solution for preventing bugs.

So what is the value of testing? Is it just snake oil? Or is it good up to a point, and not beyond that.

My belief is that test automation does have value, but that finding bugs is not the primary value. My first posit is that test automation is good at preventing some bugs — often the easy obvious issues. Like a fence.

The act of writing a test, like planning, forces you to think about a problem — to think about problems in general in trying to anticipate problems, and that this itself is good at preventing — by avoiding bugs before they are even created.

Next, tests document features, and describe what they are intended to do, at least as understood by the author. But, like writing, writing tests forces you to articulate your understanding, and then allows you to communicate that understanding to others, who can then share in your understanding, correct you if it is mistaken, or expand upon it.

When you write a test, it should be written primarily to communicate — to communicate an assumption about the software that is being tested. A test forces you to describe *specifically* what that assumption is. If the assumption is wrong, it can be corrected, if it is correct, and the code is wrong, it can be fixed, and the tested again against that assumption.

By it’s existence, a test (especially an automated test) enables youo to repeat that process. And to objectively check if the software does what is expected. Or if it changes. That helps to prevent regressions.

A test verifies that requirements are met, or at least that features are exercised.

Knowing what features are being tested and which requirements (assumptions) are being made is the second powerful value of testing, and the first explicit benefit, after the implicit value (which should not be dismissed) of deliberately thinking about the problem.

Because tests are repeatable (even manual tests), it helps to prevent regressions. And that will speed up development velocity. Developers with good tests in place will need less time to spend reasoning about the impact of their changes on the system, and not have to worry about if a change here will affect a (seemingly) unrelated outcome over there.

Software systems very quickly become too large and complex for people to reason about all at once. Testing helps to break that down. And helps to make sure that you don’t forget a check.

Repetition of test execution, like anything else, means you should get better at it. Your tests will improve over time if they are exercised (and fixed) enough to become more robust, more precise, clearer, faster, etc. So the third key benefit of testing comes from repeatedly executing your tests.

The up front cost of creating a test is non-trivial. The benefit of executing it over time increases.

Getting back to my fence analogy, like building a fence, a lot of the initial work is in surveying the area, digging and setting the posts. This is most of the work in setting up a fence, and at the end of it, you just have a few posts in the ground, and it doesn’t prevent anything from getting through.

This is the situation I was in. In the real world, you have to spend time building infrastructure, defining architecture, and laying down the foundation upon which you will build your fence. This is also true in software, but a lot of the work can be done for you with a good test framework. But you don’t have to have a full framework in place to start writing tests that provide some value.

If I had the money, I could’ve rented a machine that pounds the posts in the ground instead of digging holes with a shovel and them filling them in by hand. A few seconds with a powerful hydraulic tool, and the post is set firmly in the ground.

A good foundational framework can also help with the setup of tests too. But it takes investment in knowing how to use it. But the best framework in the world won’t help you if you’re setting up your fence in the wrong place. I mean, if your tests aren’t covering the features you need to test most.

Test automation allows testers to focus on creative work, like actively finding bugs, instead of checking for regressions around the perimeter (or in the middle) of your system.

Writing a test that can execute again without you having to think about it or do anything, is a huge time saver. This is the third value of test automation. Once it’s written, it can provide residual value over and over again. As long as you keep executing it, and keep it maintained.

But test automation is notoriously hard to maintain. UI changes, for example, can break automation that makes assumptions about the user interface. Even if they were correct at one time.

So your goal when writing automation should be to make as few assumptions as necessary. Test only one thing. Don’t depend on the UI to validate something unless you have to. That way you spend less time fixing tests — or making sure that they are testing the right thing, and more time creating new tests, testing new features, and making sure that changes don’t negatively impact the desired (and expected) behavior of the system.

Good tests written this way will help when changes need to be made later. They can act as documentation for others when they need to understand the system to fix it or add to it. This could be a new developer coming onto the project, or it could be the same person, coming back to code they wrote — and knew was working correctly — days, months, or years later.

This is the fourth value of test automation, and possibly the biggest. Not only does it verify that the system continues to work as expected over time, but it allows you to be confident that when you make one change — it doesn’t change other parts of the system without your knowledge.

Test automation can help you find bugs that are introduced when changes are made to a working system that is well tested. Initially your tests helped you prevent bugs by thinking clearly and precisely about the system, anticipating bugs, and coding around the problem. Because you wrote tests, you documented the behavior and set up checks that you can perform repeatably, repeatedly, to make sure that changes do not contradict those assumptions you made when you initially wrote the system — or they alert you to the fact that those assumptions are wrong, or the circumstances around them have changed, so a previous assumption that may have been valid in the past is no longer valid.

Tests provide their greatest value over time. There is an up front cost, which is hopefully defrayed by the initial planning and checking that prevents bugs from being added in the first place.

Unless maintaining or adapting tests costs more than the benefits they give.

Which is why, although I am generally a test first advocate, when you are doing new development on a blue sky project, testing may actually slow you down, and end up not providing enough value to justify it’s expense.

A lot of the assumptions you make early on may not be valid. And by codifying those assumptions in tests, you may be making the system too rigid and resistant to change.

Often, the first version of software is written with the assumption it will be thrown away and rewritten from scratch once you have clearer understanding of the problem and how you are going to address it. In that case, tests will be a waste of time, and any residual value they provide will never be realized.

But just as often, the quick and dirty, one-off or proof of concept project becomes production code. And it often ends up being too brittle, not scalable, or too tightly coupled to expand. And then, you may be in a situation where you need to rewrite it, but you can’t, because real world business processes depend on it or external forces demand that it stay running.

In this case, tests can again provide a value. With reliable tests you can then dissect are migrate the system with confidence, even if the authors of the original software system (and their domain knowledge) are gone.

That’s where the fifth (and perhaps final) value of testing comes in. If you have tests in place on a working system, you can then make changes to subsystems, even rewriting or removing them, if your tests can verify that your replacements don’t negatively impact other parts of the system, or the system as a hole.

In this case the tests act as a fence (or railing) for safety. To prevent you from going over the precipice or out of bounds of the system. The tests can act as a crutch or scaffold that helps protect the system from falling to pieces while you work up fixing or updating it.

But in order to do so, you need to have good tests. And by “good tests” I mean tests that are clear what they are testing. They test only one thing, so you know what is broken if the test fails.

The tests need to be easy to modify, to adapt to a changing system, and to be able to be changed, eliminated, or replaced, when the assumptions about the system change, or the way they can be tested changes.

Good tests need to be flexible. They need to not be brittle, and they need to be able to work under different conditions. Not tied to the UI, environment, or specific data (except when needed to test that specific area of the UI, depend on some aspect of that environment), or when that specific data illustrates the conditions of that test.

Above all, good tests need to be reliable. They can’t break over every minor change to the system, or for random, indeterminate reasons. Flaky tests might be worse than no tests, because they reduce your confidence in testing. So they should be robust and adaptable to changes in the system.

A good test framework should help you focus on simplicity, help you to write reliable, robust, specific tests, and help you to keep them organized, and make reporting clear, meaningful, and concise.

You should be able to run your tests alt least every day, ideally for every change. And someone should care about the results. In order for people to care about tests, they need to pass reliably, no false positives or intermittent failures.

In summary, testing provide value 5 different time:

  1. The act of writing tests forces you to think about the system and plan for possible issues.
  2. Writing tests while developing software prevents bugs from appearing because you anticipate or catch them while writing tests. It also documents your assumptions about how the system should work.
  3. Executing tests exercises the system and informs you that the requirements are being met. Repeatedly executing tests and keeping them up to date will make tests more robust and adaptable.
  4. Testing provides value over time as the system grows, it prevents regressions and allows you to reason about the system and not be slowed down.
  5. Tests act as checks that allow you to refactor the system and make changes without breaking functionality. Even when part or the whole of the system has become a black box.