Writing tests is always a bit finicky. That said, these frustrations grow to their awful peak whenever your suite of tests turns up indeterminate results. The classic example is when one of your normally passing tests randomly fails—even though the code hasn’t changed one iota.
Indeterminism in tests can have various causes, but in my experience the nastiest and most common cause is unaccounted-for state leakage—state leakage between tests and state leakage into tests. I’ll now focus on various sources of leakage worth keeping your eye on when your tests start behaving erratically.
Database records
Obviously, data lingering in the database from previous tests can only serve to soil the current one. Clearing the database between tests is a well-solved problem in most testing frameworks, so there’s usually a library to take care of it all for you. Be sure to pay careful attention to the documentation though—different types of tests (unit vs. integration) can use differing numbers of processes (one or more). This causes divergent behaviour with regard to memory and state, and it can trip you up.
It’s worth highlighting one general gotcha for these libraries: Whenever you force a test to shut down abnormally (e.g. because of a crash), the database cleaning library may not have the chance to clean up, and thus the test will pollute the database with records and leakage. A manual clean-up operation will thus be needed.
Configuration settings (both for your application and for its tests)
Every application needs configurations. If these are stored within the database, the database cleaner from above will take care of leakage issues. But often, application configurations are stored in memory as singleton objects that get shared throughout your application and tests. Such an object is just another guise for global state and, not surprisingly, this object therefore attracts exactly the same problems as any global variable would. For example, if one test’s setup modifies an application configuration, then, baring an explicit post-test reset of that configuration, the next test case in the battery would inherit a modified application. This causes unexpected test breakages because the context within which any test is run depends on which tests ran beforehand, a slippery and ever-changing environment.
I once encountered a particularly nasty instance of this sort of leakage. Even though the test suite didn’t modify any configuration options, one of the tests executed a branch of the primary codebase that happened to modify a global configuration. Subsequent tests then inherited this modification and acted strangely.
I now avoid this sort of leakage by automatically resetting in-memory configuration state after every test case.
Global/class-level variables
In a vein similar to the above, any global variables or class-level variables can cause leakage if they get modified in tests themselves or within branches of application code executed by tests. And with a dynamic language like Ruby, which has features like class-level methods or macros, global variables—and thus potential leaks—are more pervasive than one might initially suspect.
File system state
Applications interact with the filesystem of their OS—for example by creating temporary zip files or caches or work-in-progress files during photo conversions. These files also count as system state: When a test is executed, either a file of a given name is present or it isn’t, and variation in this regard can cause sporadic failures should the code implicitly or explicitly expect otherwise. An analogous issue can occur with regard to other aspects of the operating system—for example with respect to code under test that controls, starts up, or kills other OS processes. The presence, absence, or state of these external processes is yet another context that needs to be controlled.
I avoid many of these problems by ensuring that all filesystem activity happens within the project/tmp/ folder, which I then automatically wipe after every test case.
Time of day or date
Time, as an eternally shifting variable, has the nasty habit of interfering with hardwired test expectations. If your code makes decisions based on current time (e.g. find records created during this timeframe, send follow-up emails after X weeks, etc.), then hardwired time values can lead to sporadic test failures. After all, time will have one value now and another a second later. Thus it’s more stable to employ a library that freezes time to a specific moment within your tests, thereby controlling for its change.
External software that keeps its own state
Let me give you two examples in many web apps: a Memcached instance running in another process or a full-text search index running as a subscribed-to SAAS. Depending on which state-resetting features your test framework provides you with (e.g. cleaning up records in SQL databases), it will hardly cater to the infinite range of external processes your software could conceivably interact with. As such, take care to vet all external processes and services you integrate with, and if you discover that these services can have state that might leak, ensure you reset this state between tests. For example, I clear my search index after every test.
Caching
Not just caching within your application, but also caching which occurs outside your application—for example at SQL, web-browser, or OS level. These caches play by their own, sometimes abstruse rules. All it takes is one overlooked or unfortunate configuration setting, and all your efforts at test isolation may be thwarted.