22 March 2014

Refactor, don't reinvent the wheel

Recently I saw a training regarding clean code and refactoring. One of shown examples of bad code was something like this:
public void setName(String name) {
    this.name = name;
    if (this.name != null ) {
        if (this.name.length() > 30) {
            this.name = name.substring(0,30);
            this.name = this.name.toUpperCase();
        } else {
            this.name = this.name.toUpperCase();
        }
    }
}
And after a few slides the code was refactored to the final version:
public void setName(String name) {
    if (!isValid(name)) {
        this.name = null;
    }

    this.name = limit(name, to(30)).toUpperCase();
}

private boolean isValid(String name) {
    return name != null;
}

private String limit(String input, int limit) {
    return input.substring(0, limit);
}

private int to(int x) {
    return x;
}
And I will argue that's a very wrong approach or a really bad example. Why? Because every language has its idioms and most commonly used tools that became de-facto standards and are well known and understood. In java world it's guava, apache commons, lambdaj etc. Using those libraries, you can be much more functional, null-safe and concise. You can use well known existing functions instead of creating new ones and learn them again in each project. In my opinion, much more readable would be:
public void setName(@Nullable String name) {
    this.name = StringUtils.upperCase( StringUtils.left(name, 30));
}
or in case we'll need it more than once:
public static upperCasePrefix(@Nullable String input, int limit) {
    return = StringUtils.upperCase( StringUtils.left(input, limit));
}

public void setName(@Nullable String name) {
    this.name = upperCasePrefix(name, 30);
}

21 March 2014

Use unicode for better names

Let's say in a java application we have a few tabs and sometimes we hide some of them. So now we want to document a new requirement and of course we do it as a test:
@Test
public void should_hide_more_tab_when_no_additional_information_is_available() {
   ...
}
but wait... what does it exactly mean? Should our application hide more tabs then it usually does? Or is there a tab named 'more' that should be hidden? How can we clarify this? After a quick look at the unicode char table, we pick the ʻ char (or any other that makes you happy). It's a '02BB turned comma' and more information can be found for example here. There is a table with detailed information about that character and the interesting part is:
Character.isJavaIdentifierPart()  Yes
Cool! So let's write:
@Test
public void should_hide_ʻmoreʻ_tab_when_no_additional_information_is_available() {
   ...
}
Is this test more readable now?

ps.
For example in racket (a dialect of lisp) you can define lambdas using λ:
(λ(x) (+ x 1))

20 March 2014

The myth of random data in unit tests

Many times I see people generate random data for any irrelevant variable in tests:
String anyName = RandomStringUtils.random(8);

Customer customer = customerBuilder()
                               .withName(anyName)
                               .withAge(18)
                               .build();

assertThat(customer).isAdult();
First of all, it would probably be better if this test looked somehow like this:
Customer customer = newCustomerWithAge(18)

assertThat(customer).isAdult();
I know, I know: sometimes tests are a bit more complex and badly written and you just need the name as a constant. So why not simply:
private static final String ANY_NAME = "John";
...
customer = customerBuilder()
                      .withName(ANY_NAME)
                      .withAge(18);
                      .build();

assertThat(customer).isAdult();
Does the random generator make you feel safer? If the name is irrelevant, why bother generating it? It just makes your code less readable.

But some people go even further. Let's say we want to test StringUtils.contains from the Apache Commons. Some people want to generate the significant parameters:
String random1 = randomString();
String random2 = randomString();
...
assertTrue(StringUtils.contains(random1 + random2 + random3, random2));
Easy, right? But how will we test if it returns false correctly? Now our random data needs to obey some specific constraints. So it's rather hard to generate the data without, in fact, implementing the functionality again in tests. Another problem is that when you have such tests you think everything is tested and you stop thinking about corner cases.

But is everything really tested? What about nulls? what about empty strings? What about combinations of them? And even if your generator can produce nulls and empty strings, still: is everything tested?

How often will your random test run before the tested code goes on production? If you do continuous delivery then the test will run a few times during your local development, once on your CI server and... that's it. If you're not so lucky to do continuous delivery then let's assume your commit goes on production in 3 weeks. Probably soon there will be feature freeze and branch stabilization. How many times will this test run? 50 times on CI server? Random tests are totally useless when running only a few times. Of course you may expect those tests will run very many times during local development of the rest of your team but...

If it fails on someone's else machine, are you sure he will record the test result? Wait! There will be no result! There will be only information that true was expected but false was returned. So you have to remember about adding logs to all your random tests. And even if logs are being dumped, are you sure that other developer (who has to deliver his own, completely different functionality) take care about irrelevant, non-deterministic test failure? Because other option is simply re-run tests, see the green light, commit and go home. No one will ever know.

Let's face it, it can't work this way. If you are not sure if your test data is good enough then:
  • Simplify your code. Extract methods/classes, avoid ifs, avoid nulls, be more immutable and functional.
  • Try to analyze the edge cases and include them in your tests.
  • If needed, throw away the part of code and start again doing TDD. If you've never tried it, you will be surprised how different the design can be.
Seriously, those two rules will almost always be enough. That's because the sad truth is that the vast majority of all the development is a typical corpo maintenance. It's not a rocket science and all the complexity is usually incidental. But the refactoring can be expensive. And if above rules are not enough:
  • Generate a lot of random data sets, look at them and check if some of them differs from what you had in mind when designing your code. And, of course, add new cases to your tests.
  • Use mutation testing.
  • Whenever a bug is discovered during the development, uat or production, add new cases to your tests to avoid regression.
  • Do real random testing. Keep the testing server running 24/7. Every generated data that breaks the tests should be logged and added to your deterministic unit tests.