13 September 2017

Database integration tests are slow? Bullshit!


Probably we all saw projects where running all the tests took about 30 mins, it was possible only using command line, and required remote debugging. That's insane.

But fortunately it's not the only way. Let's do a few tests and see the numbers. By no means it's a benchmark. I just want to show the orders of magnitude. All the following tests use Postgres and run on my old, cheap desktop.

Sample app: Spring-boot, Flyway, JPA


Let's see a typical spring-boot application with flyway and hibernate jpa:

@SpringBootApplication
public class Application {
    public static void main(String[] args) {SpringApplication.run(Application.class, args);}
}

@Entity @Builder
class SampleEntity {
    @GeneratedValue @Id int id;
    String name;
    int age;
}

@Service
class ExpensiveService {
    public ExpensiveService(Repo repo) {
        throw new RuntimeException("too expensive for our tests");
    }
}

interface Repo extends Repository<SampleEntity, Integer> {
    Long countByNameAndAge(String name, int age);
}
Of course the startup time of the whole application can be arbitrary long depending on your app.
So as a start let's setup db-only tests.

@RunWith(SpringRunner.class)
@AutoConfigureTestDatabase(replace = Replace.NONE)
@DataJpaTest
public class RepoTest extends DbTest {

    @Autowired Repo repo;
    @Autowired TestEntityManager testEntityManager;

    @Test
    public void should_count_by_age_and_name() {

        testEntityManager.persistAndFlush(SampleEntity.builder().age(4).name("john").build());
        testEntityManager.persistAndFlush(SampleEntity.builder().age(5).name("john").build());
        testEntityManager.persistAndFlush(SampleEntity.builder().age(5).name("john").build());

        long count = repo.countByNameAndAge("john", 5);

        assertThat(count).isEqualTo(2);
    }
}

Without optimisations

No database is installed, so for sure it will take some time to start one.

After compilation is done, ./gradlew test --rerun-tasks takes ~14.4s. When we repeat the same test using Spring's @Repeat(20), it takes ~15.2s. Running it from withing IDE using Shift-F10 gives the same results so at least we don't have to switch to console and we can see clickable logs immediately. So what do we see in the logs?

Before the first test starts, spring reports that jvm is running for ~11s. That includes jvm start and building spring context. Building spring context includes, among others, starting and preparing db (~6s), executing flyway's migrations (400ms), setting up hibernate as a jpa provider (~1s)

Each test consists of 3 inserts, a search and a rollback.

first test: ~300ms
next one: ~100ms
every other: ~50ms

With running database

The most obvious improvement would be to have db up and running on the development machine. Let's do
docker run --rm -p 5432:5432 postgres:9.6.1
and then again Shift-F10. As expected we just saved 6s. Now, before 1st test, jvm is running ~5s.

Without migrations

What else we can improve? Current config clears the database and run flyway migrations before first test. Although 400ms is not much, Flyway's migrations usually grow over time and in bigger projects it can take tens of seconds. Especially because some databases have really slow DDL operations.

Often we work on some new queries and we don't modify db structure. So let's temporary disable db cleanup and therefore the need of flyway's migrations using .fullClean(__ -> {}) and flyway's verification using environment variable flyway.enabled=false. Of course to make it convenient it should be handled by some feature switch but it's just a PoC.

So now, it's ~4,6s and ~50ms for each spring's JPA test. And all running from IDE. Not bad.

Sample app: No ORM


Much more spectacular results we can get when we don't use any ORM. That's much common in small apps (e.g. microservices). Let's see a simple spring-boot, flyway, jdbc app:
@SpringBootApplication
@Repository
@AllArgsConstructor(onConstructor = @__(@Autowired))
public class Repo {
    final JdbcTemplate jdbcTemplate;

    /** fails at first batch containing null */
    public void save_ints_in_batches_by_two(List ints) {
        jdbcTemplate.batchUpdate("insert into some_table values (?)", ints, 2,
                (ps, value) -> ps.setInt(1, value));
    }

    /** executes stored function */
    public int count() {
        return jdbcTemplate.queryForObject("select my_count()", Integer.class);
    }

    public static void main(String[] args) {SpringApplication.run(Repo.class, args);}
}
In this case we don't need Spring to run tests:
public class NgTest {

    Repo repo;

    @BeforeMethod
    public void prepareDb() {
        DataSource dataSource = StandaloneDbSetup.prepareDataSourceForTest();
        repo = new Repo(new JdbcTemplate(dataSource));
    }

    @Test
    public void should_insert_all_values() {

        repo.save_ints_in_batches_by_two(Arrays.asList(1,2,3,4,5));

        int count = repo.count();
        assertThat(count).isEqualTo(5);
    }

    @Test
    public void should_fail_on_batch_containing_null() {

        assertThatThrownBy(() ->
                repo.save_ints_in_batches_by_two(Arrays.asList(1,2,3,null,5))
        ).isNotNull();

        int count = repo.count();
        assertThat(count).as("only 1st batch of size 2 should have succeeded")
                .isEqualTo(2);
    }
}

On a clean machine it takes ~6.5s. When Postgres is up, running those two tests takes 0.5s.

Without Flyway

After completely removing Flyway (migration and validation) using .buildSchema(__ -> {}).fullClean(__ ->{}), 2 tests takes: 220ms in total. One of them runs in 7ms. So instead of 2 tests let's run 20: 10 times each by using @Test(invocationCount = 10).

Now, we run 20 tests. There were 2 groups of tests

1. 1 commit + 1 integrity violation; first test took 240ms, others 5-16ms
2. 3 commits; 20-24ms

The whole execution time reported by IDE is less than 1.6s.


And that's all without things like moving database to tmpfs, changing app, etc. All the changes can be done in one test superclass.
Is it slower than normal unit tests? For sure. Is it slow? Not really.

Full source: JPA, standalone.

4 komentarze :

0xMarcin said...

Of course things are fast when you have only a single table, if you had say 120 tables things would slow down considerably. Not only Hibernate analyzes all entity classes up front (to build metamodel), but also it builds all SQL statements at startup - this takes considerable amout of time on larger projects. Also when you DB becomes larger and more compilicated (you have plenty of indexes, foreign keys and historical migrations) Postgres will no longer be so fast - from my experience especially updating/creating indexes can be a slow operation. H2 is also no solution here - because on large project it is not considerable faster than Postgres. The other thing is that for bigger DBs you will also want to have some data there - just to test that migrations are working. In my case this all adds to about 30seconds of setup time, and this is the minimum amout of time for running a single test. Compare this with about a 1second to run all my unit tests (more than 200 of them) and it becomes clear that running integration tests during developemt will slow you down (especially when you work on business logic). So try I always try to limit integration tests, they are still important but the more I cover with unit-tests the fastes I can develop new features.

Paweł said...

Great post Piotrek, but you probably have some typo at the end of first code snippet (20-23) :)

piotrek said...

Thanks Paweł! fixed

piotrek said...

@0xMarcin yes, ORM costs the most here. in dev mode you can always disable validation but still it'll be slow. when testing migration itself, you have to apply them all or use schema templates (but i haven't see any tool to automate this task). for any other tests, you should be able to disable migrations. and there should be no big difference between 1 and 20 integration tests - you pay the cost before starting the first one

Post a comment