tag:blogger.com,1999:blog-59223957496183578772024-02-24T21:46:57.529+01:00code > /dev/nullreadability, less code, java, functional programming, ...Unknownnoreply@blogger.comBlogger24125tag:blogger.com,1999:blog-5922395749618357877.post-18200166883438601542017-09-13T04:41:00.000+02:002017-09-19T13:15:37.135+02:00Database integration tests are slow? Bullshit!<br />
Probably we all saw projects where running all the tests took about 30 mins, it was possible only using command line, and required remote debugging. That's insane.<br />
<br />
But fortunately it's not the only way. Let's do a few tests and see the numbers. By no means it's a benchmark. I just want to show the orders of magnitude. All the following tests use Postgres and run on my old, cheap desktop. <br />
<br />
<h3>Sample app: Spring-boot, Flyway, JPA</h3><br />
Let's see a typical spring-boot application with flyway and hibernate jpa:<br />
<br />
<pre class='brush:java'>@SpringBootApplication
public class Application {
public static void main(String[] args) {SpringApplication.run(Application.class, args);}
}
@Entity @Builder
class SampleEntity {
@GeneratedValue @Id int id;
String name;
int age;
}
@Service
class ExpensiveService {
public ExpensiveService(Repo repo) {
throw new RuntimeException("too expensive for our tests");
}
}
interface Repo extends Repository<SampleEntity, Integer> {
Long countByNameAndAge(String name, int age);
}
</pre>Of course the startup time of the whole application can be arbitrary long depending on your app. <br />
So as a start let's setup db-only tests.<br />
<br />
<pre class='brush:java'>@RunWith(SpringRunner.class)
@AutoConfigureTestDatabase(replace = Replace.NONE)
@DataJpaTest
public class RepoTest extends DbTest {
@Autowired Repo repo;
@Autowired TestEntityManager testEntityManager;
@Test
public void should_count_by_age_and_name() {
testEntityManager.persistAndFlush(SampleEntity.builder().age(4).name("john").build());
testEntityManager.persistAndFlush(SampleEntity.builder().age(5).name("john").build());
testEntityManager.persistAndFlush(SampleEntity.builder().age(5).name("john").build());
long count = repo.countByNameAndAge("john", 5);
assertThat(count).isEqualTo(2);
}
}
</pre><h4>Without optimisations</h4>No database is installed, so for sure it will take some time to start one.<br />
<br />
After compilation is done, <code>./gradlew test --rerun-tasks</code> takes ~14.4s. When we repeat the same test using Spring's <code>@Repeat(20)</code>, it takes ~15.2s. Running it from withing IDE using Shift-F10 gives the same results so at least we don't have to switch to console and we can see clickable logs immediately. So what do we see in the logs?<br />
<br />
Before the first test starts, spring reports that jvm is running for ~11s. That includes jvm start and building spring context. Building spring context includes, among others, starting and preparing db (~6s), executing flyway's migrations (400ms), setting up hibernate as a jpa provider (~1s)<br />
<br />
Each test consists of 3 inserts, a search and a rollback.<br />
<br />
first test: ~300ms<br />
next one: ~100ms<br />
every other: ~50ms<br />
<h4>With running database</h4>The most obvious improvement would be to have db up and running on the development machine. Let's do<br />
<pre class='brush:text'>docker run --rm -p 5432:5432 postgres:9.6.1
</pre>and then again Shift-F10. As expected we just saved 6s. Now, before 1st test, jvm is running ~5s.<br />
<h4>Without migrations</h4>What else we can improve? Current config clears the database and run flyway migrations before first test. Although 400ms is not much, Flyway's migrations usually grow over time and in bigger projects it can take tens of seconds. Especially because some databases have really slow DDL operations.<br />
<br />
Often we work on some new queries and we don't modify db structure. So let's temporary disable db cleanup and therefore the need of flyway's migrations using <code>.fullClean(__ -> {})</code> and flyway's verification using environment variable <code>flyway.enabled=false</code>. Of course to make it convenient it should be handled by some feature switch but it's just a PoC.<br />
<br />
So now, it's ~4,6s and ~50ms for each spring's JPA test. And all running from IDE. Not bad.<br />
<br />
<h3>Sample app: No ORM</h3><br />
Much more spectacular results we can get when we don't use any ORM. That's much common in small apps (e.g. microservices). Let's see a simple spring-boot, flyway, jdbc app:<br />
<pre class='brush:java'>@SpringBootApplication
@Repository
@AllArgsConstructor(onConstructor = @__(@Autowired))
public class Repo {
final JdbcTemplate jdbcTemplate;
/** fails at first batch containing null */
public void save_ints_in_batches_by_two(List<integer> ints) {
jdbcTemplate.batchUpdate("insert into some_table values (?)", ints, 2,
(ps, value) -> ps.setInt(1, value));
}
/** executes stored function */
public int count() {
return jdbcTemplate.queryForObject("select my_count()", Integer.class);
}
public static void main(String[] args) {SpringApplication.run(Repo.class, args);}
}
</pre>In this case we don't need Spring to run tests:<br />
<pre class='brush:java'>public class NgTest {
Repo repo;
@BeforeMethod
public void prepareDb() {
DataSource dataSource = StandaloneDbSetup.prepareDataSourceForTest();
repo = new Repo(new JdbcTemplate(dataSource));
}
@Test
public void should_insert_all_values() {
repo.save_ints_in_batches_by_two(Arrays.asList(1,2,3,4,5));
int count = repo.count();
assertThat(count).isEqualTo(5);
}
@Test
public void should_fail_on_batch_containing_null() {
assertThatThrownBy(() ->
repo.save_ints_in_batches_by_two(Arrays.asList(1,2,3,null,5))
).isNotNull();
int count = repo.count();
assertThat(count).as("only 1st batch of size 2 should have succeeded")
.isEqualTo(2);
}
}
</pre><br />
On a clean machine it takes ~6.5s. When Postgres is up, running those two tests takes 0.5s.<br />
<h4>Without Flyway</h4>After completely removing Flyway (migration and validation) using <code>.buildSchema(__ -> {}).fullClean(__ ->{})</code>, 2 tests takes: 220ms in total. One of them runs in 7ms. So instead of 2 tests let's run 20: 10 times each by using <code>@Test(invocationCount = 10)</code>.<br />
<br />
Now, we run 20 tests. There were 2 groups of tests<br />
<br />
1. 1 commit + 1 integrity violation; first test took 240ms, others 5-16ms<br />
2. 3 commits; 20-24ms<br />
<br />
The whole execution time reported by IDE is less than 1.6s.<br />
<br />
<br />
And that's all without things like moving database to tmpfs, changing app, etc. All the changes can be done in one test superclass.<br />
Is it slower than normal unit tests? For sure. Is it slow? Not really.<br />
<br />
Full source: <a href="https://github.com/piotrturski/testegration/tree/master/testing/jpa-spring-data">JPA</a>, <a href="https://github.com/piotrturski/testegration/tree/master/testing/postgres-testng">standalone</a>.Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-5922395749618357877.post-52547775546149900382016-04-19T01:14:00.000+02:002016-04-19T01:14:13.893+02:00Spring's @Value with custom typesEveryone knows you can do:<br />
<pre class='brush:java'>@Value("${some.property}") String myString;
@Value("classpath:/${some.file}") Resource file;
</pre>Some know you can also do:<br />
<pre class='brush:java'>@Value("classpath:/myFile.txt") InputStream inputStream;
@Value("classpath:/${some.property}") Reader reader;
</pre>But what if we want to do:<br />
<pre class='brush:java'>@Value("some string ${with.properties}") MyObject myObject;
</pre>In this case spring will automatically look for builder methods <code>valueOf</code>, <code>of</code>, <code>from</code> or proper constructor (as described in <a href="https://github.com/spring-projects/spring-framework/blob/v4.2.5.RELEASE/spring-core/src/main/java/org/springframework/core/convert/support/ObjectToObjectConverter.java">ObjectToObjectConverter</a>). So <br />
<pre class='brush:java'>public MyObject(String parameter) {...}
</pre>will be enough.<br />
<br />
If more complex conversion is needed, one possibility is to create a bean named <code>conversionService</code> and type ConversionService with registered converters. For example:<br />
<pre class='brush:java'>@Bean(name="conversionService")
public ConversionService conversionService() {
DefaultConversionService conversionService = new DefaultConversionService();
conversionService.addConverter(String.class, MyObject.class, s -> ...);
return conversionService;
}
</pre><br />
Tested with spring 4.2.5.RELEASEUnknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-72663124659871734142016-01-10T00:02:00.000+01:002016-01-26T18:36:33.283+01:00Remote git branch with different nameSometimes we want to publish one of our local branches as a master branch in a new git repository. <br />
<br />
To initialize newly created github repo add new remote. Let's name it <code>other-repo</code>.<br />
<pre class='brush:text;auto-links:false'>git remote add other-repo https://github.com/xxx/yyy.git
</pre>To push <code>local-branch</code> as remote <code>master</code>:<br />
<pre class='brush:text'>git push -u other-repo local-branch:master
</pre>To later push any new commits (this can be simplified using git's config option <code>push.default</code>):<br />
<pre class='brush:text'>git push other-repo local-branch:master
</pre>To delete the tracking without deleting remote branch:<br />
<pre class='brush:text'>git branch local-branch --unset-upstream
</pre>To make existing local branch track existing remote branch (master):<br />
<pre class='brush:text'>git branch other-repo/master local-branch
</pre>To see all the trackings:<br />
<pre class='brush:text'>git branch -vv
</pre><br />
git version 1.9.1<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-20702622451371669132015-12-09T17:13:00.000+01:002015-12-09T17:13:20.615+01:00Tips for R beginnersI'm not an R expert but those few things may save you some time. Especially when doing coursera's courses.<br />
<br />
<h4>Installation</h4>Don't install R nor R studio from your system package manager. It's a waste of time. Of course it will work and you'll be able to run hello world but soon you will need some external libraries. And some of them will be outdated others will have conflicting dependencies so the installation will fail. At least that's the case with ubuntu 14.04. <br />
<h4>rJava problem</h4>Some libraries require java. If you have a problem with 'rJava' library, it's possible that your R installation by default looks for different (older) java version than you actually have installed. in this case you may try: <br />
<pre class="brush:bash">sudo R CMD javareconf
</pre><br />
as described here: <a href="http://stackoverflow.com/a/31316527/1100135">http://stackoverflow.com/a/31316527/1100135</a><br />
<br />
<h4>Changing locale</h4>If, for any reason, you can't change a locale from inside R, you can run whole R with different locale:<br />
<pre class="brush:bash">LC_ALL=C rstudio
</pre><br />
You can read more about it using <code>man setlocale</code>. Still, it won't let you use a few different locales at once.<br />
<br />
<h4>Building / transforming formulas</h4>At some point you will want use the power of lazy evaluation and build/transform formulas instead of providing them by hand. Two functions will be usefull: <code>substitute</code> and <code>as.formula</code>. Let's say we want to build a function that takes all the predictiors (or, more general, some part of formula) and adds the regression variable <code>y</code>(other part of formula)<br />
<br />
<pre class="brush:r">make.formula <- function(x) as.formula(substitute(t ~ x))
</pre>
and now we can call it using:
<pre class="brush:r">new.formula <- make.formula(x+y*z)
str(new.formula)
</pre>
to get:
<pre class="brush:text">Class 'formula' length 3 t ~ x + y * z
</pre>
<h4>Tuning knitr rendering</h4>Each code chunk {r } accepts optional parameters that allow you, for example, control if code is executed, if diagnostic messages are also rendered, if computation is cached, if each command prints its output or whole output is displayed at the end etc. Sample:
<pre class="brush:r">```{r cache=T, message=F, results='hold'}
library(randomForest)
system.time(fit <- randomForest(classe ~ ., data=training))
fit
```</pre>
It will exclude diagnostic from loading library, cache trained model and display whole output at the end.
Do ?opts_chunk to see the reference page of available options (in <code>library(knitr)</code>) and links to the online documentation.
<br><br>
For inline R do: <code>`r 2 + 3 * x`</code>
<h4>Benchmarking</h4><pre class="brush:r">System.time(x <- expensive.function())
</pre>
or to compare multiple computations:
<pre class="brush:r">library(rbenchmark)
benchmark(x <- expensive.function1(), y <-expensive.function2())
</pre>
Above code will do the actual measurement and also will assign new variable in the current environment.
<h4>Training prediction models with Caret</h4><code>train</code> delegates to other prediction method based on type. Often it's way faster to call directly the underlying method. We may loose all the caret's meta-parameter tuning but still often the model we get is good enough while having the training orders of magnitude faster. Eg:
<pre class="brush:r">train(y ~ x, data=training, method='rf')
randomForest(y ~ x, data=training)
</pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-10222618804746257342015-09-16T17:57:00.000+02:002015-09-16T17:57:30.592+02:00Integer overflow: zero from non-zero multiplicationPeople sometimes ignore overflow problem because their algorithm is still valid (eg checking if counter has changed within some short period of time). But keep in mind that overflow causes some fundamental math laws don't hold any more. One of them is: when we multiple a few integer numbers we got zero if and only if at least one of the factors is zero. Let's see that in action. Will this loop ever end?<br />
<pre class='brush:java'>int x = 1;
while(true) {
x *= RandomUtils.nextInt(1, Integer.MAX_VALUE); // positive random number
if (x == 0) break;
}</pre>In practice it ends instantly. Even though we multiply only non-zero integers we got a zero as a result. And, of course, it has nothing to do with rounding precision. When we take a closer look, it becomes obvious that to get zero we just have to produce any number that has a binary representation with 32 (in case of java's int) zeros at the end. And <code>x</code> accumulates trailing zeros with each multiplication that contains 2 in its prime factors (every second loop on average). So to get zero you can simply multiply two valid ints: (1 << 30) * 4. Same works with any combination of positive and negative numbers: in <a href="https://en.wikipedia.org/wiki/Two%27s_complement">java's representation</a> negative powers of two also accumulate trailing zeros.<br />
<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-51357389470992627932015-07-05T20:50:00.000+02:002015-07-05T20:50:19.310+02:00IoC is not DIOften I see posts and hear people using interchangeably terms 'inversion of control' and 'dependency injection'. But it's not the same thing.<br />
<br />
Inversion of control is a design pattern for, let's say, decoupling 'what' from 'when'. It lets some generic code pass the flow of control to custom components. It increases modularity and extensibility. It's about applying the Hollywood Principle: "Don't call us, we'll call you".<br />
<br />
Dependency injection is a design pattern that applies IoC to resolving dependencies. In this pattern a component X doesn't have control over creation of its own dependencies anymore. Instead, the control is inverted and given to another component Y which creates dependencies and inject them into X.<br />
<br />
But DI is not the only one realization of IoC. Some others are:<br />
<br />
service locator,<br />
aspects,<br />
events / callbacks / event handlers,<br />
strategy,<br />
template method<br />
<br />
and those patterns are often used to implement GUI, schedulers, plugins, declarative transactions or many frameworks/libraries (like servlets, junit runners etc).Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-90195904705490173502015-04-22T15:19:00.000+02:002015-04-22T15:20:26.994+02:00Hirschberg's algorithm explanationSome time ago I needed to implement Hirschberg's algorithm but I couldn't find any good explanation of it. So here is mine attempt.<br />
<br />
First of all: It's used to find the optimal sequence alignment. This alignment has to be measured somehow. As a metric we can use, for example, <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a>. Which means insertion, deletion and change have the same cost. This can be directly translated to computing list of all changes required to change one word into another.<br />
<br />
Let's say we have a word L of size n and a word R of size m.<br />
<h4>Needleman–Wunsch algorithm</h4>First we need to understand <a href="http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm">Needleman–Wunsch algorithm</a>. It's the easiest way to compute the alignment. The algorithm has just a few lines of code. Its time complexity is O(n*m), space complexity is also O(n*m). This is simple dynamic programming: having edge values we fill the array from top-left to bottom-right corner, choosing minimum (or maximum, depending on the metric. Let's say we are looking for the minimum edit distance). For L = bcd and R = abcde:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRvAFBHmGonUsmWoCjGRpne3ojF8V7QGwnB9THGlMKqEV68RfGqjv-z4PetpeHZ_fYM6gdrWe9IuBqYEAsQOuDPaQECSc0KykJ2PWZeZl7jD3OTE3aJyZ5HLviPFQpRlHOo4vqs71NQOAW/s1600/nw1.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRvAFBHmGonUsmWoCjGRpne3ojF8V7QGwnB9THGlMKqEV68RfGqjv-z4PetpeHZ_fYM6gdrWe9IuBqYEAsQOuDPaQECSc0KykJ2PWZeZl7jD3OTE3aJyZ5HLviPFQpRlHOo4vqs71NQOAW/s320/nw1.png" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiJSqyjIPdcvy38ACyvhmeWZAGWzDjJAs5M2sStYM3tn2Ckf0Ue_MysmRBsZu1Y8IHyGaH34J3fugFlksqzTv4xicGbN2HqKx_ZDvZ4BTtmIOH61hq0Ps9GxxD3rB-iSqDgpZQr_OJ-QCH/s1600/nw2.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiJSqyjIPdcvy38ACyvhmeWZAGWzDjJAs5M2sStYM3tn2Ckf0Ue_MysmRBsZu1Y8IHyGaH34J3fugFlksqzTv4xicGbN2HqKx_ZDvZ4BTtmIOH61hq0Ps9GxxD3rB-iSqDgpZQr_OJ-QCH/s320/nw2.png" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxf8aPT6iVCRSSRhKOa7V1hD3HzA6bTMr5NGRY-1oS2Fq0cMy4RlX29jXd0S09NBgfnGjreo6rMgUteOcFTTy0vlVAZ7gIcwg1K3o-BePZV32ewlk6aHwTepOw4r0w5O6YJmbZE4NSlJpr/s1600/nw3.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxf8aPT6iVCRSSRhKOa7V1hD3HzA6bTMr5NGRY-1oS2Fq0cMy4RlX29jXd0S09NBgfnGjreo6rMgUteOcFTTy0vlVAZ7gIcwg1K3o-BePZV32ewlk6aHwTepOw4r0w5O6YJmbZE4NSlJpr/s320/nw3.png" /></a><br />
<br />
After the whole array is filled we go backward (from bottom-right corner) and recreate the path.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwt-mG8q1hFFXFKhP8QzuaQcpoxNMBNWcW0guez7hTHpaI2ezLyDooqyZnlJOFRC6g3Ckcslxg88cfxLKEyd_evPJbjOGA1B8NAM2ftKxEH0MVXLi9cEelwK89ZO57WpYNQkDcfbmmwC7F/s1600/nw4.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwt-mG8q1hFFXFKhP8QzuaQcpoxNMBNWcW0guez7hTHpaI2ezLyDooqyZnlJOFRC6g3Ckcslxg88cfxLKEyd_evPJbjOGA1B8NAM2ftKxEH0MVXLi9cEelwK89ZO57WpYNQkDcfbmmwC7F/s320/nw4.png" /></a><br />
<br />
Each horizontal arrow represents an insertion, vertical - deletion and diagonal - match or replacement (depending if letters match or not). Therefore this array represents alignment:<br />
<pre class="brush: text; gutter: false;">- b c d -
| | |
a b c d e
</pre>It also means that edit distance between bcd and abcde is 2. So we can change one into another with only 2 changes. But it's not all. We can get more information from this table. Last row contains all edit distances (D) between L and all prefixes of R:<br />
<br />
D(bcd, abcde) = 2<br />
D(bcd, abcd) = 1<br />
D(bcd, abc) = 2<br />
D(bcd, ab) = 3<br />
D(bcd, a) = 3<br />
D(bcd, ) = 3<br />
<br />
If we are interested only in edit distance but not the whole alignment then we can reduce Needleman–Wunsch algorithm space complexity. The key observation is that we can fill the array horizontally (or vertically) so at any time we need only 2 adjacent rows of the array:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVOADg-93dAej47VmUTD8_L5bGUySgJcWXFUSgGILBcpkhFQ56dDvd5SFSY9oNweYDrTZ7LiMYl4jgFYunsVJL_I3busmmHHEa3Gh6j-MFJ6W7OGC-KiCxj0yvbEwa8wRC8lFQuVYLdjAY/s1600/nw5.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVOADg-93dAej47VmUTD8_L5bGUySgJcWXFUSgGILBcpkhFQ56dDvd5SFSY9oNweYDrTZ7LiMYl4jgFYunsVJL_I3busmmHHEa3Gh6j-MFJ6W7OGC-KiCxj0yvbEwa8wRC8lFQuVYLdjAY/s320/nw5.png" /></a> <br />
<br />
Its space complexity is O(n) or O(m) (depending on the filling direction) and time complexity remains O(n*m). Let's call this version NW'. If we choose correct direction then last row also contains all edit distances (D) between L and all prefixes of R.<br />
<h4>Hirschberg's algorithm</h4>Hirschberg's algorithm lets us compute the whole alignment using time O(n*m) and space O(m+n). It uses NW' algorithm and divide and conquer approach:<br />
<ol><li>if problem is trivial, compute it:</li>
<ol type="a"><li>if n or m is 0 then there is n or m insertions or deletions</li>
<li>if n = 1 then there is m-1 insertions or deletions and 1 match or change</li>
</ol><li>if problem is bigger, divide it into 2 smaller, independent problems</li>
<ol type="a"><li>divide L in the middle. into 2 sublists L1, L2</li>
<li>find optimum division R = R1 R2</li>
<li>recursively find the alignment of L1 and R1</li>
<li>recursively find the alignment of L2 and R2</li>
<li>concatenate results</li>
</ol></ol>Of course point 2. needs better explanation. How can we divide big problem into smaller ones? Let's imagine some optimal alignment between L = a1...a27 and R = b1...b32:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcf6uAuEcwzR0RgixMDW18EDVm9H3iLoDKWOi64jH8sPKHfs8gKEkFV4pZ-hwhyphenhyphen6w1wQMUjquAcn-jOtAQuBwjYZo6ymEQQUeR_zdeWH0uNiguB7FqivNFHGMEg96c-D_QDR_bw_IPjTg8/s1600/ed1.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcf6uAuEcwzR0RgixMDW18EDVm9H3iLoDKWOi64jH8sPKHfs8gKEkFV4pZ-hwhyphenhyphen6w1wQMUjquAcn-jOtAQuBwjYZo6ymEQQUeR_zdeWH0uNiguB7FqivNFHGMEg96c-D_QDR_bw_IPjTg8/s320/ed1.png" /></a><br />
<br />
It means we can transform L into R in 15 operations (to be more precise: cost of transformation is 15). Obviously we can divide this alignment at any arbitrary position into 2 independent alignments, for example:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7FykTLNvb12IryAen-LH2UNtvg2WHnCZxsw3swGXjNmfc78q062HlaxMirUNwBrPN4uh2EQVSsP-_CJeTwY3Ef3wGEEdCW8pApcoJC9H358Q1-ktqX5yLvzc1omOKOWehiFdr6tnefggL/s1600/ed2.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7FykTLNvb12IryAen-LH2UNtvg2WHnCZxsw3swGXjNmfc78q062HlaxMirUNwBrPN4uh2EQVSsP-_CJeTwY3Ef3wGEEdCW8pApcoJC9H358Q1-ktqX5yLvzc1omOKOWehiFdr6tnefggL/s320/ed2.png" /></a><br />
Above edit distances is of course just an example. The point is, they must add up to the overall edit distance (15). <br />
<br />
So now we have L = (a1...a7) (a8...a27) = L1 L2 and R = (b1...b4) (a8...a32) = R1 R2. It means we can transform L1 into R1 in 4 operations and L2 into R2 in 11 operations. We can concatenate those transformations to get the overall result. It means that if we somehow know the right division we'll be able to compute the alignment of L with R by computing alignments of L1 with R1 and L2 with R2 and simply concatenating results. So it will be possible to split big problem recursively until we reach trivial cases. For example (from wikipedia) to align AGTACGCA and TATGC we could do:<br />
<pre>(AGTACGCA,TATGC)
/ \
(AGTA,TA) (CGCA,TGC)
/ \ / \
(AG,) (TA,TA) (CG,TG) (CA,C)
/ \ / \
(T,T) (A,A) (C,T) (G,G)
</pre><br />
But if we have L divided into L1 and L2 how do we know where to divide R? Let's say we want to align (bcdce, abcde)<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd5Mnxg-C6SJkcJpgbbGbCnRtmYIOFHL1sHNH2n8CHWSpfRM113kijOMAxDdsjN0uejOUMcWeH5-k3752sCkffQh2JnPIwyBHIn5xPt-l4NU-GNX2ZZw2MFwmqBz8PmeJJIFZnQWA_OYu0/s1600/h1.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd5Mnxg-C6SJkcJpgbbGbCnRtmYIOFHL1sHNH2n8CHWSpfRM113kijOMAxDdsjN0uejOUMcWeH5-k3752sCkffQh2JnPIwyBHIn5xPt-l4NU-GNX2ZZw2MFwmqBz8PmeJJIFZnQWA_OYu0/s320/h1.png" /></a><br />
<br />
Firstly, let's arbitrary divide L in the middle. So we have two new lists L1 and L2. <br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJfLOT4YVN3E8neWyPNMmhOJLYlfPrXQZvfRzeF9DGFjPOVYimNCTvn2SvpHpinPCwZpjw40Ikk-aZszyM_OPF-aYYgaSQtKf_q0cNaM7FpkLTeS9T8askwnzAwtBKqmzZpBI8ZKFw0_gf/s1600/h2.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJfLOT4YVN3E8neWyPNMmhOJLYlfPrXQZvfRzeF9DGFjPOVYimNCTvn2SvpHpinPCwZpjw40Ikk-aZszyM_OPF-aYYgaSQtKf_q0cNaM7FpkLTeS9T8askwnzAwtBKqmzZpBI8ZKFw0_gf/s320/h2.png" /></a><br />
<br />
So where is the correct division of R? Well, we just need to ensure that edit distance (L1, R1) + edit distance (L2, R2) will not be larger than the overall edit distance (L, R). That is, we will use the shortest possible transformation (or alignment). How to ensure that? Well, some of the possible divisions are the optimal ones. So just check all of them and pick the one with the smallest sum of edit distances.<br />
Ok, but how to quickly compute edit distances of L1 with all possible R1 and L2 with all possible R2? Here NW' algorithm can help us. It's straightforward to run for the upper half:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZ8NxwTVa07l0gU301YKtqak_AANCgYiM88e_hMPLq6Dc6-GpSzeFf1faKpswtFuEWfP_T8m-TGzq9dVmQHMO-EJ-WPxIgYejULTFaW5c9b1BaYyNduYb_oIHnwd2h9EN-GsJBU50jsdGc/s1600/h3.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZ8NxwTVa07l0gU301YKtqak_AANCgYiM88e_hMPLq6Dc6-GpSzeFf1faKpswtFuEWfP_T8m-TGzq9dVmQHMO-EJ-WPxIgYejULTFaW5c9b1BaYyNduYb_oIHnwd2h9EN-GsJBU50jsdGc/s320/h3.png" /></a><br />
<br />
After NW' run is completed we have edit distances between L1 and all possible R1:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZLt8n9t14m2nrR80DjG4oyD6SYnNCyJvz5aZDMOVWNL4V7ljjWSYcB1knbEr8ipes-dm42Q_JAKduZu3jA-ABVHAHtCrt2gfior6CsiFG494KkXjzJDI6G7-te8caP9Dgxswy_Vf2aSkO/s1600/h4.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZLt8n9t14m2nrR80DjG4oyD6SYnNCyJvz5aZDMOVWNL4V7ljjWSYcB1knbEr8ipes-dm42Q_JAKduZu3jA-ABVHAHtCrt2gfior6CsiFG494KkXjzJDI6G7-te8caP9Dgxswy_Vf2aSkO/s320/h4.png" /></a><br />
<br />
But what about L2 and all possible R2? Again NW' can help, we just need to run it backward. After all edit distance is the same if we look at both words forward or backward, right?<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgylx0KwatJzWxxnBWZkuRmWWuuexWHj7ClCG7yTE-Oy8s1fOcqlT4jKY66uuF2h3RpQDjaTPqebH6BPKu73kgyFBzHp-30JMZY2yYSy3nwuX6BP1-OX9aMmuBz2mGRMV41DGUBk4qoCko3/s1600/h5.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgylx0KwatJzWxxnBWZkuRmWWuuexWHj7ClCG7yTE-Oy8s1fOcqlT4jKY66uuF2h3RpQDjaTPqebH6BPKu73kgyFBzHp-30JMZY2yYSy3nwuX6BP1-OX9aMmuBz2mGRMV41DGUBk4qoCko3/s320/h5.png" /></a><br />
<br />
After NW' run is completed we have edit distances between L2 and all possible R2:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdFRcCESdPwQFq8J5zPkn80mskd8hTtZblcMQ5O5HI4zC-zpfV5OOIGAFhIP6rWHIt6-TfXQjTiwQcD2GglPQXhtY7vn9eVzEIl1SaCB7b1y-Bt16tqm9XjMupJdAY79t7vsIoXYsQHAZC/s1600/h6.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdFRcCESdPwQFq8J5zPkn80mskd8hTtZblcMQ5O5HI4zC-zpfV5OOIGAFhIP6rWHIt6-TfXQjTiwQcD2GglPQXhtY7vn9eVzEIl1SaCB7b1y-Bt16tqm9XjMupJdAY79t7vsIoXYsQHAZC/s320/h6.png" /></a><br />
<br />
Now, we need to choose the partition of R. Let's say we choose to divide R = abcde into R1 = empty and R2 = abcde:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjloTun7XwQVh5wvc6Pr9clea3Nepxo6pCMaXmydLxgXZkbzpF0mwr88Yjzq7xsFWwU5z2eYvXp4v349ZglCb21t2KkguDWw-92VIxihlU2ETfJuKdRbznAPnheZLh4psQLXCOH11MDbdwc/s1600/h7.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjloTun7XwQVh5wvc6Pr9clea3Nepxo6pCMaXmydLxgXZkbzpF0mwr88Yjzq7xsFWwU5z2eYvXp4v349ZglCb21t2KkguDWw-92VIxihlU2ETfJuKdRbznAPnheZLh4psQLXCOH11MDbdwc/s320/h7.png" /></a><br />
<br />
Now we would have to align L1 = bcd with R1 = empty and L2 = ce with R2 = abcde. What would be the edit distances?<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNNblx1O848Cih1AzdVT5-P91CkCidFRGHEmtwTjuaO3x6U0RXawt7G9b5Uyd0wSxZhJ-BA03HEtlvk7DCjJQDu_AlIgx9M-MhpCsFblA8QFCZu9eP31_mwXN8B5f-9yKTu_dOlGglIdlN/s1600/h8.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNNblx1O848Cih1AzdVT5-P91CkCidFRGHEmtwTjuaO3x6U0RXawt7G9b5Uyd0wSxZhJ-BA03HEtlvk7DCjJQDu_AlIgx9M-MhpCsFblA8QFCZu9eP31_mwXN8B5f-9yKTu_dOlGglIdlN/s320/h8.png" /></a><br />
<br />
It means we can transform L1 (bcd) into R1 (empty) in 3 operations and L2 (ce) into R2 (abcde) in 3 operations. So choosing this division we would transform L into R in 6 operations. In other words cost of alignment of L and R with this division would be 6. Can we do better? What if we place the dividing line elsewhere?<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdOeC-vi56Qr102mAJ6JKY1N-X5r-o3WVtb1AAp6uzeU9SCTsjFKeDGq22xapFoR1hWNpasBjbqvf2ISXpVvoxrFBe4e0giZqSRg7_pon9XicLwDM4h4iLhQrFupn7aY03TVykccObzxFJ/s1600/h9.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdOeC-vi56Qr102mAJ6JKY1N-X5r-o3WVtb1AAp6uzeU9SCTsjFKeDGq22xapFoR1hWNpasBjbqvf2ISXpVvoxrFBe4e0giZqSRg7_pon9XicLwDM4h4iLhQrFupn7aY03TVykccObzxFJ/s320/h9.png" /></a><br />
<br />
Now we would align L1 (bcd) with R1 (a) with cost 3 and L2 (ce) with R2 (bcde) with cost 2. Overall alignment cost (edit distance) would be 3 + 2 = 5. And so on. The last possible position is:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0NSXZio4LpQXvghnHW1aqn4TOnQ07D9zxFuIcEu4h5ysng54nfDyGcrxHGwrejnv6pwQgxBCn2dbTmtjyaIYdkqhbgd-pKsqvtz36M-jsHZ3gSk9VfUGTPX_kI7FR4oDas2dPpTh8I4N2/s1600/h10.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0NSXZio4LpQXvghnHW1aqn4TOnQ07D9zxFuIcEu4h5ysng54nfDyGcrxHGwrejnv6pwQgxBCn2dbTmtjyaIYdkqhbgd-pKsqvtz36M-jsHZ3gSk9VfUGTPX_kI7FR4oDas2dPpTh8I4N2/s320/h10.png" /></a><br />
<br />
Which is aligning L1 (bcd) with R1 (abcde) with cost 2 and L2 (ce) with R2 (empty) with cost 2. Overall cost (distance) is 2+2 = 4. Obviously the smallest overall edit distance is:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfaknnT-WHaFafvnhc8Phz2Lr7X_CFnTVS1rEA7XKiaiaxGtPs2vdcKkIm0Cn81HHwjRgecwy9m4boABWkKJna-9RFZeJXR789tobEhsutk8SJt4gZhx5Nuq6RddeRse8sIXvBsmN4dyRt/s1600/h11.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfaknnT-WHaFafvnhc8Phz2Lr7X_CFnTVS1rEA7XKiaiaxGtPs2vdcKkIm0Cn81HHwjRgecwy9m4boABWkKJna-9RFZeJXR789tobEhsutk8SJt4gZhx5Nuq6RddeRse8sIXvBsmN4dyRt/s320/h11.png" /></a><br />
<br />
So this is the division we were looking for! Now we have 2 independent subproblems: align L1 = bcd with R1 = abcd and L2 = ce with R2 = e. Let's look at the array, it's divided into 4 parts and now we are only interested in top-left and bottom-right part. We will never ever need to look at top-right and bottom-left part. So the overall problem has been reduced by half:<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVQ5Tz-RVNtwlEF7FDrdBCOWKNX8anUksU8VphJsnEcU73SBJBpmbwIj4zfJ1iWXVnozhmtJcvJopB8rYPIFEOIZRXjHfOhjlP_IU5Kd7hyphenhyphenOhg3U0JpWvHxCh54O1sI-Hctv7Vd93v-Aem/s1600/h14.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVQ5Tz-RVNtwlEF7FDrdBCOWKNX8anUksU8VphJsnEcU73SBJBpmbwIj4zfJ1iWXVnozhmtJcvJopB8rYPIFEOIZRXjHfOhjlP_IU5Kd7hyphenhyphenOhg3U0JpWvHxCh54O1sI-Hctv7Vd93v-Aem/s320/h14.png" /></a><br />
<br />
That's it. Rest is just basic but a bit tedious math on subarray boundaries and debugging off-by-one errors.<br />
<h4>Complexity</h4>space: We need to store L and R. Each NW' run uses only 2 rows of size n. Each time we divide L by 2 so the recursion depth is O(log n), Therefore total used space is O(n+m)<br />
<br />
time: At each step we run NW' on the whole L x R array. Later 2 subproblems are created of the total size of the half of the original array. So number of operations is: (1 + 1/2 + 1/4 + ...) * (n*m) = O(n*m)Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-91503456627224638042015-03-05T18:00:00.000+01:002015-03-05T21:37:50.859+01:00Java 8: Streaming a stringIt's time to finally start learning java 8 api. Recently I needed to modify a string by replacing some characters to randomly generated values. One method for '#' -> random number, one method for '?' -> random lower case letter and one for both replacements. Each char should be processed independently, therefore simply <code>replaceAll</code> was not an option. However mapping a stream sounds easy. Unfortunately java doesn't provide CharStream and CharJoiner so it's a bit uglier and less efficient than it should be.<br />
<pre class="brush:java">import static org.apache.commons.lang3.RandomStringUtils.randomAlphabetic;
import static org.apache.commons.lang3.RandomStringUtils.randomNumeric;
import java.util.function.Supplier;
import java.util.stream.Collectors;
import com.google.common.base.Preconditions;
public class Replacer {
private String replace(String toReplace, char pattern, Supplier<String> replacementSupplier) {
Preconditions.checkNotNull(toReplace);
return toReplace
.chars()
.mapToObj(c -> c != pattern ? String.valueOf((char)c) : replacementSupplier.get())
.collect(Collectors.joining());
}
public String withLetters(String letterString) {
return replace(letterString, '?', () -> randomAlphabetic(1).toLowerCase());
}
public String withNumbers(String numberString) {
return replace(numberString, '#', () -> randomNumeric(1));
}
public String withBoth(String string) {
return withNumbers(withLetters(string));
}
}</pre>I know this can be done more efficiently using direct operations on ints. But can this be done more efficiently without sacrificing readability? Any suggestions are welcome.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-91322959872873086742015-01-19T02:45:00.000+01:002015-01-19T02:49:09.715+01:00Make git never asks for your github username againA lot of answers in SO advise you to change your repo url after you cloned the project. Don't do it. Don't change your repo url per project. That's a waste of time. Soon you will clone another project and you will have to do it again. Better way is to set the username globally so it applies to all projects from github. Current and future ones. Just type:<br />
<pre class="brush:bash; auto-links:false">git config --global credential."https://github.com".username YOUR_USERNAME
</pre>This will add<br />
<pre class="brush:text; auto-links:false">[credential "https://github.com"]
username = YOUR_USERNAME
</pre>to your ~/.gitconfig<br />
<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-21556181168380657662014-11-20T02:31:00.001+01:002014-11-20T03:11:32.257+01:00Nesting angular directivesUsing angular you quickly find directives really handy. Often you can google a directive that does exactly what you need... almost. For example there is plenty of ready-to-use multichoice dropdowns. But let's say we need a dropdown that is automatically disabled when there is no possible choices. One way is to get the source and change it - with angular it's usually easy. But then we are stuck with this version and won't get any updates nor bugfixes. Better solution is to wrap an existing directive in our own.<br />
<br />
Let's say we have an existing <code>dropdown-choices</code> directive that uses one parameter (<code>dropdown-choices</code>) for providing all possible choices and another one (<code>dropdown-selected</code>) for currently selected value. Let's use it to get what we need. <br />
<br />
Tests first. Make sure your karma.conf.js contains a reference to <code>dropdown-choices</code>, for example:<br />
<pre class='brush:js'>module.exports = function(config) {
config.set({
...
files: [
...
'app/vendor/*.js',
...
</pre>Example tests<br />
<pre class='brush:js'>'use strict';
describe('autoOffDropdown', function() {
var autoOffDropdownDiv, scope;
beforeEach(module('myModule'));
beforeEach(inject(function($rootScope, $compile) {
scope = $rootScope.$new();
multicomboDiv = $compile('<div id="someId" auto-off-dropdown \
choices="choices" selected="selected"></div>')(scope);
}));
it('should be disabled when there is nothing to select', function() {
scope.choices = [];
expectSingleVisibleChildToBe("input:disabled");
});
it('should show div with dropdown when anything can be selected', function() {
scope.choices = [1];
expectSingleVisibleChildToBe("div");
});
function expectSingleVisibleChildToBe(jquerySelector) {
scope.$apply();
// may differ depending on angular version
var visibleChildren = multicomboDiv.children().filter(
function() {return $(this).css('display') !== 'none';});
expect(visibleChildren.length).toBe(1);
expect(visibleChildren.is(jquerySelector)).toBe(true);
}
});
</pre>And now our directive<br />
<pre class='brush:js'>'use strict';
angular.module("myModule").directive('autoOffDropdown',function(){
return {
restrict: 'A',
replace: true,
scope: {
choices: '=',
selected: '='
},
template:
'<div>\
<div data-ng-hide="noChoicesAvailable()">\
<div dropdown-choices="choices" dropdown-selected="selected"></div>\
</div>\
<input data-ng-show="noChoicesAvailable()" type="text" disabled="disabled" \
placeholder="No choices available"/> \
</div>',
link: function(scope) {
scope.noChoicesAvailable = function() {
return _.isEmpty(scope.choices);
};
}
};
});
</pre><br />
Tested with angular 1.0.8Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-74976629163192053092014-10-01T22:56:00.000+02:002014-10-01T23:32:34.288+02:00Mocking $location in angular testsLet's say we want to mock a single service, i.e. <code>$location</code>. In case of internal angular's services we can use dedicated mocks, like the one described <a href="https://groups.google.com/forum/#!msg/angular/F0jFWC4G9hI/FNz3oQu0RhYJ">here</a>, but often something much simpler is enough. Let's say we have a service:<br />
<pre class="brush:js">angular.module('myModule').factory('myService', function($location) {
return {
urlSize: $location.absUrl().length
}
});
</pre>and now we want to test it. Obviously all we need is just an object with a single function. In such case, for this test we can simply provide a new definition of the required service:<br />
<pre class="brush:js">var url;
module('myModule');
module(function($provide) {
$provide.factory('$location', function() {
return {
absUrl: function () {return url}
}
})
});
</pre>and now our new <code>$location</code> is registered in angular's DI container and will be provided to all dependant services. Furthermore this test is a clear documentation that <code>myService</code> uses only this one single function of <code>$location</code>. There are no spies that obfuscate behaviour of tested code. How can we use this mock? It's a closure so in every test we can change the global variable <code>url</code> and the mocked <code>$location.absUrl()</code> will return this value. Now we can simply inject the service we want to test:<br />
<pre class="brush:js">inject(function(myService) {
expect(myService.urlSize).toEqual(...);
});
</pre>Tested with angular 1.0.8 and 1.2.16<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-75942824141627492672014-09-03T01:22:00.001+02:002015-05-07T15:42:49.888+02:009 lines that made me learn monadsSome time ago I saw <a href="https://gist.github.com/ckirkendall/2934374">this</a> programing languages comparison based on a small task. The task is to implement evaluation system for simple expressions: addition and multiplication of numbers and variables. Most solutions (regardless of the language) look like the first one:<br />
<pre class="brush: clojure">(use '[clojure.core.match :only [match]])
(defn evaluate [env [sym x y]]
(match [sym]
['Number] x
['Add] (+ (evaluate env x) (evaluate env y))
['Multiply] (* (evaluate env x) (evaluate env y))
['Variable] (env x)))
(def environment {"a" 3, "b" 4, "c" 5})
(def expression-tree '(Add (Variable "a") (Multiply (Number 2) (Variable "b"))))
(def result (evaluate environment expression-tree))
</pre>It boils down to defining 4 types of expressions, each of which receives subexpressions and the environment and later passes the environment to the subexpressions. Easy, right? There are also a few solutions that avoid passing the environment. Instead they have 'evaluate' function that preprocesses the expression tree and later evaluates it without the environment at all. For example they replace variables with their values upfront.<br />
<br />
And then I saw <a href="https://gist.github.com/ckirkendall/2934374/#comment-535694">this</a>:<br />
<pre class="brush:haskell">import Data.Map
import Control.Monad
number = return
add = liftM2 (+)
multiply = liftM2 (*)
variable = findWithDefault 0
environment = fromList [("a",3), ("b",4), ("c",7)]
expressionTree = add (variable "a") (multiply (number 2) (variable "b"))
result = expressionTree environment
</pre>And I was wondering how does it work. Where the hell is environment passing or 'evaluate' function?! Can you write such code? If not, it's time to learn monads.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5922395749618357877.post-68879697211012745312014-05-17T16:12:00.000+02:002014-05-17T18:18:45.449+02:00Spring boot: @DependsOn is not enough anymoreSpring boot can really help speed up starting new project. Ten lines of java configuration (mostly generated by let's say data-jpa-mvn archetype) and you are ready to write your business logic. All the configuration changes can be postponed until you really need them. But when you start doing those changes, very quickly you may see that something is reeeeeally missing.<br />
<br />
Let's say we want to use beans A and C created by spring boot. But also we want to create our own bean B that should be initialized in between. The real word scenario is for example: default DataSource, default EntityManagerFactory and customized Flyway service which must run before hibernate. We can easily say flyway should depend on the datasource - @Autowired does the trick. But how can we say that hibernate should depend on our flyway? We can't add anything to the hibernate bean because we don't declare it. So what's the solution? First let's check how and where spring declares the hibernate. After listing *AutoConfiguration classes we see HibernateJpaAutoConfiguration and later, in its superclass, we can find:<br />
<pre class="brush:java">@Bean
@ConditionalOnMissingBean(name = "entityManagerFactory")
public LocalContainerEntityManagerFactoryBean entityManagerFactory(
JpaVendorAdapter jpaVendorAdapter) {...}
</pre>So one way of enforcing the order is:<br />
<pre class="brush:java">@Configuration
class FlywayConfig extends HibernateJpaAutoConfiguration {
@Autowired Flyway flyway;
}
</pre>And if we need more precise control, we can override the bean:<br />
<pre class="brush:java">@Configuration
class FlywayConfig extends HibernateJpaAutoConfiguration {
@Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory(
JpaVendorAdapter jpaVendorAdapter,
Flyway flyway) {
return super.entityManagerFactory(jpaVendorAdapter);
}
}
</pre>Used versions: spring 4.0.3.RELEASE, spring-boot 1.0.2.RELEASEUnknownnoreply@blogger.com2tag:blogger.com,1999:blog-5922395749618357877.post-42742815244284340732014-04-03T23:08:00.000+02:002014-04-06T18:18:30.054+02:00Encrypted filesystem on top of an LVMI always forget how to create encrypted partition inside an lvm, so that's a good reason to write it. I'm not an admin so there might be better ways to do the same.<br />
<br />
Objective: To have an encrypted device, unmounted by default, ready for one-click mount that asks for the password.<br />
<br />
My tools:<br />
<ul><li>lvm2</li>
<li>system-config-lvm - gui for LVM management</li>
<li>palimpset - gui that allows to mount, unmount encrypted LUKS devices inside lvm volumes, renaming VG (volume group), LV (logical volume) and filesystem inside encrypted LUKS volume</li>
<li>ubuntu 12.04 LTS</li>
<li>mate (it offers one-click mount but probably any other desktop environment will do)</li>
</ul>During the whole process tab button will be your friend. It will help with luks subcommands, with finding luks devices etc. So let's start:<br />
<h4>Creating encrypted FS</h4><ol><li>Create partitions that don't waste space but can store your extents and lvm metadata. Something around (n*extent_size)+1mb. Create new partition in Gparted, format as LVM and compare the total size with the available size.</li>
<li>Create desired VGs (my_VG) and LVs (my_LV) using <code>system-config-lvm</code></li>
<li>Create luks volume using whole available space on created LV. You will be asked for a passphrase<br />
<pre class="brush:bash;gutter:false">sudo cryptsetup luksFormat /dev/mapper/my_VG-my_LV
</pre></li>
<li>Open created luks volume and register it under custom name (my_luks)<br />
<pre class="brush:bash;gutter:false">sudo cryptsetup luksOpen /dev/mapper/my_VG-my_LV my_luks</pre></li>
<li>Create filesystem using whole available space on opened luks volume<br />
<pre class="brush:bash;gutter:false">mkfs.ext4 /dev/mapper/my_luks
</pre></li>
<li>(Un)mount volumes and change all the labels using <code>sudo palimpset</code></li>
</ol>That's it. Your system (places or drivemount_applet2) will report existing encrypted LV waiting to be mounted. By default file system label will be used as a mount point for on-click mount.<br />
<br />
<h4>Shrinking encrypted FS</h4>There is often some confusion which tools use kilo- prefix for 2^10 and which for 10^3. It looks like all commands used below use the 2^10.<br />
<ol><li>Prepare for the shrinking. Open luks, check the FS size and the FS itself. <br />
<pre class="brush:bash;gutter:false">sudo cryptsetup luksOpen /dev/mapper/my_VG-my_LV my_luks
sudo mount -r /dev/mapper/my_luks /any_custom_dir/
df -h
df -B1k # show size in kilobytes (2^10 bytes)
sudo umount /any_custom_dir/
e2fsck -f /dev/mapper/my_luks
</pre></li>
<li>Shrink the FS a bit more than you really need to (about 90%). Provide new size or shrink it maximally with <code>-M</code>. Option <code>-p</code> adds a progress bar. In my case (ext4) system didn't allow me to shrink below the possible minimum size so it looks like you don't have to calculate the new size very precisely. Units: 2^10<br />
<pre class="brush:bash;gutter:false">resize2fs -Mp /dev/mapper/my_luks</pre>or<br />
<pre class="brush:bash;gutter:false">resize2fs -p /dev/mapper/my_luks 50G</pre>Check if all is good<br />
<pre class="brush:bash;gutter:false">e2fsck /dev/mapper/my_luks</pre></li>
<li>Shrink the LV. there is no need to shrink the luks volume as it doesn't have the concept of 'fixed size'. During every mount, luks uses all the space of the underlying device. But shrinking the LV is the tricky part. I couldn't find a way to precisely calculate the space. It looks like <code>df</code> shows FS size including its metadata but I don't know which size is used by <code>resize2fs</code>. Anyway, remember about the FS metadata (<a href="http://rwmj.wordpress.com/2009/11/08/filesystem-metadata-overhead/">a few percent</a>) and luks metadata (<a href="http://code.google.com/p/cryptsetup/wiki/FrequentlyAskedQuestions#2._Setup">around 2MB</a>). So add about 10% (depending on your FS type) to the FS size and shrink LV to that size. <br />
<br />
There are many ways to expres the new size (absolute value, difference, percentage etc. use man). The man never says if it uses 2^10 or 10^3 units but it 'seems' like the first one.<br />
<br />
LVM will <b>not</b> warn or prevent you from overwriting your data so bad assumption, calculation or a typo will result in a data loss.<br />
<pre class="brush:bash;gutter:false">sudo cryptsetup luksClose my_luks
sudo lvreduce -L 900G /dev/my_VG/my_LV
</pre>Check how big damages were made<br />
<pre class="brush:bash;gutter:false">sudo cryptsetup luksOpen /dev/mapper/my_VG-my_LV my_luks
e2fsck -f /dev/mapper/my_luks
</pre><li>If <code>e2fsck</code> said that you've just shrunk your LV too much, don't panic. Just close luks, extend LV a bit <code>lvextend -L</code>, open luks and check your FS again.<br />
df said my FS was 811G but shrinking LV to 820G (<code>lvreduce -L 820G</code>) was too much. However extending it to 830G was enough.<br />
</li><br />
</li>
<li>Grow the FS to fill whole underlying LV, otherwise it would be a waste of space<br />
<pre class="brush:bash;gutter:false">resize2fs -p /dev/mapper/my_luks
e2fsck /dev/mapper/my_luks
</pre></ol>That's all. Some other useful commands to check the status of devices: http://ubuntuforums.org/showthread.php?t=726724Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-18454731027488341742014-03-22T02:01:00.001+01:002014-03-22T02:01:52.143+01:00Refactor, don't reinvent the wheelRecently I saw a training regarding clean code and refactoring. One of shown examples of bad code was something like this:<br />
<pre class="brush: java">public void setName(String name) {
this.name = name;
if (this.name != null ) {
if (this.name.length() > 30) {
this.name = name.substring(0,30);
this.name = this.name.toUpperCase();
} else {
this.name = this.name.toUpperCase();
}
}
}
</pre>And after a few slides the code was refactored to the final version:<br />
<pre class="brush: java">public void setName(String name) {
if (!isValid(name)) {
this.name = null;
}
this.name = limit(name, to(30)).toUpperCase();
}
private boolean isValid(String name) {
return name != null;
}
private String limit(String input, int limit) {
return input.substring(0, limit);
}
private int to(int x) {
return x;
}
</pre>And I will argue that's a very wrong approach or a really bad example. Why? Because every language has its idioms and most commonly used tools that became de-facto standards and are well known and understood. In java world it's guava, apache commons, lambdaj etc. Using those libraries, you can be much more functional, null-safe and concise. You can use well known existing functions instead of creating new ones and learn them again in each project. In my opinion, much more readable would be:<br />
<pre class="brush: java">public void setName(@Nullable String name) {
this.name = StringUtils.upperCase( StringUtils.left(name, 30));
}
</pre>or in case we'll need it more than once:<br />
<pre class="brush: java">public static upperCasePrefix(@Nullable String input, int limit) {
return = StringUtils.upperCase( StringUtils.left(input, limit));
}
public void setName(@Nullable String name) {
this.name = upperCasePrefix(name, 30);
}
</pre>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5922395749618357877.post-29043932802000796922014-03-21T04:12:00.000+01:002014-03-21T04:39:02.561+01:00Use unicode for better namesLet's say in a java application we have a few tabs and sometimes we hide some of them. So now we want to document a new requirement and of course we do it as a test:<br />
<pre class="brush:java">@Test
public void should_hide_more_tab_when_no_additional_information_is_available() {
...
}
</pre>but wait... what does it exactly mean? Should our application hide more tabs then it usually does? Or is there a tab named 'more' that should be hidden? How can we clarify this? After a quick look at the unicode char table, we pick the <code>ʻ</code> char (or any other that makes you happy). It's a '02BB turned comma' and more information can be found for example <a href="http://www.fileformat.info/info/unicode/char/2bb/index.htm">here</a>. There is a table with detailed information about that character and the interesting part is:<br />
<pre class="brush:plain">Character.isJavaIdentifierPart() Yes
</pre>Cool! So let's write:<br />
<pre class="brush:java">@Test
public void should_hide_ʻmoreʻ_tab_when_no_additional_information_is_available() {
...
}
</pre>Is this test more readable now?<br />
<br />
ps.<br />
For example in racket (a dialect of lisp) you can define lambdas using λ:<br />
<pre class="brush:plain">(λ(x) (+ x 1))
</pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-48827401951889211922014-03-20T02:31:00.000+01:002014-03-20T02:48:58.027+01:00The myth of random data in unit testsMany times I see people generate random data for any irrelevant variable in tests:<br />
<pre class="brush: java">String anyName = RandomStringUtils.random(8);
Customer customer = customerBuilder()
.withName(anyName)
.withAge(18)
.build();
assertThat(customer).isAdult();
</pre>First of all, it would probably be better if this test looked somehow like this:<br />
<pre class="brush: java">Customer customer = newCustomerWithAge(18)
assertThat(customer).isAdult();
</pre>I know, I know: sometimes tests are a bit more complex and badly written and you just need the name as a constant. So why not simply:<br />
<pre class="brush: java">private static final String ANY_NAME = "John";
...
customer = customerBuilder()
.withName(ANY_NAME)
.withAge(18);
.build();
assertThat(customer).isAdult();
</pre>Does the random generator make you feel safer? If the name is irrelevant, why bother generating it? It just makes your code less readable.<br />
<br />
But some people go even further. Let's say we want to test <a href="http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#contains%28java.lang.CharSequence,%20java.lang.CharSequence%29">StringUtils.contains</a> from the Apache Commons. Some people want to generate the significant parameters:<br />
<pre class="brush: java">String random1 = randomString();
String random2 = randomString();
...
assertTrue(StringUtils.contains(random1 + random2 + random3, random2));
</pre>Easy, right? But how will we test if it returns <code>false</code> correctly? Now our random data needs to obey some specific constraints. So it's rather hard to generate the data without, in fact, implementing the functionality again in tests. Another problem is that when you have such tests you think everything is tested and you stop thinking about corner cases.<br />
<br />
But is everything really tested? What about nulls? what about empty strings? What about combinations of them? And even if your generator can produce nulls and empty strings, still: is everything tested?<br />
<br />
How often will your random test run before the tested code goes on production? If you do continuous delivery then the test will run a few times during your local development, once on your CI server and... that's it. If you're not so lucky to do continuous delivery then let's assume your commit goes on production in 3 weeks. Probably soon there will be feature freeze and branch stabilization. How many times will this test run? 50 times on CI server? Random tests are totally useless when running only a few times. Of course you may expect those tests will run very many times during local development of the rest of your team but...<br />
<br />
If it fails on someone's else machine, are you sure he will record the test result? Wait! There will be no result! There will be only information that <code>true</code> was expected but <code>false</code> was returned. So you have to remember about adding logs to all your random tests. And even if logs are being dumped, are you sure that other developer (who has to deliver his own, completely different functionality) take care about irrelevant, non-deterministic test failure? Because other option is simply re-run tests, see the green light, commit and go home. No one will ever know.<br />
<br />
Let's face it, it can't work this way. If you are not sure if your test data is good enough then:<br />
<ul><li>Simplify your code. Extract methods/classes, avoid ifs, avoid nulls, be more immutable and functional.</li>
<li>Try to analyze the edge cases and include them in your tests.</li>
<li>If needed, throw away the part of code and start again doing TDD. If you've never tried it, you will be surprised how different the design can be.</li>
</ul>Seriously, those two rules will almost always be enough. That's because the sad truth is that the vast majority of all the development is a typical corpo maintenance. It's not a rocket science and all the complexity is usually <a href="http://en.wikipedia.org/wiki/Accidental_complexity">incidental</a>. But the refactoring can be expensive. And if above rules are not enough:<br />
<ul><li>Generate a lot of random data sets, look at them and check if some of them differs from what you had in mind when designing your code. And, of course, add new cases to your tests.</li>
<li>Use mutation testing.</li>
<li>Whenever a bug is discovered during the development, uat or production, add new cases to your tests to avoid regression.</li>
<li>Do <i>real</i> random testing. Keep the testing server running 24/7. Every generated data that breaks the tests should be logged and added to your deterministic unit tests.</li>
</ul>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-9150198936121995922013-11-08T01:27:00.000+01:002013-11-08T01:35:11.289+01:00Date type in H2 vs OracleI often use H2 to locally develop code that is supposed to work on Oracle. Yes, yes, I can hear all the 'Why don't you work on production-like environment?!'. Well, I do it because I'm lazy. When the development configuration has no dependency on external infrastructure then every new developer or tester can simply do: <code>git clone ... && mvn test && mvn jetty:run</code> and it just works. Immediately. Without bothering me with questions about the setup, configuration, passwords etc. CI server can build each new branch simultaneously. Local tests run quickly as there is no network communication. It's simply really convenient.<br />
<br />
Of course, it has its price. Compatibility. Most applications are really simple and doesn't need any vendor-specific features. But sometimes...<br />
<pre class="brush: sql">create table my_table (my_column date);
insert into my_table values (timestamp '2013-01-23 13:23:34');
select my_column from my_table;
</pre>So what's the result of the select statement?<br />
<pre class="brush: text; gutter: false;">oracle: January, 23 2013 13:23:34+0000
H2: 2013-01-23
</pre>What happened with time? Let's see more details<br />
<pre class="brush: sql">select cast(my_column as timestamp) from my_table;
</pre><pre class="brush: text; gutter: false;">oracle: January, 23 2013 13:23:34+0000
H2: 2013-01-23 00:00:00.0
</pre>Yep, the time is <i>silently</i> truncated. Oracle <i>Date</i> stores time as opposed to H2, mysql, postgres and probably most others. And this will affect all frameworks you use: jdbc, dbunit, hibernate etc. So, if possible, use <i>Timestamp</i> type or design your application in a way it doesn't matter.<br />
<br />
Tested databases (thanks to <a href="http://sqlfiddle.com/">sqlfiddle.com</a>):<br />
<ul><li>Oracle 11g R2</li>
<li>H2 1.3.171</li>
<li>MySql 5.1.61</li>
<li>PostgreSQL 9.3.1</li>
</ul><br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-17661843204960076062013-07-27T01:53:00.000+02:002013-07-27T12:19:37.568+02:00404 error with spring mvc testingWe have a standard rest application built with spring mvc (v3.2.3). Let's add another controller (MyController) with one method that does some computations and returns an empty response with 200 OK. Test first:<br />
<pre class='brush:java'>import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.get;
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.*;
import org.junit.*;
import org.springframework.test.web.servlet.MockMvc;
import org.springframework.test.web.servlet.setup.MockMvcBuilders;
public class MyControllerTest {
private MockMvc mockMvc;
@Test
public void test() throws Exception {
mockMvc.perform(get("/my/test"))
.andExpect(status().isOk())
.andExpect(content().string(""));
}
@Before
public void setUp() throws Exception {
mockMvc = MockMvcBuilders.standaloneSetup(new MyController()).build();
}
}
</pre>Nothing new, everything just as in spring's tutorial. So now let's write the code:<br />
<pre class='brush:java'>import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
@Controller
public class MyController {
@RequestMapping("/my/test")
public void test() {
System.out.println("test method executed"); //yes, yes, I know
}
}
</pre>Test is green, that was easy. Let's see our code in action. Start jetty and check the address <code>/my/test</code>. Method is executed and we get... 404 not found. Wtf?!<br />
<br />
Probably the quickest way to find the problem is to compare the method with other working controllers. Of course, I forgot about <code>@ResponseBody</code>. After adding it, jetty displays the page correctly and test still passes. But the purpose of writing tests is to have protection against such mistakes. So why the test was green?<br />
<br />
For <code>void</code> methods without <code>@ResponseBody</code> spring forwards request processing to <code>DispatcherServlet</code> which, in this case, fails trying to resolve a view for the specified url. But for some reason <code>mockMvc</code> reports empty response and status 200. I reported it as a bug but it got status 'Works as Designed'. So how can we eliminate the false positive? We need to explicitly check if no forwarding is done. And there is an existing <code>ResultMatcher</code> for this:<br />
<pre class='brush:java'>forwardedUrl(null)
</pre>It can be added with another <code>andExpect</code> inside each test. But turning it on globally will save you from such mistakes in future:<br />
<pre class='brush:java;first-line:20'>@Before
public void setUp() throws Exception {
mockMvc = MockMvcBuilders.standaloneSetup(new MyController())
.alwaysExpect(forwardedUrl(null))
.build();
}
</pre>After adding this matcher, the test without <code>@ResponseBody</code> fails:<br />
<pre class='brush:text'>java.lang.AssertionError: Forwarded URL expected:<null> but was:<my/test>
</pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-40985244201613582782013-07-08T00:04:00.001+02:002013-07-10T19:56:23.078+02:00Testing time-dependent codeOften people do something like that<br />
<pre class="brush:java">public List<Notification> findNotificationsToSend() {
List<Notification> notifications = repository.findDailyNotifications();
int today = Calendar.getInstance().get(DAY_OF_WEEK);
if (today == SUNDAY)
notifications.addAll(repository.findAllWeeklyNotifications());
return notifications;
}
</pre>and then you can be sure they don't use TDD. Because there is absolutely no way you can create such a bad code when you have tests. But what can we do about the calendar? Same as always. Have you noticed that when there is a call to a database or other external system, people immediately say: 'extract and mock'? But when there is a call to jvm's infrastructure they have no idea what to do. And the answer is simple: 'extract and mock'. Does it mean people just repeat previously seen schemes without thinking?<br />
We can start refactoring with:<br />
<pre class="brush:java">public class TimeProvider {
public int dayOfWeek() {
return Calendar.getInstance().get(DAY_OF_WEEK);
}
}
</pre>That's a good start. Now it's easy to test the <code>findNotificationsToSend</code> but <code>TimeProvider</code> can still contain some complicated time calculations which are not testable. And it will grow with calendar-dependent methods. How to clean it up?<br />
<ul><li>Switch to joda time. It has much better api that protects TimeProvider from uncontrolled growing.</li>
<li>TimeProvider should contain only often used, calendar-like, parameterless methods dependent on current time. And nothing else. 'isSunday' and 'beginningOfQuarter' are fine but 'shouldIncludeWeeklyNotification' is not.</li>
<li>Completely separate jvm's infrastructure access from time calculations. In this case I usually choose inheritance over composition because TimeProvider won't ever grow in any additional dependencies. After all, even business guys don't change the definition of Sunday.</li>
</ul>The following code usually works for me.<br />
<pre class="brush:java">abstract class TimeProvider {
protected abstract long currentMillis();
public final DateTime now() {
return new DateTime(currentMillis());
}
public final boolean isSunday() {...} //if often used
// other common business methods. all final.
}
public final class RealTimeProvider extends TimeProvider {
protected long currentMillis() {
return System.currentTimeMillis();
}
}
public class TestTimeProvider extends TimeProvider {
private long currentMillis;
public TestTimeProvider() {
this("2013-05-17"); // preset time; handy for tests
}
public TestTimeProvider(String currentTime) {
setTime(currentTime);
}
public void setTime(String currentTime) {
currentMillis = parseTime(currentTime);
}
protected long currentMillis() {
return currentMillis;
}
private static long parseTime(String time) {...}
}
</pre>Of course, we use <code>TestTimeProvider</code> in unit and spring-context tests. Often it's more handy than mocks. If needed, add similar support for timezone. <br />
Now we can test <code>findNotificationsToSend</code>, control time during integration tests and test <code>isSunday</code> method:<br />
<pre class="brush:java">@RunWith(ZohhakRunner.class)
public class TimeProviderTest {
TestTimeProvider timeProvider = new TestTimeProvider();
@TestWith({
"2013-04-14, true",
"2013-04-15, false"
})
public void should_detect_sunday(String date, boolean shouldBeSunday) {
timeProvider.setTime(date);
boolean isSunday = timeProvider.isSunday();
assertThat(isSunday).isEqualTo(shouldBeSunday);
}
}
</pre>One place where time provider alone is not enough is integration testing, when we start the whole server to imitate production environment and connect to it over http. But that's a story for another post.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-28864827460713807542013-06-23T00:11:00.000+02:002013-06-23T00:34:22.812+02:00Testing cron expressionMany people, using spring scheduling, write<br />
<pre class="brush:java">@Scheduled("* * 3 * * ?")
public void myCronJob() {...
</pre>and then they wait a few days to check logs if the job is triggered correctly. And what about: <br />
<pre class="brush:plain;gutter:false">0 0/5 14,18,3-39,52 ? JAN,MAR,SEP MON-FRI 2002-2010</pre>Fortunately, testing a cron expression is simple. But first we need a constant:<br />
<pre class="brush:java">public static final String EVERYDAY_3_AM = "* * 3 * * ?"</pre>And now, with spring's scheduling, we can use <code>org.springframework.scheduling.support.CronSequenceGenerator</code><br />
<pre class="brush:java; highlight: 27">import static org.fest.assertions.api.Assertions.assertThat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.junit.BeforeClass;
import org.junit.runner.RunWith;
import org.springframework.scheduling.support.CronSequenceGenerator;
import com.googlecode.zohhak.api.Coercion;
import com.googlecode.zohhak.api.TestWith;
import com.googlecode.zohhak.api.runners.ZohhakRunner;
@RunWith(ZohhakRunner.class)
public class CronTest {
static CronSequenceGenerator everyday_3am;
@TestWith({
"2013-06-10 22:20, 2013-06-11 03:00",
"2013-06-13 01:12, 2013-06-13 03:00"
})
public void should_trigger_at_the_nearest_3_AM(Date now, Date nearest_3am) {
// when
Date nextExecution = everyday_3am.next(now);
//then
assertThat(nextExecution).isEqualTo(nearest_3am);
}
@BeforeClass
static public void parseExpression() {
everyday_3am = new CronSequenceGenerator(Constants.EVERYDAY_3_AM);
}
@Coercion
public Date coerce(String date) throws ParseException {
return new SimpleDateFormat("yyyy-MM-dd hh:mm").parse(date);
}
}
</pre>We use static variable just to avoid multiple parsing of the same expression, as for complex scenarios there might be many parameters.<br />
The same can be achieved with quartz library. To do this just replace <code>CronSequenceGenerator</code> with <code>org.quartz.CronExpression</code><br />
<pre class="brush:java; first-line: 27">Date nextExecution = everyday_3am.getNextValidTimeAfter(now);
</pre><pre class="brush:java; first-line: 35">everyday_3am = new CronExpression(Constants.EVERYDAY_3_AM);
</pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-52432198800366029462013-06-09T11:37:00.000+02:002013-11-02T15:06:09.199+01:00CodeGenerationException and proxiesRecently I saw one of tests failed with the message:<br />
<pre class="brush:text; gutter:false">Exception of type java.lang.IllegalStateException expected but was not thrown.
Instead an exception of type class org.mockito.cglib.core.CodeGenerationException
with message 'java.lang.reflect.InvocationTargetException-->null' was thrown.
</pre>But let's start from the beginning. We have a spring web application that contains only 2 classes. First one is a standard session scoped component:<br />
<pre class="brush:java">@Component
@Scope(value = "session", proxyMode = ScopedProxyMode.TARGET_CLASS)
public class MySessionComponent {
public void doError() {
throw new IllegalStateException();
}
public void doNothing() {}
}
</pre>Nothing new, right? Second component is a standard singleton:<br />
<pre class="brush:java">@Component
public class MySingleton {
@Autowired MySessionComponent mySessionComponent;
public void sampleAction() {
mySessionComponent.doNothing();
}
}
</pre>That's the whole application. Now let's test it.<br />
<pre class="brush:java">import org.junit.Test;
import static com.googlecode.catchexception.apis.CatchExceptionBdd.*;
public class MySessionComponentTest {
@Test
public void test() {
when(new MySessionComponent()).doError();
thenThrown(IllegalStateException.class);
}
}
</pre>I use <a href="http://code.google.com/p/catch-exception/">catch exception</a> v1.0.4 and test passes. Now, let's do an integration test:<br />
<pre class="brush:java">@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration
@WebAppConfiguration
public class MySingletonTest {
@Configuration @ComponentScan static class TestAppContext {} // just registers 2 components
@Autowired MySingleton myController;
@Test
public void test() {
myController.sampleAction();
}
}
</pre>All tests pass. And now, let's suppose that we need, for whatever reason, to combine those tests:<br />
<pre class="brush:java; first-line:10">@Test
public void test() {
myController.sampleAction();
when(new MySessionComponent()).doError();
thenThrown(IllegalStateException.class);
}
</pre>And we get the exception. WTF? After some time spent with debugger I found <code>InvocationTargetException</code> thrown inside <code>org.mockito.cglib.core.AbstractClassGenerator</code>. The problem is there is no cause nor detailed message and therefore it's not propagated anywhere so you can't find the real reason in any logs. However this exception has <code>target</code> field and there we can find:<br />
<pre class="brush:text; gutter:false">java.lang.LinkageError: loader (instance of sun/misc/Launcher$AppClassLoader):
attempted duplicate class definition for name: "MySessionComponent$$FastClassByCGLIB$$441a78f3"
</pre>At first spring creates proxy for <code>MySessionComponent</code> in order to autowire beans with different scopes. Then catch-exception tries to create proxy for the same class. It seems that both frameworks generates the same name for the class and two classes with same name are not allowed within one classloader.<br />
<br />
When we change the order of method invocations (the order of creating proxies)<br />
<pre class="brush:java; first-line:10">@Test
public void deleteEbook() {
when(new MySessionComponent()).doError();
thenThrown(IllegalStateException.class);
myController.sampleAction();
}
</pre>spring throws an exception but now you can see the real cause in the stacktrace:<br />
<pre class="brush:text; gutter:false">org.springframework.cglib.core.CodeGenerationException: java.lang.reflect.InvocationTargetException-->null
at org.springframework.cglib.core.AbstractClassGenerator.create(AbstractClassGenerator.java:237)
...
Caused by: java.lang.LinkageError: loader (instance of sun/misc/Launcher$AppClassLoader): attempted duplicate class definition for name: "MySessionComponent$$FastClassByCGLIB$$441a78f3"
</pre>Btw, this behavior is strange because cglib <a href="http://cglib.sourceforge.net/apidocs/net/sf/cglib/core/DefaultNamingPolicy.html">claims</a> it can detect name clashes. Maybe it's about repackaging cglib in almost every framework?Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5922395749618357877.post-86604953610883041512013-05-29T00:56:00.001+02:002013-06-17T20:11:18.859+02:00Functional language for a java developerWhy should you learn it? IMHO, very important and underestimated argument is: it makes you a better programmer. What is more, you are probably already using it (javascript) and clojure and scala get more and more attention so it may be a good investment. But there is another, less obvious reason: it will help get a java job. A lot of interview questions are short algorithmic tasks. You can find one at almost every interview. Just pick the task, say that it would be easier to write it in a functional language and do it. Let's see a few examples. I asked google for 'programming interview questions' and I landed <a href="http://javarevisited.blogspot.com/2011/06/top-programming-interview-questions.html">here</a>. You can easily pick some tasks:<br />
<br />
Create all permutations of a string. Here, haskell version is really impressive<br />
<pre class="brush:haskell">permutation [] = [[]]
permutation xs = [x:ys | x <- nub (xs), ys <- permutation (delete x xs)]
</pre>
In an array 1-100 numbers are stored, one number is missing how do you find it?
<pre class="brush:haskell">missing = succ . length . takeWhile id . zipWith (==) [1..] . sort
</pre>In an array 1-100 exactly one number is duplicate how do you find it?
<pre class="brush:haskell">duplicated = length . takeWhile id . zipWith (==) [1..] . sort
</pre>or after removing duplication:
<pre class="brush:haskell">increasingPrefixLength = length . takeWhile id . zipWith (==) [1..] . sort
missing = succ . increasingPrefixLength
duplicated = increasingPrefixLength
</pre>just for clarification, let's see how it works:
<pre class="brush:haskell">missing [1,3,2,5,6]
-- 4
duplicated [3,2,1,2,4]
-- 2
</pre>From other source: sum of digits of decimal expansion of 100!
<pre class="brush:clojure">(apply + (map #(Integer/parseInt (str %)) (str (apply *' (range 1 101)))))
</pre>number of zeros in decimal expansion of 100!
<pre class="brush:clojure">(count (filter #(= \0 %) (str (apply *' (range 1 101)))))
</pre>and the same in haskell (with currying and compact syntax for function composition)
<pre class="brush:haskell">sum . map digitToInt . show $ product [1..100]
length . filter (== '0') . show $ product [1..100]
</pre>Why functional style is useful? Often it's simpler because it has less edge cases (yes, you have to know the language and understand the FP). Of course above tasks aren't much more difficult in procedural approach and still, during the interview, you will probably have to solve them also in java. But I guarantee you: after such an answer you are a few points ahead of your competitors.<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5922395749618357877.post-92093629460200363022013-05-26T14:28:00.001+02:002013-05-28T22:36:40.420+02:00Everything is a nailHow many times have you rejected someone's idea to use new tool in a (new) project? How many times have you done it without knowing the pros and cons of the tool? Just because you wasn't familiar with it? Why do you use the language / framework / library you use? Because it's best? Sufficient? If you really believe it, read the famous <a href="http://www.paulgraham.com/avg.html">Beating the Averages</a> (or at least the paragraph about The Blub Paradox).<br />
<br />
<h3>Quiz</h3><br />
Before you expand code snippets, think for a while how would you solve the problem. Just try to estimate the complexity. <h4>Question 1</h4>What's the name of the following method? You know this method. If you don't, you should <pre class="brush:java">public static boolean xxx(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
</pre>How much time did you need to read and understand one of the most common functions?<br />
<h4>Question 2 </h4>Can you do it better? Can you make this code more readable? Pick any tool you want. <br />
<pre class="brush:haskell gutter:false; collapse:true; toolbar: true">isBlank = all isSpace
</pre><h4>Question 3</h4>Search all subdirectories and find all mp3 files greater than 9mb. <br />
<pre class="brush:bash gutter:false; collapse:true; toolbar: true">find -iname "*.mp3" -size +9M
</pre>Tiny academic examples? Maybe. Let's try something bigger and more complex.<br />
<h4>Question 4 </h4>Write sudoku solver. The following solution is from <a href="http://programmablelife.blogspot.co.at/2012/07/adventures-in-declarative-programming.html">Manuel Rotter's blog</a>. <pre class="brush:swipl; collapse:true; toolbar: true">:- use_module(library(clpfd)).
sudoku(Rows) :-
append(Rows, Vs), Vs ins 1..9,
maplist(all_distinct, Rows),
transpose(Rows, Columns),
maplist(all_distinct, Columns),
Rows = [A,B,C,D,E,F,G,H,I],
blocks(A, B, C), blocks(D, E, F), blocks(G, H, I),
maplist(label, Rows).
blocks([], [], []).
blocks([A,B,C|Bs1], [D,E,F|Bs2], [G,H,I|Bs3]) :-
all_distinct([A,B,C,D,E,F,G,H,I]),
blocks(Bs1, Bs2, Bs3)
</pre>Not a real life examples? No company would ever use such strange languages to make money? Well... <a href="http://stackoverflow.com/a/261360">they</a> and <a href="http://www.haskell.org/haskellwiki/Haskell_in_industry">they</a> do. <h4>Question 5</h4>Something from corpo world. Typical security for web application. Only users with admin role (based on company's ldap) can access the application and only via https. Secondary log in of the same user should terminate his previous session. Each login should create new session id to prevent session fixation attack. Css files should not be protected because of performance. <pre class="brush:xml; auto-links: false; collapse:true; toolbar: true" ><ldap-server url="ldap://mycompany.com:389/dc=mycompany,dc=com" />
<ldap-authentication-provider
user-dn-pattern="uid={0},ou=people"
group-search-base="ou=groups" />
<http pattern="/css/**" security="none"/>
<http auto-config='true'>
<form-login login-page='/login.jsp'/>
<intercept-url pattern="/login.jsp*" access="IS_AUTHENTICATED_ANONYMOUSLY"
requires-channel="https"/>
<intercept-url pattern="/**" access="ROLE_ADMIN" requires-channel="https"/>
<session-management>
<concurrency-control max-sessions="1" />
</session-management>
</http>
</pre>End of quiz. If you have other examples of a language/tool perfectly suited for a specific task, send them to me! Disclosure: prolog and spring examples haven't been tested. <br />
<br />
Of course, you can't freely mix languages. Cost of integrating different languages, not designed for it, is rather high. Complicated build process, nonexchangeable data types, different runtimes, bad IDE support and so on. So, before each project, choose your language wisely. Will it cover most of requirements? Will it cover the most time-consuming ones? Then you won't add another language just to write <code>isBlank</code> but at some point it may be worth it. Will you recognize it when you reach that point? Cause when all you have is a hammer...Unknownnoreply@blogger.com0