Mapping greek and cyrillic character to latin in Solr

Today I created a extensive mapping list of unicode to latin characters for solr. It contains some general accent mapping to the normal letter e.g. é -> ue and a mapping for greek and cyrillic characters.

If you show the files in your browser it may be possible that you need to manually set the encoding to utf8.

Exceptions in Tests

What I often encounter in tests is something like:

This is unnecessary as JUnit and other test frameworks will fail a test if an exception is thrown. So declare the test method to throw SomeException.

Another often encountered snippet is:

This was an idiom in JUnit3 to test that an exception is thrown. In times of JUnit4 the idiom changed to:

This automatically fails the test method if the exception is not thrown and in my opinion is slightly more readable than the example before. On the other hand if you need to test message or other properties of the exception you still have to use the JUnit3 idiom. But if you’re just interested that an exception is thrown one can use the new syntax.

Reading Resources

In my experience placing resources and configuration is a very difficult thing to do. Often resources are put in files and placed somewhere where it is very difficult to access them from the packaged jar file. In general one should put resources that will not or should not change during runtime in the classpath and put configuration files that need to be adapted in an easy accessible path. For reading configuration files consider using something like commons-configuration.

In the following I will describe my experiences with reading resources and try to give some advice how to maybe do it right.

getResourceAsStream vs. getResource

getResourceAsStream reads resources from the classpath This means resources that are contained in a jar-File can be read easily. Test-Files can be put into src/test/resources and can be read from there via getResourceAsStream. It is not necessary to concatenate horrible paths together that only work in the IDE.

For example FileInputStream(new File(“src/main/resources/someResource.txt”)) will not work if the project is packaged as a jar because everything under “resources” will be put into the jar directly. So src/main/resources/someResource.txt will become someResource.txt. In Maven projects compiled classes and files in src/main/resource are copied to target/classes so relative paths like the one above will not work. Things like ../../ src/main/resources/someResource.txt that work in your IDE will stop working when the project is packaged as a jar. It is therefore advisable to use getResourceAsStream to read resources that will stay constant during the run of the (finished/delivered/packed as a jar) program; as for example icons or other resources.

It is a similar thing with getResource. The method getResource returns a url that points to the resource in the classpath. If this resource is in a jar the url looks something line jar:file://someResource.txt . This can not be resolved by the JVM and an exception is thrown. So is is advisable to use streams instead of URLs and files wherever possible.

Relative- vs. absolute paths

If resources that will change between runs of the program must be read place them next to the jar file (i.e. root path in eclipse) and access them via new FileInputStream(“someResource.txt”). Absolute paths are forbidden.

In your API or simple components try to use InputStream instead of File objects as there are easily testable. For a test the file needs to be created and read in during the test. Ok if you don’t change this file. If it is changed you need to make sure the altered file is reverted to the original state after the test or your test will work only once. Additionally there will be interesting encoding problems between windows, linux and mac machines. So use InpuStream and mock the input via a ByteArrayInputStream and feed it with String.getBytes() instead of reading a file.

You shouldn’t care how the InputStream is filled. Somebody (the user of your api) or something (a DI-Framework) will give you the appropiate values. If not throw an exception but don’t try to parse or read a file in your simple component or API. This will in most cases be the responsibility of another class and therefore violate one of the SOLID principles.