Using Shapeless for Data Cleaning in Apache Spark

When it comes to importing data into a BigData infrastructure like Hadoop, Apache Spark is one of the most used tools for ETL jobs. Because input data – in this case CSV – has often invalid values, a data cleaning layer is needed. Most tasks in data cleaning are very specific and therefore need to be implemented depending on your data, but some tasks can be generalized. In this post, I’ll not go into Spark, ETL or BigData in general, but provide one approach to clean null / empty values off a data set. [Read More]

Java Libs in Scala - A bit more Functional

Every Java library can be used in Scala, which is, for me, one of the good parts of the JVM world. But Java libs are mostly object-oriented and not functional, therefore full of side effects and somtimes “ugly” to use in Scala. But there are some approaches how to make Java libs (or their interfaces) more functional, so they can almost be used like a Scala lib. Java 8 Type Conversion Many Java types like Map or List, but also functional types (Java 8) like Optional<T> have Scala pendents. [Read More]

Overcoming Checked Exceptions in Java Lambdas

In Java 8, the long awaited Lambda came to live, making it easy(-er) to do FP in Java. One problem I came across is, that most Java code throws checked exceptions which leads to IMHO ugly try/catch blocks in lambdas: Function<A, B> fun = (a: A) -> { try { // some function call that trows checked exception$ return callFn(a); } catch (Exception e) { // return failure result } }; The Good, the Bad and the Ugly A really simple, but also not really nice option is to wrap thrown exceptions into an unchecked one: [Read More]

Slides: Principles of Object Orientation

Some time ago, in school, there was a somehow funny situation: The prof was talking about object orientation and patterns. He came up with SRP but left out some important others. I asked if we will talk about all GRASP and SOLID principles, because in my personal opinion, they are part of “the basics”. His answer was not what is was expecting… He told me to prepare a talk about those principles in front of the class. [Read More]

New Blog

You may have noticed, that I had not much time to blog about something in the past months… One thing was, that we all need money, so I had to work very much. And as some of you know, I’m also studying in the evenings. Long story short, I’ve restarted blogging with a new engine and new design once again. Key is (not just to use new swag), to simplify blogging for me, so I can put more time in the articles and less in managing the blog itself. [Read More]

Play Framework Actor Pooling with Guice (Java)

Working with the Play! Framework means working with Akka, intentionally or not. But working with Akka Actors can be tricky, especially when it comes to dependency injection. Play! 2.4 uses Google’s Guice for DI and of course it has the ability to also bind Actors so an ActorRef can be injected anywhere. Single Actor DI Biding and injecting one single Actor is simple and well documented . Just bind it in a Module: [Read More]

Scala Compiler Tuning

As my Scala projects go on, I want to share some compiler configuration and tricks with you, which I use on many projects. Some tiny configuration options can greatly improve your code and warn you about things, you would probably never discover. Basically, you can pass compiler options to scalac using console arguments: $ scalac -deprecation -unchecked -Xlint something.scala If you are using SBT, it’s even simpler… You can just use the following configurations snippet in your build. [Read More]

Understanding Stemmers (Natural Language Processing)

I am interested in NLP and have already some experience with Apache Solr. It’s time to dig a little in-deep regarding stemmers. First of all, I was looking for a general definition of what a stemmer is, and I found this one, which IMHO is quite good: stemmer — an algorithm for removing inflectional and derivational endings in order to reduce word forms to a common stem So what a stemmer does is nothing more, than converting words to their word stem. [Read More]

If pragmatism raises technical debt, call it oversimplification (rant)

The word “pragmatism” or “pragmatic” is, in my personal opinion, the most overrated word in agile development. Many people use this as a buzzword without knowing what it means. I hear people saying “He solved that complex problem in half an hour, he’s so pragmatic!” and think for myself “Yeah, but that ‘solution’ probably causes other devs three times more effort than a sustainable solution would take.” Okay, but that’s only my anger speaking. [Read More]