Java string concatenation can be a nightmare


Please note that this blog has been moved.

Now it has its own domain: mynixworld.info🙂

If you want to read the latest version of this article (recommended) please click here and I open the page for you.

OK, so you are a Java keen programmer and you think that Java™ rocks. Perhaps you are right. But be aware of Java string concatenation, especially when it’s done in a loop where you have hundred thousands or even million of iterations.

Exercise extra caution when choosing a technique for string concatenation in Java™ programs. Simply using the “+=” operator to concatenate two strings creates a large number of temporary Java objects, since the Java String object is immutable. This can lead to poor performance (higher CPU utilization) since the garbage collector has additional objects to collect. Use the Java StringBuffer object to concatenate strings because it is more efficient.

The above quote come from an article written in 2001 by some IBM developer.

This is somehow fun, because I found a code written in 2003 (or something) by some IBM developers which didn’t care to much about the above statement.

The original code can be found here (as of 2012-09-25):

http://www.jdg2e.com/jdg2e_CD_for_eclipse321/plug-in_development/examples/com.ibm.jdg2e.editor.jfacetext.sql/src-SQLTextEditor/com/ibm/jdg2e/editor/jfacetext/sqleditor/sql/SQLWordStrategy.java

Take a look at the function keyWordsToUpper where they concatenate two strings in a huge loop, that multiplied by the number of tokens that could exists in a SQL file can drive the function to return after….605033 milliseconds (~10 minutes).

I’ve tested that function against a SQL file that has 20263 lines (~1 MB).

OK, so I’ve used the IBM developer’s advice above and instead of using a plain string concatenation (i.e newContent = newContent + token) I have used the StringBuffer.append(token) method. The result is astonishing: 115 milliseconds. That is more than 5261 times faster than the “same” code, except what I have said already.

As a conclusion: never use string concatenation when you know that the code is targeting a huge loop and/or the concatenated string object can grow uncontrollable (I mean, it does not depend on you, as a programmer but rather by a runtime object, like a file, which can have a variable length).

About Eugen Mihailescu

Always looking to learn more about *nix world, about the fundamental concepts of arithmetic, algebra and geometry. I am also passionate about programming, database and systems administration.
This entry was posted in Java and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s