Monday, August 2, 2010

QubLib Strings

Strings have been giving me a little bit of a challenge lately. Programming them hasn't been the problem. The problem has been deciding which type of design I want to use.

I've been told that Java uses a design that I will call "Shared Strings". Essentially, if you create the string "hi there" and then somewhere else you create it again, both string references are pointing at the same "hi there" in memory. This is extremely useful in situations where you have 10 bazillion references to "hi there". There is no need to have that same string 10 bazillion times in memory. However, doing it this way does produce a slight overhead problem. I have a StringManager singleton object that keeps track of all of my strings along with how many times they are referenced, very similar to my smart pointer that I introduced a while back. Everytime you create a string, it takes about a log(n) time chunk to see if the string is already in the StringManager. Then upon deletion, it takes another log(n) time chunk to see if the string has any other references. The deletion time is because in an actual String object I am storing a char*, which points to the actual heap-allocated character array in the StringManager. I suppose I could have a String object point to the char* and the integer value that indicates how many references it has. But keeping it the way it is right now is nice because it keeps things simple. What do you think? Are there any other options that I haven't considered yet but should keep in my head? Or, are there other factors that I'm not thinking about that would probably seal the deal for which route I will end up taking?

1 comment:

  1. I like this implementation, especially if the String objects are immutable like Java's. Also, I've heard that heap allocation is a relatively slow process, so a log(n) lookup might actually be faster if that's the case.

    I really can't think of any practical examples where the lookup wouldn't be a good idea.

    ReplyDelete