Michael Angstadt's Homepage
the works and thoughts of a software developer and artist.

Blog

How this page works

See more posts >>

01/26/2012 11:14 pm

The Collections Framework is a group of classes that allow you to group objects together in various ways and perform operations against those groupings. It is contained within the core Java API, so you don't have to install anything extra to use it. Its features are so basic and powerful that pretty much every piece of software written in Java makes use of this framework in one way or another. The framework has been around since Java 1.2 (1998) and has been improved upon with every subsequent Java release. There are four major collection types: Lists, Sets, Queues, and Maps.

The four basic collection types (List, Set, and Queue all inherit from the Collection interface).

Another type of collection in the Java Collections Framework is the set. A set is a collection of elements where duplicates are not allowed. If the programmer tries to add a duplicate element, the set will see that the element already exists and nothing will be added. Sets also do not allow random access like lists do. The only way to retrieve elements from a set is to iterate over all the elements using an Iterator or a foreach loop (so, for example, you cannot directly get the "fourth" element of a set).

Buckets

Sets are really fast when it comes to determining whether an element exists in the set or not. The way sets do this is by making heavy use of the Object.equals() and Object.hashCode() methods. When an object is added to the set, the set uses the object's hash to find a bucket to store the object in. Later, when the set is asked if it contains a given object, the set uses the hash of the given object to find a bucket, and then iterates over each element in that bucket. It calls the equals method on each element in the bucket to determine if the given object is the same as any of the elements in the bucket. This is a lot faster than a list because a list has to iterate over every single element in order to determine if it contains an object. Sets just have to iterate over the elements in a single bucket, which contains only a fraction of the total number of elements in the set.

Importance of hashCode()

However, the efficiency of the set's search operation relies heavily on the quality of the object's hashCode() implementation. If the implementation is good, then the set will spread all of its elements evenly amongst all buckets. However, if the implementation is not good, then the elements will not be spread evenly amongst all buckets. This will cause some buckets to be larger than others, which means they will take longer to search over and decrease the overall performance of the set.

Probably the worst possible hashCode() implementation you could create is one that returns a hard-coded value. It's a completely valid implementation, but it will cause all elements to be stored in a single bucket, making the set's search performance no better than that of a list.

public class BlogPost {
  private String content;
  private Date created;
  private List<String> comments;

  @Override
  public int hashCode(){
    //the worst-possible implementation
    //DO NOT USE!!
    return 1;
  }
}

What you want to do is create a hashCode() implementation that will generate a wide range of values. Luckily, if you use Eclipse (or probably any other IDE), you can have one generated for you. In Eclipse, right-click on the code and go to "Source > Generate hashCode() and equals()". It will prompt you for what class fields you want to include in the hash calculation and then generate the proper code automatically.

public class BlogPost {
  private String content;
  private Date created;
  private List<String> comments;

  @Override
  public int hashCode() {
    //the implementation generated by Eclipse (much better)
    final int prime = 31;
    int result = 1;
    result = prime * result + ((comments == null) ? 0 : comments.hashCode());
    result = prime * result + ((content == null) ? 0 : content.hashCode());
    result = prime * result + ((created == null) ? 0 : created.hashCode());
    return result;
  }
}

Relationship between hashCode() and equals()

Note that Java expects the equals() and hashCode() methods to follow certain rules (read this three times to let it sink in):

  1. If two objects are equal, then they must have the same hash.
  2. But if two objects have the same hash, they are not necessarily equal.

It is your responsibility as a programmer to follow these rules. If you do not, then sets (as well as other collection types from the Collections Framework) will not function correctly. The code that Eclipse generates complies with these rules.

Example

Here's a short code example of how to use a set in Java.

Set<String> users = new HashSet<String>();
users.add("Mark");
users.add("Steve");
users.add("Mark");
users.add("Kelly");
System.out.println(users.contains("mark"));
System.out.println(users.contains("Mark"));
for (String user : users){
  System.out.println(user);
}

On line 1, I create the Set object. I'm assigning the object to a variable of type Set instead of type HashSet so that I can more easily change implementations if I ever need to in the future.

On lines 2-5, I add elements to the set. On line 4, I try adding a duplicate element, but nothing will be added because sets do not allow duplicate elements. No exception will be thrown or anything--the add() method will just return "false" instead of "true".

Line 6 will print "false" because the String "mark" is not the same as the string "Mark". Line 7 will print "true".

On lines 8-10, I iterate over the entire set, printing each user name. This is the only way to retrieve elements from a set.

Set Implementations

There are three implementations of the Set interface. They differ in the way in which they iterate over their elements.

  • HashSet - The most basic set implementation, iterates over elements in no particular order.
  • LinkedHashSet - Iterates over elements in the order in which they were added to the set.
  • TreeSet - Iterates over elements in their sorted order. The way in which the elements are sorted can be defined by having the elements implement the Comparable interface, or by passing in an implementation of the Comparator interface to the TreeSet constructor.
01/23/2012 8:33 am

The Collections Framework is a group of classes that allow you to group objects together in various ways and perform operations against those groupings. It is contained within the core Java API, so you don't have to install anything extra to use it. Its features are so basic and powerful that pretty much every piece of software written in Java makes use of this framework in one way or another. The framework has been around since Java 1.2 (1998) and has been improved upon with every subsequent Java release. There are four major collection "types": Lists, Sets, Queues, and Maps.

The four basic collection types (List, Set, and Queue all inherit from the Collection interface).

In this blog post, I'll discuss the List type. Lists are defined as being an ordered collection of elements where duplicate elements are allowed. Probably the most used list implementation (and probably the most used class in the entire Collections Framework) is the ArrayList class. This class is meant to behave just like an array, with the added ability of being able to add, insert, and remove elements (which cannot be done with normal arrays).

List<String> names = new ArrayList<String>();
names.add("Mark");
names.add("Steve");
names.add("Jill");
names.remove(1);
names.remove("Jill");
names.add(0, "Zach");
Collections.sort(names);
System.out.println(names);

On line 1, I create a new instance of an ArrayList. Using generics, I specify that the list will only contain String objects (generics support was added to the Collections Framework in Java 1.5). If I try to add anything other than a String to my ArrayList, a compilation error will be thrown.

Also notice how I'm assigning my ArrayList object to a variable of type List instead of type ArrayList. List is an interface that ArrayList implements. The reason for doing this is that there are multiple implementations of the List interface. By assigning my ArrayList to a variable of type List, I'm making my code more flexible so that it's not tied to a specific List implementation. The only thing the code cares about is that the List variable satisfies the definition of what a "list" is supposed to be within the Collections Framework (an ordered collection where duplicates are allowed). It is good practice to apply this technique throughout the Collections Framework to maximize code flexibility.

On lines 2-4, I populate my list by adding some elements to it. The add() method appends the element to the end of the list.

On line 5, I remove an element by specifying the element index (Lists are 0-based just like normal arrays, so "0" corresponds to the first element, "1" to the second, etc). This will remove the "Steve" element. Then, on line 6, I remove an element by passing in a String object. When an element is removed in this way, it iterates over each element in the list, using the object's equals() method to determine which element to remove. If it doesn't find any such element, nothing is removed.

On line 6 of the code sample, I insert an element into the List at a specific index. The index I specify is "0", which means it will be inserted at the beginning of the list.

Then, on line 7, I sort the list so that the names are ordered alphabetically. This method requires that the class of the elements in the list implement the Comparable interface, which defines how two objects of that class are compared against each other when they are sorted. The String class already implements this interface, defining that Strings be sorted alphabetically. If you want to sort the list differently, you can pass your own implementation of the Comparator interface as a second argument of the sort() method. For example, to sort the strings by length, you could write:

Collections.sort(names, new Comparator<String>(){
  @Override
  public int compareTo(String element1, String element2){
    //sort Strings by length
    if (element1.length() < element2.length(){
      return -1; //"element1" should come before "element2"
    } else if (element1.length() > element2.length()){
      return 1; //"element1" should come after "element2"
    } else {
      return 0; //"element1" and "element2" are equal
    }
  }
});

Finally, on line 8, I print the list out so I can see what it looks like. Passing an object into the System.out.println() method will print the return value of that object's toString() method. The list's toString() method generates a string representation of the list by calling the toString() method of every list element. It surrounds the entire list in brackets and separates each element with a comma.


[Mark, Zach]
01/14/2012 1:50 pm

StatCounter is a free service that tracks the number of hits your website receives. It records a wealth of information about your visitors, including the type of browser they use, their screen resolution, and their geographic location. It can also tell you whether the user came to your website from a search engine, showing you the exact search query that the user entered.

Adding the tracking script to your site

After creating a StatCounter account, click the "Add Project" link in the menubar at the top of the screen. Fill in all the information and select "Invisible tracking" for the counter type on the right side of the page. This option will make the counter invisible to visitors of your webpage. I use this option because it makes my website look more professional.

The "Invisible tracking" option will make your counter invisible.

It then brings you to a page that contains a long list of different installation instructions for a variety of platforms, frameworks, blogs, and hosting services. Choose the one that best fits your particular setup. If you have a plain, "vanilla" website that doesn't really use anything extra, then click the "Default Installation Guide" link on the right. That's what I'll be doing.

Use the default installation guide if none of the other guides apply to your setup.

The next page displays the code that you'll need to add to your website. The code inside the "Standard" tab is the best overall choice. If you're a stickler for writing websites in well-formed XHTML, use the code in the "Standard (xhtml)" tab. If you don't want to use Javascript on your website, use the code in the "Basic" tab. And if you want your code to be both XHTML-compliant and Javascript-free, use the code in the "Basic (xhtml)" tab. Note that the "Basic" versions do not record as much information about the visitor. Javascript has to be used in order to get certain information, like screen resolution and geographic location.

The "Standard" option is the recommended choice.

I'm going to use the "Standard" code. Copy and paste the code right before the </body> tag on your website. By placing the code here, the StatCounter script won't run until the entire page is loaded. If it were to be placed "higher up" in the HTML, like in the <head> tag, then the rest of the page wouldn't load until the StatCounter script finishes downloading and executing. Sometimes, this can take a few seconds, so it's best to put the script at the end of the page so it doesn't keep the rest of the page from loading.

Once you've added the code to your website, you can check to make sure it works by clicking the "Check Installation" button at the bottom of the installation guide page.

StatCounter can check to make sure you've installed the code correctly.

Creating a blocking cookie

Lastly, you're going to want to add a blocking cookie to your computer. This cookie will prevent your own visits to your website from being tracked so that the project statistics aren't skewed by your own activity. Go to the main "Projects" page and then click the "Blocking Cookie" link in the menu bar at the top of the screen. Then, click the "Create Blocking Cookie" button. This will create a blocking cookie that applies to all of your StatCounter projects.

A blocking cookie will prevent your own visits from being recorded.

Remember that if you have multiple browsers on your computer, you'll have to do this for each browser, because each browser has its own set of cookies.

01/08/2012 2:18 pm

Often times, websites contain a number of small, icon-sized images. These images don't take long to download individually, but collectively, they can slow down the load time of a webpage because one HTTP request has to be made for every image. Plus, the HTTP protocol states that browsers can only make two simultaneous requests per domain at a time. This means that no matter how fast your Internet connection is, only two images can be downloaded at a time.

CSS image sprites help to get around this bottleneck. An image sprite is a single image that has multiple images inside of it. CSS styling is used to "cut out" an individual image from the image sprite and display it on the page. By consolidating all the images into a single image, the browser only has to make one HTTP request, as opposed to making multiple HTTP requests for each image. This decreases the load time of the page.

An image sprite from Amazon.

GWT gives you the ability to combine your application's icons into an image sprite automatically. In this tutorial, I'm going to show you how this is done by adding some images to my Twitter Search application. I will be adding:

  1. An icon for the "Search" button.
  2. An icon for the "Privacy Policy" button.
  3. An icon to display along with error messages.
The TwitterSearch application with the new images.

Creating an image bundle interface

First, create an interface that extends the ClientBundle interface. This is called an image bundle and it's what we'll use to access the individual images from our image sprite.

package com.acme.twittersearch.client;

import com.google.gwt.resources.client.ClientBundle;
import com.google.gwt.resources.client.ImageResource;

public interface Images extends ClientBundle {
  @Source("search.png")
  ImageResource search();

  @Source("error.png")
  ImageResource error();

  @Source("document.png")
  ImageResource privacyPolicy();

  @Source("loading.gif")
  ImageResource loading();
}

The image bundle contains one method per image. Each method must return a ImageResource object and can be given any name. Each method is also annotated with the path to the image file. This path is relative to the location of the image bundle interface. Since my image bundle is in the com.acme.twittersearch.client package, this means that all my images are there too (the images cannot be in the war directory, they have to be inside the classpath, alongside the source code). The GWT Eclipse plugin will tell you if the path to an image is incorrect by throwing a Java compilation error.

I also add the loading animation image to my image bundle. Normally, animated GIFs cannot be added to image sprites, but GWT still allows you to add them to image bundles. They won't be added to the image sprite that GWT generates, but programmatically, they are treated as if they were part of the image sprite.

Adding the images to the application

Now that I've created my image bundle, I can add the images to my application. First, I'll create an instance of the image bundle via deferred binding:

public class TwitterSearch implements EntryPoint {
  private final Images images = GWT.create(Images.class);
  [...]
}

Instances of ImageResource can be passed into the constructor of the Image class, so that's what I'll do for the error and loading images.

public class TwitterSearch implements EntryPoint {
  [...]
  private Image errorImage;
  private Image loadingImage;
  [...]

  private void createWidgets() {
    [...]
    errorImage = new Image(images.error());
    loadingImage = new Image(images.loading());
    [...]
  }
}

The "search" and "privacy policy" images are a little more complicated because of the way the Button widget works. After creating an Image object for each icon, I generate raw HTML code, which will be passed into the Button constructor (note: calling the toString() method on any widget object will return its HTML code).

public class TwitterSearch implements EntryPoint {
  [...]
  private Button privacyPolicyButton;
  private Button searchButton;
  [...]

  private void createWidgets() {
    [...]

    Image icon = new Image(images.search());
    SafeHtmlBuilder builder = new SafeHtmlBuilder();
    builder.appendHtmlConstant(icon.toString());
    builder.appendEscaped(" Search");
    searchButton = new Button(builder.toSafeHtml());
    searchButton.addClickHandler(new ClickHandler() {
      [...]
    });

    icon = new Image(images.privacyPolicy());
    builder = new SafeHtmlBuilder();
    builder.appendHtmlConstant(icon.toString());
    builder.appendEscaped(" Privacy Policy");
    privacyPolicyButton = new Button(builder.toSafeHtml());
    privacyPolicyButton.addClickHandler(new ClickHandler() {
      [...]
    });

    [...]
  }
}

And that's really all there is to it! GWT will create the actual image sprite for you. To see what the image sprite looks like, compile the GWT application and then navigate to the war/twittersearch folder. The image sprite will be somewhere inside this folder (it will have a crazy name like "F0B5712E038983254B46645D77476DA7.cache.png").

GWT automatically creates an image sprite from the images in our image bundle.

More information on GWT image bundles can be found in the GWT Developer Guide.

The complete source code from this tutorial can be downloaded here.

CORRECTION (1/12/2011): Even though GWT generates an image sprite from an image bundle when compiled, it actually only uses the image sprite for IE 6 and possibly IE 7 (thanks Boris). What GWT really does is encode the images as data URIs. A data URI puts the actual image data directly inside the HTML page, instead of referencing the image via a URL like normal (I wrote a brief blog post on data URIs a while back and also wrote a data URI generation tool). I'm not sure why GWT does this, but I'm sure it has its reasons. You should avoid using data URIs for large images--since the image data is embedded directly inside the page, the browser can't cache it, so the image basically has to be downloaded every time the page is loaded. --Mike

01/02/2012 7:24 pm

Highslide is a Javascript library used for creating "lightboxes", or Javascript-based popups, for your website. The Highslide website has an editor that you can use to customize the look and feel of your lightbox. It's actually pretty powerful, allowing you to customize everything from the popup animation to the font size of the image caption. Once you've customized the lightbox to your liking, you can view the HTML, CSS, and Javascript code required to make it work on your own website.

However, if your lightbox has the "graphic outline" setting enabled (this setting can be accessed by clicking the "Border and outline" button on the "Style" tab), then the hs.outlineType configuration variable in the Javascript code will be set to custom.

The "custom" outline type will not work with your own website.

This is not an outline type that comes with the Highslide installation, so if you try using this on your own website, it won't work. What the Highslide Editor does behind the scenes, is make an AJAX call to a custom.php script, which dynamically generates a border image based on the settings you've defined in the editor. As far as I can tell, the editor doesn't provide a way for you to download this image, so you have to get the URL to the custom.php script yourself, and download the image that way. Here's how to do that using Chrome:

  1. Click the Wrench button, then go to "Tools > Developer Tools". This opens a window that shows additional information about the current browser tab.

    Opening the "Developer Tools" window in Chrome
  2. Click the "Network" button at the top of the screen. This screen displays all HTTP requests that the browser tab makes, including the images that it downloads and the AJAX requests that it makes. When you first open this screen, it will be blank. But if you refresh the Highslide Editor webpage, the Network screen will flood with all the requests the browser has to make in order to load the page.

    The "Network" screen shows all the HTTP requests that were made.
  3. Now, click the sample image in the Highslide Editor so that it expands into a popup. This causes the browser to make an AJAX request to the custom.php script (for some reason, it makes the same request twice).

    An AJAX request to the custom.php script is made when the sample image is clicked.
  4. Right-click on the custom.php request in the Network screen and select "Open Link In New Tab". This will open the border image in a new browser tab, allowing you to save it to your local Highslide installation.

    Load the image in a new tab.
  5. Finally, save the border image to the "graphics/outlines" folder of your Highslide installation. Whatever name you give the file (minus the file extension) will be what you'll use in the hs.outlineType setting. For example:

    //uses the "graphics/outlines/rounded-corners.png" file
    hs.outlineType = 'rounded-corners';
See more blog entries >>

How this page works

Last Updated: 1/3/2012

My blog is actually hosted on blogger.com. The way I'm able to display my blog posts here is by parsing the blog's RSS feed. RSS feeds are used by blogs to help alert their avid readers whenever a new post is created. They are just XML files that contain data on the most recent blog posts. They include things like the title and publish date of each post, as well as the actual blog post text. I can use most of the data from my RSS feed without any trouble, but there are a few things I need to tweak in order to display everything properly.

View the source

Fixing the code samples

One tweak is fixing the code samples I often include in my posts. Blogger replaces all newlines in the blog post with <br /> tags. This is a problem because, due to the syntax highlighting library I use, the <br /> tags themselves show up in the code samples. So, I need to replace all of these tags with newline characters. However, I can't just replace all <br /> tags in the entire blog post because I only want to replace the tags that are within code samples. This means that I have to use something a little more complex than a simple search-and-replace operation:

$content = //the blog post
$contentFixed = preg_replace_callback('~(<pre\\s+class="brush:.*?">)(.*?)(</pre>)~', function($matches){
	$code = $matches[2];
	$code = str_replace('<br />', "\n", $code);
	return $matches[1] . $code . $matches[3];
}, $content);

Here, I'm using the preg_replace_callback PHP function, which will execute a function that I define every time the regular expression finds a match in the subject string. I know that each code sample is wrapped in a <pre> tag and that the tag has a class attribute whose value starts with "brush:", so I use that information to find the code samples. Then, for each match the regular expression finds, it calls my custom function, where I have it replace the <br /> tags with newlines.

Fixing the dates

Because the publish dates of each blog post in the RSS feed are relative to the UTC timezone, I also have to make sure to apply my local timezone to each date. Otherwise, the dates will not be displayed correctly (like saying that I made a post at 2am in the morning).

$dateFromRss = 'Tue, 20 Dec 2011 02:30:00 +0000';
$dateFixed = new DateTime($dateFromRss);
$dateFixed->setTimezone(new DateTimeZone('America/New_York'));

Adding Highslide support to images

One extra feature that I included is adding Highslide support to each image (Highslide is a "lightbox" library which lets you view images in special popup windows). To do this, I load the blog post into a DOM, use XPath to query for all links that have images inside of them, and then add the appropriate attributes to the link tag.

$content = //the blog post

//XML doesn't like "&nbsp;", so replace it with the proper XML equivalent
//see: http://techtrouts.com/webkit-entity-nbsp-not-defined-convert-html-entities-to-xml/
$content = str_replace("&nbsp;", "&#160;", $content);

//load the text into a DOM
//add a root tag incase there isn't one
$xml = simplexml_load_string('<div>' . $content . '</div>');

//if there's a problem loading the XML, skip the highslide stuff
if ($xml !== false){
	//get all links that contain an image
	$links = $xml->xpath('//a[img]');
	
	//add the highslide stuff to each link
	foreach ($links as $link){
		$link->addAttribute('class', 'highslide');
		$link->addAttribute('onclick', 'return hs.expand(this)');
	}

	//marshal XML to a string
	$content = $xml->asXML();
	
	//remove the XML declaration at the top
	$content = preg_replace('~^<\\?xml.*?\\?>~', '', $content);
	
	//trim whitespace
	$content = trim($content);
	
	//remove the root tag that we added
	$content = preg_replace('~(^<div>)|(</div>$)~', '', $content);
}

As you can see, the blog post text has to be awkwardly manipulated in order to be read into a DOM and written back out as a string. That's why I have a lot of comments here--when I have to revisit this code in 6 months, I won't be totally confused.

Caching the RSS file

One last thing to mention is that I cache the RSS file so that my website doesn't have to contact Blogger every time someone loads this page. When the cached file gets to be more than an hour old, a fresh copy of the file is downloaded from Blogger.

Back to top