December 19, 2008

OpenZoom Description Format

When we look at multiscale or multi-resolution imaging in 2008,1 we’re mostly looking at a pile of image files2 (called tiles) that make up an image pyramid.3 Typically, image file formats, e.g. JPEG and PNG, have stored their properties such as width and height inside the file. These formats acted not only as carrier of image data but also as container for the metadata associated with it. This is a manifestation of the The Truth Is in the File paradigm.

Deprecation Warning

The OpenZoom Description Format was just a proof-of-concept. For real-word applications, I highly recommend using the Microsoft Deep Zoom file format as it has the widest client & tooling support, is well-documented and can be used within collections.

The Truth Is Out There

Taking this paradigm into account, we suddenly encounter a problem with multiscale images. If the original image is exploded into many little pieces, where do we store its metadata? There are different solutions to this problem. For mapping sites such as Google Maps and Yahoo Maps it is probably sufficient to just hard-code the image pyramid properties and how to access the tiles directly inside the client. However, for general multiscale image viewing technologies such as Zoomify, Deep Zoom or OpenZoom this is not an option since we don’t know the properties of the images until run-time. Again, there’s a simple and elegant solution to for this: XML description files that carry the image metadata.

Rumble In the Jungle

In the following section I will first quickly present you the two dominant multiscale image description formats out there: Zoomify and Microsoft Deep Zoom. After that, I will introduce you to a new description format I designed called OpenZoom description format.

Note: The following examples all describe a 10 megapixel JPEG image with the name bruges.

Zoomify

Example

This Zoomify image has the following structure on the file system: Descriptor bruges/ImageProperties.xml [filename]/ImageProperties.xml Tiles bruges/TileGroup0/0-0-0.jpg bruges/TileGroup0/1-0-0.jpg bruges/TileGroup0/1-0-1.jpg bruges/TileGroup0/1-1-0.jpg bruges/TileGroup0/1-1-1.jpg bruges/TileGroup0/2-0-0.jpgbruges/TileGroup0/4-15-9.jpg [filename]/TileGroup[X]/[level]-[column]-[row].jpg

Deep Zoom Image (DZI)

Example


  

This Deep Zoom image (DZI) has the following structure on the file system: Descriptor bruges.xml [filename].[xml|dzi] Tiles bruges_files/0/0_0.jpg bruges_files/1/0_0.jpg bruges_files/2/0_0.jpgbruges_files/9/0_0.jpg bruges_files/9/0_1.jpg bruges_files/9/1_0.jpg bruges_files/9/1_1.jpgbruges_files/12/15_9.jpg [filename]_files/[level]/[column]-[row].[extension]

OpenZoom Description Format

The following is actually a description of the Deep Zoom image we've looked at previously. Descriptor bruges.xml [filename].xml Tiles Wherever you wish… Example


  
    
      
    
    
      
    
      
    
    
      
    
    
      
    
    
      
    
    
      
    
  

OpenZoom Description Format XML Schema (Draft)

Not Invented Here

Alright, we’ve seen examples of all three description formats for the same image. Before anything else, you might ask yourself: Why the #&$@ another format? Good attitude and glad you asked. Hopefully, I will be able to answer this question for most of you. If not, just leave me a comment, I’d be glad to discuss this further. Now, let’s compare these three formats by looking at where they shine but of course also at their shortcomings.

Conciseness

Obviously, Zoomify and Deep Zoom win big time here. Their description files have a couple of lines vs the 40+ lines of the OpenZoom descriptor which inherently is very verbose — Ed.: Levels 2–7 omitted for esthetic reasons. On the other hand, we should keep in mind that everything we see in the OpenZoom descriptor sample somehow has to be computed by the client for the other two formats. More on that later.

Portability

Not sure if portability is the right term, but let me explain what I mean: How flexible is the format regarding the storage of the descriptor and its image tiles? Deep Zoom is the most extreme case of the three where the descriptor file and the image tiles are strongly coupled through the original file name of your image. That means if you move your descriptor you always have to remember to move the image data folder as well. This could be considered risky as the two are not contained in one folder. Zoomify has the same limitation but at least the image data and its descriptor are both contained in the same folder that carries the name of the original image. OpenZoom is clearly the most portable of the three as it let’s you specify the descriptor file independently of the image tiles.

Important Note: Both Microsoft and Zoomify offer an alternative storage method in the form of a single-file format. They are called Zoomify’s Pyramidal File Format (PFF) and DZIZ (a ZIP-based container for DZI) which I’ve seen used by Microsoft Photosynth.

Flexibility

Flexibility apparently was not a design goal of Microsoft or Zoomify. This is fine considering that the design of such a new format requires these kinds of trade-offs. Their assumption is that the descriptor file and the image tiles are strongly coupled and the latter are computed with a well-defined algorithm and stored in a fixed file hierarchy. Flexibility is the area where the OpenZoom description format shines. When I worked on the OpenZoom description format, I obviously followed the Python Zen which states Explicit is better than implicit. Although one drawback is the verbosity of the format, there are many advantages we can get from it. For example, when I worked on the OpenZoom framework, I wanted to test it with some really large multiscale images that are out there. Well, what is the largest image out there that I know of? A map of the world, of course. The OpenStreetMap Project, for example, features many, many gigapixels of image data. Fine, so how do I test the framework with a map? Hard-code the URLs somewhere? No, no. Let’s create a descriptor for it. So I did. Grab it and play with it with your copy of the OpenZoom framework. Look Ma’, no code!

This example demonstrates one of the advantages of the format, namely your descriptor file does not have to be stored along with your image data. Just put your descriptor wherever you wish and point it to the image tiles.

Features: OpenZoom Description Format

The following section gives you a short summary of some of the features in the OpenZoom description format.

Flexible Pyramid Layout Behind both Zoomify and Deep Zoom, there are well-specified algorithms that create the image pyramid and define its properties. To get an idea of how the formats expand the information you previously saw in their descriptors, feel free to take a look at their implementation in OpenZoom: ZoomifyDescriptor and DZIDescriptor.

The OpenZoom description format doesn’t require a particular layout of the image pyramid. One requirement would be that every level of the pyramid approximately has the same aspect ratio but I’ve even managed to work around that constraint. To give you an idea of how powerful this flexibility is, consider the following couple of facts:

  1. The OpenZoom description format can express both, the properties of a Deep Zoom image pyramid, as well as the one produced by Zoomify. Besides these, it supports the pyramids of OpenStreetMap, Google Maps (road, terrain and satellite) and many more.
  2. Just like in Deep Zoom, you can specify tile overlap5 in the OpenZoom description format.
  3. Unlike Deep Zoom or Zoomify, the OpenZoom description format also supports non-square image tiles by exposing a tileWidth as well as a tileHeight property. Deep Zoom and Zoomify obviously don't have to support this as they know that their algorithms don't produce non-square tiles. The OpenZoom format however, has to accomodate legacy multiscale image data that has non-square tiles.
  4. One thing that surprised me most is the fact that even images on Flickr which are stored in many different dimensions can be put into relationship of an image pyramid. The levels of a Flickr image pyramid are quite irregular compared to Deep Zoom and Zoomify as they are bounded by maximum sidelengths of 100, 240, 500, 1024 and original. Even though it isn't very efficient since Flickr doesn't support tiles, images from Flickr can be rendered as multiscale images inside OpenZoom.
  5. </ol> Important Note: Deep Zoom features a powerful concept called Sparse Images not present in any other format known to me. However, I am considering to incorporate this feature into the OpenZoom description format at a later date. Powerful Addressing Scheme The description format again makes minimal assumptions about the location of the image tiles. Only the following two conditions have to be met: Tiles have to be addressable by their column and row in a cartesian or rectangular coordinate system. The client then simply applies string substitution to the reserved tokens {row} and {column} and replaces them with coordinates that have a range of [0, numRows) or [0, numColumns) respectively. The upper bounds numRows and numColumns are specified on the corresponding level element. Important to note is that unlike Zoomify which only seems to support JPEG tiles anyway and Deep Zoom where the extension is specified in the descriptor, the OpenZoom description format makes no assumptions about the file extension of the tiles whatsoever. In the days where server-side scripts with a extensions like .php or .cfm serve us images, it would be negligent to rely on the file type extension. For the client to decide if it can render the images that are being served, the format features a type property on the pyramid element that specifies the mime type of the tiles. Exceptions: Obviously, no matter how powerful a design is, there are always things it can't handle. For the OpenZoom description format this means sources such as Microsoft's Virtual Earth or the GigaPan project which both feature a quadtree-based addressing scheme. That the OpenZoom description format cannot describe these kinds of sources doesn't mean the OpenZoom framework can't render them. However, doing that involves some amount of code which in the case of OpenZoom would mean to implement the IMultiScaleImageDescriptor interface. For Silverlight Deep Zoom that would be the abstract MultiScaleTileSource class. Support for Multiple URLs As you may know, most current browsers are limited to 2 concurrent requests per domain. Therefore, the OpenZoom description format has support for defining multiple URLs for the same data. A client which supports the format is then able to concurrently fetch more than 2 image tiles at the same time. This technique is applied by most large map providers such as Google Maps and Microsoft Virtual Earth. Example
    
      
      
      
      
    

    Ease of Implementation Since the OpenZoom description format is very explicit (and therefore verbose), implementing a client to read it is very, very simple. Unlike Deep Zoom and Zoomify where the client has to do a considerable amount of work to compute the properties of the image pyramid, with the OpenZoom description format this work basically boils down to mapping the properties of the descriptor into the internal representation of a multiscale image description. In my opinion, the single biggest advantage of the OpenZoom description format is that a client that can read the format does not need to understand the algorithms that created the image pyramids which the format itself describes. This way we can totally decouple the producer of an image pyramid from its ultimate client. If you are interested in getting an idea of how all of this works, I suggest you take a look at the following classes in the OpenZoom source code repository: ZoomifyDescriptor, DZIDescriptor and OpenZoomDescriptor.

    Conclusion

    I hope this overview of the three multiscale image description formats gave you an idea on what problems each one of them is trying to solve and how well they succeed in doing that. When designing the OpenZoom description format, my intention was certainly not to create yet another description format. It simply tackles the issues of multiscale image formats from a different angle. Doing that, it turned out to be quite powerful in representing all kinds of multiscale images out there, including the two big ones: Deep Zoom and Zoomify. More importantly, the OpenZoom description format offers a way to describe a vast amount of all multiscale image out there under a single specification. That being said, the OpenZoom framework itself, which strives to be the most open, most flexible platform for multiscale images and Zoomable User Interfaces out there, obviously supports all of the formats discussed here equally well. I hope you've enjoyed this behind the scenes of the OpenZoom description format. At some point, I will show you an idea I've been working that involves these multiscale image descriptors. Until then, have a look at the links in the Further Reading section as there are some hidden gems.

    Disclaimer: All details of the OpenZoom description format are subject to change. Feature requests and opinions are welcome. As usual, feel free to leave a comment.

    Acknowledgement

    At this point, I'd like to thank my buddy Boris who unbeknownst to him, through our many very valuable discussions, considerably shaped the current form of the OpenZoom description format specification. Believe me when I say that without him the format would have most certainly been YAMSIDF.

    Footnotes

    [1 & 3] If you'd like to get some more background on this topic, I wrote an introduction to multiscale imaging and another article about the mathematical properties of an image pyramid using Microsoft's Deep Zoom as an example.

    [2] From my own experience, I know that there are unfortunately still people out there who think that there is some magic going on behind multiscale imaging. To set this straight, if you've used any of the following, Google Maps, Yahoo Maps, Microsoft Virtual Earth, Silverlight Deep Zoom, Seadragon AJAX, Seadragon Mobile or Zoomify,4 you should know that all of them basically work the same, namely with off the shelf JPEG or PNG image files. These files are stored either on disk or in a database. Once requested, they are sent to and rendered on the client which in the previous examples is either the browser, the Flash or Silverlight plugin or the iPhone. But you might ask: What about JPEG 2000? Indeed, there are some possible candidates for image file formats out there which would bring better support for multiscale imaging in the future. Two of them being JPEG 2000 and HD Photo. We won't see significant adoption of the first anytime soon because of legal issues such as this one. HD Photo originated at Microsoft and is being considered as successor to the JPEG standard dubbed JPEG XR. Again, widespread use won't happen overnight.

    [4] By the way, OpenZoom supports most of these out of the box.

    [5] In Inside Deep Zoom 2 I've explained the concept of tile overlap.

    Further Reading