Visualization objectives - Creating a Visual XML Editor

There are several questions that need to be answered when visualizing an XML file. As the editor is to be domain independent, not many assumptions can be made. The XML file can be of any size and structured in any way as long as it follows XML regulations.

Next the main questions for this thesis are defined, the parts that build up a visualization of XML.

When discussing XML files and trees in general, their structure will be referred to using the terms parent, grandparent, child and sibling. A parent is a node that includes another node. The included node is a child of the parent. A grandparent is a parent higher up in the hierarchy than the parent. Nodes on the same level are siblings.

3.1. Showing structure

XML files are used because of the tree structure. Without the need for a structure a plain text file could be used instead. An XML file is always a tree, with one root. Everything within the document is within the root tag (with the exception of the document declaration), usually within several layers of tags. All tags form their own subtrees. The number of tags in the document or within another tag is not limited nor is the height of the tree.

My experience has shown that there are seldom more than seven levels of tags in XML files, but nothing restricts this. It seems that XML trees are usually more wide than high. One tag has several children in the same layer instead of several layers of children, which seems to be what makes an XML file big. This, however, cannot be taken for granted as this information is not based on any published research.

Showing the tree structure can be divided into several subquestions. These will now be examined with discussion on how they can be answered. Answers to some of these questions are found in existing tree visualizations, but other questions are XML specific.

Parent or child

Showing the structure starts from showing that a node is a parent. In a node-link diagram this can be shown using a special icon in front of the tag name to show that the tag has children. The icon can be a plus or minus sign or a closed or open folder to show if the children are shown or not. Color can also be used for showing the difference between a parent and a child.

Structure is often indicated using relative position. For example, in text indentation of a line under another line shows that the node on the upper line is a parent. In Figure 1 the structure of Tag11 is shown this way. One can also use textual notation, such as numbers or parentheses, to show the structure. This is demonstrated in Figure 10.

Node-link diagrams use lineation and positioning to show the relation between nodes. For example, a parent can be positioned above its child and connected to it with a line. A tree structure can also be shown with nodes inside or outside each other. That a node is positioned within another node can thus mean that it is a parent or a child.

Root node

Showing where the tree starts from is often implicitly shown by the structure, but it can also be necessary to show this by differentiating the root from other nodes. This is especially important if the used visualization is unknown to the user or if the structure is such that it is impossible to know which node is the root.

Leaf tags (does / does not have children)

In XML files, text is always at the end of the tree structure, in the leaf tags. Alternatively text nodes themselves can be classified as leaf nodes. Some visualizations only show the leaf nodes, whereas others highlight them in some way.

Subtree size (width and height)

With large XML files it can be necessary to show how many children a tag contains. As will be seen in Chapter 4, several existing editors provide a possibility to hide parts of the structure. Especially in this case it might be necessary to show how much content is hidden. In a node-link diagram this can be shown by the number of lines leaving a node.

SpaceTree [Plaisant et al., 2002], discussed in Chapter 5, shows the subtree as a triangle. The number of children is shown by shading, the height of the subtree by the height of the triangle and the width with the base of the triangle. The presentation of the node, for example its size, can also be dependent on the number of nodes in the next level(s). Instead of graphics, this information could also be shown using numbers.

Number of siblings

When editing one of several children, it can be advantageous to show where the current child is related to its siblings. This is important especially if some of the siblings are hidden. This can be shown using shading within the siblings, for example, so that the first one is light and the last one dark. This can also be shown with numbers.

Tag level (height)

In text or in a node-link diagram indentation can show the number of levels. In Nested Treemaps [Johnson and Shneiderman, 1991] (see Chapter 6) this is shown with the number of rectangles around the selected tag. The same information can also be shown using the path to the tag. If this is not shown directly it has to be solved by counting levels.

Path to tag

Solving the path to a tag can be a difficult task. If parts of the path are hidden one needs to navigate the tree to solve the whole path. As discussed in Chapter 4, some editors, like Oxygen XML editor [Oxygen, 2008], ease solving the path by showing the path in a separate part of the window. In other editors this has to be solved by navigating to the node or to the root. showing text, there is one special situation to consider. The structure can be such that a tag has text, then a tag with a subtree, and then text again. This situation is demonstrated in Tag10 in the test XML file in Figure 1. The tag in the middle of the text, Tag10.2, could be a large subtree. How should this be shown? Text editors show the subtree in its entirety but if the subtree is of minor importance, it could be hidden instead.

A comment is restricted by less strict rules than text elements. It can contain any text except for a double-hyphen “--”. This is because the characters “-->” are used to end the comment [Bray et al., 1998]. Comments can be free text describing the document but also a disabled part of the structure. Processing instructions are instructions for applications.

Next, I will discuss how XML elements and their content can be presented.

Tags

Showing a node in a tree can be done in several ways. This will be presented in Chapter 5 in greater detail. In short, elements can be shown as any shape, such as circles and rectangles. To differentiate elements from each other, one can use a wide range of options, including borders, color, shading, icons in front of names and textual icons.

Text and CDATA

Text can be either text by itself or within a CDATA section. Text within a CDATA section has fewer restrictions on permitted characters (necessity to escape characters).

Thus, CDATA sections can be shown differently than text, or identically to it, if text entities are decoded.

Attributes

Attributes are used to define properties for tags by names and values attached to the names. This resembles leaf tags with text inside them. Attributes can thus be shown

using the same means as those described for showing tags with text inside them. In text like in Figure 1 and several existing editors presented in Chapter 4, attributes are shown as text beside the tag name.

Comments

Comments are often handled as plain text, but they can also contain a disabled part of the XML document with structure. The decision is thus to show them as text (XML code) or visualize them like tags, but differentiate commented tags from normal tags somehow. One option for doing this is to gray out commented tags. However, comments can also contain invalid structures (not permitted by XML). These cannot be shown as tags, but have to be shown as text.

Processing instructions

Processing instructions are similar to tags with attributes but they cannot contain children. They could thus be shown using the same means as tags and attributes.

Next the questions defined in this chapter are used when evaluating existing XML editors and tree visualizations. In Chapter 4, I will evaluate existing XML editors in order to find out how current editors present XML documents. XML files are trees and existing tree visualizations can be used at least for showing the structure of XML documents. Thus, in Chapter 5 I will discuss how different tree visualizations can be used in a visual XML editor.

In document Creating a Visual XML Editor (sivua 10-14)