Table of Contents
XML Cheat Sheet
I stumble across XML documents intermittently and every time I need to review the basics again. This is a cheat sheet so that I can review it whenever I need to. This is a summarized form of the XML Tutorial.
Also see the following related cheat sheets :
For reference this is the XML Specification and the version annotated by Tim Gray.
What is XML
- EXtensible Markup Language (XML) is a markup language designed to describe data. It has no predefined tags.
- XML uses a Document Type Definition (DTD) or an XML Schema to describe the data. An XML document together with its DTD or XML Schema is self-descriptive.
- XML Schema is the successor to DTD because it is richer and more extensible.
- XML uses text files to store data and can be used to create new languages e.g. WAP, WML, XHTML, RSS, SOAP etc.
- Because XML documents may contain Unicode characters, they should be saved as Unicode text files. The encoding attribute should be the same as the encoding that the text file is saved as.
- XML files are completely platform-independent and portable (EBCDIC platforms ?).
XML Syntax
- A simple XML document :
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "InternalNote.dtd"> <note date="12/11/2002"> <to>Alice</to> <from>Bob</from> <par>Hi.</par> <par>Bye.</par> </note>
- The first line is an XML declaration which defines the XML version and the character encoding used in the document.
- XML tags are case-sensitive and must have a corresponding closing tag. Empty elements can combine the start and closing tag e.g. <br />.
- XML tags must be properly nested.
- An XML document must have a root element (note in the above). All elements may have child elements.
- XML documents can be extended by adding new element types. Implementations must ignore unknown element types.
- Whitespace is preserved. CR/LF is converted into just LF (UNIX format line terminators).
- <!– This is a comment –>
XML Validation
- An XML document which is syntactically correct is described as well-formed.
- An XML parser or application must not try to interpret an XML document that is not well-formed. It must fail if the syntax is incorrect.
- AN XML document which is well-formed and conforms to the rules of a DTD or XML Schema is described as valid.
- A DTD or XML Schema defines the document structure with a list of legal elements and attributes.
XML Elements
- An XML element is everything from the start tag to the end tag.
- Elements can have attributes (in their start tag) which must be either single or double quoted e.g. date in the above.
- Elements can have either
- Empty content
- Simple content(text ony)
- Element content (child elements)
- Mixed content (child elements and text)
- Elements can be parents, children or siblings of other elements.
- Elements must be closed properly and be properly nested.
- XML element names can contain any character except for a space but must start with a letter and can't start with xml (in any case). Names shoud not include . : or - .
- < and & are illegal in XML elements. Avoiding ' “ and > is recommended. These should be replaced by character entities i.e. < > ' "e; &
- A CDATA section starts with ”<![CDATA[“ and ends with ”]]>“:
- Everything inside a CDATA section except for ]]> is permitted.
XML Attributes
- XML elements can have attributes in their start tag.
- A singly quoted attribute value cannot contain single-quotes. A doubly quoted attribute value cannot contain double-quotes.
- Although data can be stored either in child elements and attributes, attributes should really be used for metadata i.e. data about the data which is not part of the data itself. For example an element id is best stored in an attribute.
XML Namespaces
- XML Namespaces is covered as a separate XML recommendation .
- XML namespaces allow element names from XML documents not to conflict if they mean something different.
- A simple example of using a namespace
<h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table>
- The xmlns URL is not used but often points to an informational web page.
- If the form xmlns=“namespaceURI” is used instead then all child elements are automatically in that default namespace.
XML Stylesheets
- Cascading Style Sheets (CSS) are used to display XML by associating styles with element types.
- An XML document is associated with a stylesheet using
<?xml-stylesheet type="text/css" ref="simple.css"?>
- CSS is deprecated in favour of XSL.
- XML Stylesheet Language is the preferred formatting language for XML. It is a more sophisticated and powerful replacement for CSS.
- An XSL stylesheet can be associated with an XML document using
<?xml-stylesheet type="text/xsl" href="simple.xsl"?>
Assorted
- An XML data island is XML data embedded into an HTML page.
- There are also XML parsers and loaders in all modern web browsers.
- Web browsers can manipulate the XML document using the Document Object Model (DOM) which treats the XML document as a tree data object. The syntax varies slightly from browser to browser.
- A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource. The most common URI is the Uniform Resource Locator (URL) which identifies an Internet domain address. Another, not so common type of URI is the Universal Resource Name (URN).