tutorial

tutorial

Documents
8 pages
Lire
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

xmlformat TutorialPaul DuBoisTable of Contents1. Introduction ............................................................................................................ 12. Formatting a Document ............................................................................................ 13. Using a Configuration File ......................................................................................... 24. Discovering "Inherited" Formatting Options ................................................................. 65. Checking for Unconfigured Elements .......................................................................... 71. IntroductionThis document is a user guide that provides a tutorial introduction to the xmlformat program. Anotherdocument, The xmlformat Document Formatter, describes the capabilities of in more detail.2. Formatting a DocumentSuppose you have an XML document nameddoc1.xml that looks like this:I bought a new coffee cup!200421Suppose further that you want it to look like this:I bought a new coffee cup!200421By happy coincidence, that happens to be exactly the default output style produced by xmlformat. Toreformat your document, all you have to do is run xmlformat with the document filename as the argu-ment, ...

Sujets

Informations

Publié par
Nombre de visites sur la page 98
Langue English

Informations légales : prix de location à la page  €. Cette information est donnée uniquement à titre indicatif conformément à la législation en vigueur.

Signaler un problème
xmlformat Tutorial Paul DuBois <paul@kitebird.com>
Table of Contents
1.
2.
1. Introduction ............................................................................................................ 1 2. Formatting a Document ............................................................................................ 1 3. Using a Configuration File ......................................................................................... 2 4. Discovering "Inherited" Formatting Options ................................................................. 6 5. Checking for Unconfigured Elements .......................................................................... 7
Introduction
This document is a user guide that provides a tutorial introduction to thexmlformatprogram. Another document,ThexmlformatDocument Formatter, describes the capabilities ofxmlformatin more detail. Formatting a Document
Suppose you have an XML document named that looks like this: doc1.xml
<event> <description>I bought a new coffee cup!</description> <date><year>2004</year><month>2</month><day>1</day></date> </event>
Suppose further that you want it to look like this:
<event> <description>I bought a new coffee cup!</description> <date> <year>2004</year> <month>2</month> <day>1</day> </date> </event>
By happy coincidence, that happens to be exactly the default output style produced byxmlformat. To reformat your document, all you have to do is runxmlformatwith the document filename as the argu ment, saving the output in another file:
%xmlformat doc1.xml > output
Note: represents your shell prompt; do not type it as part of the command. % If you are confident that the output style produced byxmlformatwill be as you desire, you can be reck less and perform an inplace conversion:
1
3.
%xmlformat i doc1.xml
xmlformatTutorial
In this case,xmlformatreads the document from the input file, reformats it, and writes it back out to the same file, replacing the file's original contents. If you are not quite so reckless, use in conjunction i with a option to make a backup file that contains the original document. takes an argument that b b specifies the suffix to add to the original filename to create the backup filename. For example, to back up the original file in a file named , use this command: doc1.xml doc1.xml.bak
%xmlformat i b .bak doc1.xml
Using a Configuration File
In the preceding example, the desired output style for was the same as whatxmlformatpro doc1.xml duces by default. But what if the default style isnotwhat you want? In that case, you must tellxml formathow to handle your document. This is at once both the weakness and strength ofxmlformat. The weakness is that it is extra work to instructxmlformathow you want it to format a document. The strength is that it's possible to do so. Other XML formatters do not require any extra work, but that's be cause they are not configurable.
Suppose looks like this: doc2.xml
<example><title>Compiling and Running a Program</title> <para>To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file:</para><screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
That's ugly, and you want it to rewrite it like this:
<example> <title>Compiling and Running a Program</title> <para> To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file: </para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
The key characteristics of this rewrite are as follows:
2
xmlformatTutorial
Child elements of the element are separated by blank lines, but not indented within it. <example> The text inside the element is reformatted, adjusted to 60 characters per line and indented. <para> The contents of the element are left alone. <screen>
Unfortunately, if you run throughxmlformat, it comes out like this: doc2.xml
<example> <title>Compiling and Running a Program</title> <para>To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file:</para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
This output is unsuitable. Among the offenses committed byxmlformat, two are most notable:
The text of the element has been left alone, not reformatted. <para> The element content has been reformatted, not left intact. <screen>
In these respects, it appears thatxmlformathas done exactly theoppositeof what was wanted! Further more, had you used the option to reformat the file in place without using to make a backup, at i b this point you would have a file containing a element that you'd have to fix up by hand to <screen> restore it to its original condition.
What a worthless, worthless program!
The rewriting of the element points to an important lesson: Before trustingxmlformatwith <screen> your documents, it's best to run some tests and tune your configuration as necessary to make sure it will produce the results you want. Otherwise, you may produce changes that affect the integrity of your doc uments. This is particularly true when they contain elements such as or <screen> that should be copied verbatim, without change. <programlisting> Configuringxmlformatamounts to writing a configuration file that instructs it what to do. For , that means tellingxmlformatto leave the element alone, to normalize the text doc2.xml <screen> of the paragraph to fill lines and wrap them to a given length, and to put blank lines around subelements of the element. <example> Let's begin by creating a very basic configuration file. What should we call it?xmlformatcan read con figuration settings from a file named on the command line with a or option. This f configfile means you can name the file whatever you want. However, if you put the settings in a file named xml in the current directory,xmlformatwill read the file automatically. That's an easier ap format.conf proach, because you won't need to use a commandline option to specify the configuration file. So create a file named that contains the following two lines: xmlformat.conf
screen format = verbatim
3
xmlformatTutorial
These lines specify that elements should be formatted as verbatim elements. That is,xml <screen> formatshould reproduce their content in the output exactly as it appears in the input, without modifica tion. The first line must begin in column 1 (no preceding spaces or tabs). The second line must begin with at least one space or tab. Presence or absence of whitespace is howxmlformatdistinguish the names of elements to be formatted from the instructions that indicatehowto format them.
After creating , runxmlformat. It reads the newlyagain to process xmlformat.conf doc2.xml created configuration file and produces this result:
<example> <title>Compiling and Running a Program</title> <para>To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file:</para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
That's a little better:xmlformathas not destroyed the element by reformatting it. But prob <screen> lems remain: The paragraph content has not been reformatted, and there are no blank lines between sub elements.
Let's take care of the paragraph next. To set up its formatting, add a section to for xmlformat.conf elements: <para>
para format = block normalize = yes wraplength = 60 subindent = 1 screen format = verbatim
The order of sections in the configuration file doesn't matter. Put them in the order that makes most sense to you. The order of option lines under the initial section line doesn't matter, either.
The first two options in the section specify that the element is a block element, and that para <para> text within it should be normalized. Turning on the option tellsxmlformatthat it's okay to normalize reformat the text within the element. This means that runs of whitespace within the text are collapsed to single spaces, and that whitespace at the beginning and end of the text can be adjusted (typically to put the text on different lines than the element's opening and closing tags). Enabling normalization also al lows you to perform text linewrapping and indenting. The option specifies the maxim wraplength um number of characters per line, and specifies the indenting of text and subelements, re subindent lative to the element's own tags. Note that whenxmlformatperforms linewrapping, it includes the cur rently prevailing indent as part of the line length. (For example, if the prevailing indent is 20 spaces and value is , lines will contain at most 40 characters following the indentation.) wraplength 60 After adding the section to ,xmlformatproduces this result: para xmlformat.conf
<example> <title>Compiling and Running a Program</title> <para>
4
xmlformatTutorial
To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file: </para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
The paragraph now is wrapped and indented. However, it doesn't seem to be wrappedquitecorrectly, because the element actually would fit on the previous line. This happens because <replaceable> no formatting options were specified for in the configuration file. As a result, it is <replaceable> treated as having the default element type of , using the default behavior that block elements are block written out beginning on a new line.
To fix this problem, we should configure as an inline element. That will cause it to <replaceable> be formatted inline with the other text (and thus linewrapped along with it). Modify the configuration file to include a section: this: replaceable
para format = block normalize = yes wraplength = 60 subindent = 1 replaceable format = inline screen format = verbatim
The resulting output after making this change is as follows:
<example> <title>Compiling and Running a Program</title> <para> To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file: </para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
We're getting close now. All we need to do is space out the child elements with a blank <example> line in between. Subelement spacing is controlled by three formatting properties:
controls spacing after the opening tag of an element (that is, the spacing upon entry entrybreak into the element's content).
controls the spacing between subelements. elementbreak
5
4.
xmlformatTutorial
controls spacing before the closing tag of an element (that is, the spacing upon exit exitbreak from the element's content).
The value for each of these formatting options should be an integer indicating the number of newlines to write. A value of causes one newline, which acts simply to break to the next line. To get a blank line, 1 the break value needs to be . Modify the configuration file by adding a section for ele 2 <example> ments:
example format = block entrybreak = 2 elementbreak = 2 exitbreak = 2 subindent = 0 para format = block normalize = yes wraplength = 60 subindent = 1 replaceable format = inline screen format = verbatim
The resulting output is:
<example> <title>Compiling and Running a Program</title> <para> To compile and run the program, use the following commands, where <replaceable>sourcefile</replaceable> is the name of the source file: </para> <screen> <userinput>cc</userinput> <replaceable>sourcefile</replaceable> <userinput>./a.out</userinput> </screen> </example>
We're done!
You may be thinking, "Wow, that's a lot of messing around just to format that tiny little document." That's true. However, the effort of setting up configuration files tends to be "reusable," in the sense that you can use the same file to format multiple documents that all should be written using the same style. Also, if you have different projects requiring different styles, it tends to be easiest to begin setting up the configuration file for one project by beginning with a copy of the file from another project. Discovering "Inherited" Formatting Options
6
5.
xmlformatTutorial
In the final formatting of , note that the paragraph tags appear on separate lines preceding doc2.xml and following the paragraph content. This occurs despite the fact that the configuration file specifies no break values in the section, because if you omit formatting options for an element, it "inherits" the para default properties. In the case of the element, the relevant unspecified properties are the <para> and values. For block elements, both have a value of by default (that entrybreak exitbreak 1 is, one newline), which causes a line break after the opening tag and before the closing tag.
If you want to see all the formatting optionsxmlformatwill use, run it with the op showconfig tion. For example:
%xmlformat showconfig *DEFAULT format = block entrybreak = 1 elementbreak = 1 exitbreak = 1 subindent = 1 normalize = no wraplength = 0 *DOCUMENT format = block entrybreak = 0 elementbreak = 1 exitbreak = 1 subindent = 0 normalize = no wraplength = 0 example format = block entrybreak = 2 elementbreak = 2 exitbreak = 2 subindent = 0 normalize = no wraplength = 0 para format = block entrybreak = 1 elementbreak = 1 exitbreak = 1 subindent = 1 normalize = yes wraplength = 60 replaceable format = inline screen format = verbatim
No configuration file is specified on the command line, soxmlformatreads the default configuration file, . Then it displays the resulting configuration options. You can see that the xmlformat.conf section has inherited break values from the section. para *DEFAULT Checking for Unconfigured Elements
7
xmlformatTutorial
Any elements appearing in the input document that are not named in the configuration file are formatted using the values of the section. If the file contains no section,xmlformatuses *DEFAULT *DEFAULT builtin default values.
If you want to see whether there are any elements in the document for which you haven't specified any formatting options, runxmlformatwith the option. For ex showunconfiguredelements ample:
%xmlformat showunconfiguredelements doc2.xml The following document elements were assigned no formatting options: title
As it happens, the title already formats in the desired fashion, so there's no necessity of adding anything more to the configuration file.
8