summaryrefslogtreecommitdiff
path: root/lib/TUWF/XML.pod
blob: 4685e39279cb26c8c924062de383144f42304a1e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
=head1 NAME

TUWF::XML - Easy XML and HTML generation with Perl

=head1 DESCRIPTION

This module provides an easy and simple method for generating XML (and with
that, HTML) documents in Perl. Unlike most other TUWF modules, this one can be
used separately, outside of the TUWF framework.

The goal of this module is to make HTML generation B<easier>, it is certainly
not a goal to abstract HTML generation behind generalized functions and
objects. Nor is it a goal to ensure the correctness of the generated HTML, that
remains the responsibility of the programmer (although this module can
certainly help). You will still be writing HTML yourself, the only difference
is that you use a more convenient syntax and you won't have to manually escape
everything you output.

The primary aim of this module was to generate XHTML and HTML5, and since both
can be expressed in proper XML, extending it to write XML was a small step. In
fact, this module is basically an XML generator with convenience functions for
HTML.

This module provides two interfaces: a function interface and an object
interface. Both can be used, even at the same time. The object interface is
required in threaded environments or when you want to generate multiple
documents simultaneously, while the function interface is far more
convenient, but has some limitations and contributes to namespace pollution.

The function interface looks like this:

  use TUWF::XML ':html5';
  
  TUWF::XML->new(default => 1);
  html sub {
    head sub {
      title 'Document title!';
    };
  };
  
  # -- or, in more imperative style:
  html;
    head;
      title 'Document title!';
    end;
  end 'html';

And the equivalent, using the object interface:

  # not required when used within a TUWF website
  use TUWF::XML;
  
  my $xml = TUWF::XML->new();
  $xml->html(sub {
    $xml->head(sub {
      $xml->title('Document title!');
    });
  });
  
  # -- or, again in more imperative style:
  $xml->html;
    $xml->head;
      $xml->title('Document title!');
    $xml->end;
  $xml->end('html');

You may also combine the two interfaces by setting the I<default> option in
C<new()> and mixing method calls and function calls, but that is rather
inconsistent and messy.

TUWF automatically calls C<TUWF::XML-E<gt>new(default =E<gt> 1, ..)> at the
start of each request, so you can start generating XML or HTML using the
function interface without having to initialize this module. Of course, if you
wish to generate an other XML document while processing a request, you should
use the object interface for that, otherwise this may cause problems with other
functions within the TUWF framework that assume that the default C<TUWF::XML>
object has been set to output to TUWF.

=head1 FUNCTION-ONLY FUNCTIONS

=head2 new(options)

Creates a new XML generator object, accepts the following options:

=over

=item default

0/1. When set to a true value, the newly created object will be used as the
default object: Any regular function call (that is, without an object) to any
of the functions listed in L<METHODS & FUNCTIONS|/"METHODS & FUNCTIONS"> will
act as if they were called on this object. Until a new object is created with
the I<default> option set, in which case the default object will be overwritten
again. Default: 0.

=item write

Should contain a subroutine reference that accepts a string as argument. This
subroutine will be called whenever there is data to output. If this option is
not specified, a default function that writes to C<stdout> is used.

=item pretty

Set to a positive integer to pretty-print the generated XML, set to 0 to
disable pretty-printing. The integer indicates the number of spaces to use for
each new level of indentation. It is recommended to have pretty-printing
disabled when generating HTML, since white-space around HTML elements tends to
have significance when being rendered, and with pretty-printing you will lose
the control on where to (not) insert whitespace. Default: 0 (disabled).

=back

=head2 mkclass(%classes)

Dynamically constructs a I<class> attribute, which can be passed to C<tag()>
and friends. This function accepts a hash where the keys are the class names
and the value indicates whether the class is enabled or not. This function
returns an empty list if none of the classes are enabled, or returns
C<< (class => $list_of_enabled_classes) >>.

This is convenient when the classes are dependant on other variables, e.g.:

  tag 'div', mkclass(hidden => $is_hidden, warning => $is_warning), 'Text';
  # Output:
  #  !$is_hidden && !$is_warning:  <div>Text</div>
  #  $is_hidden && !$is_warning:   <div class="hidden">Text</div>
  #  $is_hidden && $is_warning:    <div class="hidden warning">Text</div>

Note that the order in which classes are returned may be somewhat random.

=head2 xml_escape(string)

Returns the XML-escaped string. The characters C<&>, C<E<lt>>, C<E<gt>> and
C<"> will be replaced with their XML entity.

=head2 html_escape(string)

Does the same as C<xml_escape()>, but also replaced newlines with
C<E<lt>br /E<gt>> tags.

=head1 METHODS & FUNCTIONS

=head2 lit(string)

Output the given string B<lit>erally, without modification or escaping. This is
equivalent to just calling the I<write> subroutine passed to C<new()>.

=head2 txt(string)

XML-escape the string and then output it, equivalent to
C<lit(xml_escape $string)>.

=head2 xml()

Writes the following XML header:

  <?xml version="1.0" encoding="UTF-8"?>

Since this function does not open a tag, it does not have to be C<end()>'ed.

=head2 html(options)

Writes an (X)HTML doctype and opens an C<E<lt>htmlE<gt>> tag. Accepts the
following options:

=over

=item doctype

Specify the doctype to use. Can be one of the following:

  xhtml1-strict xhtml1-transitional xhtml1-frameset
  xhtml11 xhtml-basic11 xhtml-math-svg html5

These refer to the doctypes found at
L<http://www.w3.org/QA/2002/04/valid-dtd-list.html>. Default: html5.

=item lang

Specifies the (human) language of the generated content. This will generate a
C<lang> (and C<xml:lang> for XHTML) attribute for the html open tag.

=item I<anything else>

All other arguments are passed to C<tag()>.

=back

If you don't pass a I<contents> argument to this function, you should take care
to close the C<< <html> >> tag with an C<end()>.

=head2 tag(name, attribute => value, .., contents)

Generates an XML tag or element. The first argument is the name of the tag,
attributes can be specified after that with key/value pairs and finally the
contents can be specified. If the I<contents> argument is not present, an open
tag will be generated, which should be closed later on using C<end()>. If
I<contents> is present but undef, the generated tag will be self-closing, i.e.
it will end with a C</E<gt>> instead of a regular C<E<gt>>. If I<contents> is a
scalar, it will be used as the contents of the tag, after which the tag will be
closed with a closing tag (C<E<lt>/tagnameE<gt>>). If I<contents> is a CODE
reference, the subroutine will be called in between the start tag and the
closing tag.

The tag name and attribute names are outputted as-is, after some very basic
validation. The attribute values and contents are passed through
C<xml_escape()>.

Some example function calls and their output:

  tag('items');
  # <items>
  end();
  # </items>
  
  tag('link', href => '/', undef);
  # <link href="/" />
  
  tag('a', href => '/?f&c', title => 'Homepage', 'link');
  # <a href="/f&amp;c" title="Homepage">link</a>
  
  tag('summary', type => 'html', 'I can write in <b>bold</b>');
  # <summary type="html">I can write in &lt;b&gt;bold&lt;/b&gt;</summary>
  
  tag qw{content type xhtml xml:base http://example.com/ xml:lang en}, $content;
  # is equivalent to:
  lit '<content type="xhtml" xml:base="http://example.com/" xml:lang="en">';
  txt $content;
  lit '</content>';
  # except tag() can do pretty-printing when requested

  tag 'div', sub {
    tag 'a', href => '/', 'Home';
  };
  # <div><a href="/">Home</a></div>

=head2 end(name)

Closes the last tag opened by C<tag()> or C<html()>. The I<name> argument is
optional, when given, it will be used as validation. If the given I<name> does
not equal the last opened tag, an error is thrown.

Usage of this function is discouraged, as it may not be easy keep track of
which C<end()> belongs to which C<tag()>. An easier and more functional
approach is to not use C<end()> at all, and instead give a CODE reference to
C<tag()>. For example:

  tag 'body';
    tag 'b', 'text';
  end;

Is more safely written as:

  tag 'body', sub {
    tag 'b', 'text';
  };

=head2 <html-tag>(attribute => value, .., contents)

For convenience, all HTML5 and XHTML 1.0 tags have their own function that acts
as a shorthand for calling C<tag()>. For the function naming flavors, see
L</IMPORT OPTIONS> below.

Some tags are I<boolean>, meaning that they should always be self-closing and
not have any contents. To generate these tags with C<tag()>, you have to
specify undef as the I<contents> argument. This is not required when using
these convenience functions, the undef argument is automatically added for the
following tags:

  area base br col command embed hr img input link meta param source

Again, some examples:
  
  br;  # tag 'br', undef;
  div; # tag 'div';

  title 'Page title';
  # tag 'title', 'Page title';

  Link rel => 'shortcut icon', href => '/favicon.ico';
  # tag 'link', rel => 'shortcut icon', href => '/favicon.ico', undef;

  textarea rows => 10, cols => 50, $content;
  # tag 'textarea', rows => 10, cols => 50, $content;


=head1 IMPORT OPTIONS

By default, TUWF::XML does not export anything. You can import any specific
function (except C<new()>) by specifying it on the C<use> line:

  use TUWF::XML 'lit', 'html_escape', 'br';

  # after which you can call those functions as follows:
  lit html_escape $content;
  br;

Or you can import an entire group of functions by adding the C<:xml> group or
any of the HTML flavors to the list:

=over

=item B<:xml>

This group exports the functions C<xml()>, C<lit()>, C<txt()>, C<tag()>, and
C<end()>. All lower-case.

=item B<:html>

This group exports the following functions:

  tag html lit txt end

And the following XHTML 1.0 functions:

  a abbr acronym address area b base bdo big blockquote body br button caption
  cite code col colgroup dd del dfn div dl dt em fieldset form h1 h2 h3 h4 h5
  h6 head i img input ins kbd label legend li Link Map meta noscript object ol
  optgroup option p param pre Q samp script Select small span strong style Sub
  sup table tbody td textarea tfoot th thead title Tr tt ul var

Note that some functions start with an upper-case character. This is to avoid
problems with reserved keywords or overriding Perl core functions with the same
name.

=item B<:html5>

Same as C<:html>, except that instead of XHTML 1.0, this exports all HTML 5
functions:

  a abbr address area article aside audio b base bb bdo blockquote body br
  button canvas caption cite code col colgroup command datagrid datalist dd
  del details dfn dialog div dl dt em embed fieldset figure footer form h1 h2
  h3 h4 h5 h6 head header hr i iframe img input ins kbd label legend li Link
  Map mark menu meta meter nav noscript object ol optgroup option output p
  param pre progress Q rp rt ruby samp script section Select small source
  span strong style Sub sup table tbody td textarea tfoot th thead Time title
  Tr ul var video

=item B<:Html> and B<:Html5>

These are equivalent to C<:html> and C<html5>, respectively, except that all
functions start with an upper case character for consistency.  This flavor
looks like:

  use TUWF:XML ':Html5';
  Html sub {
    Head sub {
      Title 'Document title!';
      Tag 'a', href => '/', 'Home';
      Lit '&nbsp;';
    };
  };

=item B<:html_> and B<:html5_>

These are equivalent to C<:html> and C<html5>, respectively, except that all
functions are lower-case and are suffixed with an underscore. This flavor is
similar to Haskell's L<Lucid|https://hackage.haskell.org/package/lucid>:

  use TUWF::XML ':html5_';
  html_ sub {
    head_ sub {
      title_ 'Document title!';
      tag_ 'a', href => '/', 'Home';
      lit_ '&nbsp;';
    };
  };

=back

When using this module in a TUWF website, you can substitute C<TUWF::XML> with
C<TUWF>. The main TUWF module will then redirect its import arguments to this
module. This saves some typing, and allows you to import functions from other
TUWF modules on the same C<use> line.


=head1 SEE ALSO

L<TUWF>.

This module was inspired by L<XML::Writer|XML::Writer>, which is more powerful
but less convenient.

There's also L<DSL::HTML|DSL::HTML>, a slightly more featureful, heavyweight
and opinionated HTML-templating-inside-Perl module, based on
L<HTML::Tree|HTML::Tree>.

And there's L<HTML::Declare>, which is conceptually simpler than both this and
L<DSL::HTML>, but its syntax isn't quite as nice.

And there's also L<HTML::FromArrayref>, L<HTML::Tiny>, L<HTML::Untidy> and many
more modules on CPAN. In fact I don't know why you should use this module
instead of whatever is available on CPAN.


=head1 COPYRIGHT

Copyright (c) 2008-2017 Yoran Heling.

This module is part of the TUWF framework and is free software available under
the liberal MIT license. See the COPYING file in the TUWF distribution for the
details.


=head1 AUTHOR

Yoran Heling <projects@yorhel.nl>

=cut