=head1 NAME TUWF - The Ultimate Website Framework =head1 DESCRIPTION TUWF is a small framework designed for writing websites. It provides an abstraction layer to various environment-specific tasks and has common functions to ease the creation of both small and large websites. For a gentle introduction to TUWF, see L. =head2 Main features and limitations TUWF may be The Ultimate Website Framework, but it is not the perfect solution to every problem. This section introduces you to some main features and limitations you will want to know about before using TUWF. =over =item TUWF is small. I have seen many frameworks being advertised as "small" or "minimal", yet they either require loads of dependencies or are not small at all. TUWF, on the other hand, B quite small. Its total codebase is significantly smaller than the primary code of CGI.pm, and TUWF requires absolutely no extra dependencies to run. Some optional features, however, do require extra modules: =over =item * L: For the L database handling methods. =item * L: To run TUWF in a FastCGI environment. =item * L: To run the standalone HTTP server. =item * L: If you need to handle requests with a JSON body, or wish to output JSON yourself. =item * L: For output compression. =back =item The generated response is buffered. This allows you to change the response completely while generating an other one, which is extremely useful if your code decides to throw an error while a part of the response has already been generated. In such a case your visitor will properly see your error page and not some messed up page that does not make sense. Thanks to this buffering, you will also be able to set cookies and send other headers B generating the contents of the page. And as an added bonus, your pages will be compressed more efficiently when output compression is enabled. On the other hand, this means that you can't use TUWF for applications that require Websockets or other forms of streaming dynamic content (e.g. a chat application), and you may get into memory issues when sending large files. =item Everything is UTF-8. All TUWF functions (with some exceptions) will only accept and return Unicode strings in Perls native encoding. All incoming data is assumed to be encoded in UTF-8 and all outgoing data will be encoded in UTF-8. This is generally what you want when developing new applications. If, for some very strange reason, you want all I/O with the browser to be in anything other than UTF-8, you won't be able to use TUWF. It B possible to use external resources which use other encodings, but you will have to C that into Perls native encoding before passing it to any TUWF function. =item Designed for FastCGI environments. TUWF is designed and optimized to be run a FastCGI environment, but it can also be used for plain old CGI scripts or run as a standalone HTTP server. Due to the singleton design of TUWF, you should avoid running TUWF websites in persistent environments that allow multiple websites to share the same process, such as mod_perl. =item One (sub)domain is one website. TUWF assumes that the website you are working on resides directly under a (sub)domain. That is, the homepage of your website has a URI like I, and all sub-pages are directly beneath it. (e.g. I would be your "about" page). While it B possible to run a TUWF website in a subdirectory (i.e. the homepage of the site would be I), you will have to prefix all HTML links and registered URIs with the name of the subdirectory. This is neither productive, nor will it be fun when you wish to rename that directory later on. =item One website is one (sub)domain. In the same way as the previous point, TUWF is not made to handle websites that span multiple (sub)domains and have different behaviour for each one. It is possible - quite simple, even - to have a different subdomain affect some configuration parameter while keeping the structure and behaviour of the website the same as for the other domains. An example of this could be a language setting embedded in a subdomain: I could show to the English version of your site, while I will have the German translation. Things will become messy as soon as you want (sub)domains to behave differently. If you want I to host a forum and I to be a wiki, you will want to avoid programming both subdomains in the same TUWF script. A common solution is to write a separate script for each subdomain. It is still possible to share code among both sites by means of modules. =back =head2 General structure of a TUWF website A website written using TUWF consists of a single Perl script, optionally accompanied by several modules. The script is responsible for loading, initializing and running TUWF, and can be used as a CGI script, FastCGI script, or standalone server. For small and simple websites, this script may contain the code for the entire website. Usually, however, the actual implementation of the website is spread among the various modules. The script can load the modules by calling C or C. TUWF configuration variables can be set using C, and routing handlers can be registered through C and friends. These functions can also be called by the loaded modules. In fact, for larger websites it is common for the script to only initialize TUWF and load the modules, while routing handlers are registered in the modules. The framework is based on callbacks: At initialization, your code registers callbacks to the framework and then passes the control to TUWF using C. TUWF will then handle requests and call the appropriate functions you registered. =head2 The TUWF Object While TUWF can not really be called I, it does use B major object, called the I. This object can be accessed using the C function that is exported by default, or by accessing C<$TUWF::OBJ> directly. Even though it is an "instance" of C, you are encouraged to use it as if it is the main object for your website: You can use it to store global configuration settings and other shared data. All modules loaded using C and its recursive counterpart can add their own methods to the TUWF Object. This makes it easy to split the functionality of your website among different functions and files, without having to constantly load and import your utility modules in each file that uses them. Of course, with all these methods being imported into a single namespace, this does call for some method naming conventions to avoid name conflicts and other confusing issues. The main TUWF methods use camelCase and are often prefixed with a short identifier to indicate to which module or section they belong. For example, the L methods all start with C and L with C. It is a good idea to adopt this style when you write your own methods. Be warned that the data in the TUWF object may or may not persist among multiple requests, depending on whether your script is running in CGI, FastCGI or standalone mode. In particular, it is a bad idea to store session data in this object, assuming it to be available on the next request. Storing data specific to a single request in the object is fine, as long as you make sure to reset or re-initialize the data at the beginning of the request. The I hook is useful for such practice. =head2 Utility functions Besides the above mentioned methods, TUWF also provides various handy functions. These functions are implemented in the TUWF submodules (e.g. L) and can be imported manually through these modules. Check out the L below for the list of submodules. An alternative, and more convenient, approach to importing these functions into your code is also available: you can import functions from multiple submodules at once by adding their names and/or tags to the C line. The following two examples are equivalent: # the simple approach use TUWF ':xml', 'uri_escape', 'sqlprint'; # the classic approach use TUWF; use TUWF::XML ':xml'; use TUWF::Misc 'uri_escape'; use TUWF::DB 'sqlprint'; The first C line of the classic approach is not required if all you need is to import the functions. Omitting this line from your main website script, however, will cause the main TUWF code to not be loaded into memory, and the global functions (listed below) will then not be available. The simple approach does not suffer from this problem and is therefore recommended. =head1 EXPORTED FUNCTIONS By default, TUWF exports a single function. =head2 tuwf() Returns the global TUWF object. This allows for convenient DSL-like access to methods in the TUWF object: # Get the client IP: my $remote_ip = tuwf->reqIP; # Send a 404 response: tuwf->resNotFound; =head1 GLOBAL FUNCTIONS The main TUWF namespace contains several functions used to initialize the framework and register the callbacks for your website. =head2 TUWF::any(\@methods, $path, $sub) Register a method and path to a subroutine. I<@methods> is an array of HTTP methods to accept. C<$path> can either be a literal string or a regexp. C<$sub> is called on incoming HTTP requests if the method is in the given array and if the C<$path> fully matches L. It is common to use the C operator to quote the regex, which prevents you from having to escape slashes in the path as would be required with C. If there are multiple route handlers that match a single request, then the one that has been registered first will be called. HTTP methods are case-insensitive. If the C<$path> is a regex, then the pattern groups will be available through L<< tuwf->capture |/capture(key) >>. TUWF::any ['post'], '/', sub { # This code will be called on a "POST /" }; TUWF::any ['get'], '/literal.json', sub { # This code will be called on "GET /literal.json" # But NOT on "GET /literalXjson" }; TUWF::any ['get'], qr{/regex.json}, sub { # This code will be called on "GET /regex.json" # And also on "GET /regexXjson" }; TUWF::any ['get','head'], qr{/user/(?\d+)}, sub { # This code will be called on a "GET /user/123". # The user's ID will be available in # tuwf->capture('uid') # And # tuwf->capture(1); }; =head2 TUWF::del($path, $sub) Register a handler for C. Equivalent to C. =head2 TUWF::get($path, $sub) Register a handler for C and C on C<$path>. Equivalent to C. TUWF::get '/', sub { ... }; TUWF::get qr{/game/(.*)}, sub { my $gameid = tuwf->capture(1); }; =head2 TUWF::hook($hook, $sub) Add a hook. This allows you to run a piece of code before or after a request handler. Hooks are run in the same order as they are registered. The following hooks are supported: =over =item before The subroutine will be called before the request handler. If this handler calls C<< tuwf->done >>, then TUWF will assume that the handler has generated a suitable response, and any subsequent I handlers and request handlers will not be called. This replaces the L setting. =item after Called after the request handler has run, but before the result has been sent to the client. This hook is B called, even if a I hook has called C<< tuwf->done >> or if a route handler threw an exception. (The only time an I hook may not be called is when a preceding I hook threw an exception). This replaces the L setting. =back =head2 TUWF::load(@modules) Loads the listed module names and optionally imports their exported functions to the C namespace (see the L setting). The modules must be available in any subdirectory in C<@INC>. # make sure the website modules are available from @INC use lib 'mylib'; # load mylib/MyWebsite/HomePage.pm TUWF::load('MyWebsite::HomePage'); # load two other modules TUWF::load('MyWebsite::Forum', 'MyUtilities'); Note that your modules must be proper Perl modules. That is, they should return a true value (usually done by adding C<1;> to the end of the file) and they should have the correct namespace definition. =head2 TUWF::load_recursive(@modules) Works the same as C, but this also loads all submodules. # the following will load MyWebsite.pm (if it exists) and all modules below # the MyWebsite/ directory (if any). TUWF::load_recursive('MyWebsite'); Note that all submodules must be in the same parent directory in C<@INC>. =head2 TUWF::options($path, $sub) Register a handler for C. Equivalent to C. =head2 TUWF::patch($path, $sub) Register a handler for C. Equivalent to C. =head2 TUWF::post($path, $sub) Register a handler for C. Equivalent to C. =head2 TUWF::put($path, $sub) Register a handler for C. Equivalent to C. =head2 TUWF::register(regex => subroutine, ..) This is a legacy function, and only exists for backwards compatibility. It is similar to C, with the following differences: =over =item * C supports multiple registrations in a single call. =item * C only matches on 'GET', 'HEAD' and 'POST'. =item * The path argument to C is always interpreted as a regex, literal strings need to be manually escaped. =item * The path argument to C matches C<< tuwf->reqPath >> without the leading slash. =item * Regex captures are passed to the subroutine as additional arguments. =back =head2 TUWF::set(key => value, ..) Get or set TUWF configuration variables. When called with only one argument, will return the configuration variable with that key. Otherwise the number of arguments must be a multiple of 2, setting the configuration parameters. Note that most settings don't like to be changed from within route handlers and other callbacks. You should only need to set configuration variables during initialization. =over =item content_encoding Set the default output encoding. Supported values are I, I, I, I. See L for more information. Default: auto. =item cookie_defaults When set to a hashref, will be used as the default options to L. These options can still be overruled by each individual call to C. This can be useful when globally setting the cookie domain: TUWF::set(cookie_defaults => { domain => '.example.org' }); tuwf->resCookie(foo => 'bar'); # is equivalent to: tuwf->resCookie(foo => 'bar', domain => '.example.org'); # for each call to resCookie() Default: undef (disabled). =item cookie_prefix When set to a non-empty string, its value will be used as prefix to all cookie names used by C and C. C will act as if all cookies not having the configured prefix never existed, and removes the prefix when used in list context. C will simply add the prefix to all outgoing cookies. Default: undef (disabled). =item db_login Sets the login information for the L functions. Can be set to either an arrayref or a subroutine reference. In the case of an arrayref, the array should have three elements, containing the first three arguments to C. Do not include the last options argument, TUWF will set the appropriate options itself. When necessary, however, it is still possible to set options using the DSN string itself, see the L documentation for more information. C will automatically enable the unicode/utf8 flag for C, C and C. When setting this to a subroutine reference, the subroutine will be called when connecting to the database, with the main TUWF object as only argument. The subroutine is expected to return a DBI instance. It is the responsibility of the subroutine to set the correct DBI options. In particular, it is important to have I enabled and I disabled. It is also recommended to enable unicode support if your database driver has such an option. Default: undef (disabled). =item debug Set to a true value to enable debug mode. When debug mode is enabled and I is specified, TUWF will log page generation times for each request. This flag can be easily read through the C method, so you can also use is in your own code. Default: 0 (disabled). =item error_400_handler Similar to I, but is called when something in the request data did not make sense to TUWF. In the current implementation, this only happens when the request data contains non-UTF8-encoded text. A warning is written to the log file when this happens. B The warning of L applies here as well. =item error_404_handler Set this to a subroutine reference if you want to write your own 404 error page. The subroutine will be called with TUWF object as only argument, and is expected to generate a response. =item error_405_handler Similar to I, but is called when the HTTP request method is something other than HEAD, GET or POST. These requests are usually generated by bots or applications which don't actually read the response contents, so overriding the default 405 error page makes little sense in most situations. If you do override it, do not forget to add an C HTTP header to the response, as required by the HTTP standard. B The warning of L applies here as well. =item error_413_handler Similar to I, but is called when the POST body exceeds the configured I. B The warning of L applies here as well. =item error_500_handler Set this to a subroutine reference if you want to write your own 500 error page. The subroutine will be called with the TUWF object as first argument and the error message as second argument. When I is set, a detailed error report will be written to the log. It is recommended to ignore the error message passed to your subroutine and to enable the log file, so you won't risk sending sensitive information to your visitors. B When this handler is called, the database and request objects may not be in a usable state. This handler may also be called before any of the hooks have been run. It's best to keep this handler as simple as possible, and only have it generate a friendly response. =item http_server_port Port to listen on when running the standalone HTTP server. This defaults to the C environment variable, or 3000 if it is not set. =item import_modules This setting controls whether C and C will import all public functions into the C namespace. For example, with C enabled, a module can add a ChtmlFramework()> method as follows: package My::Package; use TUWF; use Exporter 'import'; our @EXPORT = ('htmlFramework'); sub htmlFramework { # .. } The same can still be done with C set to a false value, but it will work slightly differently: package My::Package; use TUWF; sub TUWF::Object::htmlFramework { # .. } # Or: *TUWF::Object::htmlFramework = sub { # .. }; This setting defaults to true, but that's only for historical reasons. The latter style is recommended for new projects. =item logfile To enable logging, set this to a string that indicates the path to your log file. The file must of course be writable by your script. TUWF automatically logs all Perl warnings, and when one of your callbacks throws an exception a full request dump with useful information will be logged, allowing you to easily locate and fix the problem. You can also write information to the log yourself using the C method. Default: undef (disabled). =item log_format Set to a subroutine reference to influence the default log format. The subroutine is passed three arguments: the main TUWF object, the URI of the current request (or '[init]' if C was called outside of a request), and the log message. The subroutine should return the string to be written to the log, including trailing newline. Be warned that your subroutine can be called even when no request is being processed or before some resources have been initialized, so you should avoid using such resources. In paticular, do not call any database functions from this subroutine, as the database connection may not be in a defined state. Any Perl warnings generated by this subroutine will not be logged in order to avoid infinite recursion. =item log_slow_pages Setting this to a number will log all pages that took longer to generate than the time indicated in the number, in milliseconds. The format of the log line is the same as used when the I option is enabled. This option is ignored when I is enabled, since in that case all pages will be logged anyway. Default: 0 (disabled). =item log_queries Setting this to a true value will write all database queries to the log file. Useful when debugging queries, but can generate a lot of data. Default: 0 (disabled). =item mail_from The default value of the C header of mail sent using L. Default: Cnoreply-yawf@blicky.netE>. =item mail_sendmail The path to the C command, used by L. Default: C. =item max_post_body Maximum length of the contents of POST requests, in bytes. This disallows people to upload large files and potentially cause your script to run out of memory. Set to 0 to disable this limit. Default: 10MB. =item mime_types Hash of file extension to MIME type. This is used by L to set the appropriate I header. The default already comes with a few extensions, but you can easily add over override your own using: TUWF::set('mime_types')->{exe} = 'application/x-msdownload'; =item mime_default The default MIME type for extensions not covered in L. =item pre_request_handler (Deprecated) Similar to a I hook, see L. Unlike the I hook, this subroutine should not call C<< tuwf->done >> or C<< tuwf->pass >>, but instead return a false value to send the response and prevent further processing, or return a true value to continue processing further handlers. =item post_request_handler (Deprecated) Equivalent to an I hook, see L. One notable difference is that this callback will B run when a I hook or request handler threw an exception or called C<< tuwf->done() >>. =item validate_templates Hashref, templates for the L function when called using L. The recommended way to add new templates is to call C with a single argument: TUWF::set('validate_templates')->{$key} = \%validate_options; =item xml_pretty Passed to the I option of Cnew()>. See L. Default: 0 (disabled). =back =head2 TUWF::run() After TUWF has been initialized, all modules have been loaded and all URIs have been registered, the last thing that remains is to execute C. This function will start processing requests and calls the appropriate callbacks at the appropriate stages. Whether this function ever returns or not depends on the environment your script is running in; if you're running your script in a CGI environment, C will return as soon as the request has been processed. If, on the other hand, you are running the script as a FastCGI script or standalone server, it will keep waiting for new incoming requests and will therefore never return. It is a bad idea to assume either way, so you want to avoid putting any run-time code after calling C. =head1 BASIC METHODS This section documents the basic TUWF object methods provided by TUWF.pm. The TUWF object provides many other methods as well, which are implemented and documented in the various sub-modules. See the documentation of each sub-module for the methods it provides. =head2 capture(key) Returns the capture group from the route regex. Both positional captures and named captures can be used, for example: TUWF::get qr{/user/(?[1-9][0-9]*)} => sub { # capture(1) is equivalent to $1 from the regex. tuwf->capture(1) == $uid; # capture('uid') is equivalent to $+{uid} from the regex. tuwf->capture('uid') == $uid; }; Note that C is not available; This would be equivalent to C<< tuwf->reqPath >>, save for the leading slash. =head2 debug() Returns the value of the I setting. =head2 done() Calling this method will immediately abort the current handler, run the I hooks, and output the response. When called from a I hook, this will prevent running any further I hooks or request handlers. Calling this method from a request handler is equivalent to a normal return from the handler, but it can still be useful to force a response from a nested function call, e.g.: sub require_admin { if(!user_is_admin()) { # Generate a friendly error page here. # ...and send it to the client: tuwf->done; } } TUWF::get '/admin' => sub { require_admin(); # At this point we can be sure that the user is an administrator, and # continue to generate our page. }; Calling this method from a I hook has no effect other than prematurely aborting that particular hook. This method calls C, so be sure to re-throw the error when run inside an eval block. =head2 log(message) Writes a message to the log file configured with I. When no log file is configured, C will do nothing. The I argument may contain newlines, which will be nicely (re-)formatted before logging, in order to avoid ambiguity with other log entries. By default the log message will be prefixed with the date and URI of the request, but this can be changed with the L setting. This function is not used very often in practice, since it is easier to simply use Perl's C function instead. TUWF automatically writes all warnings to the log file. =head2 pass() Calling this method will immediately abort the current handler and move on to the next handler (if any). Any side effects (e.g. setting response headers, generating output) will remain. If the final handler calls C, then any response data is discarded and a 404 response is generated instead. Calling this method from a I hook has no effect other than prematurely aborting that particular hook. This method calls C, so be sure to re-throw the error when run inside an eval block. =head1 SERVER CONFIGURATION Since a website written using TUWF consists of a single Perl script that acts as the main script for your site, the only thing you have to do is to run the script, or tell your webserver to run it for you. There are generally two things you should take care of: =over =item 1. Make sure you have the right modules installed: CGI is supported out of the box, but for FastCGI you will need L and for the standalone web server, you'll need L. =item 2. You have to make sure all requests to non-existing files are passed to your script, in order for the URI rewriting in TUWF to work. =back The following examples show how to configure your server to run the C script from the TUWF distribution. I assume the TUWF distribution is unpacked in C and the site runs on the hostname C. =head2 Standalone The easiest way to run a TUWF project is to simply... run it. TUWF will automatically start a HTTP server on port 3000, and your site will be accessible on L. This mode is intended for development use only, and it can only process a single request at a time. =head2 Examples for Apache (2.2 or 2.4) CGI mode: ServerName test.example.com DocumentRoot /tuwf/examples AddHandler cgi-script .pl # %{REQUEST_FILENAME} does not seem to always work inside a # But it should be equivalent to "%{DOCUMENT_ROOT}/%{REQUEST_URI}" RewriteEngine On RewriteCond "%{DOCUMENT_ROOT}/%{REQUEST_URI}" !-s RewriteRule ^/ /singlefile.pl It is possible to move the mod_rewrite statements into a .htaccess file, in which case you can remove the C lines in the above example and put the following in your .htaccess file: RewriteEngine On RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^/ /singlefile.pl FastCGI mode, using C. With this configuration it is possible to have the documentroot point to a different directory than where the TUWF script resides, which could improve security. ServerName test.example.com DocumentRoot /whatever/you/want AddHandler fcgid-script .pl FcgidWrapper /tuwf/examples/singlefile.pl virtual # same as above example, except 'singlefile.pl' can be anything, # as long as it ends with '.pl' RewriteEngine On RewriteCond "%{DOCUMENT_ROOT}/%{REQUEST_URI}" !-s RewriteRule ^/ /singlefile.pl Again, it is possible to move the rewrites into a .htaccess. All of the above examples assume the referenced directories have the appropriate options set using a CDirectoryE> clause. =head2 Examples for lighttpd (1.4) CGI mode: $HTTP["host"] == "test.example.com" { server.document-root = "/tuwf/examples" cgi.assign = ( ".cgi" => "" ) server.error-handler-404 = "/singlefile.pl" } FastCGI: fastcgi.server = ( ".singlefile" => (( "socket" => "/tmp/perl-tuwf-singlefile.socket", "bin-path" => "/tuwf/examples/singlefile.pl", "check-local" => "disable" )) ) $HTTP["host"] == "test.example.com" { server.document-root = "/whatever/you/want" cgi.assign = ( ".cgi" => "" ) server.error-handler-404 = "/something.singlefile" } =head1 SEE ALSO L, L, L, L, L, L. The homepage of TUWF can be found at L. TUWF is available on a git repository at L. =head1 COPYRIGHT Copyright (c) 2008-2018 Yoran Heling. This module is part of the TUWF framework and is free software available under the liberal MIT license. See the COPYING file in the TUWF distribution for the details. =head1 AUTHOR Yoran Heling =cut