17.1 The Internal Rewriter

The iChain internal rewriter is used to accomplish the following:

The following sections describe how the internal rewriter works and how to configure it:

17.1.1 Which URL References Are Rewritten?

The internal rewriter searches and parses Web content that passes through the accelerator and that meets certain criteria (see What Other Criteria Are Considered?) for URL references qualified to be rewritten. URL references are rewritten only under the following conditions:

  • URL references containing DNS names or IP addresses matching those in the accelerator’s Web server address list are rewritten with the accelerator’s DNS name.

  • URL references matching the accelerator's Alternate Host Name field are rewritten with the accelerator's DNS name.

  • URL references matching entries in the [Alias Host Names] section of the rewriter.cfg configuration file are rewritten with the accelerator’s DNS name. Details on the use of this file can be found in Configuring the Internal Rewriter.

17.1.2 What Other Criteria Are Considered?

The following criteria are considered when determining whether URL references should be rewritten:

Query Strings

The internal rewriter does not rewrite URL references contained within query strings. Only the hostname portion of the reference is evaluated for rewriting.

HTTP Headers

The internal rewriter rewrites qualified URL references occurring within certain types of HTTP response headers such as Location and Content-Location. The Location header is used to redirect the browser to where the resource can be found. The Content-Location header is used to provide an alternate location where the resource can be found.

JavaScript

Within JavaScript*, only absolute references are evaluated for rewriting. Relative references and absolute paths are not attempted. Absolute paths (/path/file.html) are evaluated if the file is read from a path-based multi-homing accelerator's origin Web server and the reference follows an HTML tag. For example, the string href=‘/path/file.html' is rewritten to href=‘/accelPath/path/file.html'.

HTML Tags

URL references occurring within the following HTML tags are evaluated for rewriting:

  • action

  • archive

  • background

  • base

  • cite

  • code

  • codebase

  • data

  • dynsrc

  • href

  • longdesc

  • lowsrc

  • onclick

  • pluginspage

  • src

  • usemap

  • usemapborderimage

NOTE:Value is not a default tag.

Mime Types

The rewriter parses pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter parses pages with the following Mime Content-Types:

  • text/html

  • text/css

  • application/x-javascript

If an HTTP or HTTPS response has a Mime Content-Type set to any of the above types, or if the file extension is html, htm, shtml, jhtml, asp, jsp, or NO EXTENSION, the page is parsed for possible rewriting.

Absolute and Relative References

An absolute reference is a reference that has all the information needed to locate a resource, including the hostname, such as http://internal.web.site.com/index.html. The internal rewriter always attempts to rewrite absolute references.

A relative reference is a reference that assumes the host/path and provides only the resource of the URI, such as index.htm. The internal rewriter does not attempt to rewrite a relative reference.

An absolute path is a reference that assumes the host. It provides the complete path, including the resource. The internal rewriter attempts to rewrite an absolute path only when it is defined in a path-based multi-homed accelerator.

Path-Based Multi-Homing

With an accelerator configured for path-based multi-homing, absolute references and absolute paths are evaluated for rewriting. Relative references are not attempted.

17.1.3 Configuring the Internal Rewriter

The behavior of the internal rewriter can be controlled through use of the sys:/etc/proxy/rewriter.cfg configuration file. This section provides information on the parameters that can be used within this file.

Several configuration sections can be added to rewriter.cfg, including [Mime Content-Type], [Extension], [Exclude], [Javascript Variables], [Javascript Calls], and [Alias Host Names]. Each section is detailed below.

Remember the following conditions when configuring the internal rewriter:

  • Sections within the file must be separated with two returns or empty lines.

  • If the first part of the line contains a pound sign (#) or semi-colon (;), the line is considered a comment line.

  • For the changes made to become effective, you must apply the changes. To have the changes apply to previously cached pages, you need to purge the cache.

[Mime Content-Type]

In addition to files with extensions listed in the [Extension] section below, the rewriter parses pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter parses pages with the following Mime Content-Types:

[Mime Content-Type]           text/html
text/css
text/xml
text/javascript
application/javascript
application/x-javascript

Text/plain is not a default content type.

[Extension]

It is unusual for data to come back from a Web server without a content type. If it does happen, the file extension is compared. If a file doesn't have an extension, it always matches the null case, and the rewriter parses the page. The default extensions are as follows:

[Extension]
html, htm
shtml, jhtml
asp, jsp
js
css

Additional file extensions can be added by using the [Extension] section of rewriter.cfg, as shown below:

[Extension]
home, myNewExtension
anotherLongExtension
a,b,c,d,e,f,g

As shown in this example, additional extensions can be specified on individual lines or by using commas to separate multiple extensions specified on a single line. All of the extensions listed are appended to the default list shown above. You cannot remove the default extensions.

[Exclude]

Exclude keeps the entity data from being rewritten if the requesting URL has a match in the exclude list. It checks the requested URL and doesn't check the URL returned from the Web server. For example:

[Exclude]
http://www.a.com/dont_rewrite             ; Match without ending slash
http://www.a.com/dont_rewrite/            ; Match with ending slash
http://www.a.com/dont_rewrite/index.html  ; Match specific file name
http://www.a.com/dont_rewrite/*           ; Includes all files and
                                          ; subfiles and the three 
                                          ; examples above
http://www.a.com/*                        ; Turn off rewriting for 
                                          ; this accelerator and all 
                                          ; path-based children.

[Javascript Variables]

You can add JavaScript variables to look for URL references. This adds to the HTML variable list (href=, src=, onclick=). The word parser edits the variable to the form [w]variable[ow]=[ow], where [w] represents a space, [ow] represents an optional space, and the URL reference follows.

For example, suppose you use the following JavaScript variables:

[Javascript Variables]
headingGif
plusGifUrl
minusGifUrl
stylesheetUrl

A JavaScript example would be:

var headingGif = “/path/path/file.gif”;

[Javascript Calls]

A JavaScript call can be made within JavaScript code or in HTML code, such as:

onclick='open(“/path/file.html”)'

By default, all parameters of a JavaScript call are looked at for rewriting. The first JavaScript call within an HTML tag (href, src, onclick, etc.) has all of its parameters parsed for rewriting.

This adds to the word parser the call to parse the parameters within a JavaScript call that appears in JavaScript code, not within the HTML URL tags.

The word parser edits the variable to the form “[w]call([ow][url]...)” where the URL references are parameters within the calls.

For example:

[Javascript Calls]
openWindow

[Alias Host Names]

Sometimes a URL reference specifies a hostname that does not meet default criteria for being rewritten; that is, it does not match the accelerator's alternate hostname or any value in the Web server address list. For example, assume that a URL reference contains the hostname of home (http://home/index.html), and home is not included in the Web server address list because it is not resolvable, nor is it the value of the accelerator’s Alternate Host Name field. By default, rewriting of the URL reference http://home/index.html would not occur. The [Alias Host Names] section of rewriter.cfg can be used to specify additional hostnames to be rewritten. Using alias hostnames only applies for absolute URLs. You cannot use alias hostnames for relative URLs.

The following is an example of how to use the [Alias Host Names] section. It has the following syntax:

[Alias Host Names]
AcceleratorName=aliasName

where AcceleratorName is the value specified in the accelerator's Name field, and aliasName is the string that is rewritten with the value specified in the accelerator's DNS name field.

For the home example used above, if the accelerator name is accel2 and the URL reference to be rewritten is the hostname home, the correct syntax is:

[Alias Host Names]
accel2=home

NOTE:The alias names are not case sensitive because hostnames should not be case sensitive.

You can use [Alias Host Names] to add and remove the set of hostnames, schemes, ports, and paths that are used to represent the identity of an accelerator's Web server.

The following example illustrates the syntax to use to rewrite these items and then provides examples:

#add an alias hostname
acceleratorName=aliasHostName
#add a full host reference
# http://alias
acceleratorName=scheme://aliasHostName
# http://alias:80
accleratorName=scheme://aliasHostName
#add an addition subpath for a path-based multi-homed accelerator
# that does NOT have the remove subpath option checked
acceleratorName=/additionalPath  
#Remove a hostname from being used to rewriting to this accelerator.
# Use this option when there are multiple accelerators that contain
# the same alternate hostname and Web server port.
acceleratorName!=alternateHostName
[Alias Host Names]
novell=HOME
novell=https://www.backend.com:444
download=/fileDownloads
novell!=testserver

17.1.4 Sample Rewriter Scenario

Assume that an accelerator is set up with the following information:

Accelerator Name:       novell
DNS Host Name:          www.novell.com
Alternate Host Name:    www.backend.com
Web Server Address:     151.155.1.1
Web Server Port:        80

  • The Web server delivers content through schemes HTTP and HTTPS on ports 80 and 443, respectively. The accelerator is set up to talk to Web server port 80, and to rewrite the https references.

    [Alias Host Names]
    novell=https://www.backend.com
    

    This is a common issue. If you use two accelerators, one on port 80 and the other on port 443, you are forced to have two listeners on the public side also listening to ports 80 and 443. This makes authentication to this site very difficult. The public side listening on port 80 must authenticate on a port other than 443 because two accelerators cannot share the same listening port.

    If you want only an HTTPS (secure) connection to the browser, using an alias works as long as the Web server can serve up the content on port 80.

    The setting of

    [Alias Host Names]
    novell=https://www.backend.com:443
    

    gives the same results because 443 is the default port for HTTPS.

  • On an Oracle Web server, the hostname of HOME appears in some URL references (for example, http://HOME/path/file). This hostname does not appear in any DNS table and should not be used as an alternate hostname because it is a real hostname for the Web server in addition to HOME.

    For example:

    [Alias Host Names]
    novell=HOME
    
  • Suppose that an IS department is phasing out a DNS hostname of www.novell.com and prefers that users use the new DNS hostname of thenew.novell.com. This means that there are two accelerators with two different sets of DNS hostnames that are both reading from the same back-end Web server name of www.backend.com. All references for http://www.backend.com are rewritten to the original requesting hostname. This should not be an issue.

    The issue for the rewriter is when a third Web server has a reference such as http://www.backend.com. This could be rewritten as http://www.novell.com or http://thenew.novell.com. The following setting removes the back-end name of www.backend.com from the rewriter list for the accelerator that has the public name of www.novell.com. In this example, oldname! is the name of the accelerator.

    [Alias Host Names]
    oldname!=www.backend.com
    

    This has a side effect of always rewriting http://www.backend.com to http://thenew.novell.com, even if you are reading from the www.novell.com accelerator.

  • Consider that an accelerator is a path-based multihomed child that does not remove the child sub-path from the URL. To add additional paths to re-route requests to this multihomed child, you would do the following:

    {Alias Host Names]
    downloads=/downloads1
    downloads=downloads2
    

17.1.5 Disabling the Internal Rewriter

There are three methods you can use to disable the internal rewriter:

Disabling Per Accelerator

By default, the internal rewriter is enabled for all accelerators. The internal rewriter can slow performance because of the parsing overhead. In some cases, a Web site might not have content with URL references that need to be rewritten. The internal rewriter can be disabled on a per-accelerator basis using the SET command on the command line of the iChain machine. The following is an example of how you would use this command:

SET ACCELERATOR <name> DisableRewriter=Yes

where <name> is the name of the accelerator for which you want to disable rewriting. This action is permanent upon reboot and is exported to the .nas file.

Disabling Per URL

The rewriter.cfg file also allows you to specify a list of URLs which are to be excluded by the rewriter. For example:

[exclude]
http://www.abc.com/xyz/*
http://www.abc.com/donotrewrite.html

As shown in this example, the exclusion causes all pages in the xyz subdirectory and the donotrewrite.html file to be left untouched. The syntax of the URLs requires them to be prefixed by http, and the domain name of the accelerator also must be defined.

Disabling In a Page

In some circumstances, you might find that you need more granularity. There are cases when only part of a page cannot or should not be rewritten. Although this deviates from the premise of iChain that you shouldn’t have to modify the origin server, you might encounter circumstances where it cannot be avoided.

In these cases, you can use the following tags in your origin pages.

For example:

<!--NOVELL_REWRITER_OFF-->
.
.
HTML data not to be rewritten
.
.
<!--NOVELL_REWRITER_ON-->

These tags are seen by browsers as a comment mark, and do show up on the screen (except possibly on older browser versions). Also, the last tag is optional, and if omitted, it prevents the rest of the page from being rewritten after the initial tag is encountered.

NOTE:If the page has been cached before you add the comment to turn off the rewriter, the page must be purged from cache.

17.1.6 Internal Rewriter Summary

  • The internal rewriter is on by default for all accelerators.

  • It reads sys://etc/proxy/rewriter.cfg for configuration settings.

  • The accelerator’s DNS Name and appropriate port number are always rewritten by the rewriter.

  • The rewriter compares URL references to values in the accelerator’s Alternate Host Name, Web server address fields, and to the [Alias Host Names] section of rewriter.cfg to determine whether a rewrite should occur.

  • It looks for URL references within files that pass the MIME type check.

  • If the MIME type is not found in the packet, the file extension is examined. By default, rewriting will be attempted for files with no extension and for files with the following extensions: html, htm, shtml, jhtml, asp, jsp.

  • Within JavaScript, only absolute references are rewritten.

  • It looks for URL references in HTTP header types content-location and location.

  • It rewrites absolute references and absolute paths from a path-based multi-homing accelerator.

  • It does not support rewriting non-UTF-encoded nested URLs, such as the following:

    <a href="javascript:document.forms.queryPropertyForm.action="url""></a>
    

    The second double quote character triggers iChain to cut off the URL completely, for example:

    "javascript:document.forms.queryPropertyForm.action="
    

    To avoid this problem, use one of the following formats for the nested URL:

    <a href="javascript:document.forms.queryPropertyForm.action=’url’">
    <a href=’javascript:document.forms.queryPropertyForm.action="url"’>