The iChain internal rewriter is used to accomplish the following:
To rewrite URL references with the proper scheme (HTTP or HTTPS).
For example, an HTML file being accessed through an iChain accelerator for the Web site mynovell.com might contain a URL reference to http://mynovell.com/file1.html. If the accelerator for mynovell.com is using SSL sessions between the browser and iChain, the URL reference http://mynovell.com/file1.html must be rewritten to https://mynovell.com/file1.html. Otherwise, when the user clicks this link, the browser bounces between HTTP and HTTPS to establish a new SSL session.
To rewrite URL references that contain private IP addresses or private DNS names with the public DNS name of the iChain accelerator.
For example, suppose that a company has an internal Web site, internal.web.site.com, and wants to expose this site to Internet users through an iChain accelerator using a public DNS name of mynovell.com. Many of the HTML pages on this Web site have URL references that contain the private DNS name, such as http://internal.web.site.com/docs/file1.html. Because Internet users are unable to resolve internal.web.site.com, links using this URL reference would return DNS errors in the browser.
The internal rewriter can resolve this issue. The DNS name field in the accelerator configuration is set to mynovell.com, which users can resolve through a public DNS server to the accelerator’s public IP address. The rewriter parses Web content retrieved through the accelerator, and any URL references matching the private DNS name or private IP address listed in the Web server address field of the accelerator are changed (rewritten) with the public DNS name mynovell.com and port number of the accelerator.
Rewriting URL references addresses two issues: 1) URL references that are unreachable because of the use of private DNS names or IP addresses are now made accessible and 2) Rewriting prevents the exposure of private IP addresses and DNS names that might be viewed as sensitive information.
To rewrite the Host header in incoming HTTP packets to the name expected by the internal Web server.
Using the example above, suppose that the internal Web server expects all HTTP or HTTPS requests to have the Host field set to internal.web.site.com. When users send requests using the public DNS name mynovell.com, the Host field of the packets in those requests received by iChain is set to mynovell.com. iChain can be configured to rewrite this public name to the private name expected by the Web server by enabling the Alternate Host Name option, then entering the value internal.web.site.com in the adjacent field. Before iChain forwards packets to the Web server, the Host field is changed (rewritten) from mynovell.com to internal.web.site.com.
The following sections describe how the internal rewriter works and how to configure it:
The internal rewriter searches and parses Web content that passes through the accelerator and that meets certain criteria (see What Other Criteria Are Considered?) for URL references qualified to be rewritten. URL references are rewritten only under the following conditions:
URL references containing DNS names or IP addresses matching those in the accelerator’s Web server address list are rewritten with the accelerator’s DNS name.
URL references matching the accelerator's Alternate Host Name field are rewritten with the accelerator's DNS name.
URL references matching entries in the [Alias Host Names] section of the rewriter.cfg configuration file are rewritten with the accelerator’s DNS name. Details on the use of this file can be found in Configuring the Internal Rewriter.
The following criteria are considered when determining whether URL references should be rewritten:
The internal rewriter does not rewrite URL references contained within query strings. Only the hostname portion of the reference is evaluated for rewriting.
The internal rewriter rewrites qualified URL references occurring within certain types of HTTP response headers such as Location and Content-Location. The Location header is used to redirect the browser to where the resource can be found. The Content-Location header is used to provide an alternate location where the resource can be found.
Within JavaScript*, only absolute references are evaluated for rewriting. Relative references and absolute paths are not attempted. Absolute paths (/path/file.html) are evaluated if the file is read from a path-based multi-homing accelerator's origin Web server and the reference follows an HTML tag. For example, the string href=‘/path/file.html' is rewritten to href=‘/accelPath/path/file.html'.
URL references occurring within the following HTML tags are evaluated for rewriting:
action
archive
background
base
cite
code
codebase
data
dynsrc
href
longdesc
lowsrc
onclick
pluginspage
src
usemap
usemapborderimage
NOTE:Value is not a default tag.
The rewriter parses pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter parses pages with the following Mime Content-Types:
text/html
text/css
application/x-javascript
If an HTTP or HTTPS response has a Mime Content-Type set to any of the above types, or if the file extension is html, htm, shtml, jhtml, asp, jsp, or NO EXTENSION, the page is parsed for possible rewriting.
An absolute reference is a reference that has all the information needed to locate a resource, including the hostname, such as http://internal.web.site.com/index.html. The internal rewriter always attempts to rewrite absolute references.
A relative reference is a reference that assumes the host/path and provides only the resource of the URI, such as index.htm. The internal rewriter does not attempt to rewrite a relative reference.
An absolute path is a reference that assumes the host. It provides the complete path, including the resource. The internal rewriter attempts to rewrite an absolute path only when it is defined in a path-based multi-homed accelerator.
With an accelerator configured for path-based multi-homing, absolute references and absolute paths are evaluated for rewriting. Relative references are not attempted.
The behavior of the internal rewriter can be controlled through use of the sys:/etc/proxy/rewriter.cfg configuration file. This section provides information on the parameters that can be used within this file.
Several configuration sections can be added to rewriter.cfg, including [Mime Content-Type], [Extension], [Exclude], [Javascript Variables], [Javascript Calls], and [Alias Host Names]. Each section is detailed below.
Remember the following conditions when configuring the internal rewriter:
Sections within the file must be separated with two returns or empty lines.
If the first part of the line contains a pound sign (#) or semi-colon (;), the line is considered a comment line.
For the changes made to become effective, you must apply the changes. To have the changes apply to previously cached pages, you need to purge the cache.
In addition to files with extensions listed in the [Extension] section below, the rewriter parses pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter parses pages with the following Mime Content-Types:
[Mime Content-Type] text/html text/css text/xml text/javascript application/javascript application/x-javascript
Text/plain is not a default content type.
It is unusual for data to come back from a Web server without a content type. If it does happen, the file extension is compared. If a file doesn't have an extension, it always matches the null case, and the rewriter parses the page. The default extensions are as follows:
[Extension] html, htm shtml, jhtml asp, jsp js css
Additional file extensions can be added by using the [Extension] section of rewriter.cfg, as shown below:
[Extension] home, myNewExtension anotherLongExtension a,b,c,d,e,f,g
As shown in this example, additional extensions can be specified on individual lines or by using commas to separate multiple extensions specified on a single line. All of the extensions listed are appended to the default list shown above. You cannot remove the default extensions.
Exclude keeps the entity data from being rewritten if the requesting URL has a match in the exclude list. It checks the requested URL and doesn't check the URL returned from the Web server. For example:
[Exclude] http://www.a.com/dont_rewrite ; Match without ending slash http://www.a.com/dont_rewrite/ ; Match with ending slash http://www.a.com/dont_rewrite/index.html ; Match specific file name http://www.a.com/dont_rewrite/* ; Includes all files and ; subfiles and the three ; examples above http://www.a.com/* ; Turn off rewriting for ; this accelerator and all ; path-based children.
You can add JavaScript variables to look for URL references. This adds to the HTML variable list (href=, src=, onclick=). The word parser edits the variable to the form [w]variable[ow]=[ow], where [w] represents a space, [ow] represents an optional space, and the URL reference follows.
For example, suppose you use the following JavaScript variables:
[Javascript Variables] headingGif plusGifUrl minusGifUrl stylesheetUrl
A JavaScript example would be:
var headingGif = “/path/path/file.gif”;
A JavaScript call can be made within JavaScript code or in HTML code, such as:
onclick='open(“/path/file.html”)'
By default, all parameters of a JavaScript call are looked at for rewriting. The first JavaScript call within an HTML tag (href, src, onclick, etc.) has all of its parameters parsed for rewriting.
This adds to the word parser the call to parse the parameters within a JavaScript call that appears in JavaScript code, not within the HTML URL tags.
The word parser edits the variable to the form “[w]call([ow][url]...)” where the URL references are parameters within the calls.
For example:
[Javascript Calls] openWindow
Sometimes a URL reference specifies a hostname that does not meet default criteria for being rewritten; that is, it does not match the accelerator's alternate hostname or any value in the Web server address list. For example, assume that a URL reference contains the hostname of home (http://home/index.html), and home is not included in the Web server address list because it is not resolvable, nor is it the value of the accelerator’s Alternate Host Name field. By default, rewriting of the URL reference http://home/index.html would not occur. The [Alias Host Names] section of rewriter.cfg can be used to specify additional hostnames to be rewritten. Using alias hostnames only applies for absolute URLs. You cannot use alias hostnames for relative URLs.
The following is an example of how to use the [Alias Host Names] section. It has the following syntax:
[Alias Host Names] AcceleratorName=aliasName
where AcceleratorName is the value specified in the accelerator's Name field, and aliasName is the string that is rewritten with the value specified in the accelerator's DNS name field.
For the home example used above, if the accelerator name is accel2 and the URL reference to be rewritten is the hostname home, the correct syntax is:
[Alias Host Names] accel2=home
NOTE:The alias names are not case sensitive because hostnames should not be case sensitive.
You can use [Alias Host Names] to add and remove the set of hostnames, schemes, ports, and paths that are used to represent the identity of an accelerator's Web server.
The following example illustrates the syntax to use to rewrite these items and then provides examples:
#add an alias hostname acceleratorName=aliasHostName
#add a full host reference # http://alias acceleratorName=scheme://aliasHostName # http://alias:80 accleratorName=scheme://aliasHostName
#add an addition subpath for a path-based multi-homed accelerator # that does NOT have the remove subpath option checked acceleratorName=/additionalPath
#Remove a hostname from being used to rewriting to this accelerator. # Use this option when there are multiple accelerators that contain # the same alternate hostname and Web server port. acceleratorName!=alternateHostName
[Alias Host Names] novell=HOME novell=https://www.backend.com:444 download=/fileDownloads novell!=testserver
Assume that an accelerator is set up with the following information:
Accelerator Name: novell DNS Host Name: www.novell.com Alternate Host Name: www.backend.com Web Server Address: 151.155.1.1 Web Server Port: 80
The Web server delivers content through schemes HTTP and HTTPS on ports 80 and 443, respectively. The accelerator is set up to talk to Web server port 80, and to rewrite the https references.
[Alias Host Names] novell=https://www.backend.com
This is a common issue. If you use two accelerators, one on port 80 and the other on port 443, you are forced to have two listeners on the public side also listening to ports 80 and 443. This makes authentication to this site very difficult. The public side listening on port 80 must authenticate on a port other than 443 because two accelerators cannot share the same listening port.
If you want only an HTTPS (secure) connection to the browser, using an alias works as long as the Web server can serve up the content on port 80.
The setting of
[Alias Host Names] novell=https://www.backend.com:443
gives the same results because 443 is the default port for HTTPS.
On an Oracle Web server, the hostname of HOME appears in some URL references (for example, http://HOME/path/file). This hostname does not appear in any DNS table and should not be used as an alternate hostname because it is a real hostname for the Web server in addition to HOME.
For example:
[Alias Host Names] novell=HOME
Suppose that an IS department is phasing out a DNS hostname of www.novell.com and prefers that users use the new DNS hostname of thenew.novell.com. This means that there are two accelerators with two different sets of DNS hostnames that are both reading from the same back-end Web server name of www.backend.com. All references for http://www.backend.com are rewritten to the original requesting hostname. This should not be an issue.
The issue for the rewriter is when a third Web server has a reference such as http://www.backend.com. This could be rewritten as http://www.novell.com or http://thenew.novell.com. The following setting removes the back-end name of www.backend.com from the rewriter list for the accelerator that has the public name of www.novell.com. In this example, oldname! is the name of the accelerator.
[Alias Host Names] oldname!=www.backend.com
This has a side effect of always rewriting http://www.backend.com to http://thenew.novell.com, even if you are reading from the www.novell.com accelerator.
Consider that an accelerator is a path-based multihomed child that does not remove the child sub-path from the URL. To add additional paths to re-route requests to this multihomed child, you would do the following:
{Alias Host Names] downloads=/downloads1 downloads=downloads2
There are three methods you can use to disable the internal rewriter:
By default, the internal rewriter is enabled for all accelerators. The internal rewriter can slow performance because of the parsing overhead. In some cases, a Web site might not have content with URL references that need to be rewritten. The internal rewriter can be disabled on a per-accelerator basis using the SET command on the command line of the iChain machine. The following is an example of how you would use this command:
SET ACCELERATOR <name> DisableRewriter=Yes
where <name> is the name of the accelerator for which you want to disable rewriting. This action is permanent upon reboot and is exported to the .nas file.
The rewriter.cfg file also allows you to specify a list of URLs which are to be excluded by the rewriter. For example:
[exclude] http://www.abc.com/xyz/* http://www.abc.com/donotrewrite.html
As shown in this example, the exclusion causes all pages in the xyz subdirectory and the donotrewrite.html file to be left untouched. The syntax of the URLs requires them to be prefixed by http, and the domain name of the accelerator also must be defined.
In some circumstances, you might find that you need more granularity. There are cases when only part of a page cannot or should not be rewritten. Although this deviates from the premise of iChain that you shouldn’t have to modify the origin server, you might encounter circumstances where it cannot be avoided.
In these cases, you can use the following tags in your origin pages.
For example:
<!--NOVELL_REWRITER_OFF--> . . HTML data not to be rewritten . . <!--NOVELL_REWRITER_ON-->
These tags are seen by browsers as a comment mark, and do show up on the screen (except possibly on older browser versions). Also, the last tag is optional, and if omitted, it prevents the rest of the page from being rewritten after the initial tag is encountered.
NOTE:If the page has been cached before you add the comment to turn off the rewriter, the page must be purged from cache.
The internal rewriter is on by default for all accelerators.
It reads sys://etc/proxy/rewriter.cfg for configuration settings.
The accelerator’s DNS Name and appropriate port number are always rewritten by the rewriter.
The rewriter compares URL references to values in the accelerator’s Alternate Host Name, Web server address fields, and to the [Alias Host Names] section of rewriter.cfg to determine whether a rewrite should occur.
It looks for URL references within files that pass the MIME type check.
If the MIME type is not found in the packet, the file extension is examined. By default, rewriting will be attempted for files with no extension and for files with the following extensions: html, htm, shtml, jhtml, asp, jsp.
Within JavaScript, only absolute references are rewritten.
It looks for URL references in HTTP header types content-location and location.
It rewrites absolute references and absolute paths from a path-based multi-homing accelerator.
It does not support rewriting non-UTF-encoded nested URLs, such as the following:
<a href="javascript:document.forms.queryPropertyForm.action="url""></a>
The second double quote character triggers iChain to cut off the URL completely, for example:
"javascript:document.forms.queryPropertyForm.action="
To avoid this problem, use one of the following formats for the nested URL:
<a href="javascript:document.forms.queryPropertyForm.action=’url’"> <a href=’javascript:document.forms.queryPropertyForm.action="url"’>