CWebProxy    developed by:
Memorial University of Newfoundland

Purpose

CWebProxy allows incorporation of web-based services as channels. It provides mechanisms for connecting to and rendering HTML and XML services. Pages are refreshed when they change. For applications under http, GET and POST parameters are passed through the portal to the application. This allows communcation between the browser and the back-end application. Cookies are are kept within CWebProxy, allowing dynamic http applications to maintain state.

Versions

CWebProxy has been available as of uPortal 2.0 alpha. This documentation covers version 2.0 and its patch releases. A version is now available that works with uPortal 1.6, which is nearly identical to this one. uPortal 2.1 (available as of Dec. 6th, 2002) features major changes in CWebProxy. It remains backwards-compatible with previous versions. See the CWebProxy 2.1 Documentation for details.

Summary

The key mechanism is "pass-through". It is the means for passing request parameters through the portal to the application. There are currently four levels of pass-through supported:

Use "application" if you want references to the original cw_xml to stay in the channel, and other links to go outside the portal framework. Use "all" if you want all links generated by the first URL to stay in the channel. Use "marked" if you want to indicate precisely which links should stay in channel, and which should replace the framework, and "none" if you're not interested in having any of the links stay in-channel.

Note that it is possible to change the pass-through type at any point, so if a link is followed that would best be served by another pass-though type, it is a trivial matter to change it at that time.

Static Data and Runtime Data

With the exception of cw_reset, which is a runtime parameter only, parameters are identical for both static and runtime data. The channel state variables are initially set acccording to static data, or defaults. Runtime data modifies the equivalent channel state variables. All parameters are then passed through to the stylesheets based on the current state. The parameters are:

Note: parameter names have changed from the previous version of CWebProxy. cw_xsl and cw_xslTitle are there for compatibility with CGenericXSLT and might go away.

Portal Events

CWebProxy supports the button events for help, about, and edit. A channel instance can specify URIs for any of these via static or runtime data. A button event will then redirect the channel to the appropriate URI. Note that separate stylesheets for button event URIs are not supported. The URI should return control to the original application via cw_reset=return.

Stylesheets

This stylesheet is set up for typical dynamic applications. You may need to make modifications to suit particular applications.

xhtml.xsl

For use with XHTML or HTML applications.

The base URI is determined from cw_xml, or from the href element of the <base> tag, if one exists in the document head. Note that this is different from the baseActionURL, which is a URI referring to the channel via the portal.

<script> elements are copied from both <head> and <body>. Relative URIs in src attributes are prepended with the base URI.

Children of <body> are copied as is, save for the following special processing:

Session Support

Support is provided for cookies as specified in the original Netscape specification, as well as RFC 2109 and RFC 2965. Only the Cookie, Cookie2, Set-Cookie, and Set-Cookie2 headers are currently processed.

Cookies are not maintained between portal logins. Once you logout of the portal, your cookies are discarded.

Applications maintaining sessions via URL rewriting in http query strings should also work. Other forms of URL rewriting to maintain state probably will not work. Most applications use cookies by preference if available, which they are.

Issues and Limitations

Scripts

Limited support is provided for included scripts, but they may not work exactly as they would when viewed directly. exactly as they would when viewed directly. Note in particular that if your script replaces generates URLs, they will probably need to be absolute URLs, not relative, to work through a portal.

XHTML 1.0 states that an XHTML document must be valid xml so if embedded javascript code contains <, &, ]]> or --, it must be wrapped in a CDATA section element. however, cdata sections are recognized by xml processors but not browsers. w3c states that external scripts should be used if your script uses those character sequences. CWebProxy supports embedded javasript containing these characters only if it is wrapped in a CDATA section element and sent through tidy. Note that the output will not be valid xml, and thus not valid XHTML.

Future Directions

We plan to further examine all the issues addresses in the previous section. Adding multithreading and caching are priorities, as is proxying security and IPerson info. to the backend applications.

Recent Changes

Examples

The unmodified Tomcat numguess.jsp and servlet examples are now set up as examples. See the numguess help and servlet help. These examples are available under the Subscription Channel, and can be seen in the Tests tab of the demo user.

A theoretical example: Imagine an application that consists of a tree of web pages. The first page is a static XHTML page containing a few links. Some of these should remain in the channel, others should leave the portal framework. This page should use cw_passThrough="marked". Since it is the initial page of the application, we would indicate this in the static data defining the channel. Links that need to stay in the channel should have "?cw_inChannelLink=1" appended to them. (We will assume the use of the XHTML stylesheets provided with CWebProxy.)

Let's look at possibilities for some of these marked links. Perhaps one points to a set of specially designed static pages that avoid outside links. They are all designed to fit in the channel. We can set cw_passThrough="all" to make writing the pages easier. So when linking to this page from the top-level page, we'd have something like:

<a href="second.html?cw_inChannelLink=1&cw_passThrough=all">

Another second-level page might actually be an application implemented by a CGI or via JSP, PHP, or whatever. Let's assume that this one generates HTML (as opposed to XHTML), and also generates a few links to outside web pages, which we don't want to incorporate into the portal. The link to this this application should set the pass-through type to "application". It should also set cw_tidy="on" to convert the HTML to XHTML.

Software Dependencies

CWebProxy uses the JTidy package is used to convert HTML to XHTML. It also requires the Servlet engine to have Xalan's bsf.jar file in its classpath.

Under some servlet containers, JTidy r6 has a compatibility problem with the classloader. Tomcat 4.0.1 is one example. If CWebProxy isn't working and you see a line in your logfile that says:

Registered an uncaughted exception java.lang.IllegalAccessError: try to access field org.w3c.tidy.ParserImpl._parseHead from class org.w3c.tidy.ParserImpl$ParseHTML
The JTidy developers have noted the problem, and hopefully will fix it in the next release. In the meantime, it can be worked around by recompiling for your platform. To do this, get the source distribution, unzip it, build it with "ant jar", and replace your old tidy.jar with build/Tidy.jar. I've put a rebuilt Tidy.jar here, if you'd prefer to try that first.

Authors

Andrew Draskóy <andrew@mun.ca> and
Sarah Arnott <sarnott@mun.ca>
Computing and Communications
Memorial University of Newfoundland

Documentation author Andrew Draskóy.

Last Modified November 23, 2002.