JA-SIG uPortal CWebProxy    developed by:
Memorial University of Newfoundland

uPortal CWebProxy Channel

About This Document

This document reflects the current version of CWebProxy. See the old documentation for versions prior to uPortal 2.1. See Changes for a summary of the major changes. The latest version of this document can be found at the CWebProxy home page. There is also a tutorial on creating channels using CWebProxy.

Purpose

CWebProxy allows incorporation of web-based services as channels, regardless of what technology is used to implement them. It provides mechanisms for connecting to and rendering HTML and XML services. Pages are refreshed and kept in-channel when they change. HTTP standards are followed, allowing communication between the browser and dynamic back-end applications. Mechanisms are provided for passing user-specific information to the back-end application, as well as ways to support local interface technologies on a per-channel basis. (Such as encryption, shared secrets, single-sign-on, modification of http request headers, etc.)

How It Works

This section describes the functionality of CWebProxy in general terms. Specifics of configuration and use are covered in the sections following.

Web applications are written to interact directly with users through their browsers. When a portal incorporates such an application, it must intercept this communication to tailor it for the portal environment. This is done by rewriting the application's output appropriately. In particular, rewriting any URLs so that they will go through the portal if appropriate, rather than directly to the back-end application or elsewhere. Other mechanisms allow sharing of information between the portal and the application, to aid application functionality. Caching is available to improve performance.

Rewriting Application Output

The key mechanism is "pass-through". It is the means for passing request parameters through the portal to the application. There are currently four levels of pass-through supported:

Note that it is possible to change the pass-through type at any point, so if a link is followed that would best be served by another pass-though type, it is a trivial matter to change it at that time.

The output from applications is rewritten in four stages. HTML, XHTML, and WML are supported in the code as distributed.

  1. JTidy: If the cw_tidy attribute is on, the application's output is run through JTidy to convert it from HTML to well-formed XHTML.
  2. AbsoluteURLFilter: This converts relative URLs to absolute ones.
  3. CWebProxyURLFilter: Rewrites URLs according to cw_passThrough or cw_download.
  4. XSL/T: The XML is passed through a stylesheet according to the channel parameters and the media type. Static and runtime data parameters are passed to the stylesheet. This feature is not used by the distributed stylesheets, but may prove useful to custom-written stylesheets, particularly those with no URL Filters.

CWebProxy will use the same method (GET or POST) to call the application as was used to call the portal. Since the portal is intercepting HTTP requests aimed at the application, this will result in the correct method being used, according to what the application expects.

CWebProxy will filter out any attribute/value pairs in the query string which are portal-specific, and hand on any others that were in the URL to the application. Note that querystrings may also contain keywords without using the attribute/value format. In this case they are also passed on to the application. Although you mix this type of querystring with mechanisms that will generate attribute/value pairs, if it sees this happening, CWebProxy will pass the keywords by adding a "keywords" attribute with the appropriate value.

In the case where CWebProxy needs the channel to provide refreshed output, it will call the application with no parameters, as is usual for HTTP.

In some cases, you don't want a link to go through the normal mechanisms, but instead wish it to be handled as a download for an object with its own MIME type. CWebProxy includes a way to indicate this.

Information Sharing

Applications can benefit by getting information from the portal. CWebProxy supports the standard HTTP Cookie mechanism for keeping session and state information between page requests. Additionally, it can pass information on the user from the uPortal IPerson object. There is also support for customizing the communications to fit local policy and technical infrastructure.

Session Support

Support is provided for cookies as specified in the original Netscape specification, as well as RFC 2109 and RFC 2965. Only the Cookie, Cookie2, Set-Cookie, and Set-Cookie2 headers are currently processed.

Cookies are not maintained between portal logins. Once you logout of the portal, your cookies are discarded.

Applications maintaining sessions via URL rewriting in http query strings should also work. Other forms of URL rewriting to maintain state probably will not work. Most applications use cookies by preference if available, which they are.

IPerson Attribute Passing

uPortal maintains certain user information (called the IPerson object), possibly aggregated from several sources, for each active user session. CWebProxy allows the application to request this information. At publish time, you may configure which information is allowed to be passed to applications, and you may also choose to always pass particular attributes by default. These are sent as additional http request parameters. A uPortal configuration property sets the defaults for channels that do not specify publish-time values.

Local Connection Context

Connecting to, and communicating with, certain applications may require custom modifications to the communications. For example, you might need to add special encryption, shared secret passing, hooks into authentication mechanisms, addition or filtering of headers or cookies, etc. CWebProxy uses a mechanism called Local Connection Context to simplify and modularize this process on a per-channel basis.

Caching

Channel output maybe be cached to improve efficiency. The three aspects of caching are:

For each of these, defaults are set in uPortal properties which may be overridden for each channel at publish time. Applications may later change the defaults for a channel instance, or override them for a single page request. Cache scope and mode can only be made more restrictive, not less.

According to the HTML specification, "If the processing of a form is is idempotent (i.e. it has no lasting observable effect on the state of the world), then the form method should be GET. Many database searches have no visible side-effects and make ideal applications of query forms...If the service associated with the processing of a form has side effects (for example, modification of a database or subscription to a service), the method should be POST." For this reason, POST requests are not cached.

Configuration

Static Data

Except as noted, parameters are identical for both static and runtime data. The channel state variables are initially set acccording to static data, or defaults. Runtime data modifies the equivalent channel state variables. All parameters are then passed through to the stylesheets based on the current state. The parameters are:

Properties

CWebProxy has a few properties which act as portal-wide defaults for equivalent static data (channel publish-time parameters). These are set in the properties/portal.properties configuration file.

Controlling CWebProxy Channels

Once it is running, you can control the behaviour a CWebProxy channel instance in two ways. An HTTP application may pass instructions to the channel via portal runtime data, which means passing special attributes in the request query string. Secondly, the channel reacts to certain portal events. These are generally triggered by user actions, such as clicking on a channel button.

Runtime Data

Most static data parameters can be changed via the equivalent runtime data parameter, and have the same semantics. The exceptions are cw_person_allow and upc_localConnContext. The following are runtime-only parameters:

Portal Events

CWebProxy supports the button events for help, about, and edit. A channel instance can specify URIs for any of these via static or runtime data. A button event will then redirect the channel to the appropriate URI. Note that these URIs are subject to the same filtering and stylesheets are the normal URI. The event URI should return control to the original application via the runtime attribute cw_reset=return.

Issues and Limitations

Scripts

Limited support is provided for included scripts, but they may not work exactly as they would when viewed directly. Note in particular that if your script generates URLs, they will probably need to be absolute URLs, not relative, to work through a portal.

XHTML 1.0 indicates that an XHTML document must be valid XML so if embedded JavaScript code contains <, &, > or −−, it must be wrapped in a CDATA section element. However, CDATA sections are recognized by XML processors but not browsers. According to XHTML 1.0, external scripts should be used if your script uses those character sequences. CWebProxy supports embedded JavaSript containing these characters only if it is wrapped in a CDATA section element and sent through JTidy. Note that the output will not be valid XML, and thus not valid XHTML.

Major Changes since uPortal 2.0.2

Examples

The unmodified Tomcat numguess.jsp and servlet examples are set up as examples in uPortal as distributed. They can be seen in the CWebProxy Examples tab of the demo user, and are available for subscription via Preferences. See the webpages/media/org/jasig/portal/channels/webproxy/examples directory for the info and help files.

Further examples can be found in the tutorial on the CWebProxy home page.

Software Dependencies

CWebProxy uses the JTidy package is used to convert HTML to XHTML.

Under some servlet containers, JTidy r6 has a compatibility problem with the classloader. This is an issue for some versions of Tomcat, for example. You will know you've got this problem if CWebProxy isn't working and you see a line in your logfile that says:

Registered an uncaughted exception java.lang.IllegalAccessError: try to access field org.w3c.tidy.ParserImpl._parseHead from class org.w3c.tidy.ParserImpl$ParseHTML
The JTidy developers have noted the problem, and hopefully will fix it in the next release. In the meantime, it can be worked around by recompiling for your platform. To do this, get the source distribution, unzip it, build it with "ant jar", and replace your old tidy.jar with build/Tidy.jar. I've put a Tidy.jar rebuilt on Linux here, if you'd prefer to try that first.

Authors

Andrew Draskóy <andrew@mun.ca> and
Sarah Arnott <sarnott@mun.ca>
Computing and Communications
Memorial University of Newfoundland

Documentation author Andrew Draskóy.

Last Modified December 3, 2002.