|
|
developed by:
|
Purpose
CWebProxy allows incorporation of web-based services as
channels. It provides mechanisms for connecting to and
rendering HTML and XML services. Pages are refreshed when they
change. For applications under http, GET and POST parameters
are passed through the portal to the application. This allows
communcation between the browser and the back-end application.
Cookies are are kept within CWebProxy, allowing dynamic http
applications to maintain state.
Versions
CWebProxy has been available as of uPortal 2.0 alpha. This
documentation covers version 2.0 and its patch releases. A
version is now available that works with uPortal 1.6, which is
nearly identical to this one. uPortal 2.1 (available as of Dec.
6th, 2002) features major changes in CWebProxy. It remains
backwards-compatible with previous versions. See the CWebProxy 2.1 Documentation for details.
Summary
The key mechanism is "pass-through". It is the means for
passing request parameters through the portal to
the application. There are currently four levels of
pass-through supported:
Use "application" if you want references to the original
cw_xml to stay in the channel, and other links to go outside
the portal framework. Use "all" if you want all links generated
by the first URL to stay in the channel. Use "marked" if you
want to indicate precisely which links should stay in channel,
and which should replace the framework, and "none" if you're
not interested in having any of the links stay in-channel.
Note that it is possible to change the pass-through type at
any point, so if a link is followed that would best be served
by another pass-though type, it is a trivial matter to change
it at that time.
Static Data and Runtime Data
With the exception of cw_reset, which is a runtime
parameter only, parameters are identical for both static and
runtime data. The channel state variables are initially set
acccording to static data, or defaults. Runtime data modifies
the equivalent channel state variables. All parameters are then
passed through to the stylesheets based on the current state.
The parameters are:
- cw_xml: a URI representing the source XML or HTML
document.
- cw_ssl: a URI representing the corresponding .ssl
(stylesheet list) file.
- cw_xslTitle: a title representing the stylesheet
(optional). If no title parameter is specified, a default
stylesheet will be chosen according to the media.
- cw_xsl: a URI representing the stylesheet to use.
If cw_xsl is supplied, cw_ssl and
cw_xslTitle will be ignored.
- cw_info: a URI to be called for the info
event.
- cw_help: a URI to be called for the help
event.
- cw_edit: a URI to be called for the edit
event.
- cw_tidy: if set to on, filter the source
document through JTidy, converting HTML to XHTML.
-
cw_passThrough: indicates that runtime data is to be
passed through. If passThrough is supplied, and not
set to "none", additional runtime data parameters and
values will be passed as request parameters to the
cw_xml.
cw_passThrough values:
- none: (default). Don't do anything.
- marked: If runtime data includes
cw_inChannelLink, pass through other runtime data
as request parameters. Use in conjunction with a
stylesheet that marks appropriate links with
cw_inChannelLink as a request parameter, and replaces the
URI with baseActionURL, possibly with cw_xml and maybe
other parameters.
- all: This parameter instructs the render
routine to pass through all additional runtime data as
http request parameters to xmlURI. Intended to support
XHTML dynamic pages. With JTidy hooked in, can support
HTML as well. Use in conjunction with a style sheet that
re-routes all links through the portal.
- application: The same as all. Use in
conjunction with a style sheet that re-routes links for
that application through the portal.
- cw_reset An instruction to reset internal
variables. The value return resets cw_xml to
its last value before changed by button events. The value
reset returns all variables to the static data values.
(reset not impl. yet.) Runtime data parameter only.
Note: parameter names have changed from the previous version
of CWebProxy. cw_xsl and cw_xslTitle are there for
compatibility with CGenericXSLT and might go away.
Portal Events
CWebProxy supports the button events for help,
about, and edit. A channel instance can specify
URIs for any of these via static or runtime data. A button
event will then redirect the channel to the appropriate URI.
Note that separate stylesheets for button event URIs are not
supported. The URI should return control to the original
application via cw_reset=return.
Stylesheets
This stylesheet is set up for typical dynamic applications.
You may need to make modifications to suit particular
applications.
xhtml.xsl
For use with XHTML or HTML applications.
The base URI is determined from cw_xml, or from the href
element of the <base> tag, if one exists in the
document head. Note that this is different from the
baseActionURL, which is a URI referring to the channel via the
portal.
<script> elements are copied from both
<head> and <body>. Relative URIs in
src attributes are prepended with the base URI.
Children of <body> are copied as is, save for
the following special processing:
-
<form> If the action attribute is a
relative URL, the absolute URL is found using the cw_xml
parameter. Its value is then modified according to the
value of cw_passThrough:
- none (default): action is left as is unless it
is a relative URL, in which case it is prepended with the
base URI.
- marked: if there's an <input>
child of form with a name attribute called
'cw_inChannelLink', the action attribute of the form is
changed to equal the baseActionURL, no matter what the
value was before.
- application: if either the action attribute is
empty or the cw_xml parameter is equivalent to the
absolute URL of the action attribute, the action
attribute is set to the baseActionURL.
- application: if either the action attribute is
empty or the cw_xml parameter is equivalent to action
attribute or cw_xml equals the base URI of the action
attribute, the action attribute is set to the
baseActionURL.
- all: The action attribute is set to the
baseActionURL. If either the action attribute is empty or
the cw_xml parameter is equivalent to action attribute or
the cw_xml equals the base URI of the action attribute,
nothing else is done. Otherwise a cw_xml parameter is
added to aim the channel at the new URI specified in the
action attribute.
-
<a> If the href attribute is a relative
URL, the absolute URL is found using the cw_xml parameter.
Its value is then modified according to the value of
cw_passThrough:
- none (default): href is as is unless it is a
relative URL, in which case it is prepended with the base
URI.
- marked: if the href contains a query string
parameter named 'cw_inChannelLink', the href is replaced
with baseActionURL concatenated with the original
querystring. The value of cw_inChannelLink is
ignored.
- application: if either the href attribute is
empty or the cw_xml parameter is equivalent to the
absolute URL of the href attribute, the href is set to
the baseActionURL.
- all: The href is set to the baseActionURL. If
either the href attribute is empty or the cw_xml
parameter is equivalent to the href attribute or cw_xml
equals the base URI of the href attribute, nothing more
is done. Otherwise, a cw_xml parameter is added with a
value equivalent to that of the original href.
- <img> if the src attribute is
relative, it is prepended with the base URI.
Session Support
Support is provided for cookies as specified in the original
Netscape specification, as well as RFC 2109 and RFC 2965. Only
the Cookie, Cookie2, Set-Cookie, and
Set-Cookie2 headers are currently processed.
Cookies are not maintained between portal logins. Once you
logout of the portal, your cookies are discarded.
Applications maintaining sessions via URL rewriting in http
query strings should also work. Other forms of URL rewriting to
maintain state probably will not work. Most applications use
cookies by preference if available, which they are.
Issues and Limitations
- JTidy must be recompiled to work with some servlet
containers. See Software
Dependencies.
- The GET method is always used to send parameters to the
back-end application, even if the portal got them from the
application via POST.
- Content is not cached. Caching will need to follow http
conventions.
- HTML and XHTML <body> background colours and
images are not reflected in the output. This would require
access to the <td> element generated by the
portal to contain the channel output.
- Any <link> elements from the <head>
of an HTML document, including those that reference CSS stylesheets,
are not reflected in the output. This would require
access to the <head> element generated by the portal.
- URLs that use frames cannot be incorporated as
channels.
- Suppression of JTidy diagnostic output has not been
tested on non-UNIX platforms.
- When using JTidy, the character encoding must be specified
in the HTTP headers, otherwise the default ASCII is assumed.
JTidy does not yet recognize the use of the HTML meta element
for specifying the character encoding.
- The cw_reset=reset runtime command is not implemented
yet.
- Ideally there'd be a reset button alongside help, edit,
etc., to trigger a portal event that could be caught by the
channel.
- If cw_xml is changed before cw_reset=return is called,
your are returned to the last cw_xml used, not necessarily
the one that was in use before a button event.
- For security reasons, it might be a good idea to have a
static data parameter which specifies limitations on what can
be changed via runtime commands.
- There should be a mechanism for passing authentication or
IPerson information to the application.
- The channel should be multithreaded for better
performance.
- The source html cannot specify a namespace if cw_tidy is
on.
- Multivalued http request attributes are not
supported.
Scripts
Limited support is provided for included scripts, but they
may not work exactly as they would when viewed directly.
exactly as they would when viewed directly. Note in particular
that if your script replaces generates URLs, they will probably
need to be absolute URLs, not relative, to work through a
portal.
XHTML 1.0 states that an XHTML document must be valid xml so
if embedded javascript code contains <, &, ]]> or --,
it must be wrapped in a CDATA section element. however, cdata
sections are recognized by xml processors but not browsers. w3c
states that external scripts should be used if your script uses
those character sequences. CWebProxy supports embedded
javasript containing these characters only if it is wrapped in
a CDATA section element and sent through tidy. Note that the
output will not be valid xml, and thus not valid XHTML.
Future Directions
We plan to further examine all the issues addresses in the
previous section. Adding multithreading and caching are
priorities, as is proxying security and IPerson info. to the
backend applications.
Recent Changes
- March 19, 2002. Minor documentation update, fixed
typo.
- February 22, 2002. Minor documentation update.
- February 15, 2002. Documentation updates.
- February 15, 2002. Improved document handling for
non-tidied urls. DOCTYPES are now properly supported.
- February 15, 2002. Bugfixes: cw_xml static data parameter
can now contain a query string. Another case where
querystring parameters were erroneously separated with a "?"
has been fixed.
- January 30, 2002. Minor documentation update.
- November 29, 2001. Documented JTidy problem
workaround.
- November 5, 2001. Documentation update.
- November 1, 2001. Minor changes to xhtml.xsl for
efficiency.
- November 1, 2001. Add relative path munging for script
src attributes.
- August 28, 2001. Documentation update.
- August 24, 2001. XHTML stylesheet performance
improvements. Throw exception on http return codes 403, 500,
204 (as well as 404).
- June 29, 2001. Documentation update.
- June 28, 2001. Enhanced "marked" pass-though: Added
ability to distinguish between links equivalent to cw_xml and
others, and support for connecting to non-cw_xml links.
- June 22, 2001. Improved cookie support.
- June 22, 2001. Added support for
cw_passThrough="application". It's semantics are the same as
"all" used to be. cw_passThrough="all" now sends all
links through the portal. This is handled entirely in the
XHTML stylesheet. The CWebProxy java code treats "all" and
"application" identically.
- June 22, 2001. Added another Tomcat example.
- June 22, 2001. Added support for pre-parsing paths
containing "../". Tomcat and Internet Explorer were not
always handling these correctly, necessitating rewriting, as
done by Netscape.
Examples
The unmodified Tomcat numguess.jsp and servlet examples are
now set up as examples. See the numguess
help and servlet
help. These examples are available under the Subscription
Channel, and can be seen in the Tests tab of the demo
user.
A theoretical example: Imagine an application that consists
of a tree of web pages. The first page is a static XHTML page
containing a few links. Some of these should remain in the
channel, others should leave the portal framework. This page
should use cw_passThrough="marked". Since it is the
initial page of the application, we would indicate this in the
static data defining the channel. Links that need to stay in
the channel should have "?cw_inChannelLink=1" appended
to them. (We will assume the use of the XHTML stylesheets
provided with CWebProxy.)
Let's look at possibilities for some of these marked links.
Perhaps one points to a set of specially designed static pages
that avoid outside links. They are all designed to fit in the
channel. We can set cw_passThrough="all" to make writing
the pages easier. So when linking to this page from the
top-level page, we'd have something like:
<a
href="second.html?cw_inChannelLink=1&cw_passThrough=all">
Another second-level page might actually be an application
implemented by a CGI or via JSP, PHP, or whatever. Let's assume
that this one generates HTML (as opposed to XHTML), and also
generates a few links to outside web pages, which we don't want
to incorporate into the portal. The link to this this
application should set the pass-through type to
"application". It should also set cw_tidy="on" to
convert the HTML to XHTML.
Software Dependencies
CWebProxy uses the JTidy package
is used to convert HTML to XHTML. It also requires the Servlet
engine to have Xalan's bsf.jar file in its classpath.
Under some servlet containers, JTidy r6 has a compatibility
problem with the classloader. Tomcat 4.0.1 is one example. If
CWebProxy isn't working and you see a line in your logfile that
says:
Registered an uncaughted exception
java.lang.IllegalAccessError: try to access field
org.w3c.tidy.ParserImpl._parseHead from class
org.w3c.tidy.ParserImpl$ParseHTML
The JTidy developers have
noted the problem, and hopefully will fix it in the next
release. In the meantime, it can be worked around by
recompiling for your platform. To do this, get the
source distribution, unzip it, build it with "ant jar", and
replace your old tidy.jar with build/Tidy.jar. I've put a
rebuilt Tidy.jar here, if you'd prefer
to try that first.
Authors
Andrew Draskóy <andrew@mun.ca> and
Sarah Arnott <sarnott@mun.ca>
Computing and Communications
Memorial University of
Newfoundland
Documentation author Andrew Draskóy.
Last Modified November 23, 2002.