Writing a channel for the uPortal(tm)

Writing a Channel for the uPortal(tm)

by Michael Oltz, Cornell University

SECOND DRAFT -- Minor revision (remove explicit DTD on RSS) 2001/04/26 15:23

I recommend that you start by reading "The Four Paths to Enlightenment," "What is XML and why use it?" and "What and why is XSLT?". These contribute to an overview of what is involved in developing for the JA-Sig uPortal(tm), and will give you an idea of whether you want to read the whole rest of the document or what parts.

The Four Paths to Enlightenment

What is XML and why use it?

Decide what information you want to show

Decide how to represent the information as XML

Writing a DTD

DTD ELEMENTs
DTD ATTLISTs

Writing the Java code

How to get the Userid

How to output your XML

What and why is XSLT?

How XSLT works

Location specifiers for matches and selects

Some of the most useful XSLT commands

What to name your files and where to put them

The Four Paths to Enlightenment

In the JA-Sig uPortal(tm), each separate little window of information that users can show on their pages is called a "channel". uPortal supports four basic kinds of channels:

An ordinary URL (i.e. a normal web page)
An RSS document
An applet
A servlet

This documentation will spend by far the most time discussing the fourth kind, the servlet, because it is the most complicated.

The Channel Database

Each channel available for subscription at a particular uPortal site, is described by an entry in a relational database accessible to the uPortal web server. The entry for each channel contains a sequence number, the title of the channel, and an XML fragment that describes what the uPortal needs to know in order to run the channel. The title of the channel, as given in the separate database column, is shown on the "Personalize Channels" (channel subscription) page. For some kinds of channels, that title is also shown in the channel's titlebar; for other channels, the text shown in the titlebar comes from elsewhere, as described in the sections below. Here is an example of the XML fragment.

<channel minimized="false" class="org.jasig.portal.channels.CPageRenderer">
<parameter value="News From Nowhere" name="name"/>
<parameter value="http://domain.com/news.html" name="url"/>
</channel>

All channels must begin and end with the <channel> element. The channel element has two attributes. If "minimized" is "true" then when a user displays a portal page with that channel on it, the channel will initially be displayed as just a titlebar, and the user must click on the appropriate icon to open it. If "minimized" is "false" then the contents of the channel will be shown initially as well.

The "class" attribute gives the name of a Java class which implements the channel. For the first three kinds of uPortal channel we will be discussing (a URL, an RSS document, and an applet), Java classes have been provided with the uPortal software to implement those classes. For the fourth kind (a servlet) you must write the class yourself.

The channel element can contain zero or more <parameter> elements. The parameter elements give more details about what the particular instance of the channel is supposed to do. Each parameter element has two attributes. The "name" is the name of the parameter, and the "value" is the value of it (amazingly appropriate attributes!). The Java class can call a method that will get the value of a parameter of a given name, and can use that value to control what the Java code does.

An Ordinary URL

The simplest kind of channel is a URL. All you have to to do, is make and maintain a web page, and tell the uPortal what the URL is. If the page is being handled by CGI, JSP, etc. on the web server from whence the page comes, then you can specify the parameters to be passed on the initial fetch of the page. The page cannot use user-specific cookies, because the cookies would be coming from the uPortal server rather than the individual user's workstation. Nor can it use a stylesheet, because when it's displayed through the portal it's not a whole page, just a table inside a page. Also, relative links such as "./images/me.gif" will not work, since when it is inside the portal page, such links are interpreted relative to the URL of the portal itself, rather than relative to the web server your page originates from.

Keep the minimum size of the page small, and keep it simple. Nested frames, especially with a form inside, can, on some web browsers, severely slow down the time it takes to format the downloaded page for display.

Use a URL channel when you wish to keep an existing web site in its present form, and have a "toehold" to lead people from the portal to the other site. Or, when you want to display a small amount of non-user-specific but slowly time-varying information, such as a weather report.

What it looks like in the channel database The example given under the "Channel Database" topic above, is for a URL channel. Here's the example again.

The Java class must be org.jasig.portal.channels.CPageRenderer. This class will correctly pass through some of the simpler HTML tags, including forms. The parameter named "name" will be used in the titlebar of the channel, and the "url" parameter gives the URL from where to load the channel's contents.

An RSS document

Rich Site Summary, or RSS, is a standard developed by Netscape for its My Netscape portal. Its purpose is to provide a list of links to a site, each link having a title and a description. Thus, the list serves as a summary of the more interesting information found on the site being described. An RSS document is internally similar in appearance to an HTML web page, but the markup is more specialized to its purpose and less flexible. This document gives technical information on the format of an RSS document.

To create an RSS channel, you create an RSS document (perhaps by hand, perhaps by some program that runs at regular intervals), put the document on a web server, and tell uPortal where the document is. Here is an example of RSS (all the information, including the links, is imaginary)

<?xml version="1.0"?>

<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN">

<rss version="0.91">

<channel>

     <title>This RSS channel</title>
     <description>Here is a lot of information for you, dear
          surfer of the web!</description>
     <link>http://domain.com/</link>
     <language>en</language>

     <item>
       <title>The first item in the list</title>
       <link>http://domain.com/firstone.html</link>
       <description>Our initial thoughts about the Switch Master 5000</description>
     </item>

     <item>
       <title>The second item</title>
       <link>http://domain.com/somethingelse.html</link>
     </item>

</channel>

</rss>

The first four lines (the ?xml tag, the !DOCTYPE tag, the rss tag, and the channel tag) must appear in that order and must have the contents as shown. Of these four, the first two are header lines and do not have end tags. The channel tag and rss tag need to be closed with end tags at the bottom of the file. The !DOCTYPE tag may be wrapped around on your screen, but there should be EXACTLY ONE space instead of any apparent line breaks in that line. Note that you do not have to specify a URL for the DTD of RSS, because the uPortal distribution has that DTD file in it and will use that copy by default.

The next four lines after the channel tag (title, description, language, and link) must be present. The "en" in the language tag stands for "english". If your RSS channel is in another language, the codes for that are shown on the RSS documentation page accessible from the link given above.

Following those four tags, the RSS document may contain from zero to fifteen <item>s. Each item can contain a title, a description, and a link to which the user is taken when they click on the item title.

There are more tags in RSS, which are described in the RSS documentation page.

What it looks like in the channel database

<channel minimized="false" class="org.jasig.portal.channels.CRSSChannel">
<parameter value="http://www.cornell.edu/Calendar/CalendarRSS.php3" name="url"/>
</channel>

The Java class must be org.jasig.portal.channels.CRSSChannel. This kind of channel has only one parameter, the URL. The title used in the titlebar, will be the text from the <title> element in the RSS document itself.

An Applet

You may have a channel which is an applet, a Java program which runs in the user's browser. We will not go into detail here on how to write and test an applet; there are many books on the subject. There are no peculiar coding requirements for a uPortal applet, nor are there any uPortal-specific classes or methods provided especially for calling from your applet code. All that the uPortal makes possible, is that it will run that applet in a particular window if the user chooses to add it to their layout.

What it looks like in the channel database

<channel minimized="false" class="org.jasig.portal.channels.CApplet">
   <parameter value="ABC News" name="name"/>
   <parameter value="starwave.news.affiliate.Megaticker.class" name="code"/>
   <parameter value="http://webapp.abcnews.com/java/" name="codeBase"/>
   <parameter value="MegaTicker.jar" name="archive"/>
   <parameter value="141" name="width"/>
   <parameter value="409" name="height"/>
   <parameter name="APPLET.cabbase" value="MegaTicker.cab"/>
   <parameter name="APPLET.station" value="KABC"/>
</channel>

The Java class must be org.jasig.portal.channels.CApplet. There are six standard parameters, and one more unusual one. The value of the "name" parameter is used in the titlebar of the channel. The values of the other five standard parameters, "code", "codeBase", "archive", "width", and "height", are the familiar attributes of the HTML <applet> element. An applet element with these attributes will be put inside the channel window when the user displays a portal page with this channel on it. The "APPLET." parameter is used to specify additional attributes that the particular applet being called needs to see. The name of the attribute appears after the period. When these attributes are added to the <applet> element at runtime, the "APPLET." part is stripped off the parameter name as given in the channel XML.

What is XML and why use it?

The official 2.0 release of the JA-Sig uPortal(tm) software will support an XML architecture for channels. This document will discuss how to write an XML channel in the 1.0 software environment, because 2.0 is not sufficiently complete as yet. Much of the information will be the same for 2.0.

XML is a standard for information markup, which describes what the information is, rather than what it looks like on the screen (the latter is what HTML does). The reason that the uPortal uses XML, is that it allows your Java code to postpone until a later step, worrying about what kind of output display the user has. At that later step, you could, for example, format for a high-resolution desktop computer screen, a low-resolution desktop computer screen, a handheld windowing computer, or even a palmtop.

The example provided with the uPortal code is the org.jasig.portal.xmlchannels.CBookmarks class. The demonstration bookmarks channel stores its data in a relational database table, and the table entry for each user contains an XML tree as a string. The bookmarks channel stores its information as XML, and happens to use DOM to parse that XML, or to alter it. You do not have to store your data that way; XML is required only when you are sending information to be displayed to the user, and you can generate that XML any way you want.

Here are the files related to the CBookmarks class:
org.jasig.portal.xmlchannels.CBookmarks.java
bookmarks.dtd
BookmarksChannel.ssl
edit_regular.xsl
editBookmark_regular.xsl
view_regular.xsl

The steps involved in creating a servlet channel are:

1. Decide what information you want to show in your portal window.

2. Decide how to represent the information as XML.

3. Optionally, write a DTD (Document Type Definition) which describes that XML.

4. Write your Java code. It should extend the GenericPortalBean class, and implement the IXMLChannel interface. Output the information you want to display in your channel window, as XML.

5. Write one or more XSLT stylesheets to transform the XML that your code will output, into HTML to display on the particular display device used by the user. At first you can write just one of these, and customize it for various output devices later.

Now I'll go into more detail for each of these steps.

Decide what information you want to show

The primary thing to keep in mind here, is to keep it simple. A portal channel generally has to share the browser window with several other things, and you may not be allocated very much horizontal space. If the browser column you are in is narrow, and you put too much information in your window, it will elongate vertically and may push part of itself or of other channel windows below the bottom of the screen.

One way around this is to break up your information into several small portions, and switch between them with buttons. The bookmarks channel shows how to do this in its bookmark-editing mode; see the "setRuntimeData" method and the bookmarks.ssl stylesheet.

Decide how to represent the information as XML

XML looks something like HTML, but you get to name the tags yourself -- in XML the tags are called "elements" -- and decide what they mean. You can decide what their attributes are if any (attributes are the equal-sign things inside an element). You can decide what contents an element may enclose, or decide that the element never has any contents. In HTML, <br></br> is an example of an element that never has any contents.

An important difference between XML and HTML is that XML parsers are much fussier about syntax. In an HTML document, if there is a missing end tag, the browser will try to make a guess at what was intended. In XML, the syntax is crucial to indicating the meaning of each morsel of data, so far less latitude can be allowed.

Here is an example of what XML looks like. Suppose you want to write an XML description of an hourly employee's timecards. The resulting XML might look like this. I've left out some details and put in ellipses.

<?xml version="1.0"?>
<!DOCTYPE timecards SYSTEM "http://amachine.somewhere.edu/dtd/timecards.dtd">
<timecards employee="John J. J. Smith">
<timecard payperiod="12" startdate="08/07/2000" enddate="08/18/2000">
<day date="08/07/2000" dow="Wednesday">
<worktime start="07:52" end="12:03" />
<worktime start="12:47" end="16:30" />
<worktime start="20:19" end="22:48" comment="overtime fixing pipe break" />
</day>
<day date="08/08/2000" dow="Thursday">
...
</day>
</timecard>
<timecard payperiod="13" ... >
...
</timecard>
</timecards>

The first line says that this document conforms to the XML 1.0 standard; yes, the question marks have to be there. The second line tells what the name of this document type is, and where the DTD is that describes the syntax of the elements in it. As you can see, the syntax of the rest of the document closely resembles HTML. One thing to remember, is that after the two header elements (?xml and !DOCTYPE) XML requires that you have one element that encloses everything else. This is called the document element. In the example above this is the <timecards> element.

Keep in mind that you can put anything inside an element that you want, providing that you describe the syntax of the contents in the DTD. Some elements may contain nothing; others might contain one or more of one other different element (like the <timecard> elements inside the <timecards> element). Some elements may contain a variety of different elements one after the other. And some kinds of element may have contents that are free-form text, rather than any specific XML element.

One question you may have when designing your XML is, when you have some text associated with an element, should it be an attribute of the element, like this:

<orderitem prodnum=12345 quantity=2 comment="Do not backorder" />

or should it be the contents of the element, like this:

<orderitem prodnum=12345 quantity=2>
Do not backorder.
</orderitem>

I suggest this criterion. If the datum is an integer or floating point number, or an always relatively short text string without any internal formatting; there are only one or a small, fixed quantity of such values associated with a particular element; and the datum always has the same interpretation; then use attributes, as in our first <orderitem>example. If the datum is an arbitrary-length text string, or has formatting markup in it, but the text string always has the same relationship to the element, then make it the contents of the element, as in the second <orderitem>example above. If there are an arbitrary number of sub-data to represent, or if the meaning of their relationship to the parent element can vary, then define a new element to contain them; such as the way we are using <worktime>elements inside the <day> element in our <timecards> example.

Writing a DTD

HTML, XML, and certain other document formats, are descended from SGML, Standard Generalized Markup Language. It is SGML which specifies the use of angle brackets < >, opening and closing tags, the syntax of incorporating attributes, and other overall aspects of what these other document formats look like. For each kind of document derived from SGML, a Document Type Definition (DTD) is written to describe the particular elements and attributes used for that kind of document. For example, there is a DTD at www.w3.org describing each release level of HTML.

There are two levels of strictness when parsing XML. "Well-formedness" merely asks whether the angle brackets < > and quotes are properly matched, whether each opening tag has a properly nested closing tag, and other non-specific tests. "Validation" parsing compares the XML against the specific DTD which defines its syntax. The uPortal uses a parser that only requires "well-formedness". So, you do not have to write a DTD if you do not want to. The information here is for those who are going to use the same XML format for some other purpose that requires a DTD, or for those who want to write a DTD anyway.

A DTD itself does not have any kind of header information, or enclosing element (like <HTML> or <BODY> on a web page). It simply contains definitions of four different kinds of things, plus any comments and white space you want to add. The four kinds of things are ENTITYs, ELEMENTs, ATTLISTs, and NOTATIONS. You can also conditionally include sections of the DTD using the INCLUDE and IGNORE statements. For the kind of use we will be making of XML, it is likely that all you will need are ELEMENTs and ATTLISTs, so that's all we will describe here.

DTD ELEMENTs

An ELEMENT declaration describes an element that can be included in your XML document. The ELEMENT declaration does not describe the attributes that can be specified in the opening tag of the element; that's what ATTLIST is for, and we'll describe it later. The two simplest kinds of ELEMENT are:

<!ELEMENT whatevername EMPTY>

<!ELEMENT someothername ANY>

Note that to distinguish them from ordinary document tags, DTD declarations begin with an angle bracket and an exclamation point. EMPTY means that there cannot be anything at all enclosed between the opening and closing tags of the element -- but there can be attributes in the opening tag. There are two ways to write an empty element in your XML document:

<whatevername></whatevername>

<whatevername/>

In either case, the opening tag can have attributes; and the slash character must be right next to the angle bracket without any space.

ANY means that the element can enclose any well-formed information between its opening and closing tag; no validation will be performed on the contents.

The more complicated but far more useful ELEMENT type has as its third part a content model. A content model is a thing in parentheses that tells what other elements, and perhaps some text, that the element you are defining can enclose between its opening and closing tags. The most straightforward example is:

<!ELEMENT aplainone (whatevername) >

This means that element aplainone must enclose exactly one instance of the whatevername element, and nothing else. That inner element can itself be defined to have some content. But in this example, I show the inner element as empty:

<aplainone>
<whatevername/>
</aplainone>

Since we may want our enclosed material to be rather more complex, there are some operators that can be added to make more complicated content models.

, (comma) the elements must appear in the exact order shown

| (vertical bar) choose one of the elements listed

? The element may be left out, or appear exactly once

* The element may be left out, or appear one or more times

+ The element must be present, and may appear one or more times

The first two operators, the comma and vertical bar, appear in between a list of elements. The last three operators, ?, *, and +, each appear at the very end of the name of the element, or parenthesized group, to which they apply. If none of the latter three is specified for a particular element in the content model, then that element must appear and it must appear exactly once in the position shown.

<!ELEMENT cuperson (student, (faculty | staff) ) >

This means "a cuperson element encloses exactly one student element followed by either exactly one faculty element or exactly one staff element." (I don't guarantee that these examples make practical sense; they are contrived). So either one of these would be permitted (assuming that some of the elements have attributes):

<cuperson>
<student name="Humbert A. Lerner"/>
<faculty name="Phil O. Soffer"/>
</cuperson>

<cuperson>
<student name="Elaine Change"/>
<staff name="Ingrid Able"/>
</cuperson>

The following example illustrates using the "how many" operators:

<!ELEMENT oddthing (tool?, vehicle*, toy+) >

"An oddthing consists of either no or exactly one tool element, followed by no or any number of vehicle elements, followed by one or more toy elements."

The other kind of thing you can put in an ELEMENT is #PCDATA, which means "arbitrary unparsed text". The XML standard requires that if the #PCDATA token is present in the content model, it must always be listed there first. Depending on the nature of the model, this does not necessarily mean that the text to which it may refer, must appear first inside the XML element in the documents being described. Here is an example of that:

<!ELEMENT something (#PCDATA | shape)+ >

The something element must enclose one or more things, each of which may be either #PCDATA or a shape. So the first of the things can be of either kind.

DTD ATTLISTs

Now that we can define the elements that can appear in our XML, how do we define the attributes that can be specified inside their opening tags? That is the purpose of the ATTLIST definition. Each occurrence of ATTLIST describes one attribute of one element. Usually an ELEMENT appears first, followed by a series of ATTLISTs for that element. In each ATTLIST specification, the !ATTLIST keyword is followed by the name of the element whose attribute this is; the name of the attribute; the kind of information represented by the attribute's value; and finally a keyword that indicates how "required" the attribute is, plus optionally a default value.

The keywords for the "how required" part are these:

#REQUIRED The attribute always must be specified on the element. If it's missing that's an error.

#IMPLIED The attribute may be missing (and there is no default), or may appear with any value.

#FIXED plus default value If the attribute is specified then it must have the default value given; if it's not there the default will be assumed.

Default value only If the attribute is specified then it can have any value; if it's not given it will assume the default.

Although there are several different kinds of information that can appear in the value of an attribute, the most basic one is CDATA, which again means "arbitrary unparsed text." We'll illustrate in a moment how to specify an enumerated value.

<!ATTLIST student name CDATA #REQUIRED>

This says that the student element has an attribute called "name" which has a character value and which must always be specified.

<!ATTLIST student hometown CDATA "none specified">

This says that the student element has an attribute called "hometown" which has a character value. If the hometown attribute actually appears in the element it can have any value; if it does not appear then the value for that student will be "none specified".

An enumerated value for an attribute, looks a little bit like a content model, but the only operator it can have in it is the vertical bar for choice.

<!ATTLIST student college (arts | agr | hotel | law | vet | humec | ilr | engr) #REQUIRED >

This means that the student element has an attribute called "college" which must be present, and which must have exactly one of the listed, and only of the listed, values.

Writing the Java code

Your Java class should be declared like this:

public class CTimecards extends GenericPortalBean implements IXMLChannel

The class GenericPortalBean contains only three methods. Generally the base class implementation of them will be fine for your needs. The setPortalBaseDirmethod will store a string inside the current object, that remembers the base directory of the uPortal install tree. The uPortal framewok will call that method for you with the correct value, so you can assume it has been set. You can call getPortalBaseDir to get that base directory string. The uPortal software refers to several files in the "source" or "webpages" subdirectories of the installation; that way, there is no need to store these pages in the web server's web pages root, since client machines do not need to access them directly. The third method in GenericPortalBean is debug, which puts a string message to System.out.

The IXMLChannel interface describes the methods you need to implement to be a channel.

getSubscriptionProperties() is a "beanish" kind of call. The uPortal will call this method, and your code should create, initialize, and return an instance of the class ChannelSubscriptionProperties. The latter class contains generic characteristics of your channel: what is the user-visible name of the channel? What buttons should be shown in its title bar (minimize, close, edit)? Does the channel have a help screen? Can the channel be "torn off" as a new browser window (I don't think this is implemented yet)?

The uPortal will call the getRuntimeProperties() method immediately before requesting the channel to render itself. Your channel class should create, initialize and return an object of the class ChannelRuntimeProperties. At present the only thing inside this class is an indicator of whether the channel is presently capable of rendering itself at all. If you set this indicator to false (it is by default true), the uPortal will not bother to ask your channel to render itself.

setStaticData() is used by the uPortal to pass in a reference to a ChannelStaticData object that the uPortal has created for you. The uPortal will call this method when the channel is being created. ChannelStaticData contains a channel ID number that is unique within the user's session, and any channel configuration parameters that come from the channel's properties file.

The uPortal will call setRuntimeData() just before each time it calls
renderXML. It will pass in an instance of ChannelRuntimeData. The two
important things in that class are the baseActionURL, which is the full filepath to the .jsp file that caused your channel to get called (normally it's "layout.jsp");
and a reference to the HttpServletRequest object for the current
request. You can also call the getParameter method of yourChannelRuntimeDataobject to get at any parameters that are sent in to your channel (from the <parameter> elements in the XML defining the channel, as described earlier).

One of the two methods in which "something actually happens" is receiveEvent. This method is called whenever the user clicks on a control (that the uPortal knows about) that affects your channel; for example the minimize, full-screen/restore, close, and edit buttons in the channel's title bar. An object of the class LayoutEvent is passed in to this method, and you can call getEventNumber or getEventNameof that class to find out which control was clicked, then react to it.

The most important method of the IXMLChannel interface in which "something actually happens" is renderXML. The uPortal calls renderXML when it wants your channel to "draw" itself. As mentioned earlier, your Java code should just
generate and output some XML containing the information you want to display, and the stylesheet(s) you write will take that XML and turn it into HTML to be sent to the client. If you want to, you can internally keep track of some kind of "state" of your channel, and generate different XML streams depending on the state. This is what the bookmarks channel does.

How to get the Userid

The userid is accessible from the javax.servlet.http.HttpServletRequestobject. In a uPortal channel, you can get at that object via ChannelRuntimeData (see the setRuntimeData method above). Assuming you have stored your ChannelRuntimeData object reference in a field called runtimeData, you can do this to get the userid:

String theUserid = runtimeData.getHttpRequest().getRemoteUser();

At Cornell, provided that the web server has been modified to do Sidecar
authentication on the uPortal pages, this userid will be the Cornell NetID,
or it will be "guest" for people who do not have Sidecar or are not running it. Normally, Cornell channels providing NetID-specific user information are not made available to the "guest" userid by the local uPortal site manager. If you wish to make a demonstration version of your channel available, you could check for that userid in your channel and provide dummy demonstration data or operate in a limited manner.

How to output your XML

In your renderXML method (see the Writing the Java code section above), you need to get your channel's data, turn it into XML, then hand it over to a method of the XSLTProcessor class to get it formatted. The problem with using the CBookmarks channel as an example, is that it stores its data in XML format already, so when it comes time to render, all it does is run a stylesheet on the XML to turn it into HTML. Since our data is coming from non-XML databases, we must first make some XML.

Since the processing method you must call can (indirectly) accept a byte stream or character stream, the most straightforward way to make your XML is to concatenate it together as a String or StringBuffer. Then convert the String to a byte stream (an InputStream) or a character stream (a Reader), put it into an XSLTInputSource, and call the processor.

The renderXML method is passed a SAX DocumentHandler object as a parameter. The purpose of that DocumentHandler is to receive the completely formatted HTML to be sent to the browser. When you call the XSLTProcessor, you pass in the DocumentHandler object as the output destination.

Here's an example from the CBookmarks channel, with an added data creation section.

public void renderXML(DocumentHandler out)
{
String myData;
myData = "<?xml version \"1.0\"?>" +
"<!DOCTYPE bookmarks SYSTEM " +
"\"http://localhost:8000/my/testportal/dtd/bookmarks.dtd\">" +
"<bookmarks>" +
"<bookmark url=\"http://yahoo.com\" name=\"Yahoo\" " +
"comments=\"Search engine\"/>" +
"</bookmarks>";

StringReader myDataSR = new StringReader(myData);

try {
if (set!=null) {

XSLTInputSource stylesheet = set.getStylesheet(
runtimeData.getHttpRequest() );

if(stylesheet!=null) {

XSLTProcessor processor =
XSLTProcessorFactory.getProcessor();

processor.process(new XSLTInputSource(myDataSR),
stylesheet,new XSLTResultTarget(out));

}
}
} catch (Exception e) {};
}

The !DOCTYPE element has as its second token the name of the root element of the XML document; then the word SYSTEM in capitals; then a URL in quotes for the document type definition of your XML format. This is followed by the actual XML content of your channel. After making this string, the example turns it into a StringReader for the benefit of XSLTInputSource (see later in the example).

In the real CBookmarks class, the entire XML document, headers and all, would come from a file or a database, rather than being hard-coded as the value of the myData string. At Cornell, we would be getting information from a database via CUObjects, and turning that into an XML string.

The try clause contains the code that runs your XML through a stylesheet to the output. The clause as shown here, does three things. First, it finds the stylesheet you want to use to transform your XML. The "set" object is initialized in the constructor; it keeps a list of the XML stylesheet(s) you have created for this channel. Secondly, the try clause makes an XSLTProcessor via the factory method. Thirdly, it calls the process method of the XSLTProcessor. The three arguments are: (1) an XSLTInputSource, which is just a wrapper for several different kinds of input you can use. This first one, wraps the XML input document. (2) another XSLTInputSource wrapping your stylesheet. (3) an XSLTResultTarget wrapping the destination for the output from the XSLT stylesheet. In this case we use as our destination the out object which was passed in as a parameter to the renderXML method.

What and why is XSLT?

XSLT stands for eXtensible Stylesheet Language: Transformations. Its purpose is to transform, or modify, an XML document. An XSLT stylesheet is itself an XML document; it uses a set of XML elements defined by the "xsl:" namespace prefix.

A question that may come to mind immediately is -- why not just write a Java program to transform the XML? One reason to have a specific language for XML transformations, is that it can be more concise than a hand-coded Java program. Another is that it can be declarative -- instead of being organized as a procedural or object-oriented program as such, you can write a description of the information you want to transform, and the result you want to achieve, and let XSLT take care of the functionality that gets you what you want. A third motivation for creating XSLT, and making it look like XML, was the earlier example of HTML and CSS -- the stylesheets do not use the same syntax as the things they are formatting, and this is frustrating.

If one looks at the code for a uPortal channel (in 1.0 anyway), it eventually becomes evident that it is not absolutely necessary to use XSLT. One could theoretically output HTML directly. However, the side effect is that one loses the flexibility to customize the output for whatever display device the end user is using.

Sites that find it too demanding to expect every developer to learn XSLT, could specialize the work. Most developers would only have to learn a little about XML (as much as in this document's section on XML), and know enough HTML to define what they want the output to look like. Then the developer has to write the Java code to read their data and output it to the XML they have defined. The developer would give an example of what their XML will be, and of what the HTML needs to be, to a web specialist. The specialist would write the DTD for the XML, and write the required XSLT stylesheet(s). Another way to divide up the work, is for one person to do all the "web" part -- XML, XSLT, and HTML -- and provide the XML specification to the Java developer.

How XSLT works

There are several patterns that could be used for writing XSLT. If you are generating XML in your bean for your channel, and can arrange it however you like to suit the needs of the output, then the declarative style mentioned above, in which one describes the inputs one might find, and the output one wishes to generate as a result, will be able to accommodate your development. This is the only pattern we will describe here, and we will leave out considerable detail. For more information, consult the XSLT recommendation, or the book XSLT Programmer's Reference by Michael Kay (Wrox Press, 1-861003-12-9). At this writing it is the most comprehensive book about XSLT.

An XSLT program written in the most declarative style, consists primarily of a series of templates. Each template describes a pattern of one or more nodes in the source XML document to search for. The template also describes the information to be output when a match is found.

In our earlier section on representing information as XML, we said that in an XML document there must be one element that encloses all other elements, and this is called the document element. When an XML document is being parsed or processed, it is conceptually considered to be a tree data structure, with everything in the tree, including the document element, being underneath an invisible parent node called the document root. So, in a well-formed XML document, the (invisible) document root will have one child, the (visible) document element, and everything else will be hanging off the document element.

In XSLT, after the XML source document has been read in and parsed, and certain preparatory steps have been taken, the actual transformation of the document begins by "processing the document root." How do you "process" a node in the XML tree? It works like this:

1. Search the XSLT templates looking for one that matches the node.

2. If a matching template is found, look at the contents of the template; if there are any XSLT instructions in there, execute them; send out everything else inside the template to the output document. When done with this step, you are done processing the node.

3. If no matching template was found, then run the "built-in template," that is, do the default action. The default action is, "process all the children of this node." And that means, to run this entire algorithm on every immediate child of this node, whether or not any match is found for any one of them.

To rephrase and amplify: If no matching template can be found for a node being processed, then the immediate children of that node will be recursively processed, and the process will continue downward until a match is found along every branch or the leaves are arrived at. If a matching template is found for a node being processed, then the immediate children of that node will not be processed -- unless there is an XSLT instruction inside the matching template that explicitly causes further processing.

Let me give a few very simple examples.

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<html>
<head>
</head>
<body>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

The first element identifies this document as an XSL stylesheet. It defines a namespace, giving its prefix as "xsl" and its URI as "http://www.w3.org/1999/XSL/Transform". It also gives the XML version as 1.0. Normally you should just copy this element verbatim at the beginning of your own stylesheet.

The xsl:template element says that it matches the (invisible) document root; that's what the match="/" means. We'll talk more about specifying matches in the next section. The template then proceeds to output a completely empty but syntactically legal HTML page, with no title and no contents. Because there are no XSLT statements inside the template to cause further matching, the XSLT processor does not look any further inside the input XML document. Rather a useless example, but it's a start.

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:template match="/">
<html><head></head><body>
<xsl:apply-templates>
</body>
</html>
</xsl:template>

<xsl:template match="timecard">
... 
<xsl:template>

</xsl:stylesheet>

Let's assume that we are running this stylesheet on the "timecards" example we gave back in the section on representing data as XML. In the template that matches the document root, the above example outputs the beginning information of an HTML page. Then we encounter something new. Remember that if a node is matched, the processing will not descend to any children of the current node unless you tell it to. The xsl:apply-templateselement is one way, perhaps the simplest way, to say that you want to go deeper down the tree. Because no additional details are given in this example of it, it will process all the children of the current node, looking for any matching templates. When it is done processing all of them, it will pop back up into this template, and will carry out the rest of this template's contents. In this example, that means it will output two HTML instructions which close the body of the HTML page, then close the page itself.

Ah, but what happens while the xsl:apply-templates is being obeyed? It will begin by looking at the descendant of the document root; that would be the visible document element. In our "timecards" XML document, the document element is <timecards>, but there is no match for that (a node match is a "whole word" match, not a substring match). So it will follow the default advice of the built-in template, and look at every one of the immediate children of <timecards>. Well, in our earlier example they are all instances of <timecard>, and there is a match for that. So, for each one, the XSLT processor will run the contents of the last template in our example just above. I haven't bothered to specify exactly what happens there.

Location specifiers for matches and selects

In XSLT, you need to be able to specify what node or nodes in the XML input tree you want to refer to. This is done with the "match" attribute on xsl:template, and with the "select" attribute on commands such as xsl:value-of.

XSLT uses the XPath specification to define how to make such references. There is both a detailed and an abbreviated way of writing the references, and we will cover only the abbreviated way. These examples are extracted and modified from Section 2.5 "Abbreviated Syntax" of the XPath specification, and cover only a few combinations that you are most likely to need. The phrase "context node" means "the node from the perspective of which you want to 'move' as a starting point." The examples are in the following format.

timecard

is all of the immediate children of the context node which are elements named 'timecard'

This means that if the string "timecard" appears in a match attribute such as
<xsl:template match="timecard">
or a select attribute such as
<xsl:value-of select="timecard">
then it would be interpreted as described above.

*

is all of the children of the context node which are elements, no matter which kind of element they are

.

is the context node itself

text()

is all children of the context node which are text nodes rather than element nodes

timecard[1]

is the first 'timecard' child of the context node

timecard[last()]

is the last 'timecard' child of the context node

*/timecard

is all the grandchildren of the context node that are elements named 'timecard'

@payperiod

is the attribute named 'payperiod' of the context node

@*

is all of the attributes of the context node

timecard[@payperiod="12"]

is all 'timecard' children of the context node that have a payperiod attribute with value '12'

Some of the most useful XSLT elements.

<xsl:template match="" name="" priority="" mode="">

This is the primary organizing element of an XSLT spreadsheet. The purpose of xsl:template is to describe a particular node, or a particular kind of node, within the XML document being processed, and when a match occurs (during the searching process described in the section "How XSLT Works" above), the material enclosed by the xsl:template is inspected once for each node that is matched. If the enclosed material has any XSLT elements they are evaluated; any HTML commands or raw text are passed on to the XSLT output stream.

What to name your files and where to put them

To be written. $$$ Note to Mike: Need a section on what to name the stylesheet(s), where to put on server, and how to relate that to the code.