The list below describes the third-party components and tools you'll need to run a DSpace server. These are simply recommendations based on our setup at MIT; since DSpace is built on open source, standards-based tools, there are numerous other possibilities and setups.
Also, please note that the configuration and installation guidelines relating to a particular tool below are here for convenience. You should refer to the documentation for each individual component for complete and up-to-date details. Many of the tools are updated on a frequent basis, and the guidelines below may become out of date.
UNIX-like OS (Linux, HP/UX etc)
Java 1.4 or later (standard SDK is fine, you don't need J2EE)
Apache Ant 1.5 or later (Java make-like tool)
PostgreSQL 7.3 or later, an open source relational database, or Oracle 9 or higher.
PostgreSQL
Unicode (specifically UTF-8) support must be enabled. This is enabled by default in 8.0+. For 7.x, be sure to compile with the following options to the 'configure
' script:
--enable-multibyte --enable-unicode --with-java
Once installed, you need to enable TCP/IP connections (DSpace uses JDBC). For 7.x, edit postgresql.conf
(usually in /usr/local/pgsql/data
or /var/lib/pgsql/data
), and add this line:
tcpip_socket = true
For 8.0+, in postgresql.conf
uncomment the line starting:
listen_addresses = 'localhost'
Then tighten up security a bit by editing pg_hba.conf
and adding this line:
host dspace dspace 127.0.0.1 255.255.255.255 md5
Then restart PostgreSQL.
Oracle
Copy the Oracle JDBC driver into [dspace-source]/lib
.
Create a database for DSpace. Make sure that the character set is one of the Unicode character sets. DSpace uses UTF-8 natively, and it is suggested that the Oracle database use the same character set. Create a user account for DSpace (e.g. dspace
,) and ensure that it has permissions to add and remove tables in the database.
Edit the config/dspace.cfg file in your source directory for the following settings:
db.name = oracle db.url = jdbc.oracle.thin:@//host:port/dspace db.driver = oracle.jdbc.OracleDriver
Go to [dspace-source]/etc/oracle
and copy the contents to their parent directory, overwriting the versions in the parent:
cd dspace_source/etc/oracle cp * ..
You now have Oracle-specific .sql
files in your etc
directory, and your dspace.cfg is modified to point to your Oracle database and are ready to continue with a normal DSpace install, skipping the Postgres setup steps.
NOTE: DSpace uses sequences to generate unique object IDs - beware Oracle sequences, which are said to lose their values when doing a database export/import, say restoring from a backup. Be sure to run the script etc/update-sequences.sql
.
ALSO NOTE: Everything is fully functional, although Oracle limits you to 4k of text in text fields such as item metadata or collection descriptions.
For people interested in switching from Postgres to Oracle, I know of no tools that would do this automatically. You will need to recreate the community, collection, and eperson structure in the Oracle system, and then use the item export and import tools to move your content over.
Jakarta Tomcat 4.x/5.x or equivalent, such as Jetty or Caucho Resin.
Note that DSpace will need to run as the same user as Tomcat, so you might want to install and run Tomcat as a user called 'dspace
'.
You need to ensure that Tomcat has a) enough memory to run DSpace and b) uses UTF-8 as its default file encoding for international character support. So ensure in your startup scripts (etc) that the following environment variable is set:
JAVA_OPTS="-Xmx512M -Xms64M -Dfile.encoding=UTF-8"
You also need to alter Tomcat's default configuration to support searching and browsing of multi-byte UTF-8 correctly. You need to add a configuration option to the <Connector>
element in [tomcat]/config/server.xml
:
URIEncoding="UTF-8"
e.g. if you're using the default Tomcat config, it should read:
<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 --> <Connector port="8080" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />
Jetty and Resin are configured for correct handling of UTF-8 by default.
But First, a Word on Directories and Path Names
DSpace uses three separate directory trees. Although you don't need to know all the details of them in order to install DSpace, you do need to know they exist and also know how they're referred to in this document:
[dspace-source]
[dspace]
[tomcat]/webapps/dspace
(with [tomcat]
being wherever
you installed Tomcat--also known as $CATALINA_HOME). This directory is generated by the web server when it unpacks dspace.war, and should never be edited.For details on the contents of these separate directory trees, refer to directories.html. Note that the source directory and install directory should always be separate!
Create the DSpace user. This needs to be the same user that Tomcat (or Jetty etc) will run as. e.g. as root run:
useradd -m dspace
Download the latest DSpace source code release and unpack it:
gunzip -c dspace-source-1.x.tar.gz | tar -xf -
Copy the PostgreSQL JDBC driver (.jar
file) into
[dspace-source]/lib
. If you compiled PostgreSQL yourself, it'll be in postgresql-7.x.x/src/interfaces/jdbc/jars/postgresql.jar
. Alternatively you can download it directly from the PostgreSQL JDBC site. Make sure you get the driver for the version of PostgreSQL you're running and for JDBC2.
Create a dspace
database, owned by the dspace
PostgreSQL user:
createuser -U postgres -d -A -P dspace ; createdb -U dspace -E UNICODE dspace
Enter a password for the DSpace database. (This isn't the same as the dspace
user's UNIX password.)
Edit [dspace-source]/config/dspace.cfg
, in particular you'll need to set these properties:
dspace.url dspace.hostname dspace.name db.password (the password you entered in the previous step) mail.server mail.from.address feedback.recipient mail.admin alert.recipient (not essential but very useful!)
Note that if you change dspace.dir
you'll also need to change other properties with values that start with /dspace
, e.g. assetstore.dir
, log.dir
...
Create the directory for the DSpace installation. As root, run:
mkdir [dspace] ; chown dspace [dspace]
(Assuming the dspace
UNIX username.)
As the dspace
UNIX user, compile and install DSpace:
cd [dspace-source] ; ant fresh_install
The most likely thing to go wrong here is the database connection. See the common problems section.
Copy the DSpace Web application archives (.war
files) to the appropriate directory in your Tomcat/Jetty/Resin installation. For example:
cp [dspace-source]/build/*.war [tomcat]/webapps
Create an initial administrator account:
[dspace]/bin/create-administrator
Now the moment of truth! Start up (or restart) Tomcat. Visit the base URL of your server, e.g. http://dspace.myu.edu:8080/dspace. You should see the DSpace home page. Congratulations!
In order to set up some communities and collections, you'll need to access the administration UI. To do this, append 'admin' to your server's URL, e.g. http://dspace.myu.edu:8080/dspace/dspace-admin.
The above installation steps are sufficient to set up a test server to play around with, but there are a few other steps and options you should probably consider before deploying a DSpace production site.
A couple of DSpace features require that a script is run regularly -- the e-mail subscription feature that alerts users of new items being deposited, and the new 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for indexing.
To set these up, you just need to run the following command as the dspace
UNIX user:
crontab -e
Then add the following lines:
# Send out subscription e-mails at 01:00 every day 0 1 * * * [dspace]/bin/sub-daily # Run the media filter at 02:00 every day 0 2 * * * [dspace]/bin/filter-media
Naturally you should change the frequencies to suit your environment.
PostgreSQL also benefits from regular 'vacuuming', which optimizes the indices and clears out any deleted data. Become the postgres
UNIX user, run crontab -e
and add (for example):
# Clean up the database nightly at 2.40am 40 2 * * * vacuumdb --analyze dspace > /dev/null 2>&1
In order that statistical reports are generated regularly and thus kept up to date you should set up the following cron jobs:
# Run stat analyses 0 1 * * * [dspace]/bin/stat-general 0 1 * * * [dspace]/bin/stat-monthly 0 2 * * * [dspace]/bin/stat-report-general 0 2 * * * [dspace]/bin/stat-report-monthly
Obviously, you should choose execution times which are most useful to you, and you should ensure that the -report-
scripts run a short while after the analysis scripts to give them time to complete (a run of around 8 months worth of logs can take around 25 seconds to complete); the resulting reports will let you know how long analysis took and you can adjust your cron times accordingly.
For information on customising the output of this see configuring system statistical reports.
Plain old HTTP is totally insecure, and if your DSpace uses username/password authentication or stores some restricted content, running it over HTTPS (HTTP over a Secure Socket Layer (SSL)) is advisable. There are two options for this: Using Apache HTTPD, or Tomcat/Jetty's in-built HTTPS support.
To use Apache HTTPD: The DSpace source bundle includes a partial Apache configuration apache13.conf
, which contains most of the DSpace-specific configuration required. It assumes you're using mod_webapp, which is deprecated and tricky to compile but a lot easier to configure than mod_jk2
which is the current recommendation from Tomcat. Use of this is optional, you might just want to use it as an example. To use it directly, in the main Apache httpd.conf
, you should:
mod_ssl
and mod_webapp
are configured and loadedInclude [dspace]/config/httpd.conf
. You can decide where the DSpace part will go in your file system--see the configuration section.To use Tomcat or Jetty's HTTPS support consult the documentation for the relevant tool.
First a few facts to clear up some common misconceptions:
You don't have to use CNRI's Handle system. At the moment, you need to change the code a little to use something else (e.g PURLs) but that should change soon.
You'll notice that while you've been playing around with a test server, DSpace has apparently been creating handles for you looking like hdl:123456789/24
and so forth. These aren't really Handles, since the global Handle system doesn't actually know about them, and lots of other DSpace test installs will have created the same IDs.
They're only really Handles once you've registered a prefix with CNRI (see below) and have correctly set up the Handle server included in the DSpace distribution. This Handle server communicates with the rest of the global Handle infrastructure so that anyone that understands Handles can find the Handles your DSpace has created.
If you want to use the Handle system, you'll need to set up a Handle server. This is included with DSpace. Note that this is not required in order to evaluate DSpace; you only need one if you are running a production service. You'll need to obtain a Handle prefix from the central CNRI Handle site.
A Handle server runs as a separate process that receives TCP requests from other Handle servers, and issues resolution requests to a global server or servers if a Handle entered locally does not correspond to some local content. The Handle protocol is based on TCP, so it will need to be installed on a server that can broadcast and receive TCP on port 2641.
The Handle server code is included with the DSpace code in
[dspace-source]/lib/handle.jar
. A script exists to create a simple Handle configuration - simply run [dspace]/bin/make-handle-config
after you've set the appropriate parameters in dspace.cfg
. You can also create a Handle configuration directly by following the installation instructions on handle.net, but with these changes:
java -cp /hs/bin/handle.jar net.handle.server.SimpleSetup /hs/svr_1as directed in the Handle Server Administration Guide, you should run
[dspace]/bin/dsrun net.handle.server.SimpleSetup [dspace]/handle-serverensuring that
[dspace]/handle-server
matches whatever you have in dspace.cfg
for the handle.dir
property.[dspace]/handle-server/config.dct
file to include the following lines in the "server_config"
clause:
"storage_type" = "CUSTOM" "storage_class" = "org.dspace.handle.HandlePlugin"
This tells the Handle server to get information about individual Handles from the DSpace code.
Whichever approach you take, start the Handle server with [dspace]/bin/start-handle-server
, as the DSpace user. Once the configuration file has been generated, you will need to go to http://hdl.handle.net/4263537/5014 to upload the generated sitebndl.zip file. The upload page will ask you for your contact information. An administrator will then create the naming authority/prefix on the root service (known as the Global Handle Registry), and notify you when this has been completed. You will not be able to continue the handle server installation until you receive further information concerning your naming authority.
Note that since the DSpace code manages individual Handles, administrative operations such as Handle creation and modification aren't supported by DSpace's Handle server.
TODO
In any software project of the scale of DSpace, there will be bugs. Sometimes, a stable version of DSpace includes known bugs. We do not always wait until every known bug is fixed before a release. If the software is sufficiently stable and an improvement on the previous release, and the bugs are minor and have known workarounds, we release it to enable the community to take advantage of those improvements.
The known bugs in a release are documented in the KNOWN_BUGS
file in the source package.
Please see the DSpace bug tracker for further information on current bugs, and to find out if the bug has subsequently been fixed. This is also where you can report any further bugs you find.
In an ideal world everyone would follow the above steps and have a fully functioning DSpace. Of couse, in the real world it doesn't always seem to work out that way. This section lists common problems that people encounter when installing DSpace, and likely causes and fixes. This is likely to grow over time as we learn about users' experiences.
ant fresh_install
There are two common errors that occur. If your error looks like this--
[java] 2004-03-25 15:17:07,730 INFO org.dspace.storage.rdbms.InitializeDatabase @ Initializing Database [java] 2004-03-25 15:17:08,816 FATAL org.dspace.storage.rdbms.InitializeDatabase @ Caught exception: [java] org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. [java] at org.postgresql.jdbc1.AbstractJdbc1Connection.openConnection(AbstractJdbc1Connection.java:204) [java] at org.postgresql.Driver.connect(Driver.java:139)
it usually means you haven't yet added the relevant configuration parameter to your PostgreSQL configuration (see above), or perhaps you haven't restarted PostgreSQL after making the change.
Also, make sure that the db.username
and db.password
properties are correctly set in
[dspace-source]/config/dspace.cfg
.
An easy way to check that your DB is working OK over TCP/IP is to try this on the command line:
psql -U dspace -W -h localhost
Enter the dspace
database password, and you should be dropped into the psql tool with a dspace=>
prompt.
Another common error looks like this:
[java] 2004-03-25 16:37:16,757 INFO org.dspace.storage.rdbms.InitializeDatabase @ Initializing Database [java] 2004-03-25 16:37:17,139 WARN org.dspace.storage.rdbms.DatabaseManager @ Exception initializing DB pool [java] java.lang.ClassNotFoundException: org.postgresql.Driver [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:198) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:186)
This means that the PostgreSQL JDBC driver is not present in [dspace-source]/lib
. See above.
If you're trying to tweak Tomcat's configuration but nothing seems to make a difference to the error you're seeing, you might find that Tomcat hasn't been shutting down properly, perhaps because it's waiting for a stale connection to close gracefully which won't happen. To see if this is the case, try:
ps -ef | grep java
and look for Tomcat's Java processes. If they stay arround after running Tomcat's shutdown.sh
script, trying kill
ing them (with -9
if necessary), then starting Tomcat again.
If you find that when you try to access a DSpace Web page and your browser sits there connecting, or if the database connections fail, you might find that a 'zombie' database connection is hanging around preventing normal operation. To see if this is the case, try:
ps -ef | grep postgres
You might see some processes like this
dspace 16325 1997 0 Feb 14 ? 0:00 postgres: dspace dspace 127.0.0.1 idle in transaction
This is normal--DSpace maintains a 'pool' of open database connections, which are re-used to avoid the overhead of constantly opening and closing connections. If they're 'idle' it's OK; they're waiting to be used. However sometimes, if something went wrong, they might be stuck in the middle of a query, which seems to prevent other connections from operating, e.g.:
dspace 16325 1997 0 Feb 14 ? 0:00 postgres: dspace dspace 127.0.0.1 SELECT
This means the connection is in the middle of a SELECT
operation, and if you're not using DSpace right that instant, it's probably a 'zombie' connection. If this is the case, try kill
ing the process, and stopping and restarting Tomcat.
After you've rebuilt DSpace and copied dspace.war
from your [dspace-source]/build
directory
into your [tomcat]/webapps
directory, you must
also delete the existing [tomcat]/webapps/dspace
directory before re-starting Tomcat. Otherwise
Tomcat will continue to use the old code.