Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Modules

richardrodgers edited this page Jul 2, 2012 · 9 revisions

mds = Modular DSpace

Although the 'm' in mds originally signified modernized, it can with almost equal justification mean modular. This page explains what mds modules are, how to create them, and how the mds system handles them.

Functional Definition

In most general functional terms, an mds (DSpace) instance is a set of 2 or more modules, where one of them must be a special module known as the 'kernel'. The kernel contains the content API and an implementation of that API that can produce concrete objects in the content model, i.e. Items, Bitstreams, Communities, etc. It does not, however, contain any application code that uses the implementation. Expressed otherwise, the kernel is essentially a library that application code can use to produce real repositories. Therefore at least one other module must be present containing application code, for a real instance to exist. There is no limit to the number of modules an instance may contain, and importantly, modules may be added to existing instances at any time after the instance has been created, provided they are compatible with the instance. Modules may also be updated in an instance. In addition to the core APIs mentioned, the kernel module contains tools and code to perform these module operations (additions and upgrades, etc), so it may also be thought of as a bootstrap system (one that can build itself up). As such, the kernel must be the first module to be installed when creating a DSpace instance.

Module Maven Character

In technical terms, a module is any maven project that obeys certain conventions and restrictions and follows certain practices. Briefly, these are:

  1. Module projects must produce one of the natural maven artifacts, viz. a jar or a war (or ear, etc), not a pom project.
  2. Module projects must use maven artifact IDs that begin with 'dsm-' (DSpace module). This helps the system distinguish them from other projects, and apply special rules to them.
  3. Module projects must link to and employ a common maven assembly descriptor, which guarantees that the published version -- a zip archive containing the module - is uniform. This descriptor is published as an ordinary maven project that can be retrieved from a maven repository.
  4. Module projects must invoke the maven dependency plugin in a prescribed manner to produce a stored list of dependencies that will be used during module installation and upgrade.
  5. Module projects must place additional resources in standardized locations, but otherwise use regular maven conventions. (See detailed description below in Module Layout).
  6. Module projects are strongly encouraged to include the source code (in the standard maven 'src' directory) to permit local customization. Modules without source are considered 'locked', but may still be installed and upgraded. (This feature is intended to accommodate vended, or other controlled-source modules).

Beyond this, there are no particular restrictions on modules. They may use any package names, third party jars, etc. There is, of course, no guarantee that any given version of any module will be compatible with any given instance: this determination is made at installation time, not module build time.

Module Layout

As mentioned above, modules must organize their content in a very specific way. The basic layout is:

[bin/] [conf/] [db/] deps.txt lib/ pom.xml [reg/] src/
         |                                         |
         modules/                                  main/
         emails/                                     |
                                                     java/
                                                     webapp/

In addition to the basic maven required files (a pom.xml and a src directory), a file called 'deps.txt' will be present which is automatically generated by the maven dependency plugin. The directory structure is further expanded to include several optional directories, whose names and contents are:

bin : contains OS-specific shell or other executables. Since DSpace encourages the use of the 'script launcher' for most such command-line tools, this directory should rarely be needed.

conf : contains configuration files used by ConfigurationManager/Service. They should obey the same rules as DSpace; in particular, most all configuration properties files should reside under 'conf/modules'. Other configuration data, like email templates, follow the same rules.

db : contains DDL files (SQL scripts) needed by the module. Since the DDL for the instance is contained in the kernel module, mostly this directory will not be used.

reg : contains load files for DSpace registries (these are the files that formerly resided in 'config/registries').

Module Lifecycle

Modules can best be seen from two distinct vantage points: that of the the producer/author, and that of the consumer/installer. A rough lifecycle:

A module producer begins with an ordinary maven project (just a pom.xml and src files). This project is enhanced by the addition of the shared assembly (or an equivalent) descriptor (bound to the package phase), and the dependency plugin call (bound to the same phase). When:

mvn package

is run, the assembly process will create a archive (.zip) consisting of the built artifact (jar or war), all required dependencies placed in the 'lib' directory, the 'deps.txt' file, and any special files (conf, reg, etc), together with the src files and the pom itself. This zip archive (found in the producer's maven 'target' directory), becomes a distribution package for consumers of the module. They simply need to download/acquire the zip archive, unzip it, and they will see the directory structure described above.

The module consumer obtains the module package (zip archive), and installs it into a local DSpace instance. The exact sequence of steps to install will depend on several factors (i.e. whether the consumer wishes to customize the code, etc), but in the simplest (and typical) case, all the consumer will have to do is supply local configuration values. This usually means editing one or more properties files in the 'conf' directory of the module. When configured, the module installation process itself is managed by kernel module tools. Specifically, the tools are available in the 'script-launcher' ('dspace') application, so that the steps are reduced to:

edit conf/modules/themodule.cfg
./dspace install themodule

If a subsequent change in a module needs to be made (e.g. a configuration value), then the consumer edits the version of the module file that was downloaded, and just applies it:

edit conf/modules/themodule.cfg
./dspace update themodule

Note that in these simple cases, mds handles all installation details - neither maven nor ant are required to be present on the consumer's system. If the consumer wishes, however, to modify the module code, they simply make the changes:

edit src/main/java/org/dspace/themodule/Foo.java
mvn package
./dspace install themodule

since the module as distributed in the zip archive is a valid and complete maven project.

Keeping Things Straight

On the consumer side, it is important to understand how distributed module files relate to resources in the deployed instance. MDS, like DSpace, distinguishes three locations related to an instance: the 'installation' directory, the 'source-directory', and the webapp deployment directory. DSpace documentation tags these directories as '[dspace]' (installation) and '[dspace-source]'. We will roughly follow that convention, but call the source directory '[stage]', the installation '[install]' and the webapp location or locations '[deploy]'. This change in emphasis is intended to convey that the staging directory need not be a compilation source (it becomes one only if customization needed). As will be seen below, it is also important to understand that these three areas also constitute three distinct java execution environments (classpaths).

Let us walk through the installation of an mds system from scratch:

a. As noted, the first module to be installed must be a kernel module. The consumer obtains the kernel distribution - dsm-kernel-1.0.zip - and creates the [stage] directory to put it in.

b. Consumer unzips the kernel package in [stage], so that [stage] looks now looks like:

bin/ conf/ db/ deps.txt lib/ pom.xml reg/ src/    

c. Consumer edits the necessary values in [stage]/conf including the 'dspace.dir' property in 'kernel.cfg' that defines the location of [install]

d. From [stage]/bin, consumer performs installation:

./dspace install kernel

Note that this command is executing the the [stage] environment, since the [stage]/bin commands are using [stage]/lib as the classpath. Since [stage]/lib contains the kernel jar (which has installer code), the command will work. In essence, the install command will create [install], copy all resources and jars to it. [install] directory will now contain:

bin/  conf/  lib/

which contain all the files needed by the deployed instance (there may be a log/ directory where installation details are logged)

e. Now that the kernel has been installed, other modules may also. The consumer should see a directory called 'modules' in [stage] and there create directories for any modules she wishes to install (any name is OK). We can imagine a few modules, e.g. 'admin' for batch import and 'oai' for an OAI-PMH data provider web app. The consumer then needs only drop the module distributions (e.g. dsm-admin-1.0.zip, etc) in the named directories and unzip them.

f. If any configuration is required, it is done in [stage]/modules/admin/conf/modules/admin.cfg (for example). Then the new modules can be installed (from [stage]/bin as was the kernel, not [stage]/modules/admin/bin):

./dspace install admin

g. The kernel installer will copy/merge all the module files to [install], adding and new jars to lib/, etc. Web apps work almost the same way, except that following the install step, the consumer will have to take the war file in (e.g.) [stage]/modules/oai/lib/dsm-oai-1.0.war and drop it into the [tomcat]/webapps directory. The install of a web app only copies supporting resource files ('oaicat.properties, etc in this case) to [install], since the execution environment is Tomcat, not [install].

h. A valid instance has now been created at [install], and it is ready for use. The consumer simply must move execution environments from [stage] to [install]. For example to get started she might begin to set up the repository and load content:

[install]/bin/dspace create-administrator
[install]/bin/dspace import add -s /mydata/saf -m /mydata/map -e admin@foo.com
Clone this wiki locally