Installation & Configuration

This section explains how to get, install and configure an OLD application. An overview of the process:

  1. Download and install the OLD.
  2. Generate an OLD config file and edit it.
  3. Run the setup command to create the database tables and directory structure.
  4. Serve the application and test that it is working properly.

Note that these installation instructions assume a Unix-like system, e.g., Linux or Mac OS X. If you are using Windows [1], please refer to the Pylons or the virtualenv documentation for instructions on how to create and activate a Python virtual environment and install and download a Pylons application.

QuickStart

For the impatient, here is the quickest way to install, configure and serve an OLD application. Before blindly issuing the following commands, however, it is recommended that you read the detailed instructions in the following sections

virtualenv --no-site-packages env
source env/bin/activate
easy_install onlinelinguisticdatabase
mkdir xyzold
cd xyzold
paster make-config onlinelinguisticdatabase production.ini
paster setup-app production.ini
paster serve production.ini

Open a new terminal window and run the basic test script to ensure that the OLD application is being served and is operating correctly:

python _requests_tests.py

You should see All requests tests passed. as output. Congratulations.

Download

Pre-packaged eggs of stable OLD releases can be downloaded from the Python Package Index.

The easiest way to get and install the OLD is via the Python command-line program Easy Install. Before issuing the following command, read the Create a virtual Python environment and consider installing the OLD in a virtual environment. To download and install the OLD with Easy Install, run:

sudo easy_install onlinelinguisticdatabase

For developers, the full source code for the OLD can be found on GitHub. To clone the OLD repository, first install Git and then run:

git clone git://github.com/jrwdunham/old.git

See below for detailed instructions.

Install

Create a virtual Python environment

It is recommended that the OLD be installed in a virtual Python environment. A virtual environment is an isolated Python environment within which you can install the OLD and its dependencies without inadvertently rendering other programs unworkable by, say, upgrading their dependencies in incompatible ways. If you do not want to install the OLD and its dependencies in a virtual environment, skip this section.

Use virtualenv to create a virtual Python environment. First, follow the steps on the aforementioned web site to install virtualenv. If you already have easy_install or pip installed, you can just run one of the following commands at the terminal:

pip install virtualenv
easy_install virtualenv

Otherwise, you can download the virtualenv archive, decompress it, move into the directory and install it manually, i.e.,

cd virtualenv-X.X
python setup.py install

Once virtualenv is installed, create a virtual environment in a directory called env (or any other name) with the following command:

virtualenv --no-site-packages env

The virtual environment set up in env is packaged with a program called easy_install which, as its name suggests, makes it easy to install Python packages and their dependencies. We will use the virtual environment’s version of easy_install to install the OLD and its dependencies into the virtual environment.

There are two ways to do this. The more explicit and verbose way is to specify the path to the executables in the virtual environment directory. That is, to run the virtual environment’s python, easy_install or pip executables, you would run one of the following commands.

/path/to/env/bin/python
/path/to/env/bin/easy_install
/path/to/env/bin/pip

The easier way (on Posix systems) is to activate the Python virtual environment by running the source command with the path to the activate executable in your virtual environment as its first argument. That is, run:

source /path/to/env/bin/activate

If the above command was successful, you should see the name of your virtual environment directory in parentheses to the left of your command prompt, e.g., (env)username@host:~$. Now invoking python, easy_install, paster, pip, etc. will run the relevant executable in your virtual environment.

Install the OLD

The easiest way to install the OLD is via Easy Install, as in the command below. (Note that from this point on I am assuming that you have activated a virtual environment in one of the two ways described above or have elected not to use a virtual environment.)

easy_install onlinelinguisticdatabase

You can also use pip to install it:

pip install onlinelinguisticdatabase

Once the install has completed, you should see Finished processing dependencies for onlinelinguisticdatabase. (If you used pip, you will see something like Successfully installed onlinelinguisticdatabase.) This means that the OLD and all of its dependencies have been successfully installed.

If you have downloaded the OLD source code and need to install the dependencies, then move to the root directory of the source, i.e., the one containing the setup.py file, and run:

python setup.py develop

Configure

Generate the config file

Once the OLD is installed, it is necessary to configure it. This is done by generating a default config file and making any desired changes. When the OLD’s setup script is run, several directories will be created in the same directory as the config file. Therefore, it is a good idea to create the config file in its own directory. I use the convention of naming production systems using the ISO 639-3 three-character id of the object language. To illustrate, I will use the fictitious language id xyz and will name the directory xyzold, the MySQL database xyzold and the MySQL user xyzuser. If following this convention, replace “xyz” with the Id of the language your OLD application will be documenting. To make a new directory called xyzold and change to it, issue the following commands.

mkdir xyzold
cd xyzold

The first step in configuring the OLD is creating a config file. To create a config file named production.ini, run:

paster make-config onlinelinguisticdatabase production.ini

By default, the OLD is set to serve at 127.0.0.1 on port 5000, the Pylons interactive debugger is turned off and the database (RDBMS) is set to SQLite (a database called production.db will be created in the current directory). These defaults are good for verifying that everything is working ok. On a production system you will need to change the host and port values in the config file as well as set the database to MySQL. If you want to get up and running with MySQL right away, see the Set up MySQL/MySQLdb section; otherwise, continue on to Edit the config file.

Developers will not need to generate a config file. The test.ini and development.ini config file should already be present in the root directory of the source. See the Developers section for details.

Set up MySQL/MySQLdb

The OLD can be configured to use either MySQL or SQLite as its relational database management system (RDBMS).

While SQLite is easy to install (both the SQLite library and the pysqlite language binding are built into the Python language), it is not recommended for multi-user concurrent production systems. Therefore, a production OLD setup should have MySQL installed. The following instructions assume that you have successfully installed the MySQL server on your system.

First login to MySQL as root:

mysql -u root -p<root_password>

Then create a database to store your OLD data:

mysql> create database xyzold default character set utf8;

Now create a MySQL user with sufficient access to the above-created database. In the first command, xyzuser is the username and 4R5gvC9x is the password.

mysql> create user 'xyzuser'@'localhost' identified by '4R5gvC9x';
mysql> grant select, insert, update, delete, create, drop on xyzold.* to 'xyzuser'@'localhost';
mysql> quit;

Make sure that the above commands worked:

mysql -u xyzuser -p4R5gvC9x
mysql> use xyzold;
mysql> show tables;

Now MySQL is set up with a database called xyzold (with UTF-8 as its default character set) and a user xyzuser who has access to xyzold. The next step is to make sure that the python module MySQLdb is installed. Enter a Python prompt (using your virtual environment, if applicable) and check:

python
>>> import MySQLdb

If you see no output, then MySQLdb is installed. If you see ImportError: No module named MySQLdb, then you need to install MySQLdb.

Installing MySQLdb can be tricky. On some Linux distributions, it is necessary to first install python-dev. On distros with the Advanced Packaging Tool, you can run the following command.

apt-get install python-dev

Once python-dev is installed, run the following to install MySQLdb (remembering to activate the virtual environment, if necessary).

easy_install MySQL-python

Note that it is also possible to use easy_install to install MySQLdb at the same time as you install the OLD. Instead of running easy_install onlinelinguisticdatabase as above, run the following command:

easy_install onlinelinguisticdatabase[MySQL]

Edit the config file

The config file (whose creation was described in Generate the config file) is where an OLD app is configured. Open the config file (e.g., production.ini) and make any desired changes. While the config file is self-documenting, this section supplements that documentation.

(Note that once the OLD is downloaded and installed, it may be used to run several distinct OLD web services, e.g., for different languages. To do this, repeat the configuration steps with different settings. For example, to create two OLD web services, one for language xyz and one for language abc, create two directories, xyzold and abcold, generate a config file in each, and edit each config file appropriately, following these instructions.)

The host and port where the application will be served are configured here. The defaults of 127.0.0.1 (i.e., localhost) and 5000 are fine for initial setup and testing. During deployment and server configuration, the host will certainly need to be changed and the port probably also.

The set debug = false line should be left as is on a production setup. However, for initial testing it is a good idea to comment out this line with a hash mark (i.e., #set debug = false) so that errors can be debugged. When the line is commented out and an error occurs, Pylons will generate a detailed error report with a web interface that can be accessed by navigating to the link printed to the console (i.e., stderr).

The sqlalchemy.url parameter will need to be changed, depending on the relational database setup needed. If SQLite will be used, then the sqlalchemy.url = sqlite:///production.db line should remain uncommented. Change the database name, if desired; i.e., change production.db to, say, mydb.sql.

If MySQL will be used, then the first step is to comment out the SQLite line, and uncomment the two MySQL lines:

#sqlalchemy.url = sqlite:///production.db
sqlalchemy.url = mysql://username:password@localhost:3306/dbname
sqlalchemy.pool_recycle = 3600

Then, change the first MySQL line so that it contains the appropriate values for your MySQL setup. E.g., using the example setup from Set up MySQL/MySQLdb would involve changing it to the following:

sqlalchemy.url = mysql://xyzuser:4R5gvC9x@localhost:3306/xyzold

The only other values you may want to change are password_reset_smtp_server, create_reduced_size_file_copies and preferred_lossy_audio_format.

Uncomment the password_reset_smtp_server = smtp.gmail.com line if you want the system to send emails using a Gmail account specified in a separate gmail.ini config file.

Set create_reduced_size_file_copies to 0 if you do not want the system to create copies of images and .wav files with reduced sizes. Note that in order for the reduced-copies functionality to succeed with images and .wav files it is necessary to install the Python Imaging Library (PIL) and FFmpeg, respectively (see the Soft dependencies section below).

Finally, set the preferred_lossy_audio_format to mp3 instead of ogg if you would like to create .mp3 copies of your users’ .wav files instead of .ogg ones. (Note that a default installation of FFmpeg may not be able to convert .wav to .mp3 without installation of some additional libraries.)

Setup

Once the OLD has been installed and a config file has been created and edited, it is time to run the setup command. This will generate the tables in the database, insert some defaults (e.g., some users and useful tags) and create the requisite directory structure. To set up an OLD application, move to the directory containing the config file (e.g., xyzold containing production.ini) and run the paster setup-app command:

cd xyzold
paster setup-app production.ini

If successful, the output should be Running setup_app() from onlinelinguisticdatabase.websetup. By default, the OLD sends logs to application.log so if you run cat application.log you should see something like the following.

Environment loaded.
Retrieving ISO-639-3 languages data.
Creating a default administrator, contributor and viewer.
Tables created.
Creating default home and help pages.
Generating default application settings.
Creating some useful tags and categories.
Adding defaults.
OLD successfully set up.

If you now enter the database and poke around, you will see that the tables have been created and the defaults inserted.

mysql -u xyzuser -p4R5gvC9x
mysql> use xyzold;
mysql> show tables;
mysql> select username from user;

You should also see two new directories (analysis and files), the application log file application.log and Python script _requests_tests.py.

Serve

To begin serving an OLD application, use Paster’s serve command:

paster serve production.ini

The output should be something like the following.

Starting server in PID 7938.
serving on http://127.0.0.1:5000

If you visit http://127.0.0.1:5000 in a web browser, you should see {"error": "The resource could not be found."} displayed. If you visit http://127.0.0.1:5000/forms in a web browser, you should see {"error": "Authentication is required to access this resource."}. These error responses are to be expected: the first because no resource was specified in the request URL and the second because authentication is required before forms can be read. Congratulations, this means an OLD application has successfully been set up and is being served locally.

When paster setup-app is run, a Python script called _requests_tests.py is created in the current working directory. This script uses the Python Requests module to test that a live OLD application is working correctly. Assuming that you have run paster serve and an OLD application is being served locally on port 5000, running the following command will run the _requests_tests script:

python _requests_tests.py

If everything is working correctly, you should see All requests tests passed. (Note that if you have changed the config file, i.e., the host or port values, then you will need to change the values of the host and/or port variables in _requests_tests.py to match.)

Soft dependencies

In order to create smaller copies of image files and .wav files, the OLD uses the Python Imaging Library (PIL) and the FFmpeg command-line program. If you would like your OLD application to automatically create reduced-size images and lossy (i.e., .ogg or .mp3) copies of .wav files, then these programs should be downloaded and installed using the instructions on the above-linked pages. I provide brief instructions here.

In order to allow the specification of phonologies as finite-state transducers, the OLD uses the command-line programs foma and flookup. See the linked page for installation instructions.

In order to search OLD treebank corpora, Tgrep2 must be installed.

NLTK may be used for some OLD functionality.

PIL

To install PIL, download and decompress the source. Then move into the root folder and run setup.py install (remembering to use your virtualenv Python executable, if necessary):

cd Imaging-1.1.7
python setup.py install

Note

I experienced difficulties installing PIL in this way such that jpeg functionality was not working. To correctly install PIL, I did:

cd Imaging-1.1.7
~/env/bin/python setup.py build_ext -i
~/env/bin/python selftests.py
~/env/bin/python setup.py install

The OLD accepts .jpg, .png and .gif image file uploads. If you want to test whether the PIL install can resize all of these formats, create a test file of each format and run something like the following. If successful, you will have created a smaller version of each image:

>>> import Image
>>> im = Image.open('large_image.jpg')
>>> im.thumbnail((500, 500), Image.ANTIALIAS)
>>> im.save('small_image.jpg')

FFmpeg

FFmpeg is a command-line tool that can convert .wav files to the lossy formats .ogg and .mp3. It can be somewhat tricky to install FFmpeg properly and some installs will not support .mp3 creation by default. For Debian 6.0 (Squeeze), I can recommend this tutorial.

Once ffmpeg is installed, you can check whether .wav-to-.ogg and .wav-to-.mp3 conversion is working by ensuring you have a file called old_test.wav in the current directory and issuing the following commands:

ffmpeg -i old_test.wav old_test.ogg
ffmpeg -i old_test.wav old_test.mp3

If successful, you will have created a .ogg and a .mp3 version of your .wav file.

Deploy

Deploying an OLD application means getting a domain name, serving the application on the world wide web and setting up some admin scripts. There are many possible ways to achieve this. In my production systems I have followed the approach of using Apache to proxy requests to Pylons as described in Chapter 21: Deployment of The Pylons Book and have had success with that. I review that approach here.

Assuming Apache 2, mod_proxy and mod_proxy_http are installed, you first enable the latter two:

sudo a2enmod proxy
sudo a2enmod proxy_http

Then you create a config file such as the one below in /etc/apache2/sites-available/ or in the equivalent location for your platform. I used the config file below for an OLD application deployed for documenting the Okanagan language. The domain name is okaold.org. I saved the file as /etc/apache2/sites-available/okaold.org and created the error logs directory, i.e., /home/old/log. The only configuration necessary for the OLD config file (i.e., the production.ini file whose creation was detailed in Generate the config file) is to ensure that the host variable is set to localhost and the port variable is set to 8081.

NameVirtualHost *
# OKA - Okanagan
<VirtualHost *>
    ServerName okaold.org
    ServerAlias www.okaold.org

    # Logfiles
    ErrorLog /home/old/log/error.log
    CustomLog /home/old/log/access.log combined

    # Proxy
    ProxyPreserveHost On
    ProxyPass / http://localhost:8081/ retry=5
    ProxyPassReverse / http://localhost:8081/
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>
</VirtualHost>

Now you can start serving the OLD application with Paster. In order to keep the server running after you exit the shell, you must invoke paster serve in daemon mode, as follows:

paster serve --daemon production.ini start

Now disable the default Apache configuration, enable the virtual host config file just created (in this case okaold.org) and restart Apache:

sudo a2dissite default
sudo a2ensite okaold.org
sudo /etc/init.d/apache2 restart

You might also want the paster serve script to log error messages, which you can do by specifying a file to log to using the --log-file option. You can also use the --pid-file option to store the process ID of the running server in a file so that other tools know which server is running:

paster serve --daemon --pid-file=/home/old/okaold.pid --log-file=/home/old/log/paster-okaold.log production.ini start

As well as specifying start, you can use a similar command with stop or restart to stop or restart the running daemon, respectively.

The Pylons Book also explains how to Create init scripts and how to use crontab to restart a paster server that is serving an OLD/Pylons application (should that) ever be necessary. See the referenced sections for details.

You may also wish to write admin scripts to monitor an OLD application to ensure that it is functioning properly and to email you if not. I may include a guide for doing that at some future data.

Finally, it is a good idea to make regular backups of the database and the files and analysis directories of your OLD application. In my production systems I have used MySQL database replication to create a mirror of my production database on a second server in a different location. I then use the standard Unix utility rsync to create live copies of the files and analysis directories on that same second server. A Python script is run periodically on the second server to perform a mysqldump of the relevant databases. I will further document my backup setup at a later date.

Developers

This section provides an overview of the OLD for developers. It covers (1) how to download the source and install the dependencies, (2) the structure of the source, (3) how to write and compile the documentation to HTML and PDF, (4) the creation of Python version-specific virtual environments and (5) the building of OLD releases as eggs or archives.

For detailed documentation on developing a Pylons application, consult the excellent documentation for the Pylons framework, i.e., The Pylons Book.

Download & depencency installation

This subsection details how to get the OLD source and install its dependencies. To download the most up-to-date source code, make sure you have Git installed and run:

git clone git://github.com/jrwdunham/old.git

To install the dependencies, move to the newly created old directory and run:

python setup.py develop

Directory structure

The onlinelinguisticdatabase directory contains all of the files and directories that will be packaged into the final distribution. Its subdirectories are config, controllers, lib, model, public, and tests. This section gives an overview of the contents of these directories and the websetup.py file.

The websetup.py file controls how an OLD application is set up. That is, when paster setup-app config_file.ini is run (or when nosetests is run), the contents of websetup.py determine what database tables are created, what defaults are entered into them and what directories are created.

The config directory houses the deployment.ini_tmpl and routing.py files. The former is the template used to generate the config file when something like paster make-config production.ini is run. The routing.py module is where the mappings from URL paths to OLD controller actions are specified. When a new controller is created or the interface to an existing controller needs to be changed, the routing.py file must be edited.

The controllers directory holds a module for each OLD controller. For example, the controllers/forms.py module defines a FormsController class; the methods of this class (the controller’s actions) return values which determine the content of particular responses. The index method (action) of the FormsController class, for example, returns a list of all form models in the database; since config/routing.py maps GET /forms to FormsController.index, it is this list of forms that is returned when GET /forms is requested.

The lib directory holds modules that define functionality used by multiple controllers. The utils.py module defines a large number of widely-used functions, classes and data structures; these are made available in controllers under the h namespace, e.g., the value of h.markupLanguages is the list of valid markup language string values, as defined in utils.py. The auth.py script holds the decorators that control authentication and authorization. The schemata.py module contains the validators that are applied against user input. The other modules in the lib directory are mentioned in this document where appropriate; consult the docstrings for more information.

The model directory contains a module for each SQLAlchemy model used by the OLD. For example, model/file.py houses the File class which defines the attributes of the file model and their implementation as columns and relations in a relational database. The model/model.py is special; it defines the Model class from which all of the other models inherit a number of methods. Note that in order to make a model available in the onlinelinguisticdatabase.model namespace, it must be imported in model/__init__.py.

The public directory may contain static files, HTML, CSS and JavaScript. Since the client-side OLD application has not yet been implemented, the public directory contains, at present, only the iso_639_3_languages_data which stores the tab-delimited files containing the ISO-639-3 dataset.

The tests directory contains all of the test modules. When the nosetests command is run, it is the modules here that define the tests. For example, tests/functional/test_forms.py defines a TestFormsController class whose methods test the various actions (or functionalities) of the forms controller. For example, the test_create method of the TestFormsController class simulates POST /forms requests and confirms that the system behaves as expected. When testing new funcionality, new tests should be defined in tests/functional or existing tests should be supplemented. Note the _toggle_tests.py script which does not define tests but provides an easy way to turn large numbers of them on or off. For example, ./onlinelinguisticdatabase/tests/functional/_toggle_tests.py on will turn all tests on and ./onlinelinguisticdatabase/tests/functional/_toggle_tests.py off will turn them all off. See its docstrings for further usage instructions. Finally, the tests directory also contains the _requests_tests.py script which defines some simple tests (using the Requests module) which (as described in the Serve section) can be run on a live OLD application to ensure that it is working correctly.

The websetup.py module defines the setup_app function that is called when the OLD is set up, i.e., when paster setup-app config_file.ini is issued. The behaviour of the setup process is determined by the name of the config file. If test.ini is the config file (as is the case when nosetests is run), then test-specific setup will be performed, i.e., all database tables will be dropped and then re-created. Otherwise, only the tables that do not already exist will be created.

Documentation

This section reviews the OLD documentation creation process. The OLD documentation (i.e., this document) is written using Sphinx and the reStructuredText lightweight markup language. In order to edit and build the documentation, Sphinx must be installed:

easy_install sphinx

The reStructuredText source files for the OLD documentation are the .rst-suffixed files in the docs directory. The Sphinx documentation has a good overview of the reStructuredText syntax. Once the source files have been edited, build the documentation HTML (in docs/_build/html) by moving to the docs directory and running:

sphinx-build -b html . ./_build/html

To generate a LaTeX version of the documentation in docs/_build/latex, run (from the docs directory):

sphinx-build -b latex . ./_build/latex

If pdflatex is installed [2], generate a PDF of the documentation by moving to docs/_build/latex and running:

pdflatex -interaction=nonstopmode OLD.tex

Virtualenv & Python distros

In order to test whether the OLD works on different Python versions or to build distributions for those versions, it is necessary to create virtual environments for each such Python distribution.

The pythonbrew utility facilitates the building and installation of different Pythons in a user’s home directory. Install pythonbrew using the instructions on its web site.

Now run pythonbrew install to install the desired Pythons. For example, to install Python 2.4.6, 2.5.6 and 2.7.3, run:

pythonbrew install 2.4.6
pythonbrew install 2.5.6
pythonbrew install 2.7.3

Once complete, new Python executables should be installed in ~/.pythonbrew/pythons/Python-2.4.6, ~/.pythonbrew/pythons/Python-2.5.6, etc. For example, to launch the Python 2.5.6 interactive console, run:

~/.pythonbrew/pythons/Python-2.5.6/bin/python

To create a virtual environment using one of these Pythons, run virtualenv with the -p option followed by the path to the desired Python executable. It is also a good idea to choose a name for the virtual environment that makes it easy to tell what version of Python it uses. For example:

virtualenv -p ~/.pythonbrew/pythons/Python-2.5.6/bin/python env-2.5.6

Make sure that the new virtual environment has the correct python:

~/env-2.5.6/bin/python --version

Note that the OLD works with Python 2.6 and 2.7 but not with 2.4 or 2.5. It has not been tested with Python 3.

Releases

This section explains how to build stable OLD releases and how to upload them to PyPI.

To build an egg or a source distribution of a stable release, run the following two commands, respectively:

python setup.py bdist_egg
python setup.py sdist

Each of these commands will create a new archive in the dist directory.

In order to build an OLD egg distribution and upload it to PyPI in one command, run the following command. (Note that you will need the OLD’s PyPI password in order to be permitted to do this.)

python setup.py bdist_egg register upload

To create and upload the source distribution to PyPI (so that, e.g., Pip can be used to install the OLD), run:

python setup.py sdist register upload
[1]The OLD has not been tested on Windows. Some alterations to the source may be required in order to get it running on a Windows OS. To be clear, this does not mean that users running a Windows OS will not be able to use a production OLD web application. A live OLD application is a web service and users with any operating system should be able to interact with it, assuming an internet connection is available. What this does mean is that the OLD, as is, may not run on a Windows server.
[2]See this page for an overview of how to use the TeX command-line utilities.