Getting started

Installation

PolyglotDB is now available directly via conda-forge. We recommend using Conda for installation, as it ensures compatibility with required system dependencies like Java and makes it easier to manage environments across platforms.

If you don’t have conda installed on your device:

  1. Install either Anaconda, Miniconda, or Miniforge (Conda Installation)

  2. Make sure your conda is up to date conda update conda

Note

On Windows, it is recommended to use the Anaconda Prompt or Miniforge Prompt to manage and execute conda commands effectively. This is because, by default, installing Anaconda or Miniforge does not add the conda command to your system’s PATH environment variable. However, if you prefer to use the regular Windows Command Prompt or run Python scripts directly from your IDE, you will need to manually add the necessary directories to your PATH. To do so, follow these steps:

  1. Open the Start Menu and search for Environment Variables.

  2. Click on Edit the system environment variables.

  3. In the System Properties window, click on the Environment Variables button.

  4. In the Environment Variables window, find the Path variable in the User variables or System variables section and select it.

  5. Click Edit, then New, and add the following two paths (adjust to your installation):

    1. C:\Users\YourUsername\Anaconda3

    2. C:\Users\YourUsername\Anaconda3\Scripts

After completing these steps, you should be able to use conda in the Windows Command Prompt and configure your IDE accordingly.

Pathway to installation

There are two mandatory steps to install PolyglotDB:

  1. Either Quick Installation via conda-forge or Installation from source with pip

  2. Setting up a local database

Optionally, you can Configure Your IDE between Step 1 and Step 2.

To install from source (primarily for development)

Note

Skip this step if you have installed PolyglotDB via conda-forge

  1. Clone or download the Git repository (https://github.com/MontrealCorpusTools/PolyglotDB).

  2. Navigate to the directory via command line and create the conda environment via conda env create -f environment.yml

  3. Activate conda environment conda activate polyglotdb-dev

  4. Install PolyglotDB via pip install -e ., which will install the pgdb utility that can be run inside your conda environment and manages a local database.

(Note that if you installed via conda-forge, the pgdb utility is already installed.)

Configure Your IDE (Optional)

Note

This step is not required for general use of PolyglotDB. You only need to do this if you plan to write/run PolyglotDB scripts within an IDE, such as Visual Studio Code, PyCharm, or similar tools.

If you are using an IDE, you may encounter issues where the IDE’s default Python interpreter is different from the one set up in your Conda environment. This can lead to errors such as missing packages, even if you’ve installed everything correctly in Conda. In such cases, you need to manually set the Python interpreter in your IDE to point to the one used by your Conda environment. If you are on Windows, make sure you have completed this step so that the Conda environment is accessible from your IDE’s terminal. For Visual Studio Code, follow these steps (a similar process applies to most other IDEs):

  1. Make sure you have the Python extension installed in VSCode.

  2. Open VSCode and open Command Palette (Ctrl+Shift+p on Windows or cmd+shift+p on Mac), then choose Python: Select Interpreter.

  3. Select the interpreter corresponding to your Conda environment (e.g., conda-env:polyglotdb).

  4. Open a new terminal in VSCode. If the environment is not activated automatically, run conda activate polyglotdb

Now, you can run PolyglotDB commands and scripts directly within VSCode’s integrated terminal.

Setting up local database

Installing the PolyglotDB package also installs a utility script (pgdb) that is then callable from the command line inside your conda environment. The pgdb command allows for the administration of a single Polyglot database (install/start/stop/uninstall). pgdb install is a separate step that installs the actual local database backend, including Neo4j and InfluxDB. This is necessary to run PolyglotDB locally.

You only need to run pgdb install once. After it is installed, you only ever use the commands in Managing the local database to interact with PolyglotDB databases.

Installing the local database

  1. Make sure you are inside the dedicated conda environment just created. If not, activate it via conda activate polyglotdb

  2. Inside your conda environment, run pgdb install /path/to/where/you/want/data/to/be/stored, or pgdb install to save data in the default directory.

Warning

  1. On Windows, make sure you are running as an Administrator (right-click on Anaconda Prompt/Miniforge Prompt/Command Prompt/Your IDE and select “Run as administrator”), as Neo4j will be installed as a Windows service.

  2. Do not use sudo with pgdb install on macOS, as it will lead to permissions issues later on.

Managing the local database

  • To start the database: pgdb start

  • To stop the database: pgdb stop

  • To uninstall the database pgdb uninstall

To view your conda environments:

conda info -e

To return to your root environment:

conda deactivate

Steps to use PolyglotDB

Now that you have set up the PolyglotDB conda environment and installed local databases, follow these steps each time you use PolyglotDB:

  1. Navigate to your working directory, either in your IDE or via the command line.

  2. Activate the conda environment: conda activate polyglotdb.

  3. Start the local databases: pgdb start.

  4. Put your Python scripts (which use the polyglotdb library) inside this working directory.

  5. Run the scripts using: python your_script.py.

  6. When finished, stop the local databases: pgdb stop.

  7. Deactivate the conda environment: conda deactivate.

Alternative Installation (Using Docker Environment)

Running PolyglotDB in a Docker container is a great way to maintain a consistent environment, isolate dependencies, and streamline your setup process. This section will guide you through setting up and using PolyglotDB within Docker. Note that this method is an alternative to the default installation with conda-forge or pip. If you already installed via conda-forge or pip above, do not re-install with Docker.

Prerequisites

Before starting, ensure that Docker is installed on your system. You can check if Docker is installed by running the following command in your terminal:

docker version

Setting Up the Docker Container

Follow these steps to get your Docker container up and running:

  1. Clone the Repository:

    First, clone the PolyglotDB Docker repository to your local machine:

    git clone https://github.com/MontrealCorpusTools/polyglotdb-docker.git

  2. Start the Docker Container:

    Navigate to the directory you just cloned and start the container:

    docker-compose run polyglotdb

    Note

    Note for Mac Users: If you’re using a Mac, you might need to run docker compose run polyglotdb

    The docker compose run automatically starts the databases server, so there’s no extra steps to set up the databases. This command launches an interactive shell inside the polyglotdb container, allowing you to execute PolyglotDB scripts directly.

  3. Working with the Default Folder Structure:

    Your default folder structure is as follows. Ensure your Python scripts and data are placed within the polyglotdb-docker directory, which is mounted to the Docker container for execution:

    polyglotdb-docker (your default working directory, mounted to /polyglotdb inside the Docker container)
    ├── pgdb
    │   ├── neo4j
    │   │   ├── conf
    │   │   │   └── neo4j.conf
    │   │   ├── data
    │   │   │   └── *
    │   │   └── logs
    │   │       └── *
    │   ├── influxdb
    │   │   ├── conf
    │   │   │   └── influxdb.conf
    │   │   ├── data
    │   │   │   └── *
    │   │   └── meta
    │   │       └── *
    ├── your scripts and data should go here
    
  4. Editing and Running Your PolyglotDB Scripts

    You can choose to edit your scripts either using an IDE outside of the Docker container or by using command-line text editors within the Docker container. Two text editors, nano and vim, are pre-installed for use inside the container.

    • Using an IDE Outside the Docker Container:

      If you prefer to use an IDE outside the Docker container, ensure that you save your scripts inside your working directory (default: polyglotdb-docker). You can customize this directory by following the instructions in the later section Changing the Default Storage Location. The scripts stored in this directory will be automatically available inside the Docker container under the /polyglotdb directory. You can then execute your scripts using the command: python your_script.py.

    • Using Command-Line Text Editors Inside the Docker Container:

      If you choose to write your scripts inside the Docker container using command-line tools, you can place them anywhere within the container and execute them using the command: python your_script.py. However, if you want to preserve your scripts after shutting down the container, ensure you save them in the directory mounted to your device (default: /polyglotdb).

    • Note when writing your scripts:

      1. It is important to avoid using absolute paths in your scripts when working with Docker. This is because the Docker container has its own internal filesystem, so absolute paths from your host machine (e.g., /home/user/documents/my_corpus) will not be valid inside the container. Instead, always use relative paths based on the current working directory inside the container. Additionally, you must place all files you want to reference (such as corpus folders, Praat scripts, etc.) inside the directory that is mounted to the Docker container, which is the polyglotdb-docker directory by default.

      import os
      corpus_root = './data/my_corpus'
      # Now you can use corpus_root to access files in the my_corpus folder
      
      1. The Docker setup comes with several pre-installed tools inside the polyglotdb container located at /pgdb/tools:

        1. Praat: Installed at /pgdb/tools/praat, environment variable praat. In your script, you can reference it by os.environ.get('praat').

        2. Reaper: Installed at /pgdb/tools/reaper, environment variable reaper. In your script, you can reference it by os.environ.get('reaper').

  5. Stopping the Docker Containers:

    To stop the Docker containers, first exit the polyglotdb shell by running:

    exit
    

    Then, shut down the other containers with:

    docker compose down
    

Changing the Default Storage Location

You can modify the default folder structure by editing the docker-compose.yml file. To change the storage location for Neo4j and InfluxDB data:

  1. Move the neo4j and influxdb folders from the polyglotdb-docker/pgdb directory to your desired location.

  2. Update the volume paths in the docker-compose.yml file to reflect the new location. For example:

    neo4j:
       ...
       volumes:
          - /path/to/your/neo4j/conf:/conf
          - /path/to/your/neo4j/data:/data
          - /path/to/your/neo4j/logs:/logs
          - shared_data:/temp
       ...
    
    influxdb:
       ...
       volumes:
          - /path/to/your/influxdb:/var/lib/influxdb
          - /path/to/your/influxdb/conf/influxdb.conf:/etc/influxdb/influxdb.conf
          - shared_data:/temp
       ...
    

You can also change the working directory by modifying the docker-compose.yml file. For instance:

polyglotdb:
   ...
   volumes:
      - shared_data:/temp
      - /path/to/your/working/directory:/polyglotdb

By doing this, the specified directory on your device will be mounted to the Docker container under /polyglotdb. To access PolyglotDB scripts and data within the container, ensure they are placed inside your chosen directory.