Interacting with a local Polyglot database

There are two potential ways to have a local Polyglot instance up and running on your local machine. The first is a command line utility pgdb. The other option is to connect to a locally running ISCAN server instance.

pgdb utility

This utility provides a basic way to install/start/stop all of the required databases in a Polyglot database (see Set up local database for more details on setting up a Polyglot instance this way).

When using this set up the following ports are used (and are relevant for later connecting with the corpus):

Port

Protocol

Database

7474

HTTP

Neo4j

7475

HTTPS

Neo4j

7687

Bolt

Neo4j

8086

HTTP

InfluxDB

8087

UDP

InfluxDB

If any of those ports are in use by other programs (they’re also the default ports for the respective database software), then the Polyglot instance will not be able to start.

Once pgdb start has executed, the local Neo4j instance can be seen at http://localhost:7474/.

Connecting from a script

When the Polyglot instance is running locally, scripts can connect to the relevant databases through the use of parameters passed to CorpusContext objects (or CorpusConfig objects):

from polyglotdb import CorpusContext, CorpusConfig

connection_params = {'host': 'localhost',
                    'graph_http_port': 7474,
                    'graph_bolt_port': 7687,
                    'acoustic_http_port': 8086}
config = CorpusConfig('corpus_name', **connection_params)
with CorpusContext(config) as c:
    pass # replace with some task, i.e., import, enrichment, or query

These port settings are used by default and so connecting to a vanilla install of the pgdb utility can be done more simply through the following:

from polyglotdb import CorpusContext

with CorpusContext('corpus_name') as c:
    pass # replace with some task, i.e., import, enrichment, or query

See the tutorial scripts for examples that use this style of connecting to a local pgdb instance.

Local ISCAN server

A locally running ISCAN server is a more fully functional system that can manage multiple Polyglot databases (creating, starting and stopping as necessary through a graphical web interface). While ISCAN servers are intended to be run on dedicated remote servers, there will often be times where scripts will need to connect a locally running server. For this, there is a utility function ensure_local_database_running:

from polyglotdb import CorpusContext, CorpusConfig
from polyglotdb.utils import ensure_local_database_running

with ensure_local_database_running('database', port=8080, token='auth_token_from_iscan') as connection_params:
    config = CorpusConfig('corpus_name', **connection_params)
    with CorpusContext(config) as c:
        pass # replace with some task, i.e., import, enrichment, or query

Important

Replace the database, auth_token_from_iscan, and corpus_name with relevant values. In the use case of one corpus per database, database and corpus_name can be the same name, as in the SPADE analysis repository.

As compared to the example above, the only difference is the context manager use of ensure_local_database_running. What this function does is first try to connect to a ISCAN server running on the local machine. If it successfully connects, then it creates a new database named "database" if it does not already exist, starts it if it is not already running, and then returns the connection parameters as a dictionary that can be used for instantiating the CorpusConfig object. Once all the work inside the context of ensure_local_database_running has been completed, the database will be stopped.

The token keyword argument should be an authentication token for a user with appropriate permissions to access the ISCAN server. This token can be found by going to the admin page for tokens within ISCAN (by default, http://localhost:8080/admin/auth_token/) and choosing an appropriate one. However, please ensure that this token is not committed or made public in any way as that would lead to security issues. One way to use this in committed code is to have the token saved in a separate text document that git does not track, and load it via a function like:

def load_token():
    token_path = os.path.join(base_dir, 'auth_token')
    if not os.path.exists(token_path):
        return None
    with open(token_path, 'r') as f:
        token = f.read().strip()
    return token

Note

The ISCAN server keeps track of all existing databases and ensures that the ports do not overlap, so multiple databases can be run simultaneously. The ports are all in the 7400 and 8400 range, and should not (but may) conflict with other applications.

This utility is thus best for isolated work by a single user, where only they will be interacting with the particular database specified and the database only needs to be available during the running of the script.

You can see an example of connecting to local ISCAN server used in the scripts for the SPADE analysis repository, for instance the basic_queries.py script.