Interacting with a local Polyglot database¶
There are two potential ways to have a local Polyglot instance up and running on your local machine. The first is a
command line utility pgdb
. The other option is to connect to
a locally running ISCAN server instance.
pgdb utility¶
This utility provides a basic way to install/start/stop all of the required databases in a Polyglot database (see Set up local database for more details on setting up a Polyglot instance this way).
When using this set up the following ports are used (and are relevant for later connecting with the corpus):
Port |
Protocol |
Database |
---|---|---|
7474 |
HTTP |
Neo4j |
7475 |
HTTPS |
Neo4j |
7687 |
Bolt |
Neo4j |
8086 |
HTTP |
InfluxDB |
8087 |
UDP |
InfluxDB |
If any of those ports are in use by other programs (they’re also the default ports for the respective database software), then the Polyglot instance will not be able to start.
Once pgdb start
has executed, the local Neo4j instance can be seen at http://localhost:7474/
.
Connecting from a script¶
When the Polyglot instance is running locally, scripts can connect to the relevant databases through the use of parameters passed to CorpusContext objects (or CorpusConfig objects):
from polyglotdb import CorpusContext, CorpusConfig
connection_params = {'host': 'localhost',
'graph_http_port': 7474,
'graph_bolt_port': 7687,
'acoustic_http_port': 8086}
config = CorpusConfig('corpus_name', **connection_params)
with CorpusContext(config) as c:
pass # replace with some task, i.e., import, enrichment, or query
These port settings are used by default and so connecting to a vanilla install of the pgdb
utility can be done more simply
through the following:
from polyglotdb import CorpusContext
with CorpusContext('corpus_name') as c:
pass # replace with some task, i.e., import, enrichment, or query
See the tutorial scripts for examples that use this style of connecting to a local pgdb
instance.
Local ISCAN server¶
A locally running ISCAN server is a more fully functional system that can manage multiple Polyglot databases (creating, starting and stopping
as necessary through a graphical web interface).
While ISCAN servers are intended to be run on dedicated remote servers, there will often be times where scripts
will need to connect a locally running server. For this, there is a utility function ensure_local_database_running
:
from polyglotdb import CorpusContext, CorpusConfig
from polyglotdb.utils import ensure_local_database_running
with ensure_local_database_running('database', port=8080, token='auth_token_from_iscan') as connection_params:
config = CorpusConfig('corpus_name', **connection_params)
with CorpusContext(config) as c:
pass # replace with some task, i.e., import, enrichment, or query
Important
Replace the database
, auth_token_from_iscan
, and corpus_name
with relevant values. In the use case of one
corpus per database,
database
and corpus_name
can be the same name, as in the SPADE analysis repository.
As compared to the example above, the only difference is the context manager use of ensure_local_database_running
.
What this function does is first try to connect to a ISCAN server running on the local machine.
If it successfully connects, then it creates a new database named "database"
if it does not already exist, starts it if
it is not already running, and then returns the connection parameters as a dictionary that can be used for instantiating
the CorpusConfig
object. Once all the work inside the context of ensure_local_database_running
has been completed, the
database will be stopped.
The token keyword argument should be an authentication token for a user with appropriate permissions to access the ISCAN server. This token can be found by going to the admin page for tokens within ISCAN (by default, http://localhost:8080/admin/auth_token/) and choosing an appropriate one. However, please ensure that this token is not committed or made public in any way as that would lead to security issues. One way to use this in committed code is to have the token saved in a separate text document that git does not track, and load it via a function like:
def load_token():
token_path = os.path.join(base_dir, 'auth_token')
if not os.path.exists(token_path):
return None
with open(token_path, 'r') as f:
token = f.read().strip()
return token
Note
The ISCAN server keeps track of all existing databases and ensures that the ports do not overlap, so multiple databases can be run simultaneously. The ports are all in the 7400 and 8400 range, and should not (but may) conflict with other applications.
This utility is thus best for isolated work by a single user, where only they will be interacting with the particular database specified and the database only needs to be available during the running of the script.
You can see an example of connecting to local ISCAN server used in the scripts for the SPADE analysis repository, for instance the basic_queries.py script.