.. _SPADE analysis repository: https://github.com/MontrealCorpusTools/SPADE

.. _admin section of the ISCAN server: https://iscan.readthedocs.io/en/latest/administration.html

.. _basic_queries.py: https://github.com/MontrealCorpusTools/SPADE/blob/master/basic_queries.py

.. _local:

Interacting with a local Polyglot database
==========================================

There are two potential ways to have a local Polyglot instance up and running on your local machine.  The first is a
command line utility :code:`pgdb`.  The other option is to connect to
a locally running ISCAN server instance.


pgdb utility
------------

This utility provides a basic way to install/start/stop all of the required databases in a Polyglot database (see
:ref:`local_setup` for more details on setting up a Polyglot instance this way).

When using this set up the following ports are used (and are relevant for later connecting with the corpus):

+-------+----------+----------+
|  Port | Protocol | Database |
+=======+==========+==========+
| 7474  | HTTP     | Neo4j    |
+-------+----------+----------+
| 7475  | HTTPS    | Neo4j    |
+-------+----------+----------+
| 7687  | Bolt     | Neo4j    |
+-------+----------+----------+
| 8086  | HTTP     | InfluxDB |
+-------+----------+----------+
| 8087  | UDP      | InfluxDB |
+-------+----------+----------+

If any of those ports are in use by other programs (they're also the default ports for the respective database software),
then the Polyglot instance will not be able to start.

Once :code:`pgdb start` has executed, the local Neo4j instance can be seen at :code:`http://localhost:7474/`.

Connecting from a script
````````````````````````

When the Polyglot instance is running locally, scripts can connect to the relevant databases through the use of parameters passed to
CorpusContext objects (or CorpusConfig objects):


.. code-block:: python

    from polyglotdb import CorpusContext, CorpusConfig

    connection_params = {'host': 'localhost',
                        'graph_http_port': 7474,
                        'graph_bolt_port': 7687,
                        'acoustic_http_port': 8086}
    config = CorpusConfig('corpus_name', **connection_params)
    with CorpusContext(config) as c:
        pass # replace with some task, i.e., import, enrichment, or query


These port settings are used by default and so connecting to a vanilla install of the ``pgdb`` utility can be done more simply
through the following:

.. code-block:: python

    from polyglotdb import CorpusContext

    with CorpusContext('corpus_name') as c:
        pass # replace with some task, i.e., import, enrichment, or query


See the tutorial scripts for examples that use this style of connecting to a local ``pgdb`` instance.

.. _local_iscan_server:

Local ISCAN server
------------------

A locally running ISCAN server is a more fully functional system that can manage multiple Polyglot databases (creating, starting and stopping
as necessary through a graphical web interface).
While ISCAN servers are intended to be run on dedicated remote servers, there will often be times where scripts
will need to connect a locally running server.  For this, there is a utility function :code:`ensure_local_database_running`:

.. code-block:: python

    from polyglotdb import CorpusContext, CorpusConfig
    from polyglotdb.utils import ensure_local_database_running

    with ensure_local_database_running('database', port=8080, token='auth_token_from_iscan') as connection_params:
        config = CorpusConfig('corpus_name', **connection_params)
        with CorpusContext(config) as c:
            pass # replace with some task, i.e., import, enrichment, or query

.. important::

   Replace the ``database``, ``auth_token_from_iscan``, and ``corpus_name`` with relevant values.  In the use case of one
   corpus per database,
   ``database`` and ``corpus_name`` can be the same name, as in the `SPADE analysis repository`_.

As compared to the example above, the only difference is the context manager use of :code:`ensure_local_database_running`.
What this function does is first try to connect to a ISCAN server running on the local machine.
If it successfully connects, then it creates a new database named :code:`"database"` if it does not already exist, starts it if
it is not already running, and then returns the connection parameters as a dictionary that can be used for instantiating
the :code:`CorpusConfig` object.  Once all the work inside the context of :code:`ensure_local_database_running` has been completed, the
database will be stopped.

The token keyword argument should be an authentication token for a user with appropriate permissions to access the ISCAN
server.  This token can be found by going to the admin page for tokens within ISCAN (by default, http://localhost:8080/admin/auth_token/)
and choosing an appropriate one.  However, please ensure that this token is not committed or made public in any way as
that would lead to security issues.  One way to use this in committed code is to have the token saved in a separate text
document that git does not track, and load it via a function like:

.. code-block:: python

    def load_token():
        token_path = os.path.join(base_dir, 'auth_token')
        if not os.path.exists(token_path):
            return None
        with open(token_path, 'r') as f:
            token = f.read().strip()
        return token

.. note::

   The ISCAN server keeps track of all existing databases and ensures that the ports do not overlap, so multiple databases
   can be run simultaneously.  The ports are all in the 7400 and 8400 range, and should not (but may) conflict with other applications.

This utility is thus best for isolated work by a single user, where only they will be interacting
with the particular database specified and the database only needs to be available during the running of the script.

You can see an example of connecting to local ISCAN server used in the scripts for the `SPADE analysis repository`_,
for instance the `basic_queries.py`_ script.