Skip to main content

Migrate Graph Service Implementation to Elasticsearch

We currently support either Elasticsearch or Neo4j as backend implementations for the graph service. We recommend Elasticsearch for those looking for a lighter deployment or do not want to manage a Neo4j database. If you started using Neo4j as your graph service backend, here is how you can migrate to Elasticsearch.

Docker-compose

If you are running your instance through docker locally, you will want to spin up your Datahub instance with elasticsearch as the backend. On a clean start, this happens by default. However, if you've written data to Neo4j you need to explicitly ask DataHub to start in Elastic mode.

datahub docker quickstart --graph-service-impl=elasticsearch

Next, run the following command from root to rebuild your graph index.

./docker/datahub-upgrade/datahub-upgrade.sh -u RestoreIndices

After this command completes, you should be migrated. Open up the DataHub UI and verify your relationships are visible.

Once you confirm the migration is successful, you must remove your neo4j volume by running

docker volume rm datahub_neo4jdata

This prevents your DataHub instance from coming up in neo4j mode in the future.

Helm

First, adjust your helm variables to turn off neo4j and set your graph_service_impl to elasticsearch.

To turn off neo4j in your prerequisites file, set neo4j-community's enabled property to false in this values.yaml.

Then, set graph_service_impl to elasticsearch in the values.yaml of datahub.

See the deployment helm guide for more details on how to set up your helm deployment.

Finally, follow the restore-indices helm guide to re-build your graph index.

Once the job completes, your data will be migrated.