Hosting a Triplestore with Oxigraph

Last updated 2023.03.02

Data can be persisted in a RDF Triplestore, using HTTP and

SPARQL

SPARQL - pronouned “sparkle” (✨) — is a query language for RDF graph shapes. It is not clear what the acronym stands for. The query language is based around the RDF concepts of “subject, obect, predicate”. SPARQL can select items from from the graph, construct new graphs of new shapes, or sumamrize the data within a graph.

as a query language to insert and retrieve data.

A performant and open source solution for hosting your own triplestore is Oxigraph, though other options certainly exist like AWS Neptune, GraphDB, and Triply.

Stucco Software's infrastructure is built using Oxigraph Server on Fly.io. Fly.io uses Wireguard to allow for private networking between nodes, so we also use NGINX as a reverse proxy to add authentication Oxigraph, which as of this writing doesn’t implement its own basic authentication layer.

Oxigraph

To host Oxigraph on Fly.io, follow the documentation for fly launch. A directory structure for this deployment is simple:

📁 fly-oxigraph
  - 📄 Dockerfile
  - 📄 fly.toml

The toml configuration file looks like this:

# fly.toml

app = "oxigraph"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

Our Dockerfile is also pretty simple:

FROM oxigraph/oxigraph
CMD [ "--location", "/data", "serve", "--bind", "[::]:8080" ]

Persistent Storage

The last step is an important one, putting the “store” in “datastore”. Follow the Fly docs to attach a Volume to the Oxigraph application at the /data directory.

NGINX

We also deploy a separate NGINX application that has public ports to the open internet, and proxy requests through to the internal, private Oxigraph ports with a layer of basic authentication.

The directory structure has a little bit more going on:

📁 fly-nginx
  - 📄 Dockerfile
  - 📄 nginx.conf
  - 📄 fly.toml
  - 📄 .htpasswd

The Dockerfile is straightforward:

FROM nginx
COPY nginx.conf /etc/nginx/conf.d/nginx.conf
COPY .htpasswd /etc/nginx/.htpasswd

While the the nginx.conf has the logic for the authentication and the proxy:

server {
  listen 8080;
  listen [::]:8080;

  server_name _;
  rewrite ^/(.*) /$1 break;

  proxy_ignore_client_abort on;
  proxy_set_header  X-Real-IP  $remote_addr;
  proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header  Host $http_host;
  proxy_set_header Access-Control-Allow-Origin "*";

  location / {
    auth_basic "Oxigraph";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://oxigraph.internal:8080;
    proxy_pass_request_headers on;
    proxy_ssl_protocols TLSv1.2;
    proxy_ssl_server_name on;
  }

  location ~ ^(/|/query)$ {
    auth_basic "Oxigraph";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://oxigraph.internal:8080;
    proxy_pass_request_headers on;
    proxy_ssl_protocols TLSv1.2;
    proxy_ssl_server_name on;
  }

  location ~ ^(/update|/store)$ {
    auth_basic "Oxigraph";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://oxigraph.internal:8080;
    proxy_pass_request_headers on;
    proxy_ssl_protocols TLSv1.2;
    proxy_ssl_server_name on;
  }
}

The .htpasswd file holds username:password combinations. One adds to these using the CLI:

$ htpasswd -Bbn <usename> <password> >> .htpasswd

Deploy the proxy to Fly with a fly.toml file like this:

# fly.toml

app = "oxigraph-proxy"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

Conclusion

If all went well, you should now have a functioning RDF Triplestore, gated behind your defined username and passwords. You can start using SPARQL to query this store right away.

References

  1. https://www.w3.org/RDF/
  2. https://www.w3.org/2001/sw/wiki/SPARQL
  3. https://github.com/oxigraph/oxigraph
  4. https://crates.io/crates/oxigraph_server
  5. https://fly.io/docs/reference/private-networking/
  6. https://fly.io/docs/reference/fly-launch/
  7. https://fly.io/docs/reference/volumes/