Edit this page | Blame

Virtuoso

We run instances of virtuoso for our graph databases. Virtuoso is remarkable software and runs some really large databases, including Uniprot. Virtuoso can sometimes feel old and clunky. But, we still prefer it to other shiny new ones because it is the only large one not written in Java. Java packages are almost impossible to package in Guix.

Running virtuoso

Running virtuoso in a guix system container

We have a Guix virtuoso service in the guix-bioinformatics channel. The easiest way to run virtuoso is to use the virtuoso service to run it in a guix system container. The only downside of this method is that, since guix system containers require root privileges to start up, you will need root priviliges on the machine you are running this on.

Here is a basic guix system configuration that runs virtuoso listening on port 8891, and with its HTTP server listening on port 8892. Among other things, the HTTP server provides a SPARQL endpoint to interact with.

(use-modules (gnu)
             (gn services databases))

(operating-system
  (host-name "virtuoso")
  (timezone "UTC")
  (locale "en_US.utf8")
  (bootloader (bootloader-configuration
               (bootloader grub-bootloader)
	       ;; It doesn't matter what "/dev/sdX" is
               (targets (list "/dev/sdX"))))
  (file-systems %base-file-systems)
  (users %base-user-accounts)
  (packages %base-packages)
  (services (cons (service virtuoso-service-type
                           (virtuoso-configuration
                            (server-port 8891)
                            (http-server-port 8892)))
                  %base-services)))

You can write the above configuration to a file, say virtuoso-os.scm, build a container with it, and run it with the command below. Everything inside the container is ephemeral and vanishes when the container is stopped. In order to persist the database, we mount a host directory /tmp/virtuoso-state at /var/lib/virtuoso in the container. /var/lib/virtuoso is the default state directory used by the Guix virtuoso service.

sudo $(guix system container --network --share=/tmp/virtuoso-state=/var/lib/virtuoso virtuoso-os.scm)

When running the above command, you will be given the container's PID. Should you want to inspect the container, you can run:

sudo nsenter -at PID /run/current-system/profile/bin/bash

If you have only one shepherd process running on your system, you may use the following quick hack to get the PID.

sudo nsenter -at $(pgrep shepherd) /run/current-system/profile/bin/bash

Also, in this set-up, note that the conductor web interface is not supported in the GUIX Service that's part of guix-bioinformatics. It isn't required for using virtuoso as a SPARQL server and only adds to the confusion.

Running virtuoso by invoking it on the command line

You may also choose to run virtuoso the traditional way by invoking it on the command line. Managing long-running instances started from the command line is messy. So, this method works best for temporary instances.

First, we create a new directory for virtuoso and change into it. We will run virtuoso from this directory, and virtuoso will store all its state in this directory.

mkdir virtuoso
cd virtuoso

Then, we create a configuration file---virtuoso.ini. A basic configuration need only specify the ports to listen on. Here we specify port 8891 for the virtuoso server and port 8892 for the HTTP server that includes the SPARQL endpoint.

[Parameters]
ServerPort = localhost:8891

[HTTPServer]
ServerPort = localhost:8892

Finally, we start virtuoso.

virtuoso-t +foreground +configfile virtuoso.ini

Detailed documentation of the virtuoso configuration file format is at

In particular, consider setting NumberOfBuffers and MaxDirtyBuffers as described at

For a working configuration file, you can also look at /export/virtuoso/var/lib/virtuoso/db/virtuoso.ini in penguin2.

Running SPARQL Queries using isql

The straight-forward way of running SPARQL queries is using the web-interface:

To use a CLI tool, you can utilise isql by running:

guix shell virtuoso-ose -- isql <server-port>

Queries within isql look like:

SQL> SPARQL SELECT * WHERE {?s ?p ?o};

Set passwords for virtuoso users

After running virtuoso, you will want to change the default password of the `dba` user. The default password of the `dba` user is `dba`. You can change passwords using the isql command-line client. See

Loading data into virtuoso

Virtuoso supports at least three different ways to load RDF.

Bulk loading using the isql command-line client

Bulk loading using the isql command-line client is usually the fastest. But, it requires correct handling of file system permissions, and cannot work on remote servers.

SPARQL 1.1 Update

The standard SPARQL protocol allows update of RDF too.

SPARQL 1.1 Graph Store HTTP Protocol

For ease of implementation, SPARQL 1.1 also specifies an additional REST-like API to update data.

The virtuoso documentation shows examples of using this protocol with cURL.

We recap the same here.

When uploading data, the virtuoso server often does not report errors properly. It simply freezes up. So, it is very helpful to validate your RDF before uploading. For this, use rapper from the raptor2 package. To validate data.ttl, a turtle file, run

rapper --input turtle --count data.ttl

Then, upload it to a virtuoso SPARQL endpoint running at port 8892

curl -v -X PUT --digest -u dba:password -T data.ttl -G http://localhost:8892/sparql-graph-crud-auth --data-urlencode graph=http://example.org

where http://example.org is the name of the graph.

The PUT method deletes the existing data in the graph before loading the new one. So, there is no need to manually delete old data before loading new data. However, virtuoso is slow at deleting millions of triples, resulting in an apparent freeze-up. So, it is preferable to handle such deletes manually using a lower-level SQL statement issued via the isql client.

$ isql
SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://example.org');

When virtuoso has just been started up with a clean state (that is, the virtuoso state directory was empty before virtuoso started), uploading large amounts of data using the SPARQL 1.1 Graph Store HTTP Protocol fails the first time. It succeeds only the second time. It is not clear why. I can only recommend retrying as in this commit:


(made with skribilo)