Edit this page | Blame

Virtuoso: Shutdown Clears Data

Tags

  • type: bug
  • assigned: fredm
  • priority: critical
  • status: closed, completed
  • interested: bonfacem, pjotrp, zsloan
  • keywords: production, container, tux04, virtuoso

Description

It seems that virtuoso has the bad habit of clearing data whenever it is stopped/restarted.

This issue will track the work necessary to get the service behaving correctly.

According to the documentation on

The bulk loader also disables checkpointing and the scheduler, which also need to be re-enabled post bulk load

That needs to be handled.

Notes

After having a look at

it occurs to me that the reason virtuoso supposedly clears the data is that the `DatabaseFile` value is not set, so it defaults to a new database file every time the server is restarted (See also the `Striping` setting).

Troubleshooting

Reproduce locally:

We begin by getting a look at the settings for the remote virtuoso

$ ssh tux04
fredm@tux04:~$ cat /gnu/store/bg6i4x96nm32gjp4qhphqmxqc5vggk3h-virtuoso.ini
[Parameters]
ServerPort = localhost:8981
DirsAllowed = /var/lib/data
NumberOfBuffers = 4000000
MaxDirtyBuffers = 3000000
[HTTPServer]
ServerPort = localhost:8982

Copy these into a file locally, and adjust the `NumberOfBuffers` and `MaxDirtyBuffers` for smaller local dev environment. Also update `DirsAllowed`.

We end up with our local configuration in `~/tmp/virtuoso/etc/virtuoso.ini` with the content:

[Parameters]
ServerPort = localhost:8981
DirsAllowed = /var/lib/data
NumberOfBuffers = 10000
MaxDirtyBuffers = 6000
[HTTPServer]
ServerPort = localhost:8982

Run virtuoso!

$ cd ~/tmp/virtuoso/var/lib/virtuoso/
$ ls
$ ~/opt/virtuoso/bin/virtuoso-t +foreground +configfile ~/tmp/virtuoso/etc/virtuoso.ini

Here we start by changing into the `~/tmp/virtuoso/var/lib/virtuoso/` directory which will be where virtuoso will put its state. Now in a different terminal list the files created int the state directory:

$ ls ~/tmp/virtuoso/var/lib/virtuoso
virtuoso.db  virtuoso.lck  virtuoso.log  virtuoso.pxa  virtuoso.tdb  virtuoso.trx

That creates the database file (and other files) with the documented default values, i.e. `virtuoso.*`.

We cannot quite reproduce the issue locally, since every reboot will have exactly the same value for the files locally.

Checking the state directory for virtuoso on tux04, however:

fredm@tux04:~$ sudo ls -al /export2/guix-containers/genenetwork/var/lib/virtuoso/ | grep '\.db$'
-rw-r--r-- 1  986  980 3787456512 Oct 28 14:16 js1b7qjpimdhfj870kg5b2dml640hryx-virtuoso.db
-rw-r--r-- 1  986  980 4152360960 Oct 28 17:11 rf8v0c6m6kn5yhf00zlrklhp5lmgpr4x-virtuoso.db

We see that there are multiple db files, each created when virtuoso was restarted. There is an extra (possibly) random string prepended to the `virtuoso.db` part. This happens for our service if we do not actually provide the `DatabaseFile` configuration.

Fixes

(made with skribilo)