Database locked error preventing startup [time sensitive]

Hi,

I am in the process of deploying Kolibri on a RACHEL PI at a school in Iquitos, Peru. We have a technical person on site for another week so I would appreciate any help you can provide in addressing the issue below as quickly as possible. Thank you.

The RACHEL PI is running on a RPi 3B+. Kolibri 0.12.5 was installed on the Pi using apt. Kolibri works about half the time. The other half the time Kolibri starts, but fails to start accepting connections. That is, the server process is running but any attempt to connect to the server using http will fail. When Kolibri fails to start, there is a python exception in daemon.log that indicates that an operation failed because the database was locked. The daemon.txt file is attached below for your reference. There does not seem to be any reliable way to work around this error other than rebooting and hoping for the best. I have had to reboot the server as many as 4 or 5 times to get it working.

The startup for Kolibri is quite taxing on the RPi 3B+. When it works, it takes 3 minutes for the vacuum to complete and 7 minutes before the server is up and running. During the 7 minutes it takes to start, the RPi is very slow and seems unresponsive at times. I suspect this may be related to the fact that we have large Kolibri installation with 28 modules loaded including both the English and Spanish versions of Khan Academy.

I have started to read through the code a bit but I am at a disadvantage as I am not familiar with Django or sqlite. My guess as to the problem is that the database Vacuum thread is running in parallel with the update_channel_metadata thread. Normally the vacuum thread completes before the update_channel_metadata thread needs to update the database. Other times, the vacuum thread is still running when update_channel_metadata needs to write to the database and results in a database locked condition. Note, from my understanding, sqlite is not designed to allow updates in multiple threads. In the attached log file, notice that the database locked message appears before the vacuum has completed.

I would very much appreciate if you could provide a code patch or a workaround that I could deploy next week to improve the reliability of the startup process. Thank you so much for your help.

  • Rich

PS: I would recommend adding timestamps to the log messages in daemon.txt.

daemon.txt
---------------
INFO     Attempting a ping.
INFO     Started new JOBCHECKER thread ID#60ff8aefc0844637948a04725a8ce2db
INFO     Updating num_coach_content on existing channels
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/kolibri", line 9, in <module>
    load_entry_point('kolibri==0.12.5', 'console_scripts', 'kolibri')()
  File "/usr/lib/python3/dist-packages/kolibri/utils/cli.py", line 716, in main
    start(port, daemon=daemon)
  File "/usr/lib/python3/dist-packages/kolibri/utils/cli.py", line 338, in start
    server.start(port=port, run_cherrypy=run_cherrypy)
  File "/usr/lib/python3/dist-packages/kolibri/utils/server.py", line 113, in start
    update_channel_metadata()
  File "/usr/lib/python3/dist-packages/kolibri/core/content/utils/annotation.py", line 73, in update_channel_metadata
    update_num_coach_contents()
  File "/usr/lib/python3/dist-packages/kolibri/core/content/utils/annotation.py", line 167, in update_num_coach_contents
    .values(num_coach_contents=cast(ContentNodeTable.c.coach_content, Integer()))
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: 'UPDATE content_contentnode SET num_coach_contents=CAST(content_contentnode.coach_content AS INTEGER) WHERE content_contentnode.kind != ?'] [parameters: ('topic',)] (Background on this error at: http://sqlalche.me/e/e3q8)
INFO     Sqlite database Vacuum finished.
INFO     Next Vacuum at 3AM local server time (in 964.65 minutes).
INFO     Ping succeeded! (response: {'messages': [], 'id': 297913})
INFO     Sleeping for 1440.0 minutes.

Hi Rich,

Sorry to hear you are experiencing difficulties.

I fear you may be right about the content metadata process and the vacuum racing each other, this is not ideal. I am planning to prevent the content DB step from happening in the future, and require a manual scan when content is copied in, which would prevent this happening at startup repeatedly.

One possible workaround for now is to try to perform the vacuum step outside of the startup cycle. You can do this like this:

  • Run kolibri manage vacuumsqlite
  • Wait until it gives the message Next Vacuum at 3AM local server time
  • Then press ctrl-c to break the loop

The vacuum should now have been performed, so the next time Kolibri starts up, the vacuum step should be almost instantaneous.

Let me know if this helps at all, if not, I’ll see if I can come up with some other advice to help.

Kind Regards,
Richard

Thank you for your quick response.

The RACHEL Pi’s are going to be used by teachers with little computer experience. I cannot ask them to run anything manually. The school also does not have internet so that means they will not be updating the content themselves. Do you think it would be safe to comment out line 113 in kolibri/utils/server.py?

server.py:113:  # update_channel_metadata()

If that works, then great. It may also speedup the startup process which is a plus. If not, does it make sense to put the content update into a protected block?

server.py:112: with vacuum_db_lock:
server.py:113:    update_channel_metadata()

– Rich

If you can manually edit the source, commenting out update_channel_metadata should be fine. Also, this is only an issue if content is copied in outside of the Kolibri interface (either manually, or using the RACHEL module system).

That seems like the quickest fix for you, so I would recommend this.

Perfect. I will generate a patch file and have my onsite tech install it. Thanks!!

Also, I see that the startup code is running the vacuum twice. It is run directly (outside the vacuum thread) on kolibri/utils/cli.py:667 when “kolibri start” is first called. Later, kolibri/utils/server.py:80 starts the vacuum thread. The first thing the vacuum thread does is run the vacuum a second time. Is it safe to comment out line 667 of kolibri/utils/cli.py so that vacuum is only performed once?

cli.py:666:        # Defragment the db
cli.py:667:        # call_command("vacuumsqlite")

The vacuum thread takes two minutes to complete on the RPi so the time savings can be meaningful.

– Rich

Yeah, that should also be fine! Thanks for the patience here, hopefully we can do a spring clean on these start up tasks.