Hi,
I am in the process of deploying Kolibri on a RACHEL PI at a school in Iquitos, Peru. We have a technical person on site for another week so I would appreciate any help you can provide in addressing the issue below as quickly as possible. Thank you.
The RACHEL PI is running on a RPi 3B+. Kolibri 0.12.5 was installed on the Pi using apt. Kolibri works about half the time. The other half the time Kolibri starts, but fails to start accepting connections. That is, the server process is running but any attempt to connect to the server using http will fail. When Kolibri fails to start, there is a python exception in daemon.log that indicates that an operation failed because the database was locked. The daemon.txt file is attached below for your reference. There does not seem to be any reliable way to work around this error other than rebooting and hoping for the best. I have had to reboot the server as many as 4 or 5 times to get it working.
The startup for Kolibri is quite taxing on the RPi 3B+. When it works, it takes 3 minutes for the vacuum to complete and 7 minutes before the server is up and running. During the 7 minutes it takes to start, the RPi is very slow and seems unresponsive at times. I suspect this may be related to the fact that we have large Kolibri installation with 28 modules loaded including both the English and Spanish versions of Khan Academy.
I have started to read through the code a bit but I am at a disadvantage as I am not familiar with Django or sqlite. My guess as to the problem is that the database Vacuum thread is running in parallel with the update_channel_metadata thread. Normally the vacuum thread completes before the update_channel_metadata thread needs to update the database. Other times, the vacuum thread is still running when update_channel_metadata needs to write to the database and results in a database locked condition. Note, from my understanding, sqlite is not designed to allow updates in multiple threads. In the attached log file, notice that the database locked message appears before the vacuum has completed.
I would very much appreciate if you could provide a code patch or a workaround that I could deploy next week to improve the reliability of the startup process. Thank you so much for your help.
- Rich
PS: I would recommend adding timestamps to the log messages in daemon.txt.
daemon.txt
---------------
INFO     Attempting a ping.
INFO     Started new JOBCHECKER thread ID#60ff8aefc0844637948a04725a8ce2db
INFO     Updating num_coach_content on existing channels
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/bin/kolibri", line 9, in <module>
    load_entry_point('kolibri==0.12.5', 'console_scripts', 'kolibri')()
  File "/usr/lib/python3/dist-packages/kolibri/utils/cli.py", line 716, in main
    start(port, daemon=daemon)
  File "/usr/lib/python3/dist-packages/kolibri/utils/cli.py", line 338, in start
    server.start(port=port, run_cherrypy=run_cherrypy)
  File "/usr/lib/python3/dist-packages/kolibri/utils/server.py", line 113, in start
    update_channel_metadata()
  File "/usr/lib/python3/dist-packages/kolibri/core/content/utils/annotation.py", line 73, in update_channel_metadata
    update_num_coach_contents()
  File "/usr/lib/python3/dist-packages/kolibri/core/content/utils/annotation.py", line 167, in update_num_coach_contents
    .values(num_coach_contents=cast(ContentNodeTable.c.coach_content, Integer()))
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/kolibri/dist/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: 'UPDATE content_contentnode SET num_coach_contents=CAST(content_contentnode.coach_content AS INTEGER) WHERE content_contentnode.kind != ?'] [parameters: ('topic',)] (Background on this error at: http://sqlalche.me/e/e3q8)
INFO     Sqlite database Vacuum finished.
INFO     Next Vacuum at 3AM local server time (in 964.65 minutes).
INFO     Ping succeeded! (response: {'messages': [], 'id': 297913})
INFO     Sleeping for 1440.0 minutes.