summary refs log tree commit diff stats
path: root/src/content/blog/distributed-hooks.md
diff options
context:
space:
mode:
authorBenedikt Peetz <benedikt.peetz@b-peetz.de>2025-04-25 12:09:21 +0200
committerBenedikt Peetz <benedikt.peetz@b-peetz.de>2025-04-25 12:09:21 +0200
commit6acf4ab874c58ee14f35da671029e56972745ce6 (patch)
treebc6dfe4f3661332e8c1fc6ff4ca657185db488e4 /src/content/blog/distributed-hooks.md
parentfix(flake): Ensure that the `dead-trees` directory exists (diff)
downloadb-peetz.de-6acf4ab874c58ee14f35da671029e56972745ce6.zip
feat(treewide): Migrate to zola
Diffstat (limited to 'src/content/blog/distributed-hooks.md')
-rw-r--r--src/content/blog/distributed-hooks.md105
1 files changed, 105 insertions, 0 deletions
diff --git a/src/content/blog/distributed-hooks.md b/src/content/blog/distributed-hooks.md
new file mode 100644
index 0000000..b32c743
--- /dev/null
+++ b/src/content/blog/distributed-hooks.md
@@ -0,0 +1,105 @@
++++
+title = "An collection of my toughs regarding hooks in a fully distributed system"
+date = 2025-04-25
++++
+<!-- LTeX: language=en-GB -->
+
+## Hooks in a distributed system
+
+We assume that our distributed system (system for short), contains in
+total one task set. This task set is synced via multiple replicas and,
+most importantly, not one replica owns it. As such a replica needs to
+synchronize with every other replica to be able to claim, that they own
+the full task set.
+
+We just assume, that each of the replicas can perfectly synchronize
+itself.
+
+What if a user wanted to run a hook on ever new input to this task set
+(i.e., on every new task)?
+
+Where would the hook execution happen?
+
+
+### 1. The naive way (e.g. Taskwarrior or git)
+
+You could simply run the hook in the client that adds/modifies the task to the task set.
+At first, this approach might look very promising:
+It is easy, gives the user direct feedback about the hook return status (and thus allows to use the hook as a filter) and most importantly it is completely transparent for the user.
+
+The problem with this approach is unfortunately not fixable.
+Take for example following setup:
+
+```
+ | Desktop |---------------------------- | Smartphone |
+      |                                         |
+      +--------------| Laptop |-----------------+
+```
+
+And assume, that all of them have access to a client, that can add task to the task set.
+Now assume, that I have a hook that connects to a server and starts a time there.
+
+If I were to start a task on my desktop, the hook would fire and tell the server to start time tracking.
+If I later stop the task on my laptop, the hook would fire again and tell the server to stop tracking time.
+This works flawlessly, as the server was already tracking time and as such can stop doing so.
+
+Let's imagine a different approach:
+I start the task on my smartphone, which has a client that is not able to run hooks directly, as my smartphone lacks a full Linux system.
+Now, trying to stop the task on my laptop, raises an error with the server, as the task was never started.
+
+In this case, we would need a way to track that the smartphone has not yet started the time on the server.
+And that this should happen once the replica on the laptop got a hold of the task (i.e., it synchronized itself with the replica on the smartphone).
+
+### 2. Centralized approach (e.g. Git on servers)
+
+The “easy” way out is simply promoting on of the replicas to be our point of centralization.
+This replica could then run the hook for every new input it receives via synchronization.
+
+After a full synchronization with every other replica out there, we know that the hook was run exactly once for each task.
+
+The problem with this approach is quite apparent:
+We need to promote one of the replicas.
+This means that the hook can only be run, _after_ this central replica synchronized itself.
+As such, filter hooks, that prevent certain tasks to be inserted into the whole task set are only run after the fact.
+This also makes it necessary that the other replicas wait for this central replica to advance before they advance themselves.
+
+This approach is quite similar to git's branches.
+Our central replica would be the main branch, and all the other replicas would than rebase themselves regularly on the main branch.
+
+### 3. Distributed Tracking
+
+Having now explained why both running the task directly on the client, and running it in a centralized replica has downsides, I would like to point out my third idea.
+
+What if we combine these approaches?
+
+A client marks a task at replica addition time, with the hooks that it has already executed (in our example above the server timer start/stop hook).
+If a replica synchronizes itself and receives a task, which has not yet recorded hook execution, it will execute them instead of the original client.
+
+Have you noticed the problem in this approach?
+
+Exactly! Hooks now need to be idempotent and can possibly be executed at an arbitrary time _after_ the original task was added:
+
+My smartphone client, marks the new task as having not run any hooks, as such _both_ my desktop and my laptop will run the server time tracking hook.
+This will than fail on the later run, as the server cannot start time tracking for an already started task.
+Additionally, the second hook run could also happen _after_ the task was already stopped (marking it started again)!
+
+As such we exchanged having no hook execution, for one prone to race conditions.
+
+
+### 4. Hook execution on every client
+
+Having seen, that working around client hook execution does not really work (cf. approach 2 or 3), we could also go the other way and give all clients the possibility to execute hooks.
+
+This would require two things:
+1. Hooks need to be somehow synchronized between clients (you cannot expect someone, to manually sync a hook script with a mobile client)
+2. Hooks can no longer be undefined executable blobs. They need to be constricted, to a subset of executables (e.g., to Lua/python/web assembly).
+
+With these two requirements in place, a client could ship a Lua/python/web assembly runtime and thus guarantee that it can execute all possible hooks.
+
+There is a big problem with this approach.
+It breaks probably most of the current hooks, because they are either written in a not-included language like POSIX shell (which cannot be included, because it probably hard-depends on binary dependencies (GNU `coreutils` as the most prevalent) which cannot be introspected from the outside) or are written in one of the included languages, but depend on external dependencies (many python hooks, for example, try to execute `task` to perform further task operations).
+
+In general, this approach would probably require a sandbox of some sort for hooks, so that hook authors know that their hook will also work on other platforms.
+If we limit hooks to a subset of possible options, we should also enforce it on the platforms with more possibilities, so that hook authors can be confident that their hook actually works everywhere.
+
+