Add special keywords for atomic updates of state dict values
The new hook-based design for implementing recsystems (!6 (merged)) gives contestants a simple scratch space called the "state" where they can store all runtime state information for their recsystem. It is a very simple data store that is passed around the recsystem each time hooks are called.
(Extensive use of this state store is optional, and can superseded by more sophisticated contestants by their own storage backend (e.g. a database)).
Each hook function can return a "state update" which specifies how the state should be updated at the end of the hook function. The new state will be passed to future hook function calls.
Currently the state update is very simple: It is just a dict that is merged into the previous version of the state. It does not provide any guarantees of atomicity or consistency. For example, two hook functions can be called one after the other and provided the same value of the state, but the final version of the state will depend on which hook function completes last. Some hooks may clobber each other's state updates. This is by design; it is an intentionally poor design in order to keep things simple (it is not intended to be a robust storage solution). It also does not use any form of locking so that hook functions can run concurrently without blocking each other.
However, there are some likely common cases where this design is perhaps too flawed. The most common case I can think of is for values that are accumulated.
For example, say a recsystem (like keywords counts the number of times a user has clicked on an article containing some keyword:
{
"user_keywords": {
"some_user_id": {
"keyword1": 2
"keyword3": 4
}
}
}
Now say that user quickly clicks on two articles one after the other. Each article_interaction
hook receives the above state. If each article contains "keyword3" each hook produces a state update like:
{
"user_keywords": {
"some_user_id": {
"keyword3": 5
}
}
}
This is obviously insufficient since the count for "keyword3" should be 6.
This can be mitigated by providing a small library of MongoDB or Redis-like update commands as "values" in the state update. I don't want to go wild here and completely implement a new database; again contestants with more complicated needs can use their own database. But it would be easy to support a few things.
For starters I would suggest a few commands with Python-inspired naming. All commands start with a "$" sign.
Each command is provided as the value associated with a given keyword, but instead of setting the value of that keyword to the command, it executes the command on the previous value of the keyword. Commands are specified as a dict containing a single key/value pair: The name of the command and the argument to the command (currently only one for each of the commands I will propose).
For example:
{
"user_keywords": {
"some_user_id": {
"keyword3": {"$int.add": 1}
}
}
}
Means that user_keywords.some_user_id.keyword3
is incremented by 1. This allows multiple hook function calls to increment or decrement that value without losing information.
A complete list of commands I propose to start with are:
-
$int.add
: increment an integer (or float) by some integer value; the value must be an integer. If the value does not already exist it defaults to zero. -
$int.sub
: same as$int.add
but for subtraction;$int.add
with a negative integer is equivalent to$int.sub
with a positive integer -
$float.add
: increment an integer (or float) by some floating point value, returning a new float -
$float.sub
: equivalent of$int.sub
for floats -
$list.append
: append a value to the end of a list (if the list does not exist it is created) -
$list.remove
: same semantics as Python'slist.remove
method -
$list.pop
: same semantics as Python'slist.pop
method -
$set.add
: add a value to a set (if the set does not exist it is created); as a special case if it is called on a list, a uniqueness check is performed and the value is appended if it is unique at the time (but sets should preferentially be used if this is the case) -
$set.remove
: same semantics as Python'sset.remove
on a set; special case if it is called on a list, all values in the list matching the removed value are removed (the list is treated as though it's a set)
All of these are trivial to implement and should cover a large range of cases for basic recsystems.
Some other notes:
-
A command can fail for a number of reasons:
- It is called on an existing value of the wrong type (e.g.
$int.add
on alist
) - The necessary data is missing (e.g.
$list.remove
for a value that is not found in the list in the first place)
- It is called on an existing value of the wrong type (e.g.
-
If single command in a state update fails, the rest of the state update still succeeds. For example, you can successfully increment an integer value while failing to remove a non-existent value from a list. However, all update command failures are logged as warnings.
Any comments, critiques are very welcome.