Bug in handling article interactions
When a recsystem receives an article interaction event for a user's interaction with an article (clicked, recommended, etc.) it tries to make sure that that article is cached in its local collection of articles (e.g. so recsystems can do processing on the interaction with that article without having to manually download it).
Unfortunately this seems to fail sometimes, and you'll find tracebacks in the log like:
group3_1 | 2021-10-26 20:34:21 b9554570c37e group3.py[6] WARNING article 953131 not in my collection for some reason; fetching it from the backend
group3_1 | 2021-10-26 20:34:22 b9554570c37e group3.py[6] INFO successfully fetched article 953131 and added it to my collection
group3_1 | 2021-10-26 20:34:22 b9554570c37e root[6] ERROR 953131
group3_1 | File "/usr/src/app/renewal_recsystem/articles.py", line 156, in __getitem__
group3_1 | return self.articles[article_id]
group3_1 | KeyError: 953131
group3_1 |
group3_1 | During handling of the above exception, another exception occurred:
group3_1 |
group3_1 | Traceback (most recent call last):
group3_1 | File "/usr/local/lib/python3.7/site-packages/jsonrpcserver/dispatcher.py", line 123, in handle_exceptions
group3_1 | yield handler
group3_1 | File "/usr/local/lib/python3.7/site-packages/jsonrpcserver/async_dispatcher.py", line 40, in safe_call
group3_1 | lookup(methods, request.method), *request.args, **request.kwargs
group3_1 | File "/usr/local/lib/python3.7/site-packages/jsonrpcserver/async_dispatcher.py", line 34, in call
group3_1 | return await validate_args(method, *args, **kwargs)(*args, **kwargs)
group3_1 | File "/usr/src/app/renewal_recsystem/basic.py", line 201, in article_interaction
group3_1 | metrics = self.articles[article_id].setdefault('metrics',
group3_1 | File "/usr/src/app/renewal_recsystem/articles.py", line 158, in __getitem__
group3_1 | raise IndexError(article_id)
group3_1 | IndexError: 953131
group3_1 | 2021-10-26 20:34:22 b9554570c37e group3.py[6] INFO article_interaction(interaction={'user_id': 'bzQ4o2uudoYcs5Zs6cX1KSHoOkD2', 'article_id': 953131, 'timestamp': '2021-10-26T20:34:16.756951', 'recommended': True})
The logs show that the article was successfully fetched and added to the collection. But then it gets an IndexError
on that article anyways when it tries to move forward with the processing.
I believe this a problem with how ArticleCollection.push
is implemented. ArticleCollection
is a size-limited cache of articles, and by default it drops articles with the lowest article_id.
I think making the cache invalidation based on article_id
is a mistake, because there's nothing stopping a recsystem from recommending slightly older articles, or from users interacting with older articles still displayed in their app. Really there should be nothing special assumed about article_id in terms of deciding whether or not to keep an article cached.
Instead it should simply be an LRU cache. I'm not sure why I didn't write it that way in the first place, though it may be an artifact of older ideas for how article recommendation should work...