Making the implementation of a recsystem much simpler
We have to chew up the work (don't know if this expression works in english
Maybe you’ll think it’s too simplistic, it is not in my opinion, it is just an easy way to do recommendations, but later, it can be much smarter (in terms of latency, computation efficiency…) like you proposed by properly use streams and API calls.
The idea is to provide to students (competitors) a unique instance of something that do everything.
Here a python fake script that do recommendations:
from renewal.recsystools import RecsysTools
rectools = RecsysTools(token="XXX") # Init of the RecsysTools instance
while True:
users = rectools.wait_for_not_served_users(min_pending_reclists=1) # We wait buckets consumption from assigned users
for user in users:
candidates = rectools.get_candidates(user) # We get data using the API behind
history = rectools.get_history(user)
reclist = random.sample(candidates, 30) # Competiror algorithm (random)
rectools.put_reclist(user, reclist)
Init of the RecsysTools instance
Here we do everything to init streams and API connection, plus loading assigned users and caching candidates news and so on. These data are updated using streaming queues.
We wait buckets consumption from assigned users
Behind wait_for_not_served_users
there is a reclist queue for each assigned user.
Actually I specified that this queue must be on the server side (it was great debate in last december) so actually it can only be on the client side and we'll see later.
So, for example, if the min_pending_reclists
parameter is set to 2, it means when adding a new reclist, queues will be of length 2.
Here min_pending_reclists
is set to 1, so when a user don't have anymore buckets, we add a reclists directly.
The more we push reclists is queues, the more the competitor recsys will answer rapidly to rec requests...
To better understand this mecanism, let's take the case a user refresh 3 times his home page: the consequence will be that if we set min_pending_reclists
to 1, the competor recsys won't be able to answer all these requests because there are nt enough reclists in the local queue...
We get data using the API behind
It will be easy to get data with a cache mecanism behind...
Conclusion
Actually there are a lot of things to "challenge" in this proposition, and I have lots of things to talk about on what are the mechanism behind (in terms of caching...), so this issue will allow us to discuss, but don't hesitate to talk on discord / bbb with me.
As a note: I don't have as much knowledge as you do about websockets and async.io, so you will tell me if it's possible to do that... I think yes but don't know if it's easy.
Anyway, this point is crucial for the CentraleSupelec competition.