Summaries/Cloud/3 Recommendation Systems.md

39 lines
2.1 KiB
Markdown
Raw Permalink Normal View History

2022-08-09 21:04:44 +02:00
---
title: 3 Recommendation Systems
updated: 2021-09-07 18:33:42Z
created: 2021-09-06 16:26:21Z
latitude: 52.09370000
longitude: 6.72510000
altitude: 0.0000
---
# Cloud SQL and Cloud Dataproc
- Cloud SQL: managed relational database
- Cloud Dataproc: managed environment on which you can run Apache Spark
### Why moving from on-premisis to cloud
- utilizing and tuning on-premise clusters is dfficult
- but also moving dedicated storage to off-cluster storage
A core aspect of a **recommendation system** is that you need to train and serve it at scale.
### what is managed?
................
### Recommendation Systems
The core pieces are:
- data
- the model
- infrastructure
to train and serve recommendations to users.
A core tenet of machine learning is to let the model learn for itself what
the relationship is between the data that you have, like user preferences (labeled data), and the data that you don't have. A history of good labeled data is important.
Machine learning scales much better because it doesn't require hard-coded rules. It's all automated. Learning from data in an automated way, that's what machine learning is.
Machine learning recommentation model is essentially asking "Who is this user like?" Secondly, is this subjectively a iyrm that people tend to rate highly? The predicted rating is a combination of both these factors.
All things considered, the rating of an item for a particular user will be the average of the ratings of users like this user but it's calibrated with the quality of the item itself.
Updating the data can be in **batch**, because ratings of the items doesn't change on eq a daily basis. On the otherhand there is a lot of data that has to be updated in a **fault tolerant way** that can scale to **large datasets** ==> Apache Hadoop.
When the user logs on, we want to show that user the recommendations that we precomputed specifically for them. So we need a transactional way (so that while the user is reading these predictions, we can update the predictions table as well) to store the predictions. eg 1 million user with each 5 predictions = 5 miljion rows. A MySQL database is sufficient.