39 lines
2.1 KiB
Markdown
39 lines
2.1 KiB
Markdown
|
---
|
||
|
title: 3 Recommendation Systems
|
||
|
updated: 2021-09-07 18:33:42Z
|
||
|
created: 2021-09-06 16:26:21Z
|
||
|
latitude: 52.09370000
|
||
|
longitude: 6.72510000
|
||
|
altitude: 0.0000
|
||
|
---
|
||
|
|
||
|
# Cloud SQL and Cloud Dataproc
|
||
|
|
||
|
- Cloud SQL: managed relational database
|
||
|
- Cloud Dataproc: managed environment on which you can run Apache Spark
|
||
|
|
||
|
### Why moving from on-premisis to cloud
|
||
|
- utilizing and tuning on-premise clusters is dfficult
|
||
|
- but also moving dedicated storage to off-cluster storage
|
||
|
|
||
|
A core aspect of a **recommendation system** is that you need to train and serve it at scale.
|
||
|
|
||
|
### what is managed?
|
||
|
................
|
||
|
|
||
|
### Recommendation Systems
|
||
|
The core pieces are:
|
||
|
- data
|
||
|
- the model
|
||
|
- infrastructure
|
||
|
to train and serve recommendations to users.
|
||
|
|
||
|
A core tenet of machine learning is to let the model learn for itself what
|
||
|
the relationship is between the data that you have, like user preferences (labeled data), and the data that you don't have. A history of good labeled data is important.
|
||
|
|
||
|
Machine learning scales much better because it doesn't require hard-coded rules. It's all automated. Learning from data in an automated way, that's what machine learning is.
|
||
|
|
||
|
Machine learning recommentation model is essentially asking "Who is this user like?" Secondly, is this subjectively a iyrm that people tend to rate highly? The predicted rating is a combination of both these factors.
|
||
|
All things considered, the rating of an item for a particular user will be the average of the ratings of users like this user but it's calibrated with the quality of the item itself.
|
||
|
Updating the data can be in **batch**, because ratings of the items doesn't change on eq a daily basis. On the otherhand there is a lot of data that has to be updated in a **fault tolerant way** that can scale to **large datasets** ==> Apache Hadoop.
|
||
|
When the user logs on, we want to show that user the recommendations that we precomputed specifically for them. So we need a transactional way (so that while the user is reading these predictions, we can update the predictions table as well) to store the predictions. eg 1 million user with each 5 predictions = 5 miljion rows. A MySQL database is sufficient.
|