Summaries/Cloud/3 Recommendation Systems.md

2.1 KiB

title updated created latitude longitude altitude
3 Recommendation Systems 2021-09-07 18:33:42Z 2021-09-06 16:26:21Z 52.09370000 6.72510000 0.0000

Cloud SQL and Cloud Dataproc

  • Cloud SQL: managed relational database
  • Cloud Dataproc: managed environment on which you can run Apache Spark

Why moving from on-premisis to cloud

  • utilizing and tuning on-premise clusters is dfficult
  • but also moving dedicated storage to off-cluster storage

A core aspect of a recommendation system is that you need to train and serve it at scale.

what is managed?

................

Recommendation Systems

The core pieces are:

  • data
  • the model
  • infrastructure to train and serve recommendations to users.

A core tenet of machine learning is to let the model learn for itself what the relationship is between the data that you have, like user preferences (labeled data), and the data that you don't have. A history of good labeled data is important.

Machine learning scales much better because it doesn't require hard-coded rules. It's all automated. Learning from data in an automated way, that's what machine learning is.

Machine learning recommentation model is essentially asking "Who is this user like?" Secondly, is this subjectively a iyrm that people tend to rate highly? The predicted rating is a combination of both these factors. All things considered, the rating of an item for a particular user will be the average of the ratings of users like this user but it's calibrated with the quality of the item itself. Updating the data can be in batch, because ratings of the items doesn't change on eq a daily basis. On the otherhand there is a lot of data that has to be updated in a fault tolerant way that can scale to large datasets ==> Apache Hadoop. When the user logs on, we want to show that user the recommendations that we precomputed specifically for them. So we need a transactional way (so that while the user is reading these predictions, we can update the predictions table as well) to store the predictions. eg 1 million user with each 5 predictions = 5 miljion rows. A MySQL database is sufficient.