Summaries/Cloud/WK 2 Data wharehouse.md

34 lines
1.8 KiB
Markdown
Raw Permalink Normal View History

2022-08-09 21:04:44 +02:00
---
title: WK 2 Data wharehouse
updated: 2021-09-20 11:29:16Z
created: 2021-09-20 09:08:49Z
latitude: 52.09370000
longitude: 6.72510000
altitude: 0.0000
---
![4c3b142e5e73a6ac715600d0331150c2.png](../_resources/4c3b142e5e73a6ac715600d0331150c2.png)
![d00c7365a714367186d44f165573aa33.png](../_resources/d00c7365a714367186d44f165573aa33.png)
# BigQuery
BigQuery organizes data tables into units called datasets
![3393db98a463262eb4c2453a6140b984.png](../_resources/3393db98a463262eb4c2453a6140b984.png)
The project is what the billing is associated with.
To run a query, you need to be logged into the GCP console. You'll run a query in your own GCP project and the query charges are then build to your project.
In order to run a query in a project, you need Cloud IAM permissions to submit a job.
Access control is through Cloud IAM, and is that the data set level and applies to all tables in the dataset. BigQuery provides predefined roles for controlling access to resources. By defining authorized views and row-level permissions to give different users different roles for for the same data.
BigQuery data sets can be regional or multi-regional.
![3a3ffb85892084804f4ad8c23681d1b2.png](../_resources/3a3ffb85892084804f4ad8c23681d1b2.png)
Logs and BigQuery are immutable and are available to be exported to Stackdriver.
# Loading data into BigQuery
EL, ELT, ETL
![9c2b3ab61d6331389b844684b029f52f.png](../_resources/9c2b3ab61d6331389b844684b029f52f.png)
If your data is an Avro format, which is self-describing BigQuery can determine the schema directly, if the data is in JSON or CSV format BigQuery can auto detect the schema, but manual verification is recommended.
**Backfilling data** means adding a missing past data to make a dataset complete with no gaps, and to keep all analytic processes working as expected.