--- title: Statistics updated: 2022-04-02 15:10:58Z created: 2021-05-04 14:58:11Z --- # Statistics ## Data Types - **Categorical** - **Nominal Variables** Intrinsic order of the labels: Country of birth (Argentina, England, Germany) Postcode Vehicle make (Citroen, Peugeot, ...) - **Ordinal Variables** Can be meaningfully ordered are called ordinal: Student's grade in an exam (A, B, C or Fail) Days of the week (Monday = 1 and Sunday = 7) - **Numerical** - **Discrete** how many cards in a game? integers - **Continuous** height of a room floating point numbers What are proportions? Is an aggregation of nominal data to provide a numerical figure. eq a percentage of nominal variables. ## Mixed Variables - Observations show either numbers or categories among their values - Number of credit accounts (1-100, U, T, M) U = unknown, T = unverified, M = unmatched) - Observations show both numbers and categories in their values - Cabin (Titanic) (A15, B18, ...) ## Distributions ![48751b057b60e03ec51f64e3235fa1b3.png](../../_resources/48751b057b60e03ec51f64e3235fa1b3.png) Selecting something on de x-axis in the middle has an higher probability then rarer on to the edges. Bell curve of Normal Distribution ![be8b17237548f72ecd8013f80df036dc.png](../../_resources/be8b17237548f72ecd8013f80df036dc.png) Bi-mode distribution ![b7f8b2f785a9637ea5a22abe2877bca5.png](../../_resources/b7f8b2f785a9637ea5a22abe2877bca5.png) Skewed distribution Sample Distribution ![34262aff59c5f5dd9a413b2b3d74629a.png](../../_resources/34262aff59c5f5dd9a413b2b3d74629a.png) $$ \overline{X} == variance $$ ## Sampling ande Estimation eg some best number of successes divided by the sampling, gives an estimate 10 / 3 = 3,3333 $$ \Theta == estimate with some variance around to make a good guess out of the sample $$ ![846c953521751f708bd680556dc9ae0b.png](../../_resources/846c953521751f708bd680556dc9ae0b.png) So given an sample we have am 95% confidence out sample estimate is in this interval. If less sure of this theta, the larger the confidence interval. eq because the n is much smaller. ![d575f021de579d10e3855c763198e7bc.png](../../_resources/d575f021de579d10e3855c763198e7bc.png) ## Hypothesis Testing ![981ea34418a595b422aab0b0df23f4b6.png](../../_resources/981ea34418a595b422aab0b0df23f4b6.png) In Hypothesis Testing never: - prove anything - never accept the null hypothesis ## P-values consider a null Hypothesis: Hypothesis Test asses ig our sample is extreme enough to reject the null. The p-value then measure how extreme our sample is. ![6af1399567c87fcda04a6414efbe18bf.png](../../_resources/6af1399567c87fcda04a6414efbe18bf.png) ## P-hacking ![ace369b638966681b9558c42e25dd0b4.png](../../_resources/ace369b638966681b9558c42e25dd0b4.png)