2.7 KiB
2.7 KiB
title | updated | created |
---|---|---|
Statistics | 2022-04-02 15:10:58Z | 2021-05-04 14:58:11Z |
Statistics
Data Types
- Categorical
- Nominal Variables Intrinsic order of the labels: Country of birth (Argentina, England, Germany) Postcode Vehicle make (Citroen, Peugeot, ...)
- Ordinal Variables Can be meaningfully ordered are called ordinal: Student's grade in an exam (A, B, C or Fail) Days of the week (Monday = 1 and Sunday = 7)
- Numerical
- Discrete how many cards in a game? integers
- Continuous height of a room floating point numbers
What are proportions? Is an aggregation of nominal data to provide a numerical figure. eq a percentage of nominal variables.
Mixed Variables
- Observations show either numbers or categories among their values
- Number of credit accounts (1-100, U, T, M) U = unknown, T = unverified, M = unmatched)
- Observations show both numbers and categories in their values
- Cabin (Titanic) (A15, B18, ...)
Distributions
Selecting something on de x-axis in the middle has an higher probability then rarer on to the edges. Bell curve of Normal Distribution
\overline{X} == variance
Sampling ande Estimation
eg some best number of successes divided by the sampling, gives an estimate 10 / 3 = 3,3333
\Theta == estimate with some variance around to make a good guess out of the sample
So given an sample we have am 95% confidence out sample estimate is in this interval. If less sure of this theta, the larger the confidence interval. eq because the n is much smaller.
Hypothesis Testing
- prove anything
- never accept the null hypothesis
P-values
consider a null Hypothesis: Hypothesis Test asses ig our sample is extreme enough to reject the null. The p-value then measure how extreme our sample is.