Updated: 2025-09-10 07:03:27 Β· Views: 68

πŸ“˜ Statistics Basics

1. Measurement of Central Tendency

  • Mean (xΜ„) = (Sum of all values) Γ· (Number of values)
  • Median = Middle value when data is sorted
  • Mode = Value that occurs most often

---

2. Measurements of Dispersion

Variance (Οƒ^2 or s^2)

Population: Οƒ^2 = ( Ξ£(xi - ΞΌ)^2 ) Γ· N Sample: s^2 = ( Ξ£(xi - xΜ„)^2 ) Γ· (n - 1)

Standard Deviation (Οƒ or s)

Population: Οƒ = √( Ξ£(xi - ΞΌ)^2 Γ· N ) Sample: s = √( Ξ£(xi - xΜ„)^2 Γ· (n - 1) )

πŸ‘‰ Meaning: Tells how far values spread around the mean

---

3. Quartiles & Interquartile Range (IQR)

  • Q1 (Lower Quartile) = 25% position
  • Q2 (Median) = 50% position
  • Q3 (Upper Quartile) = 75% position

IQR = Q3 - Q1 πŸ‘‰ Spread of middle 50% data (ignores extreme values). πŸ‘‰ More stable than range, less affected by outliers.

---

4. Outliers using IQR

Outlier Rule:

  • Lower Bound = Q1 - (1.5 Γ— IQR)
  • Upper Bound = Q3 + (1.5 Γ— IQR)

πŸ‘‰ Any value outside this range = Outlier (too far from the bulk of data).

---

5. Min, Max, Range

  • Minimum (min) = Smallest value
  • Maximum (max) = Largest value
  • Range = max - min πŸ‘‰ Quick measure, but sensitive to outliers.

--- # Part 2 ---

πŸ“˜ Visualizing Data Notes

1. Scatter Plots

  • Show relationship between 2 variables (x,y).
  • Each point = one observation. πŸ‘‰ Useful for finding patterns, trends, outliers, correlation.

---

2. Line Plots

  • Points connected with lines (usually time on x-axis). πŸ‘‰ Great for trends over time (stock price, temperature, etc.).

---

3. Distribution Plots – Histograms

  • Show how data values are spread across intervals (bins).
  • x-axis = value ranges, y-axis = frequency/count. πŸ‘‰ Helps see skew, shape, spread.

---

4. Categorical Plots – Bar Plots

  • Categories on x-axis, bar height = value/frequency. πŸ‘‰ Used for comparing groups or categories.

---

5. Categorical/Distribution Plots – Box & Whisker Plots

  • Show median, quartiles, IQR, outliers.
  • Box = Q1 to Q3, line = median, whiskers = min/max (without outliers). πŸ‘‰ Best for comparing distributions between groups.

---

6. Other Plot Types

  • Violin Plot β†’ combo of boxplot + density curve (shows distribution shape).
  • KDE Plot (Kernel Density Estimation) β†’ smooth curve showing probability density. πŸ‘‰ Both are for understanding distribution shapes better than plain histograms.

---

7. Common Plot Pitfalls

  • Wrong scale (zooming or cutting axes can mislead).
  • Too many categories β†’ bar/line chart becomes messy.
  • Cherry-picking β†’ showing only part of the data.
  • Overplotting β†’ too many points on scatter, hides patterns.

---

❓ Why variance denominator is n-1 (not n)

This is the part that confuses many people, so let’s break it super simple.

Step 1: Population vs Sample

  • Population variance β†’ divide by N (you have all data).
  • Sample variance β†’ divide by n-1 (you have only part of data).

---

Step 2: The problem with just dividing by n

When you use a sample, you already used the sample mean (xΜ„) to calculate deviations. This mean is closer to your sample data than the real population mean (ΞΌ).

πŸ‘‰ Result: Variance calculated with n underestimates the true spread. It looks smaller than reality.

---

Step 3: Fixing the bias

To correct this "shrinkage", statisticians use n-1 instead of n. This makes the variance a little bigger β†’ more fair estimate of the true population variance.

---

Step 4: Easy way to remember

  • Divide by N if you have the whole population.
  • Divide by n-1 if you only have a sample. πŸ‘‰ That β€œ-1” is called degrees of freedom = one piece of info is lost when you use the sample mean.

---

βœ… In short:

  • Population variance: Γ·N
  • Sample variance: Γ·(n-1) β†’ avoids underestimating true spread.
← Back to Home