R Programming Notes — Vectors (From Basics)
Vectors are the most important data structure in R. A vector is a collection of values of the same type stored together.
Think of a vector like a row of boxes holding data.
---
Section 1: Creating Vectors and Assignment
What is a Vector?
- A vector is a list of values such as:
- Marks of students
- Prices of items
- Daily temperatures
---
Creating a Numeric Vector
`r
numbers <- c(10, 20, 30, 40)
Memory Tip: c means combine
Creating a Character Vector
names <- c("Kiran", "Ravi", "Asha")
Creating a Logical Vector
status <- c(TRUE, FALSE, TRUE)
Important Rule
All elements in a vector must be the same data type. x <- c(1, "A", TRUE) R converts everything to character automatically.
Section 2: Vectorized Operations
R performs operations on all elements at once.
marks <- c(50, 60, 70)
marks + 5
### output: 55 65 75
This adds 5 to every element.
Multiply All Elements
marks * 2
Output:
100 120 140
Real-world idea: Like applying a formula to a whole Excel column.
Section 3: Basic Functions on Vectors
R provides many built-in functions to quickly understand data inside a vector.
Let’s use this example vector: r
x <- c(4, 7, 1, 9, 2)
Mean:(Average)
mean(x) Meaning: Adds all numbers and divides by count.
Use case: Finding average marks of students.
Median (Middle Value)
median(x) Meaning: Middle number after sorting values.
Use case: Finding the middle income in a group (less affected by extreme values).
Sum
sum(x) Meaning: Adds all elements together.
Use case: Total sales, total marks, total expenses.
Length
length(x) Meaning: Counts number of elements in the vector.
Use case: Number of students, number of observations.
Standard Deviation (sd)
sd(x)
Meaning: Measures how spread out the values are from the mean.
Small sd → values are close to average
Large sd → values are spread out
Use case: Checking consistency of marks or measurements.
Variance (var)
var(x)
Meaning: Square of standard deviation. Also measures spread.
Use case: Used in statistics and machine learning.
Summary
summary(x)
Meaning: Gives a quick overview:
Minimum, 1st Quartile, Median, Mean, 3rd Quartile, Maximum
Use case: Quick report of dataset.
Missing Data and na.rm
Sometimes data contains missing values, represented as NA.
x <- c(4, 7, NA, 9, 2)
If we try:
mean(x)
Output will be:
NA
Because R cannot calculate with missing data.
Removing NA Values
Use na.rm = TRUE to remove missing values while calculating.
code:
---------------------
mean(x, na.rm = TRUE)
sum(x, na.rm = TRUE)
sd(x, na.rm = TRUE)
------------------
Meaning:
na.rm = "remove NA values before calculation"
mean() | Average value
| median() | Middle value |
| sum() | Total of all values |
| length() | Number of elements |
| sd() | Spread of data |
| var() | Variance (spread squared) |
| summary() | Quick statistics report |
| na.rm=TRUE | Ignore missing values |
Section 4: Subsetting Vectors
Subsetting means selecting specific elements from a vector.
This is heavily used in:
- Data Science (filtering data)
- Machine Learning (selecting features)
- Deep Learning (processing inputs)
---
Example Vector
`r
x <- c(10, 20, 30, 40, 50)
Subsetting by Position
Select element using its index.
x[2]
Output: 20
Use case: Accessing a specific observation.
Subsetting Multiple Elements using c()
x[c(1, 3, 5)]
Output: 10 30 50
Meaning: Select positions 1, 3, and 5.
Use case: Selecting specific features from a dataset.
Subsetting a Range using :
x[2:4]
Output: 20 30 40
Meaning: Select all elements from index 2 to 4.
Use case: Selecting a block of rows in data.
Removing Elements using Negative Index
x[-2]
Output:
10 30 40 50
Meaning: Remove element at position 2.
Remove multiple elements:
x[-c(1, 5)]
Use case: Removing unwanted features or data points.
Logical (Boolean) Subsetting
x[x > 25]
Output: 30 40 50
Meaning: Select elements that satisfy a condition.
Use case: Filtering dataset (e.g., age > 18).
Subsetting with Logical Vector
condition <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
x[condition]
Output: 10 30 50
Meaning: TRUE keeps the value, FALSE removes it.
Use case: Masking data in ML preprocessing.
Subsetting by Excluding Condition
x[x <= 30]
Select values less than or equal to 30.
Finding Index Positions using which()
which(x > 25)
Output: 3 4 5
Meaning: Returns positions where condition is TRUE.
Use case: Locating outliers or specific samples.
Subsetting with Names
scores <- c(math = 90, science = 85, english = 88)
scores["science"]
Use case: Accessing labeled features.
| Method | Purpose in ML |
| -------------- | -------------------------- |
| x[index] | Select specific samples |
| x[c(...)] | Select important features |
| x[start:end] | Slice dataset |
| x[-index] | Remove noisy data |
| x[condition] | Filter dataset |
| which() | Find positions of patterns |
Section 5: Booleans (Logical Values)
Booleans represent True or False values.
In R, Boolean values are written as:
- TRUE
- FALSE
They are used for comparisons and filtering data.
---
Comparison Operators
These operators compare values and return TRUE or FALSE.
| Operator | Meaning | Example | Result |
|----------|---------|---------|--------|
| == | Equal to | 5 == 5 | TRUE |
| > | Greater than | 7 > 3 | TRUE |
| < | Less than | 2 < 1 | FALSE |
| >= | Greater than or equal to | 5 >= 5 | TRUE |
| <= | Less than or equal to | 4 <= 6 | TRUE |
---
Using Booleans with Vectors
Booleans work element by element in vectors.
Example Vector
`r
x <- c(10, 20, 30, 40)
Compare Values x > 25
Output: FALSE FALSE TRUE TRUE
Each element is checked separately.
Subsetting a Vector with a Boolean Condition
This is one of the most important concepts in Data Science.
x[x > 25]
Output: 30 40
Meaning: Keep only values greater than 25.
Another Example
x[x <= 20] Output: 10 20
Boolean with Equality
x[x == 30] Output: 30 Real-World Use Cases | Task | Example |
| ---------------------------- | ------------------------ |
| Filter students who passed | marks[marks >= 40] |
| Select high-income customers | income[income > 50000] |
| Remove invalid values | data[data > 0] |
Important Notes
Boolean operations return a vector of TRUE/FALSE.
TRUE keeps the value, FALSE removes it during subsetting.
This is called logical indexing.
Summary
Booleans help R answer questions like:
Is this value bigger?
Is this value equal?
Should we keep this data?
They are the foundation of filtering data in Machine Learning and Data Science.