Updated: 2026-02-16 16:14:04 · Views: 15

R Programming Notes — Vectors (From Basics)

Vectors are the most important data structure in R. A vector is a collection of values of the same type stored together.

Think of a vector like a row of boxes holding data.

---

Section 1: Creating Vectors and Assignment

What is a Vector?

A vector is a list of values such as:
Marks of students
Prices of items
Daily temperatures

---

Creating a Numeric Vector

`r numbers <- c(10, 20, 30, 40)

Memory Tip: c means combine

Creating a Character Vector

names <- c("Kiran", "Ravi", "Asha")

Creating a Logical Vector

status <- c(TRUE, FALSE, TRUE)

Important Rule

All elements in a vector must be the same data type. x <- c(1, "A", TRUE) R converts everything to character automatically.

Section 2: Vectorized Operations

R performs operations on all elements at once.

marks <- c(50, 60, 70)

marks + 5

### output: 55 65 75

This adds 5 to every element.

Multiply All Elements

marks * 2

Output:

100 120 140

Real-world idea: Like applying a formula to a whole Excel column.

Section 3: Basic Functions on Vectors

R provides many built-in functions to quickly understand data inside a vector.

Let’s use this example vector: r

x <- c(4, 7, 1, 9, 2)

Mean:(Average)

mean(x) Meaning: Adds all numbers and divides by count.

Use case: Finding average marks of students.

Median (Middle Value)

median(x) Meaning: Middle number after sorting values.

Use case: Finding the middle income in a group (less affected by extreme values).

Sum

sum(x) Meaning: Adds all elements together.

Use case: Total sales, total marks, total expenses.

Length

length(x) Meaning: Counts number of elements in the vector.

Use case: Number of students, number of observations.

Standard Deviation (sd)

sd(x)

Meaning: Measures how spread out the values are from the mean.

Small sd → values are close to average

Large sd → values are spread out

Use case: Checking consistency of marks or measurements.

Variance (var)

var(x)

Meaning: Square of standard deviation. Also measures spread.

Use case: Used in statistics and machine learning.

Summary

summary(x)

Meaning: Gives a quick overview:

Minimum, 1st Quartile, Median, Mean, 3rd Quartile, Maximum

Use case: Quick report of dataset.

Missing Data and na.rm

Sometimes data contains missing values, represented as NA.

x <- c(4, 7, NA, 9, 2)

If we try:

mean(x)

Output will be:

Because R cannot calculate with missing data.

Removing NA Values

Use na.rm = TRUE to remove missing values while calculating.

code:

---------------------

mean(x, na.rm = TRUE)

sum(x, na.rm = TRUE)

sd(x, na.rm = TRUE)

------------------

Meaning:

na.rm = "remove NA values before calculation"

mean() | Average value

| median() | Middle value |

| sum() | Total of all values |

| length() | Number of elements |

| sd() | Spread of data |

| var() | Variance (spread squared) |

| summary() | Quick statistics report |

| na.rm=TRUE | Ignore missing values |

Section 4: Subsetting Vectors

Subsetting means selecting specific elements from a vector.

This is heavily used in:

Data Science (filtering data)

Machine Learning (selecting features)

Deep Learning (processing inputs)

---

Example Vector

x <- c(10, 20, 30, 40, 50)

Subsetting by Position

Select element using its index.

x[2]

Output: 20

Use case: Accessing a specific observation.

Subsetting Multiple Elements using c()

x[c(1, 3, 5)]

Output: 10 30 50

Meaning: Select positions 1, 3, and 5.

Use case: Selecting specific features from a dataset.

Subsetting a Range using :

x[2:4]

Output: 20 30 40

Meaning: Select all elements from index 2 to 4.

Use case: Selecting a block of rows in data.

Removing Elements using Negative Index

x[-2]

Output:

10 30 40 50

Meaning: Remove element at position 2.

Remove multiple elements:

x[-c(1, 5)]

Use case: Removing unwanted features or data points.

Logical (Boolean) Subsetting

x[x > 25]

Output: 30 40 50

Meaning: Select elements that satisfy a condition.

Use case: Filtering dataset (e.g., age > 18).

Subsetting with Logical Vector

condition <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

x[condition]

Output: 10 30 50

Meaning: TRUE keeps the value, FALSE removes it.

Use case: Masking data in ML preprocessing.

Subsetting by Excluding Condition

x[x <= 30]

Select values less than or equal to 30.

Finding Index Positions using which()

which(x > 25)

Output: 3 4 5

Meaning: Returns positions where condition is TRUE.

Use case: Locating outliers or specific samples.

Subsetting with Names

scores <- c(math = 90, science = 85, english = 88)

scores["science"]

Use case: Accessing labeled features.

| Method | Purpose in ML |

| -------------- | -------------------------- |

| x[index] | Select specific samples |

| x[c(...)] | Select important features |

| x[start:end] | Slice dataset |

| x[-index] | Remove noisy data |

| x[condition] | Filter dataset |

| which() | Find positions of patterns |

Section 5: Booleans (Logical Values)

Booleans represent True or False values.

In R, Boolean values are written as:

TRUE
FALSE

They are used for comparisons and filtering data.

---

Comparison Operators

These operators compare values and return TRUE or FALSE.

|----------|---------|---------|--------|

| == | Equal to | 5 == 5 | TRUE |

| > | Greater than | 7 > 3 | TRUE |

| < | Less than | 2 < 1 | FALSE |

| >= | Greater than or equal to | 5 >= 5 | TRUE |

| <= | Less than or equal to | 4 <= 6 | TRUE |

---

Using Booleans with Vectors

Booleans work element by element in vectors.

Example Vector

`r x <- c(10, 20, 30, 40)

Compare Values x > 25

Output: FALSE FALSE TRUE TRUE

Each element is checked separately.

Subsetting a Vector with a Boolean Condition

This is one of the most important concepts in Data Science.

x[x > 25]

Output: 30 40

Meaning: Keep only values greater than 25.

Another Example

x[x <= 20] Output: 10 20

Boolean with Equality

x[x == 30] Output: 30 Real-World Use Cases | Task | Example |

| ---------------------------- | ------------------------ |

| Filter students who passed | marks[marks >= 40] |

| Select high-income customers | income[income > 50000] |

| Remove invalid values | data[data > 0] |

Important Notes

Boolean operations return a vector of TRUE/FALSE.

TRUE keeps the value, FALSE removes it during subsetting.

This is called logical indexing.

Summary

Booleans help R answer questions like:

Is this value bigger?

Is this value equal?

Should we keep this data?

They are the foundation of filtering data in Machine Learning and Data Science.

← Back to Home