R Programming (NEP & CBCS) Questions with Answers
2 Marks (Important)
1. What is R Programming?
R Programming is a programming language and free software environment primarily used for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques and is highly extensible.
2. List the data types in R.
- Numeric: Represents numeric values (e.g., 1.5, -3.2).
- Integer: Represents integer values (e.g., 1L, -3L).
- Logical: Represents Boolean values (TRUE or FALSE).
- Character: Represents text or string values.
- Complex: Represents complex numbers with real and imaginary parts.
- Raw: Represents raw byte values.
3. What is the use of tapply() and get() functions.
- tapply(): Used to apply a function over subsets of a vector or data frame, split by one or more factors.
Example: Applying mean function to subsets based on a factor tapply(mtcars$mpg, mtcars$cyl, mean) - get(): Used to retrieve the value of an object with a specified name.
Example: Retrieving the value of a variable with the name ‘x’
x <- 5
get(“x”)
4. What is the use of readline() function.
Used to read a line from the console input. It is often used to interactively input data
from the user.
Example: Reading a line of text from the user
user_input <- readline(prompt = “Enter your name: “)
5. Give the example for lowess() function .
Computes a locally weighted scatterplot smoothing (LOWESS) fit.
Example: Generating a LOWESS smooth curve
x <- 1:50
y <- rnorm(50)
smooth_curve <- lowess(x, y)
plot(x, y)
lines(smooth_curve)
6. Define correlation function.
Measures the degree of association between two variables. In R, cor() is commonly used.
Example: Calculating correlation between two vectors
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 1, 3, 5)
correlation <- cor(x, y)
7. What is multiple regression?
A statistical technique that models the relationship between multiple independent variables and a dependent variable.
Example: Multiple regression with lm() function
model <- lm(y ~ x1 + x2, data = mydata)
8. What is interactive mode?
Refers to running R code interactively, where commands are entered one at a time, and the results are immediately displayed.
9. What is batch mode?
Refers to running R code as a script or from a file without user interaction. The entire script is executed without user intervention.
10. Give example for cumulative sums and products.
cumsum() and cumprod() functions are used for cumulative sums and products, respectively.
Example: Cumulative sums and products x <- c(1, 2, 3, 4)
cum_sum <- cumsum(x) #1 3 6 10
cum_prod <- cumprod(x) #1 2 6 24
11. What is the use of point() function.
There is no standard function named point() in base R. If you refer to a specific package or context, please provide more details.
12. Define Poisson distribution.
A probability distribution that describes the number of events occurring within fixed intervals of time or space. In R, it can be simulated using functions like dpois() for probability density, ppois() for cumulative distribution, etc.
13. min() vs pmin() function.
min() returns the minimum value among a set of values.
pmin() returns the element-wise minimum of several vectors.
Example:
min_value <- min(3, 7, 1, 9) #1
pmin_values <- pmin(c(1, 5, 3), c(2, 4, 6)) #1 43
14. Define binomial distribution.
A probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials. In R, it can be simulated using functions like dbinom() for probability density, pbinom() for cumulative distribution, etc.
15. What is spline?
A piecewise continuous polynomial function used for interpolation or smoothing. In R, the spline() function can be used to generate spline fits.
Example:
x <- 1:10
y <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3)
spline_fit <- spline(x, y, n = 100, method = “natural”) plot(x, y)
lines(spline_fit, col = “red”)
16. What is R, and what are its main characteristics?
R is a programming language and environment widely used for solving data science problems and particularly designed for statistical computing and data visualization.
Its main characteristics include:
- Open source
- Interpreted (i.e., it supports both functional and object-oriented programming)
- Highly extensible due to its large collection of data science packages
- Functional and flexible
- Compatible with many operating systems
- Can be easily integrated with other programming languages and frameworks
- Allows powerful statistical computing
- Offers a variety of data visualization tools for creating publication-quality charts
- Equipped with the command-line interface
- Supported by a strong online community
17. What are some disadvantages of using R?
- Non-intuitive syntax and hence a steep learning curve, especially for beginners in programming
- Relatively slow
- Inefficient memory usage
- Inconsistent and often hard-to-read documentation of packages
- Some packages are of low quality or poorly-maintained
- Potential security concerns due to its open-source nature
18. List and define some basic data types in R.
- Numeric : decimal
- Integer: whole
- Character: a letter, number, or symbol, or any combination of them, enclosed in regular or single quotation
- Factor: categories from a predefined set of possible values, often with an intrinsic
- Logical: the Boolean values TRUE and FALSE, represented under the hood as 1 and 0, respectively.
19. List and define some basic data structures in R.
- Vector : a one-dimensional data structure used for storing values of the same data
- List: a multi-dimensional data structure used for storing values of any data type and/or other data
- Matrix: a two-dimensional data structure used for storing values of the same data
- Data frame: a two-dimensional data structure used for storing values of any data type, but each column must store values of the same data
20. How to import data in R?
The base R provides essential functions for importing data:
- table(): the most general function of the base R for importing data, takes in tabular data with any kind of field separators, including specific ones, such as |.
- csv(): comma-separated values (CSV) files with . as the decimal separator.
- csv2(): semicolon-separated values files with , as the decimal separator.
- delim(): tab-separated values (TSV) files with . as the decimal separator.
21. What is a package in R?
An R package is a collection of functions, code, data, and documentation, representing an extension of the R programming language and designed for solving specific kinds of tasks.
22. What is a factor in R?
A factor in R is a specific data type that accepts categories (aka levels) from a predefined set of possible values. These categories look like characters, but under the hood, they are stored as integers. Often, such categories have an intrinsic order.
23. What is RStudio?
RStudio is an open-source IDE (integrated development environment) that is widely used as a graphical front-end for working with the R programming language starting from version 3.0.1.
24. How to create a user-defined function in R?
function_name <- function(parameters)
{
function body
}
- Function name: the name of the function object that will be used for calling the function after its
- Function parameters: the variables separated with a comma and placed inside the parentheses that will be set to actual argument values each time we call the
- Function body: a chunk of code in the curly brackets containing the operations to be performed in a predefined order on the input arguments each time we call the
25. List some popular data visualization packages in R.
- ggplot2: the most popular R data visualization package allowing the creation of a wide variety of
- Lattice: for displaying multivariate data as a tiled panel (trellis) of several
- Plotly: for creating interactive, publication-quality
- Highcharter: for easy dynamic plotting, offers many flexible features, plugins, and themes; allows charting different R objects with one
- Leaflet: for creating interactive
26. How to assign a value to a variable in R?
- Using the assignment operator <-, g., my_var <- 1—the most common way of assigning a value to a variable in R.
- Using the equal operator =, g., my_var = 1—for assigning values to arguments inside a function definition.
- Using the rightward assignment operator ->, g., my_var -> 1—can be used in pipes.
- Using the global assignment operators, either leftward (<<-) or rightward (->>), g., my_var <<- 1—for creating a global variable inside a function definition.
27. What are the requirements for naming variables in R?
- A variable name can be a combination of letters, digits, dots, and It can’t contain any other symbols, including white spaces.
- A variable name must start with a letter or a
- If a variable name starts with a dot, this dot can’t be followed by a
- Reserved words in R (TRUE, for, NULL, ) can’t be used as variable names.
- Variable names are case-sensitive.
28. What types of loops exist in R, and what is the syntax of each type?
- For loop: iterates over a sequence the number of times equal to its length (unless the statements break and/or next are used) and performs the same set of operations on each item of that sequence.
for (variable in sequence)
{
operations
}
- While loop: performs the same set of operations until a predefined logical condition (or several logical conditions) is met—unless the statements break and/or next are used.
while (logical condition)
{
operations variable update
}
- Repeat loop: repeatedly performs the same set of operations until a predefined break condition (or several break conditions) is met.
repeat
{
operations
if(break condition)
{
break
}
}
29. How to aggregate data in R?
To aggregate data in R, we use the aggregate() function. This function has the following essential parameters, in this order:
- x: the data frame to
- by: a list of the factors to group
- FUN: an aggregate function to compute the summary statistics for each group (e.g., mean, max, min, count, sum).
30. What is the difference between the functions apply(), lapply(), sapply(), and tapply()?
While all these functions allow iterating over a data structure without using loops and perform the same operation on each element of it, they are different in terms of the type of input and output and the function they perform.
- apply(): takes in a data frame, a matrix, or an array and returns a vector, a list, a matrix, or an This function can be applied row-wise, column-wise, or both.
- lapply(): takes in a vector, a list, or a data frame and always returns a In the case of a data frame as an input, this function is applied only column-wise.
- sapply(): takes in a vector, a list, or a data frame and returns the most simplified data structure, e., a vector for an input vector, a list for an input list, and a matrix for an input data frame.
- tapply(): calculates summary statistics for different factors (i.e., categorical data).
31. List and define the control statements in R.
There are three groups of control statements in R: conditional statements, loop statements, and jump statements.
Conditional statements:
- if—tests whether a given condition is true and provides operations to perform if it’s
- if-else—tests whether a given condition is true, provides operations to perform if it’s so and another set of operations to perform in the opposite
- .. else if… else—tests a series of conditions one by one, provides operations to perform for each condition if it’s true, and a fallback set of operations to perform if none of those conditions is true.
- switch—evaluates an expression against the items of a list and returns a value from the list based on the results of this
Loop statements:
- for—in for loops, iterates over a
- while—in while loops, checks if a predefined logical condition (or several logical conditions) is met at the current iteration.
- repeat—in repeat loops, continues performing the same set of operations until a predefined break condition (or several break conditions) is
Jump statements:
- next—skips a particular iteration of a loop and jumps to the next one if a certain condition is
- break—stops and exits the loop at a particular iteration if a certain condition is
- return—exits a function and returns the
32. What are correlation and covariance, and how do you calculate them in R
Covariance measures the degree to which two variables change together, while correlation is a standardized measure of covariance that ranges from -1 to 1, indicating the strength and direction of the relationship.