Descriptive Biostatistics in RStudio: Complete Guide with Biological Dataset and R Code

Introduction

Descriptive biostatistics is a fundamental component of data analysis in biological and health sciences. It involves summarizing, organizing, and interpreting data to understand patterns and trends. In modern research, tools like RStudio have become essential for performing statistical analysis efficiently and accurately.

This article provides a complete guide to descriptive biostatistics in RStudio, including a biological dataset (hemoglobin values), detailed explanations of statistical measures, and ready-to-use R scripts. This guide is especially useful for students, researchers, and beginners in biostatistics.

Biological Dataset (Hemoglobin Values)

Below is the dataset used for analysis:

ObservationHemoglobin (g/dL)
112.5
213.2
311.8
414.0
513.5
614.2
711.5
814.8
912.9
1014.1

👉 This dataset represents hemoglobin levels of individuals, a common biological parameter used in medical studies.

What is Descriptive Biostatistics?

Descriptive biostatistics refers to statistical methods used to summarize and describe the main features of biological data. It includes:

  • Measures of central tendency
  • Measures of dispersion
  • Measures of position
  • Measures of shape

These measures help researchers understand the distribution and variability of biological data.

Measures of Central Tendency

These measures indicate the central value of the dataset.

Mean

The average hemoglobin level.

Median

The middle value when data is arranged in order.

Mode

The most frequently occurring value.
👉 In this dataset, no mode exists (amodal distribution).

Measures of Dispersion

These show how spread out the data is.

  • Range
  • Variance
  • Standard Deviation
  • Interquartile Range (IQR)

👉 A higher standard deviation indicates more variability in hemoglobin levels.

Measures of Position

These divide the dataset into equal parts:

  • Quartiles (Q1, Q2, Q3)
  • Percentiles
  • Deciles

Measures of Shape

These describe data distribution:

  • Skewness → symmetry of data
  • Kurtosis → peakedness of data

R Script for Descriptive Statistics

Below is the complete R code for your dataset:

# Create dataset
data <- data.frame(
  Hemoglobin = c(12.5,13.2,11.8,14.0,13.5,14.2,11.5,14.8,12.9,14.1)
)

x <- data$Hemoglobin

# -----------------------------
# Basic Descriptive Statistics
# -----------------------------

mean(x)
median(x)

# Mode function
mode_func <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
mode_func(x)

min(x)
max(x)
range(x)
sum(x)
length(x)

# -----------------------------
# Dispersion
# -----------------------------

var(x)
sd(x)
IQR(x)
quantile(x)

# -----------------------------
# Position Measures
# -----------------------------

quantile(x, probs = c(0.25, 0.5, 0.75))
quantile(x, probs = seq(0.1, 0.9, by = 0.1))

# -----------------------------
# Shape Measures
# -----------------------------

install.packages("moments")
library(moments)

skewness(x)
kurtosis(x)

Output Summary Table (Example)

StatisticValue (Approx.)
Mean13.25
Median13.35
ModeNo mode
Minimum11.5
Maximum14.8
Range3.3
Variance1.20
Standard Deviation1.10
IQR~1.75
Sample Size10

Interpretation of Results

  • The mean hemoglobin level (13.25 g/dL) indicates the average health condition of the group.
  • The median (13.35) is close to the mean, suggesting a relatively symmetric distribution.
  • The absence of a mode indicates that all values are unique.
  • The standard deviation (~1.10) shows moderate variability.
  • The dataset has a slight variation but no extreme outliers.

Importance in Biological Research

Descriptive statistics are widely used in:

  • Clinical trials
  • Epidemiological studies
  • Laboratory experiments
  • Public health research

They help in:

  • Summarizing large datasets
  • Identifying trends
  • Supporting further inferential analysis

Advantages of Using RStudio

  • Free and open-source
  • Powerful statistical functions
  • Easy data visualization
  • Widely used in research and academia

Full R Script Download File

descriptive_biostatistics.txt

Conclusion

Descriptive biostatistics provides essential tools for summarizing and understanding biological data. Using RStudio, researchers can efficiently compute statistical measures such as mean, median, variance, and skewness. This guide demonstrated how to analyze hemoglobin data using R, making it a practical reference for students and professionals.

Leave a Comment