Table of Contents

Introduction

Descriptive biostatistics is a fundamental component of data analysis in biological and health sciences. It involves summarizing, organizing, and interpreting data to understand patterns and trends. In modern research, tools like RStudio have become essential for performing statistical analysis efficiently and accurately.

This article provides a complete guide to descriptive biostatistics in RStudio, including a biological dataset (hemoglobin values), detailed explanations of statistical measures, and ready-to-use R scripts. This guide is especially useful for students, researchers, and beginners in biostatistics.

Biological Dataset (Hemoglobin Values)

Below is the dataset used for analysis:

Observation	Hemoglobin (g/dL)
1	12.5
2	13.2
3	11.8
4	14.0
5	13.5
6	14.2
7	11.5
8	14.8
9	12.9
10	14.1

👉 This dataset represents hemoglobin levels of individuals, a common biological parameter used in medical studies.

What is Descriptive Biostatistics?

Descriptive biostatistics refers to statistical methods used to summarize and describe the main features of biological data. It includes:

Measures of central tendency
Measures of dispersion
Measures of position
Measures of shape

These measures help researchers understand the distribution and variability of biological data.

Measures of Central Tendency

These measures indicate the central value of the dataset.

Mean

The average hemoglobin level.

Median

The middle value when data is arranged in order.

Mode

The most frequently occurring value.
👉 In this dataset, no mode exists (amodal distribution).

Measures of Dispersion

These show how spread out the data is.

Range
Variance
Standard Deviation
Interquartile Range (IQR)

👉 A higher standard deviation indicates more variability in hemoglobin levels.

Measures of Position

These divide the dataset into equal parts:

Quartiles (Q1, Q2, Q3)
Percentiles
Deciles

Measures of Shape

These describe data distribution:

Skewness → symmetry of data
Kurtosis → peakedness of data

R Script for Descriptive Statistics

Below is the complete R code for your dataset:

# Create dataset
data <- data.frame(
  Hemoglobin = c(12.5,13.2,11.8,14.0,13.5,14.2,11.5,14.8,12.9,14.1)
)

x <- data$Hemoglobin

# -----------------------------
# Basic Descriptive Statistics
# -----------------------------

mean(x)
median(x)

# Mode function
mode_func <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
mode_func(x)

min(x)
max(x)
range(x)
sum(x)
length(x)

# -----------------------------
# Dispersion
# -----------------------------

var(x)
sd(x)
IQR(x)
quantile(x)

# -----------------------------
# Position Measures
# -----------------------------

quantile(x, probs = c(0.25, 0.5, 0.75))
quantile(x, probs = seq(0.1, 0.9, by = 0.1))

# -----------------------------
# Shape Measures
# -----------------------------

install.packages("moments")
library(moments)

skewness(x)
kurtosis(x)

Output Summary Table (Example)

Statistic	Value (Approx.)
Mean	13.25
Median	13.35
Mode	No mode
Minimum	11.5
Maximum	14.8
Range	3.3
Variance	1.20
Standard Deviation	1.10
IQR	~1.75
Sample Size	10

Interpretation of Results

The mean hemoglobin level (13.25 g/dL) indicates the average health condition of the group.
The median (13.35) is close to the mean, suggesting a relatively symmetric distribution.
The absence of a mode indicates that all values are unique.
The standard deviation (~1.10) shows moderate variability.
The dataset has a slight variation but no extreme outliers.

Importance in Biological Research

Descriptive statistics are widely used in:

Clinical trials
Epidemiological studies
Laboratory experiments
Public health research

They help in:

Summarizing large datasets
Identifying trends
Supporting further inferential analysis

Advantages of Using RStudio

Free and open-source
Powerful statistical functions
Easy data visualization
Widely used in research and academia

Full R Script Download File

descriptive_biostatistics.txt

Conclusion

Descriptive biostatistics provides essential tools for summarizing and understanding biological data. Using RStudio, researchers can efficiently compute statistical measures such as mean, median, variance, and skewness. This guide demonstrated how to analyze hemoglobin data using R, making it a practical reference for students and professionals.

Descriptive Biostatistics in RStudio: Complete Guide with Biological Dataset and R Code