Introduction
Descriptive biostatistics is a fundamental component of data analysis in biological and health sciences. It involves summarizing, organizing, and interpreting data to understand patterns and trends. In modern research, tools like RStudio have become essential for performing statistical analysis efficiently and accurately.
This article provides a complete guide to descriptive biostatistics in RStudio, including a biological dataset (hemoglobin values), detailed explanations of statistical measures, and ready-to-use R scripts. This guide is especially useful for students, researchers, and beginners in biostatistics.
Biological Dataset (Hemoglobin Values)
Below is the dataset used for analysis:
| Observation | Hemoglobin (g/dL) |
|---|---|
| 1 | 12.5 |
| 2 | 13.2 |
| 3 | 11.8 |
| 4 | 14.0 |
| 5 | 13.5 |
| 6 | 14.2 |
| 7 | 11.5 |
| 8 | 14.8 |
| 9 | 12.9 |
| 10 | 14.1 |
👉 This dataset represents hemoglobin levels of individuals, a common biological parameter used in medical studies.
What is Descriptive Biostatistics?
Descriptive biostatistics refers to statistical methods used to summarize and describe the main features of biological data. It includes:
- Measures of central tendency
- Measures of dispersion
- Measures of position
- Measures of shape
These measures help researchers understand the distribution and variability of biological data.
Measures of Central Tendency
These measures indicate the central value of the dataset.
Mean
The average hemoglobin level.
Median
The middle value when data is arranged in order.
Mode
The most frequently occurring value.
👉 In this dataset, no mode exists (amodal distribution).
Measures of Dispersion
These show how spread out the data is.
- Range
- Variance
- Standard Deviation
- Interquartile Range (IQR)
👉 A higher standard deviation indicates more variability in hemoglobin levels.
Measures of Position
These divide the dataset into equal parts:
- Quartiles (Q1, Q2, Q3)
- Percentiles
- Deciles
Measures of Shape
These describe data distribution:
- Skewness → symmetry of data
- Kurtosis → peakedness of data
R Script for Descriptive Statistics
Below is the complete R code for your dataset:
# Create dataset
data <- data.frame(
Hemoglobin = c(12.5,13.2,11.8,14.0,13.5,14.2,11.5,14.8,12.9,14.1)
)
x <- data$Hemoglobin
# -----------------------------
# Basic Descriptive Statistics
# -----------------------------
mean(x)
median(x)
# Mode function
mode_func <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
mode_func(x)
min(x)
max(x)
range(x)
sum(x)
length(x)
# -----------------------------
# Dispersion
# -----------------------------
var(x)
sd(x)
IQR(x)
quantile(x)
# -----------------------------
# Position Measures
# -----------------------------
quantile(x, probs = c(0.25, 0.5, 0.75))
quantile(x, probs = seq(0.1, 0.9, by = 0.1))
# -----------------------------
# Shape Measures
# -----------------------------
install.packages("moments")
library(moments)
skewness(x)
kurtosis(x)
Output Summary Table (Example)
| Statistic | Value (Approx.) |
|---|---|
| Mean | 13.25 |
| Median | 13.35 |
| Mode | No mode |
| Minimum | 11.5 |
| Maximum | 14.8 |
| Range | 3.3 |
| Variance | 1.20 |
| Standard Deviation | 1.10 |
| IQR | ~1.75 |
| Sample Size | 10 |
Interpretation of Results
- The mean hemoglobin level (13.25 g/dL) indicates the average health condition of the group.
- The median (13.35) is close to the mean, suggesting a relatively symmetric distribution.
- The absence of a mode indicates that all values are unique.
- The standard deviation (~1.10) shows moderate variability.
- The dataset has a slight variation but no extreme outliers.
Importance in Biological Research
Descriptive statistics are widely used in:
- Clinical trials
- Epidemiological studies
- Laboratory experiments
- Public health research
They help in:
- Summarizing large datasets
- Identifying trends
- Supporting further inferential analysis
Advantages of Using RStudio
- Free and open-source
- Powerful statistical functions
- Easy data visualization
- Widely used in research and academia
Full R Script Download File
Conclusion
Descriptive biostatistics provides essential tools for summarizing and understanding biological data. Using RStudio, researchers can efficiently compute statistical measures such as mean, median, variance, and skewness. This guide demonstrated how to analyze hemoglobin data using R, making it a practical reference for students and professionals.