Cluster sampling is a probability sampling method in
which the population is divided into separate groups or clusters, and a simple random sample
of clusters is selected from the population. Unlike stratified sampling, where a sample is
taken from every stratum, cluster sampling involves selecting only some of the clusters for
the sample.
In cluster sampling, there can be variations within each cluster, and the
clusters may not be as internally homogeneous as the strata in stratified sampling. Cluster
sampling can be implemented in two ways: single-stage and multistage. In single-stage cluster
sampling, the entire selected clusters are sampled, while in multistage cluster sampling,
random samples are taken within the chosen clusters in one or more stages.
Compared to stratified sampling, cluster sampling tends to increase sampling
error for the same sample size. In other words, cluster sampling requires a larger sample
size to achieve the same accuracy standards.
However, cluster sampling offers cost savings, particularly when travel costs
between clusters are high. By reducing the cost per respondent, cluster sampling may result
in a lower overall cost for the study.
Ideally, within stratified sampling, the variation within each stratum should
be small (homogeneous), while within cluster sampling, the variation within clusters should
be large. However, in practice, controlling the variation within clusters is often beyond our
control.
The process of cluster sampling involves dividing the population into clusters
(e.g., towns or cities), grouping the clusters into strata, and taking a cluster sample from
each stratum. This approach ensures that the selected clusters are representative of different
strata within the population.
To see how cluster sampling works, consider the example of the urban India
household panel which used to be the largest consumer panel in the world. Set-up by Hindustan
Lever, the panel was configured by splitting all Indian cities and towns (clusters) into groups
based on size and geographical location. A selection of about 20 clusters was made covering
small, medium, and large urban centres across north, south, east, and west of India. The
selected towns/cities were then further divided into blocks, and these blocks were stratified
based on variables such as household income and household size. The panel was formed by
randomly selecting homes from the chosen urban blocks to adequately represent all strata. This
multi-stage sampling process involved selecting cities, then blocks, and finally homes.
A similar approach, involving clustering and stratification, is followed when
conducting national surveys of individuals or households in countries across the globe.
In summary, cluster sampling is a probability sampling method that involves
dividing the population into clusters, selecting a random sample of clusters, and sampling
within those clusters. While cluster sampling requires larger sample sizes than stratified
sampling to achieve the same accuracy standards, it offers cost savings and can be an effective
approach, especially when there are geographical or logistical constraints in data collection.