# A Math Primer

## Basic Concepts of Statistics

**Note: **This section is not intended to provide a
full

coverage of statistics. A formal book on statistical

methods and applications will be more appropriate

for that. This section, instead, intends to provide a

quick overview of simple statistical approaches used

to establish relationships between data and how

these can be used in solving some environmental

problems.

**Introduction
What is Statistics?**

Statistics is the discipline concerned with the

collection, organization, and interpretation of

numerical data, especially as it relates to the

analysis of population characteristics by inference

from sampling. It addresses all elements of

numerical analysis, from study planning to the

presentation of final results. Statistics, therefore, is

more than a compilation of computational

techniques. It is a means of learning from data, a

way of viewing information, and a servant of all

science.

In a simplistic way, we can say that Statistics boils

down to two approaches: exploration and

adjudication. The purpose of exploration is to

uncover patterns and clues within data sets.

Adjudication, on the other hand, serves to determine

whether the uncovered patterns are valid and can be

generalized. Both approaches are as important and

none can be minimized in the statistical process of

data analysis. Statistics is a great quantitative tool to

help make any method of enquiry more meaningful

and particularly as objective as possible. However,

one must avoid falling in the trap of the “black hole

of empiricism” whereby data are analyzed with the

hopes of discovering the fundamental “laws”

responsible for observed outcomes. One must first

establish an explanatory protocol of what these

laws/processes can be and then use Statistics

(among other tools) to test the appropriateness, and

sometimes exactness, of such explanations. This

pre-formulation of plausible explanations is at the

core of the “scientific method” and is called

“hypothesis formulation”. Hypotheses are

established as educated hunches to explain observed

facts or findings and should be constructed in ways

that can lead to anticipatory deductions (also called

predictions). Such predictions should of course be

verifiable through data collection and analysis. This

is probably where Statistics come most in handy in

helping judge the extent to which the recovered data

agree with the established predictions (although

Statistics also contributes substantially to

formulation of test protocols and how data might be

collected to verify hypotheses).

Statistics thus seeks to make each process of the

scientific method (observation, hypothesis

formulation, prediction, verification) more **objective**

(so that things are observed as they are, without

falsification according to some preconceived view)

and **reproducible** (so that we might judge things in

terms of the degree to which observations might be

repeated).

It is not the scope of this short introduction to go

over the range of statistical analyses possible. In

fact, this text explores only selective issues related

to statistics leaving room for true course in statistics

(applied or theoretical) to develop all concepts more

fully. Below we will talk succinctly about variables,

summary statistics, and the evaluation of linear

relationships between two variables.

**A. Measurement**

To perform statistical operations we need an object

of analysis. For this, number (or codes) are used as

the quantitative representation of any specific

observation. The assignment of number or codes to

describe a pre-set subject is called **measurement.**

Measurements that can be expressed by more than

one value during a study are called** variables.**

Examples of variables are AGE of individuals,

WEIGHT of objects, or NAME of species.

Variables only represent the subject of the

measurement, not any intrinsic value or code.

Variables can be classified according to the way in

which they are encoded (i.e. numeric, text, date) or

according to which scale they are measured.

Although there exists many ways to classify

measurement scales, three will be considered here:

Nominal (qualitative,
categorical)

Ordinal (semi-quantitative, “ranked”)

Scale (quantitative, “continuous”,

interval/ratio)

**
Nominal variables **are categorical attributes that

have no inherent order. For example SEX (male or

female) is a nominal variable, as is NAME and

EYECOLOR.

**Ordinal variables**are ranked-ordered characteristic

and responses. For example an opinion graded on a

1-5 scale (5 = strongly agree; 4 = agree; 3 =

undecided; 2 = disagree; 1 = strongly disagree).

Although the categories can be put in ascending (or

descending) order, distances (“differences”)

between possible responses are uneven (i.e. the

distance between “strongly agree” and “agree” is

not the same as the distance between “agree” and

“undecided”). This makes the measurement ordinal,

and not scaled.

**Scale variables** represent quantitative

measurements in which differences between

possible responses are uniform (or continuous). For

example LENGTH (measured in centimeters) is a

scale measurement. No matter how much you cut

down the measurement into a smaller fraction (i.e. a

tenth of a centimeter) the difference between on

measurement and the next still remains the same

(i.e. the difference between 3 centimeters and 2

centimeters or 3 millimeters and 2 millimeters is the

same as that between 2 cm and 1 cm or 2 mm and 1

mm).

Notice that each step up the measurement scale

hierarchy takes on the assumptions f the step below

it and then adds another restriction. That is, nominal

variables are named categories. Ordinal variables

are named categories that can be put into logical

order. Scale variables are ordinal variables that have

equal distance between possible responses.

**Data Quality
**

Something must be said about the quality of data

used. A statistical analysis is only as good as its data

and interpretative limitations may be imposed by

the quality of the data rather than by the analysis. In

addressing data quality, we must make a distinction

between

**measurement error**and

**processing error.**

Measurement error is represented by differences

between the “true” quality of the object observed

(i.e the true length of a fish) and what appears

during data collection (the actual scale measurement

collected during the study). Processing errors are

errors that occur during data handling (i.e. wrong

data reporting, erroneous rounding or

transformation). One must realize that errors are

inherent to any measurement and that trying to

avoid them is virtually impossible. What must be

done is characterize these errors and try minimizing

in the best way possible.

**Population and Sample**

Most statistical analyses are done to learn about a

specific

**population**(the total number of trouts in a

specific river, the concentration of a contaminant in

a lake’s total sediment bed). The population is thus

the universe of all possible measurements in a

defined unit. When the population is real, it is

sometimes possible to obtain information on the

entire population. This type of study is called

**census**. However, performing a census is usually

impractical, expensive, and time-consuming, if not

downright impossible. Therefore, nearly all

statistical studies are based on a subset of the

population, which is called

**sample**. Whenever

possible, a probability sample should be used. A

probability samples is a sample in which a) every

population member (item) has known probability of

being sampled, b) the sample is drawn by some

method of chance consistent with these

probabilities, and c) selection probabilities are

considered when making estimates from the

samples.