 
Displaying Freq. Dists. using R
See also
Drawing a histogram
R can draw a histogram of a variable, y, using these instructions:
Gave us:
Note:
 You can set the number of breakpoints using the hist function's breaks argument (or some shortened form thereof, such as br). For example hist(y, breaks=1) would yield just 2 (equal width) intervals, whereas setting br=100 yields 101 intervals, and so on. Setting intervals of unequal widths is also possible, albeit generally unwise.
 Since histograms group values into arbitrary classintervals, you may find it useful to indicate the actual values of y as 'tickmarks' along one axis  this is known as a 'rugplot'. The following instruction adds a rugplot to an existing plot.
 by default, a rugplot is added to the (lower) xaxis.
 Rugplots can usefully be added to some other types of plot.

A frequency polygon
R can draw a frequency polygon of one variable, y, using these instructions:
Gave us:
Note:
 Frequency polygons can be useful in comparing several distributions (say of the values in variables y & z) by plotting them within the same (single set) of graph axes. Of course, if the distributions of y & z differ much you need to set the plot limits so they all fit on it  and, if the number of values are very different, it is best to use relative frequencies, as we do below:

A stem and leaf plot
R can produce a stem and leaf plot of a variable, y, using these instructions:
Gave us:
The decimal point is 1 digit(s) to the right of the 
42  000
44  50
46  005
48  05055
50  050
52  00000555
54  0555
56  00 
  
Note:
 Unlike R's graphics functions, stem only produces an output to the console. The command x=stem(y) would merely assign a NULL value to x.
 The parameter 'scale' is used to set the scale of the plot. We set scale=2 to scale it the same way as the other plots. If scale were not specified, R would use the default scale=1 which would pool adjacent classes.
 If you used stem(y/10000) the result would be identical  aside from the message at the top being The decimal point is 4 digit(s) to the left of the .

A jittered dot plot
R can produce a jittered dot plot of a variable, y, using these instructions:
Gave us:

Rank scatterplots
There are many ways of plotting rank (or relative rank) on value  in other words producing a rank scatterplot. For example:
Note:
 You can obtain the same sort of thing by sorting y into ascending order, and plotting the result against that order:
 Notice this code assumes there are NO nonavailable (NA) values, because the sort function automatically removes them.
 If y does have some NA values, the following code would work  although it is somewhat inefficient because y must be sorted twice.
 A better way is to sort y, and create a new variable (r)  assuming it is OK to reorder y and create r.
 If you prefer to use relative rank, and y has no NA values, these instructions would work:
 But these instructions may be better:
 Or you could use these instructions:

Frequency of each value
R can plot the frequency of each value in variable y, without using class intervals, using these instructions:
Gave us:
 If you prefer to plot the distribution as a histogramtype line diagram, use these instructions instead:
 Or you could sort y into ascending order, then find how many of each value there are (using the runlengthencoding function) then plot frequency against value:
 The rle (run length encoding) function produces two sets of values:
 Run lengths  that is the frequency of (neighbouring) identical elements.
 Run values  that is the value of each group of identical elements.
 The result of the rle function are assigned to a (listtype) variable called tmp  as two (hidden) vectors, called 'lengths' and 'values'. The next line instructs R to plot the frequency of each value in variable y against its value. Notice that, because the length and value of each run is held by vectors within tmp, we need to address them as tmp$values and tmp$lengths.
 If y is a continuous variable whose values are neither rounded nor truncated, the result will nearly always be equivalent to a rugplot. This is often true of small samples of discrete data, such as the distribution of eggs per gram of faeces  as shown in the 'lineplot' and a barplot below.
 Last but not least, if you prefer to plot the distribution as a (traditional) nonjittered dotplot you could use these instructions:

Empirical cumulative distribution function (ECDF)
An ECDF is simply a plot of (sequential) rank on value, shown as a step plot  but often with the values overlaid as points. Since step plots must be plotted in ascending order, the simplest way is to sort the data. For example:
Gave us:
Note:
 If you prefer to use R's ecdf function, you could simply enter plot(ecdf(y))  but it does not produce the verticals (nor will it accept a colour), so we prefer our version.

Pvalue plot
Pvalue plots are a useful way to examine and compare frequency distributions. At their simplest they are simply a (cumulative) lineplot of (p=) relative rank on (y=) value. But since it is harder to assess symmetry given an shaped plot, than to assess it given a /\ shaped plot, or an X shaped plot, p is calculated as corrected relative rank and values above the median are plotted against 1p.
The code below gives a pvalue plot for 1 variable (y).
Gave us:
Note:
 The median lies within the graph's apex, but you could make this explicit using abline(v=median(y1))
 If y does not contain any nonavailable (NA) values, and you do not wish to sort y, this code would work equally well:
 Or you could simply plot p, then 1p, but only show the lower half of the graph.

Multiple Pvalue plots
Pvalue plots are a useful way to compare distributions because they are easier to interpret than quantilequantile plots  and pvalue plots enable you to compare more than 2 distributions at once.
For example, we did pvalue plots for the two sets of cattle weight data using these instructions:
Gave us:
Note:
 You could also plot these as scatterplots, and limit their upper range to p0.5, but lineplots are often easier to inspect.
 We highlighted their medians using dotted vertical lines. This makes it easier to judge how symmetrical each distribution is, albeit at the expense of a more cluttered graph.
 See how readily you can identify their quartiles, 10% & 90% quantiles, or any outlying values.

Smoothed distribution function
Since the frequency distribution of a sample is unavoidably discrete, it is sometimes useful to smooth it before plotting  provided this can be done without introducing too many arbitrary assumptions. We consider the reasoning behind this in Unit 3. R provides this facility via its density function. For example:
Gave us:
Note:
 For small samples it is often useful to add a rug plot.
 Formallyspeaking, by default this density function uses normal (or Gaussian) smoothing.
 By overplotting, several smoothed distributions can be compared. For example:

