[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45. descriptive


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.1 Introduction to descriptive

Package descriptive contains a set of functions for making descriptive statistical computations and graphing. Together with the source code there are three data sets in your Maxima tree: pidigits.data, wind.data and biomed.data.

Any statistics manual can be used as a reference to the functions in package descriptive.

For comments, bugs or suggestions, please contact me at 'mario AT edu DOT xunta DOT es'.

Here is a simple example on how the descriptive functions in descriptive do they work, depending on the nature of their arguments, lists or matrices,

(%i1) load (descriptive)$
(%i2) /* univariate sample */   mean ([a, b, c]);
                            c + b + a
(%o2)                       ---------
                                3
(%i3) matrix ([a, b], [c, d], [e, f]);
                            [ a  b ]
                            [      ]
(%o3)                       [ c  d ]
                            [      ]
                            [ e  f ]
(%i4) /* multivariate sample */ mean (%);
                      e + c + a  f + d + b
(%o4)                [---------, ---------]
                          3          3

Note that in multivariate samples the mean is calculated for each column.

In case of several samples with possible different sizes, the Maxima function map can be used to get the desired results for each sample,

(%i1) load (descriptive)$
(%i2) map (mean, [[a, b, c], [d, e]]);
                        c + b + a  e + d
(%o2)                  [---------, -----]
                            3        2

In this case, two samples of sizes 3 and 2 were stored into a list.

Univariate samples must be stored in lists like

(%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
(%o1)           [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

and multivariate samples in matrices as in

(%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
             [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
                        [ 13.17  9.29  ]
                        [              ]
                        [ 14.71  16.88 ]
                        [              ]
                        [ 18.5   16.88 ]
(%o1)                   [              ]
                        [ 10.58  6.63  ]
                        [              ]
                        [ 13.33  13.25 ]
                        [              ]
                        [ 13.21  8.12  ]

In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.

Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file pidigits.data contains the first 100 digits of number %pi:

      3
      1
      4
      1
      5
      9
      2
      6
      5
      3 ...

In order to load these digits in Maxima,

(%i1) s1 : read_list (file_search ("pidigits.data"))$
(%i2) length (s1);
(%o2)                          100

On the other hand, file wind.data contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discused in Haslett, J., Raftery, A. E. (1989) Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion. Applied Statistics 38, 1-50). This loads the data:

(%i1) s2 : read_matrix (file_search ("wind.data"))$
(%i2) length (s2);
(%o2)                          100
(%i3) s2 [%]; /* last record */
(%o3)            [3.58, 6.0, 4.58, 7.62, 11.25]

Some samples contain non numeric data. As an example, file biomed.data (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, A and B, of different ages,

(%i1) s3 : read_matrix (file_search ("biomed.data"))$
(%i2) length (s3);
(%o2)                          100
(%i3) s3 [1]; /* first record */
(%o3)            [A, 30, 167.0, 89.0, 25.6, 364]

The first individual belongs to group A, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.

One must take care when working with categorical data. In the next example, symbol a is asigned a value in some previous moment and then a sample with categorical value a is taken,

(%i1) a : 1$
(%i2) matrix ([a, 3], [b, 5]);
                            [ 1  3 ]
(%o2)                       [      ]
                            [ b  5 ]

Categories:  Descriptive statistics Share packages Package descriptive


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.2 Functions and Variables for data manipulation

Function: build_sample (list)
Function: build_sample (matrix)

Builds a sample from a table of absolute frequencies. The input table can be a matrix or a list of lists, all of them of equal size. The number of columns or the length of the lists must be greater than 1. The last element of each row or list is interpreted as the absolute frequency. The output is always a sample in matrix form.

Examples:

Univariate frequency table.

(%i1) load (descriptive)$
(%i2) sam1: build_sample([[6,1], [j,2], [2,1]]);
                       [ 6 ]
                       [   ]
                       [ j ]
(%o2)                  [   ]
                       [ j ]
                       [   ]
                       [ 2 ]
(%i3) mean(sam1);
                      2 j + 8
(%o3)                [-------]
                         4
(%i4) barsplot(sam1) $

Multivariate frequency table.

(%i1) load (descriptive)$
(%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
                           [ 6  3 ]
                           [      ]
                           [ 5  6 ]
                           [      ]
                           [ 5  6 ]
(%o2)                      [      ]
                           [ u  2 ]
                           [      ]
                           [ 6  8 ]
                           [      ]
                           [ 6  8 ]
(%i3) cov(sam2);
       [   2                 2                            ]
       [  u  + 158   (u + 28)     2 u + 174   11 (u + 28) ]
       [  -------- - ---------    --------- - ----------- ]
(%o3)  [     6          36            6           12      ]
       [                                                  ]
       [ 2 u + 174   11 (u + 28)            21            ]
       [ --------- - -----------            --            ]
       [     6           12                 4             ]
(%i4) barsplot(sam2, grouping=stacked) $

Categories:  Package descriptive

Function: continuous_freq (list)
Function: continuous_freq (list, m)

The argument of continuous_freq must be a list of numbers. Divides the range in intervals and counts how many values are inside them. The second argument is optional and either equals the number of classes we want, 10 by default, or equals a list containing the class limits and the number of classes we want, or a list containing only the limits. Argument list must be a list of (2 or 3) real numbers. If sample values are all equal, this function returns only one class of amplitude 2.

Examples:

Optional argument indicates the number of classes we want. The first list in the output contains the interval limits, and the second the corresponding counts: there are 16 digits inside the interval [0, 1.8], 24 digits in (1.8, 3.6], and so on.

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, 5);
(%o3) [[0, 1.8, 3.6, 5.4, 7.2, 9.0], [16, 24, 18, 17, 25]]

Optional argument indicates we want 7 classes with limits -2 and 12:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, [-2,12,7]);
(%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]

Optional argument indicates we want the default number of classes with limits -2 and 12:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, [-2,12]);
                3  4  11  18     32  39  46  53
(%o3)  [[- 2, - -, -, --, --, 5, --, --, --, --, 12], 
                5  5  5   5      5   5   5   5
               [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]

Categories:  Package descriptive

Function: discrete_freq (list)

Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list,

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) discrete_freq (s1);
(%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
                             [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]

The first list gives the sample values and the second their absolute frequencies. Commands ? col and ? transpose should help you to understand the last input.

Categories:  Package descriptive

Function: standardize (list)
Function: standardize (matrix)

Subtracts to each element of the list the sample mean and divides the result by the standard deviation. When the input is a matrix, standardize subtracts to each row the multivariate mean, and then divides each component by the corresponding standard deviation.

Categories:  Package descriptive

Function: subsample (data_matrix, predicate_function)
Function: subsample (data_matrix, predicate_function, col_num1, col_num2, ...)

This is a sort of variant of the Maxima submatrix function. The first argument is the data matrix, the second is a predicate function and optional additional arguments are the numbers of the columns to be taken. Its behaviour is better understood with examples.

These are multivariate records in which the wind speed in the first meteorological station were greater than 18. See that in the lambda expression the i-th component is refered to as v[i].

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) subsample (s2, lambda([v], v[1] > 18));
              [ 19.38  15.37  15.12  23.09  25.25 ]
              [                                   ]
              [ 18.29  18.66  19.08  26.08  27.63 ]
(%o3)         [                                   ]
              [ 20.25  21.46  19.95  27.71  23.38 ]
              [                                   ]
              [ 18.79  18.96  14.46  26.38  21.84 ]

In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and less than 25 knots in station number 4. The sample contains only data from stations 1, 2 and 5. In this case, the predicate function is defined as an ordinary Maxima function.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) g(x):= x[1] >= 16 and x[4] < 25$
(%i4) subsample (s2, g, 1, 2, 5);
                     [ 19.38  15.37  25.25 ]
                     [                     ]
                     [ 17.33  14.67  19.58 ]
(%o4)                [                     ]
                     [ 16.92  13.21  21.21 ]
                     [                     ]
                     [ 17.25  18.46  23.87 ]

Here is an example with the categorical variables of biomed.data. We want the records corresponding to those patients in group B who are older than 38 years.

(%i1) load (descriptive)$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) h(u):= u[1] = B and u[2] > 38 $
(%i4) subsample (s3, h);
                [ B  39  28.0  102.3  17.1  146 ]
                [                               ]
                [ B  39  21.0  92.4   10.3  197 ]
                [                               ]
                [ B  39  23.0  111.5  10.0  133 ]
                [                               ]
                [ B  39  26.0  92.6   12.3  196 ]
(%o4)           [                               ]
                [ B  39  25.0  98.7   10.0  174 ]
                [                               ]
                [ B  39  21.0  93.2   5.9   181 ]
                [                               ]
                [ B  39  18.0  95.0   11.3  66  ]
                [                               ]
                [ B  39  39.0  88.5   7.6   168 ]

Probably, the statistical analysis will involve only the blood measures,

(%i1) load (descriptive)$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38),
                 3, 4, 5, 6);
                   [ 28.0  102.3  17.1  146 ]
                   [                        ]
                   [ 21.0  92.4   10.3  197 ]
                   [                        ]
                   [ 23.0  111.5  10.0  133 ]
                   [                        ]
                   [ 26.0  92.6   12.3  196 ]
(%o3)              [                        ]
                   [ 25.0  98.7   10.0  174 ]
                   [                        ]
                   [ 21.0  93.2   5.9   181 ]
                   [                        ]
                   [ 18.0  95.0   11.3  66  ]
                   [                        ]
                   [ 39.0  88.5   7.6   168 ]

This is the multivariate mean of s3,

(%i1) load (descriptive)$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) mean (s3);
       65 B + 35 A  317          6 NA + 8144.999999999999
(%o3) [-----------, ---, 87.178, ------------------------, 
           100      10                     100
                                                    3 NA + 19587
                                            18.123, ------------]
                                                        100

Here, the first component is meaningless, since A and B are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol NA is used here to indicate non available data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with NA symbols, although this deserves some loss of information.

(%i1) load (descriptive)$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) g(v):= v[4] # NA and v[6] # NA $
(%i4) mean (subsample (s3, g, 3, 4, 5, 6));
(%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 
                                                            2514
                                                            ----]
                                                             13

Categories:  Package descriptive

Function: tranform_sample (matrix, varlist, exprlist)

Transforms the sample matrix, where each column is called according to varlist, following expressions in exprlist.

Examples:

The second argument assigns names to the three columns. With these names, a list of expressions define the transformation of the sample.

(%i1) load (descriptive)$
(%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
(%i3) tranform_sample(data, [a,b,c], [c, a*b, log(a)]);
                               [ 7  6   log(3) ]
                               [               ]
                               [ 2  21  log(3) ]
(%o3)                          [               ]
                               [ 4  16  log(8) ]
                               [               ]
                               [ 4  10  log(5) ]

Add a constant column and remove the third variable.

(%i1) load (descriptive)$
(%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
(%i3) tranform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);
                                  [ 1  3  2 ]
                                  [         ]
                                  [ 1  3  7 ]
(%o3)                             [         ]
                                  [ 1  8  2 ]
                                  [         ]
                                  [ 1  5  2 ]

Categories:  Package descriptive


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.3 Functions and Variables for descriptive statistics

Function: mean (list)
Function: mean (matrix)

This is the sample mean, defined as

                       n
                     ====
             _   1   \
             x = -    >    x
                 n   /      i
                     ====
                     i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean (s1);
                               471
(%o3)                          ---
                               100
(%i4) %, numer;
(%o4)                         4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) mean (s2);
(%o6)     [9.9485, 10.1607, 10.8685, 15.7166, 14.8441]

Categories:  Package descriptive

Function: var (list)
Function: var (matrix)

This is the sample variance, defined as

                     n
                   ====
           2   1   \          _ 2
          s  = -    >    (x - x)
               n   /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var (s1), numer;
(%o3)                   8.425899999999999

See also function var1.

Categories:  Package descriptive

Function: var1 (list)
Function: var1 (matrix)

This is the sample variance, defined as

                     n
                   ====
               1   \          _ 2
              ---   >    (x - x)
              n-1  /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var1 (s1), numer;
(%o3)                    8.5110101010101
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) var1 (s2);
(%o5) [17.39586540404041, 15.13912778787879, 15.63204924242424, 
                            32.50152569696971, 24.66977392929294]

See also function var.

Categories:  Package descriptive

Function: std (list)
Function: std (matrix)

This is the the square root of function var, the variance with denominator n.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std (s1), numer;
(%o3)                   2.902740084816414
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std (s2);
(%o5) [4.149928523480858, 3.871399812729241, 3.933920277534866, 
                            5.672434260526957, 4.941970881136392]

See also functions var and std1.

Categories:  Package descriptive

Function: std1 (list)
Function: std1 (matrix)

This is the the square root of function var1, the variance with denominator n-1.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std1 (s1), numer;
(%o3)                   2.917363553109228
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std1 (s2);
(%o5) [4.170835096721089, 3.89090320978032, 3.953738641137555, 
                            5.701010936401517, 4.966867617451963]

See also functions var1 and std.

Categories:  Package descriptive

Function: noncentral_moment (list, k)
Function: noncentral_moment (matrix, k)

The non central moment of order k, defined as

                       n
                     ====
                 1   \      k
                 -    >    x
                 n   /      i
                     ====
                     i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) noncentral_moment (s1, 1), numer; /* the mean */
(%o3)                         4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) noncentral_moment (s2, 5);
(%o6) [319793.8724761505, 320532.1923892463,
      391249.5621381556, 2502278.205988911, 1691881.797742255]

See also function central_moment.

Categories:  Package descriptive

Function: central_moment (list, k)
Function: central_moment (matrix, k)

The central moment of order k, defined as

                    n
                  ====
              1   \          _ k
              -    >    (x - x)
              n   /       i
                  ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) central_moment (s1, 2), numer; /* the variance */
(%o3)                   8.425899999999999
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) central_moment (s2, 3);
(%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102,
                             37.5986572057918, 25.85981904394192]

See also functions central_moment and mean.

Categories:  Package descriptive

Function: cv (list)
Function: cv (matrix)

The variation coefficient is the quotient between the sample standard deviation (std) and the mean,

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) cv (s1), numer;
(%o3)                   .6193977819764815
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) cv (s2);
(%o5) [.4192426091090204, .3829365309260502, 0.363779605385983, 
                            .3627381836021478, .3346021393989506]

See also functions std and mean.

Categories:  Package descriptive

Function: smin (list)
Function: smin (matrix)

This is the minimum value of the sample list. When the argument is a matrix, smin returns a list containing the minimum values of the columns, which are associated to statistical variables.

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smin (s1);
(%o3)                           0
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smin (s2);
(%o5)             [0.58, 0.5, 2.67, 5.25, 5.17]

See also function smax.

Categories:  Package descriptive

Function: smax (list)
Function: smax (matrix)

This is the maximum value of the sample list. When the argument is a matrix, smax returns a list containing the maximum values of the columns, which are associated to statistical variables.

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smax (s1);
(%o3)                           9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smax (s2);
(%o5)          [20.25, 21.46, 20.04, 29.63, 27.63]

See also function smin.

Categories:  Package descriptive

Function: range (list)
Function: range (matrix)

The range is the difference between the extreme values.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) range (s1);
(%o3)                           9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) range (s2);
(%o5)          [19.67, 20.96, 17.37, 24.38, 22.46]

Categories:  Package descriptive

Function: quantile (list, p)
Function: quantile (matrix, p)

This is the p-quantile, with p a number in [0, 1], of the sample list. Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package descriptive.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) /* 1st and 3rd quartiles */
         [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
(%o3)                      [2.0, 7.25]
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quantile (s2, 1/4);
(%o5)    [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]

Categories:  Package descriptive

Function: median (list)
Function: median (matrix)

Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median (s1);
                                9
(%o3)                           -
                                2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median (s2);
(%o5)         [10.06, 9.855, 10.73, 15.48, 14.105]

The median is the 1/2-quantile.

See also function quantile.

Categories:  Package descriptive

Function: qrange (list)
Function: qrange (matrix)

The interquartilic range is the difference between the third and first quartiles, quantile(list,3/4) - quantile(list,1/4),

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) qrange (s1);
                               21
(%o3)                          --
                               4
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) qrange (s2);
(%o5) [5.385, 5.572499999999998, 6.022500000000001, 
                            8.729999999999999, 6.649999999999999]

See also function quantile.

Categories:  Package descriptive

Function: mean_deviation (list)
Function: mean_deviation (matrix)

The mean deviation, defined as

                     n
                   ====
               1   \          _
               -    >    |x - x|
               n   /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean_deviation (s1);
                               51
(%o3)                          --
                               20
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) mean_deviation (s2);
(%o5) [3.287959999999999, 3.075342, 3.23907, 4.715664000000001, 
                                               4.028546000000002]

See also function mean.

Categories:  Package descriptive

Function: median_deviation (list)
Function: median_deviation (matrix)

The median deviation, defined as

                 n
               ====
           1   \
           -    >    |x - med|
           n   /       i
               ====
               i = 1

where med is the median of list.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median_deviation (s1);
                                5
(%o3)                           -
                                2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median_deviation (s2);
(%o5)           [2.75, 2.755, 3.08, 4.315, 3.31]

See also function mean.

Categories:  Package descriptive

Function: harmonic_mean (list)
Function: harmonic_mean (matrix)

The harmonic mean, defined as

                  n
               --------
                n
               ====
               \     1
                >    --
               /     x
               ====   i
               i = 1

Example:

(%i1) load (descriptive)$
(%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) harmonic_mean (y), numer;
(%o3)                   3.901858027632205
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) harmonic_mean (s2);
(%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745, 
                            13.44199028193692, 13.01439145898509]

See also functions mean and geometric_mean.

Categories:  Package descriptive

Function: geometric_mean (list)
Function: geometric_mean (matrix)

The geometric mean, defined as

                 /  n      \ 1/n
                 | /===\   |
                 |  ! !    |
                 |  ! !  x |
                 |  ! !   i|
                 | i = 1   |
                 \         /

Example:

(%i1) load (descriptive)$
(%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) geometric_mean (y), numer;
(%o3)                   4.454845412337012
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) geometric_mean (s2);
(%o5) [8.82476274347979, 9.22652604739361, 10.0442675714889, 
                            14.61274126349021, 13.96184163444275]

See also functions mean and harmonic_mean.

Categories:  Package descriptive

Function: kurtosis (list)
Function: kurtosis (matrix)

The kurtosis coefficient, defined as

                    n
                  ====
            1     \          _ 4
           ----    >    (x - x)  - 3
              4   /       i
           n s    ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) kurtosis (s1), numer;
(%o3)                  - 1.273247946514421
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) kurtosis (s2);
(%o5) [- .2715445622195385, 0.119998784429451, 
     - .4275233490482861, - .6405361979019522, - .4952382132352935]

See also functions mean, var and skewness.

Categories:  Package descriptive

Function: skewness (list)
Function: skewness (matrix)

The skewness coefficient, defined as

                    n
                  ====
            1     \          _ 3
           ----    >    (x - x)
              3   /       i
           n s    ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) skewness (s1), numer;
(%o3)                  .009196180476450424
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) skewness (s2);
(%o5) [.1580509020000978, .2926379232061854, .09242174416107717, 
                            .2059984348148687, .2142520248890831]

See also functions mean, var and kurtosis.

Categories:  Package descriptive

Function: pearson_skewness (list)
Function: pearson_skewness (matrix)

Pearson's skewness coefficient, defined as

                _
             3 (x - med)
             -----------
                  s

where med is the median of list.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) pearson_skewness (s1), numer;
(%o3)                   .2159484029093895
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) pearson_skewness (s2);
(%o5) [- .08019976629211892, .2357036272952649, 
         .1050904062491204, .1245042340592368, .4464181795804519]

See also functions mean, var and median.

Categories:  Package descriptive

Function: quartile_skewness (list)
Function: quartile_skewness (matrix)

The quartile skewness coefficient, defined as

               c    - 2 c    + c
                3/4      1/2    1/4
               --------------------
                   c    - c
                    3/4    1/4

where c_p is the p-quantile of sample list.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) quartile_skewness (s1), numer;
(%o3)                  .04761904761904762
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quartile_skewness (s2);
(%o5) [- 0.0408542246982353, .1467025572005382, 
       0.0336239103362392, .03780068728522298, .2105263157894735]

See also function quantile.

Categories:  Package descriptive

Function: cov (matrix)

The covariance matrix of the multivariate sample, defined as

              n
             ====
          1  \           _        _
      S = -   >    (X  - X) (X  - X)'
          n  /       j        j
             ====
             j = 1

where X_j is the j-th row of the sample matrix.

Example:

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) fpprintprec : 7$  /* change precision for pretty output */
(%i4) cov (s2);
      [ 17.22191  13.61811  14.37217  19.39624  15.42162 ]
      [                                                  ]
      [ 13.61811  14.98774  13.30448  15.15834  14.9711  ]
      [                                                  ]
(%o4) [ 14.37217  13.30448  15.47573  17.32544  16.18171 ]
      [                                                  ]
      [ 19.39624  15.15834  17.32544  32.17651  20.44685 ]
      [                                                  ]
      [ 15.42162  14.9711   16.18171  20.44685  24.42308 ]

See also function cov1.

Categories:  Package descriptive

Function: cov1 (matrix)

The covariance matrix of the multivariate sample, defined as

              n
             ====
         1   \           _        _
   S  = ---   >    (X  - X) (X  - X)'
    1   n-1  /       j        j
             ====
             j = 1

where X_j is the j-th row of the sample matrix.

Example:

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) fpprintprec : 7$ /* change precision for pretty output */
(%i4) cov1 (s2);
      [ 17.39587  13.75567  14.51734  19.59216  15.5774  ]
      [                                                  ]
      [ 13.75567  15.13913  13.43887  15.31145  15.12232 ]
      [                                                  ]
(%o4) [ 14.51734  13.43887  15.63205  17.50044  16.34516 ]
      [                                                  ]
      [ 19.59216  15.31145  17.50044  32.50153  20.65338 ]
      [                                                  ]
      [ 15.5774   15.12232  16.34516  20.65338  24.66977 ]

See also function cov.

Categories:  Package descriptive

Function: global_variances (matrix)
Function: global_variances (matrix, options ...)

Function global_variances returns a list of global variance measures:

where p is the dimension of the multivariate random variable and S_1 the covariance matrix returned by cov1.

Option:

Example:

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) global_variances (s2);
(%o3) [105.338342060606, 21.06766841212119, 12874.34690469686, 
         113.4651792608501, 6.636590811800795, 2.576158149609762]

Calculate the global_variances from the covariance matrix.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) s : cov1 (s2)$
(%i4) global_variances (s, data=false);
(%o4) [105.338342060606, 21.06766841212119, 12874.34690469686, 
         113.4651792608501, 6.636590811800795, 2.576158149609762]

See also cov and cov1.

Categories:  Package descriptive

Function: cor (matrix)
Function: cor (matrix, logical_value)

The correlation matrix of the multivariate sample.

Option:

Example:

(%i1) load (descriptive)$
(%i2) fpprintprec : 7 $
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) cor (s2);
      [   1.0     .8476339  .8803515  .8239624  .7519506 ]
      [                                                  ]
      [ .8476339    1.0     .8735834  .6902622  0.782502 ]
      [                                                  ]
(%o4) [ .8803515  .8735834    1.0     .7764065  .8323358 ]
      [                                                  ]
      [ .8239624  .6902622  .7764065    1.0     .7293848 ]
      [                                                  ]
      [ .7519506  0.782502  .8323358  .7293848    1.0    ]

Calculate de correlation matrix from the covariance matrix.

(%i1) load (descriptive)$
(%i2) fpprintprec : 7 $
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) s : cov1 (s2)$
(%i5) cor (s, data=false); /* this is faster */
      [   1.0     .8476339  .8803515  .8239624  .7519506 ]
      [                                                  ]
      [ .8476339    1.0     .8735834  .6902622  0.782502 ]
      [                                                  ]
(%o5) [ .8803515  .8735834    1.0     .7764065  .8323358 ]
      [                                                  ]
      [ .8239624  .6902622  .7764065    1.0     .7293848 ]
      [                                                  ]
      [ .7519506  0.782502  .8323358  .7293848    1.0    ]

See also cov and cov1.

Categories:  Package descriptive

Function: list_correlations (matrix)
Function: list_correlations (matrix, options ...)

Function list_correlations returns a list of correlation measures:

Option:

Example:

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) z : list_correlations (s2)$
(%i4) fpprintprec : 5$ /* for pretty output */
(%i5) z[1];  /* precision matrix */
      [  .38486   - .13856   - .15626   - .10239    .031179  ]
      [                                                      ]
      [ - .13856   .34107    - .15233    .038447   - .052842 ]
      [                                                      ]
(%o5) [ - .15626  - .15233    .47296    - .024816  - .10054  ]
      [                                                      ]
      [ - .10239   .038447   - .024816   .10937    - .034033 ]
      [                                                      ]
      [ .031179   - .052842  - .10054   - .034033   .14834   ]
(%i6) z[2];  /* multiple correlation vector */
(%o6)      [.85063, .80634, .86474, .71867, .72675]
(%i7) z[3];  /* partial correlation matrix */
      [  - 1.0     .38244   .36627   .49908   - .13049 ]
      [                                                ]
      [  .38244    - 1.0    .37927  - .19907   .23492  ]
      [                                                ]
(%o7) [  .36627    .37927   - 1.0    .10911    .37956  ]
      [                                                ]
      [  .49908   - .19907  .10911   - 1.0     .26719  ]
      [                                                ]
      [ - .13049   .23492   .37956   .26719    - 1.0   ]

See also cov and cov1.

Categories:  Package descriptive

Function: principal_components (matrix)
Function: principal_components (matrix, options ...)

Calculates the principal componentes of a multivariate sample. Principal components are used in multivariate statistical analysis to reduce the dimensionality of the sample.

Option:

The output of function principal_components is a list with the following results:

Examples:

In this sample, the first component explains 83.13 per cent of total variance.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) fpprintprec:4 $
(%i4) res: principal_components(s2);
0 errors, 0 warnings
(%o4) [[87.57, 8.753, 5.515, 1.889, 1.613], 
[83.13, 8.31, 5.235, 1.793, 1.531], 
[ .4149  .03379   - .4757  - 0.581   - .5126 ]
[                                            ]
[ 0.369  - .3657  - .4298   .7237    - .1469 ]
[                                            ]
[ .3959  - .2178  - .2181  - .2749    .8201  ]]
[                                            ]
[ .5548   .7744    .1857    .2319    .06498  ]
[                                            ]
[ .4765  - .4669   0.712   - .09605  - .1969 ]
(%i5) /* accumulated percentages  */
    block([ap: copy(res[2])],
      for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1],
      ap);
(%o5)                 [83.13, 91.44, 96.68, 98.47, 100.0]
(%i6) /* sample dimension */
      p: length(first(res));
(%o6)                                  5
(%i7) /* plot percentages to select number of
         principal components for further work */
     draw2d(
        fill_density = 0.2,
        apply(bars, makelist([k, res[2][k], 1/2], k, p)),
        points_joined = true,
        point_type    = filled_circle,
        point_size    = 3,
        points(makelist([k, res[2][k]], k, p)),
        xlabel = "Variances",
        ylabel = "Percentages",
        xtics  = setify(makelist([concat("PC",k),k], k, p))) $

In case de covariance matrix is known, it can be passed to the function, but option data=false must be used.

(%i1) load (descriptive)$
(%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]);
                                [  1   - 2  0 ]
                                [             ]
(%o2)                           [ - 2   5   0 ]
                                [             ]
                                [  0    0   2 ]
(%i3) fpprintprec:4 $
(%i4) /* the argumment is a covariance matrix */
      res: principal_components(S, data=false);
0 errors, 0 warnings
                                                  [ - .3827  0.0  .9239 ]
                                                  [                     ]
(%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [  .9239   0.0  .3827 ]]
                                                  [                     ]
                                                  [   0.0    1.0   0.0  ]
(%i5) /* transformation to get the principal components
         from original records */
      matrix([a1,b2,c3],[a2,b2,c2]).last(res);
             [ .9239 b2 - .3827 a1  1.0 c3  .3827 b2 + .9239 a1 ]
(%o5)        [                                                  ]
             [ .9239 b2 - .3827 a2  1.0 c2  .3827 b2 + .9239 a2 ]

Categories:  Package descriptive


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.4 Functions and Variables for statistical graphs

Function: barsplot (data1, data2, …, option_1, option_2, …)
Function: barsplot_description (…)

Plots bars diagrams for discrete statistical variables, both for one or multiple samples.

data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.

Available options are:

Function barsplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxbarsplot for creating embedded histograms in interfaces wxMaxima and iMaxima.

Examples:

Univariate sample in matrix form. Absolute frequencies.

(%i1) load (descriptive)$
(%i2) m : read_matrix (file_search ("biomed.data"))$
(%i3) barsplot(
        col(m,2),
        title        = "Ages",
        xlabel       = "years",
        box_width    = 1/2,
        fill_density = 3/4)$

Two samples of different sizes, with relative frequencies and user declared colors.

(%i1) load (descriptive)$
(%i2) l1:makelist(random(10),k,1,50)$
(%i3) l2:makelist(random(10),k,1,100)$
(%i4) barsplot(
        l1,l2,
        box_width    = 1,
        fill_density = 1,
        bars_colors  = [black, grey],
        frequency = relative,
        sample_keys = ["A", "B"])$

Four non numeric samples of equal size.

(%i1) load (descriptive)$
(%i2) barsplot(
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        title  = "Asking for something to four groups",
        ylabel = "# of individuals",
        groups_gap   = 3,
        fill_density = 0.5,
        ordering     = ordergreatp)$

Stacked bars.

(%i1) load (descriptive)$
(%i2) barsplot(
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        makelist([Yes, No, Maybe][random(3)+1],k,1,50),
        title  = "Asking for something to four groups",
        ylabel = "# of individuals",
        grouping     = stacked,,
        fill_density = 0.5,
        ordering     = ordergreatp)$

barsplot in a multiplot context.

(%i1) load (descriptive)$
(%i2) l1:makelist(random(10),k,1,50)$
(%i3) l2:makelist(random(10),k,1,100)$
(%i4) bp1 : 
        barsplot_description(
         l1,
         box_width = 1,
         fill_density = 0.5,
         bars_colors = [blue],
         frequency = relative)$
(%i5) bp2 : 
        barsplot_description(
         l2,
         box_width = 1,
         fill_density = 0.5,
         bars_colors = [red],
         frequency = relative)$
(%i6) draw(gr2d(bp1), gr2d(bp2))$

For bars diagrams related options, see bars of package draw. See also functions histogram and piechart.

Categories:  Package descriptive Plotting

Function: boxplot (data)
Function: boxplot (data, option_1, option_2, …)
Function: boxplot_description (…)

This function plots box-and-whishker diagrams. Argument data can be a list, which is not of great interest, since these diagrams are mainly used for comparing different samples, or a matrix, so it is possible to compare two or more components of a multivariate statistical variable. But it is also allowed data to be a list of samples with possible different sample sizes, in fact this is the only function in package descriptive that admits this type of data structure.

Available options are:

Function boxplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxboxplot for creating embedded histograms in interfaces wxMaxima and iMaxima.

Examples:

Box-and-whishker diagram from a multivariate sample.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix(file_search("wind.data"))$
(%i3) boxplot(s2,
        box_width  = 0.2,
        title      = "Windspeed in knots",
        xlabel     = "Stations",
        color      = red,
        line_width = 2)$

Box-and-whishker diagram from three samples of different sizes.

(%i1) load (descriptive)$
(%i2) A :
       [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
        [8, 10, 7, 9, 12, 8, 10],
        [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
(%i3) boxplot (A, box_orientation = horizontal)$

Categories:  Package descriptive Plotting

Function: histogram (list)
Function: histogram (list, option_1, option_2, …)
Function: histogram (one_column_matrix)
Function: histogram (one_column_matrix, option_1, option_2, …)
Function: histogram (one_row_matrix)
Function: histogram (one_row_matrix, option_1, option_2, …)
Function: histogram_description (…)

This function plots an histogram from a continuous sample. Sample data must be stored in a list of numbers or a one dimensional matrix.

Available options are:

Function histogram_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxhistogram for creating embedded histograms in interfaces wxMaxima and iMaxima.

Examples:

A simple with eight classes:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) histogram (
           s1,
           nclasses     = 8,
           title        = "pi digits",
           xlabel       = "digits",
           ylabel       = "Absolute frequency",
           fill_color   = grey,
           fill_density = 0.6)$

Setting the limits of the histogram to -2 and 12, with 3 classes. Also, we introduce predefined tics:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) histogram (
           s1,
           nclasses     = [-2,12,3],
           htics        = ["A", "B", "C"],
           terminal     = png,
           fill_color   = "#23afa0",
           fill_density = 0.6)$

We make use of histogram_description for setting the xrange and adding an explicit curve into the scene:

(%i1) load (descriptive)$
(%i2) ( load("distrib"),
        m: 14, s: 2,
        s2: random_normal(m, s, 1000) ) $
(%i3) draw2d(
        grid   = true,
        xrange = [5, 25],
        histogram_description(
          s2,
          nclasses     = 9,
          frequency    = density,
          fill_density = 0.5),
        explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$

Categories:  Package descriptive Plotting

Function: piechart (list)
Function: piechart (list, option_1, option_2, …)
Function: piechart (one_column_matrix)
Function: piechart (one_column_matrix, option_1, option_2, …)
Function: piechart (one_row_matrix)
Function: piechart (one_row_matrix, option_1, option_2, …)
Function: piechart_description (…)

Similar to barsplot, but plots sectors instead of rectangles.

Available options are:

Function piechart_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxpiechart for creating embedded histograms in interfaces wxMaxima and iMaxima.

Example:

(%i1) load (descriptive)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) piechart(
        s1,
        xrange  = [-1.1, 1.3],
        yrange  = [-1.1, 1.1],
        title   = "Digit frequencies in pi")$

See also function barsplot.

Categories:  Package descriptive Plotting

Function: scatterplot (list)
Function: scatterplot (list, option_1, option_2, …)
Function: scatterplot (matrix)
Function: scatterplot (matrix, option_1, option_2, …)
Function: scatterplot_description (…)

Plots scatter diagrams both for univariate (list) and multivariate (matrix) samples.

Available options are the same admitted by histogram.

Function scatterplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxscatterplot for creating embedded histograms in interfaces wxMaxima and iMaxima.

Examples:

Univariate scatter diagram from a simulated Gaussian sample.

(%i1) load (descriptive)$
(%i2) load (distrib)$
(%i3) scatterplot(
        random_normal(0,1,200),
        xaxis      = true,
        point_size = 2,
        dimensions = [600,150])$

Two dimensional scatter plot.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot(
       submatrix(s2, 1,2,3),
       title      = "Data from stations #4 and #5",
       point_type = diamant,
       point_size = 2,
       color      = blue)$

Three dimensional scatter plot.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$

Five dimensional scatter plot, with five classes histograms.

(%i1) load (descriptive)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot(
        s2,
        nclasses     = 5,
        frequency    = relative,
        fill_color   = blue,
        fill_density = 0.3,
        xtics        = 5)$

For plotting isolated or line-joined points in two and three dimensions, see points. See also histogram.

Categories:  Package descriptive Plotting

Function: starplot (data1, data2, …, option_1, option_2, …)
Function: starplot_description (…)

Plots star diagrams for discrete statistical variables, both for one or multiple samples.

data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.

Available options are:

Function starplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects. There is also a function wxstarplot for creating embedded histograms in interfaces wxMaxima and iMaxima.

Example:

Plot based on absolute frequencies. Location and radius defined by the user.

(%i1) load (descriptive)$
(%i2) l1: makelist(random(10),k,1,50)$
(%i3) l2: makelist(random(10),k,1,200)$
(%i4) starplot(
        l1, l2,
        stars_colors = [blue,red],
        sample_keys = ["1st sample", "2nd sample"],
        star_center = [1,2],
        star_radius = 4,
        proportional_axes = xy,
        line_width = 2 ) $ 

Categories:  Package descriptive Plotting

Function: stemplot (data)
Function: stemplot (data, option)

Plots stem and leaf diagrams.

Unique available option is:

Example:

(%i1) load (descriptive)$
(%i2) load(distrib)$
(%i3) stemplot(
        random_normal(15, 6, 100),
        leaf_unit = 0.1);
-5|4
 0|37
 1|7
 3|6
 4|4
 5|4
 6|57
 7|0149
 8|3
 9|1334588
10|07888
11|01144467789
12|12566889
13|24778
14|047
15|223458
16|4
17|11557
18|000247
19|4467799
20|00
21|1
22|2335
23|01457
24|12356
25|455
27|79
key: 6|3 =  6.3
(%o3)                  done

Categories:  Package descriptive Plotting


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Charlie & on July, 6 2015 using texi2html 1.76.