Chapter 6. Performance

 

Scilab—The fastest thing from France since Django Reinhardt.

  cls
Table of Contents
High-Level Operations
Extending Scilab
Building an Optimized Scilab

In this chapter we discuss how expressions can be written to execute more quickly while doing the same thing. Scilab is powerful and flexible, therefore there are plenty of things one can do to speed up function execution. On the downside there are a lot of things the can be done the wrong way, slowing down the execution to a crawl.

In the first part we focus on high-level operations that are executed fast. The main class to name here are vectorized operations. Another class are all functions that are constructing or manipulating vectors or matrices as a whole. The second part of this chapter deals with the extension of Scilab through compiled functions for the sake of increased execution speed.

High-Level Operations

Not using vectorized operations in Scilab is the main source for suffering from a slow code. Here we present performance comparisons between different Scilab constructs that are semantically equivalent.

Vectorized Operations

The key to achieve a high speed with Scilab is to avoid the interpreter and instead make use of the built in vectorized operations. Let us explain that with a simple example.

Say we want to calculate the standard scalar product s of two vectors a and b which have the same length n. Naive as we are, we start with

    s = 0                               // line 1
    i = 1                               // line 2
    while i <= n                        // line 3
        s = s + a(i) * b(i)             // line 4
        i = i + 1                       // line 5
    end                                 // line 6

Here Scilab re-interprets lines 3 to 5 in every round-trip, which in total is n times. This results in slow execution. The example utilizes no vectorization at all. On the other hand it uses only very little memory memory as no vectors have to be stored.

The first step to get some vectorization is to replace the while with a for loop.

    s = 0                               // line 1
    for i = 1:n                         // line 2
        s = s + a(i) * b(i)             // line 3
    end                                 // line 4

Line 2 is only interpreted once; the vector i = 1:n is set up and the loop body, line 3 is threaded over it. So, only line 3 is re-evaluated in each round trip.

OK, it is time for a really fast vector operation. In the previous examples the expression in the loop body has not been modified, but we can replace it with the element wise multiplication operator .* and replace the loop with the built-in sum function. (See also the section called Functions Operating on a Matrix as a Whole.)

    s = sum(a .* b) 

One obvious advantage is that we have a one-liner now. Is that as good as it can get? No, the standard scalar product is not only a built-in function it is also an operator:

    s = a * b'

We summarize the timing results of a PII/330 Linux-system in Table 6-1.

Table 6-1. Comparison of various vectorization levels

constructMFLOPS
while0.005
for0.008
.* and sum 1.7
*2.8

In other words the speed ratio is 1:1.6:330:550. Of course the numbers vary from system to system, but the general trend is clear.

Avoiding Indexing

Accessing a vector- or matrix-element via indexing is slow. Sometimes the index cannot be avoided, but there are cases where it can. Compare

    for i = 1:n
        v(i) = i
    end

and

    v = []
    for i = 1:n
        v = [v, i]
    end

The second snippet is not only faster, but in some circumstances may be clearer. Again there is a built-in operator that does the same job at lightning speed, the colon :, which is described in detail in the section called Vector Generation.

    v = 1:n

The speed ratio is approximately 1:1.5:5000.

In the next example, Example 6-1, the functions actually try to do something useful: they mirror a matrix along its columns or rows. We show different implementations of mirrorN that all do the same job, but utilize more and more of Scilab's vector power with increasing function index N.

Example 6-1. Variants of a matrix mirror function

        function b = mirror1(a, dir)
        // mirror matrix a along its
        // rows, dir = 'r' (horizontal)
        // or along its columns, dir = 'c' (vertical)

        [rows, cols] = size(a)
        select dir
        case 'r' then
            for j = 1 : cols
                for i = 1 : rows
                    b(i, j) = a(rows - i + 1, j)
                end
            end
        case 'c' then
            for j = 1 : cols
                for i = 1 : rows
                    b(i, j) = a(i, cols - j + 1)
                end
            end
        else
            error("dir must be ''r'' or ''c''")
        end


        function b = mirror2(a, dir)
        // same as mirror 1

        [rows, cols] = size(a)
        b = []
        select dir
        case 'r' then
            for i = rows : -1 : 1
                b = [b; a(i, :)]
            end
        case 'c' then
            for i = cols : -1 : 1
                b = [b, a(:, i)]
            end
        else
            error("dir must be ''r'' or ''c''")
        end


        function b = mirror3(a, dir)
        // same as mirror 1

        [rows, cols] = size(a)
        select dir
        case 'r' then
            i = rows : -1 : 1
            b = a(i, :)
        case 'c' then
            i = cols : -1 : 1
            b = a(:, i)
        else
            error("dir must be ''r'' or ''c''")
        end


        function b = mirror4(a, dir)
        // same as mirror 1

        select dir
        case 'r' then
            b = a($:-1:1, :)
        case 'c' then
            b = a(:, $:-1:1)
        else
            error("dir must be ''r'' or ''c''");
        end
    

Besides the performance issue discussed here the functions in Example 6-1 demonstrate how much expressiveness Scilab has got. The solutions look quite different, though they yield the same results. The benchmark results of all functions are plotted in Figure 6-1, and the discussion is found in the section called Comparison of the Link Overhead. In brief the functions get faster from top to bottom, function mirror1 is the slowest, mirror4 the fastest.

$-Constant

The last of the examples, mirror4, introduces a new symbol, the "highest index", $ along a given direction. The dollar sign is only defined in the index expression of a matrix. As 1 always is the lowest index, $ always is the highest. Please note that the dollar represents a constant, but that this constant varies across the expression! More precisely it varies with each matrix dimension. Let us make things clear by stating an example.

    -->m = [ 11 12 13; 21 22 23 ];

    -->m(2, $)
     ans  =
        23.

    -->m($, $)
     ans  =
        23.

    -->m(:, $/2 + 1)
     ans  =
    !   12. !
    !   22. !

Flattened Matrix Representation

The $ sign leads us to the flattened or vector-like representation of a matrix, if we rewrite the third line of the above example to

    -->m(1:$)[1]
     ans  =
    !   11. !
    !   21. !
    !   12. !
    !   22. !
    !   13. !
    !   23. !

In general a nxm matrix mat can be accessed in three ways:

  • as a unit by saying mat,

  • by referencing its elements according to their row and column with mat(i, j), or

  • via indexing into the flattened form mat(i).

The following equivalence holds: mat(i, j) == mat(i + (j - 1)*n). Scilab follows Fortran in its way to store matrices in column-major form. See also the discussion of the function matrix in the section called Functions Operating on a Matrix as a Whole.

Built-In Vector-/Matrix-Functions

Functions discussed in this section:

  • Colon Operator ":"

  • linspace

  • logspace

  • zeros

  • ones

  • eye

  • diag

  • rand

  • find

  • max

  • min

  • and

  • or

  • sum

  • prod

  • sort

  • size

  • matrix

There are many built-in functions that work on vectors or matrices. Knowing what functions are available is important to avoid coding the same functionality with slow iterative expressions.

For further information about contemporary techniques of processing matrices with computers, the classical work "Matrix Computations" [golub:1996] is recommended.

Vector Generation

There are two built-in functions and one operator to generate a row-vector of numbers.

Vector Generation Functions

Colon Operator ":"

This syntax of the colon operator is

                initial [: increment] : final
            

with a default increment of +1. To produce the equivalent piece of Scilab code, we write

                x = initial
                v = [ x ]
                while x <= final - increment
                    x = x + increment
                    v = [v, x]
                end
            

where v is the result. Note that the last element of the result always will be smaller or equal to the value final.

linspace

The syntax of linspace is

                linspace(initial, final [, length])
            

using a default of 100 for length. linspace returns a row-vector with length entries, which divide the interval (initial, final) in equal-length sub-intervals. Both endpoints, i.e. initial and final are always included.

logspace

logspace works much like linspace, and the following relation holds

                logspace(initial, final) == 10^linspace(initial, final)
            

After having discussed the most important vector generation functions, we now turn to functions that build a whole matrix at once.

Whole Matrix Construction

All of the functions shown in this section are capable to produce arbitrary matrices including the boundary cases of row- and column-vectors.

Matrix Generation Functions

zeros

As the name suggests this function produces a matrix filled with zeros. The two possible instantiations are with two scalar arguments

                n = 2
                m = 5
                mat = zeros(n, m)
            

or with one matrix argument

                mat1 = [ 4 2; ..
                         4 5; ..
                         3 5 ]
                mat2 = zeros(mat1)
            

The first form produces the n times m matrix mat made up from zeros, whereas the second builds the matrix mat2 which has the same shape as mat1, and is also consisting of zeros.

CautionSingle scalar argument to zeros
 

In the case of a single scalar argument zeros returns a 1x1 matrix, the sole element being a zero.

Furthermore, note that

                zeros()
            

is not allowed.

ones

The command is functionally equivalent to zeros. Instead of returning a matrix filled with 0.0 as zeros does, ones returns a matrix filled with 1.0. The only difference from the caller's point of view is a third form which is permitted for ones, and that is calling the function without any arguments:

                -->ones()
                 ans  =
                    1.  
            
eye

The eye function produces a generalized identity matrix, i.e. a matrix with all elements a(i, j) == 0.0 for i ~= j, and 1.0 for i == j. This command is functionally equivalent to zeros. The only extension is the usage without any argument, where the result automatically takes over the dimensions of the matrix in the subexpression it is used.

                -->a=[2 3 4 3; 4 2 6 7; 8 2 7 4]
                 a  =
                !   2.    3.    4.    3. !
                !   4.    2.    6.    7. !
                !   8.    2.    7.    4. !

                -->a - 2*eye()
                 ans  =
                !   0.    3.    4.    3. !
                !   4.    0.    6.    7. !
                !   8.    2.    5.    4. !
            
diag

Function diag constructs a diagonal matrix mat from the vector v, with v being mat's main diagonal.

                -->diag(2:2:8)
                 ans  =
                !   2.    0.    0.    0. !
                !   0.    4.    0.    0. !
                !   0.    0.    6.    0. !
                !   0.    0.    0.    8. !
            

The second form of the diag function

                diag(v, k)
            

constructs a matrix that has its diagonal k positions away from the main diagonal, the diagonal being made up from v again. Therefore, diag(v) is the special case of diag(v, 0). A positive k denotes diagonals above, a negative k diagonals below the main diagonal.

                -->diag([1 1 1 1]) + diag([2 2 2], 1) + diag([-2 -2 -2], -1)
                 ans  =
                !   1.    2.    0.    0. !
                ! - 2.    1.    2.    0. !
                !   0.  - 2.    1.    2. !
                !   0.    0.  - 2.    1. !
            
rand

The rand function generates pseudo-random scalars and matrices. Again the function shares its two fundamental forms with zeros. Moreover, the distribution of the numbers can be chosen from 'uniform' which is the default, and 'normal'. The generator's seed is set and queried with

                rand('seed', new_seed)
            

and

                current_seed = rand('seed')
            

Functions Operating on a Matrix as a Whole

find

In our opinion one of the most useful functions in the group of whole matrix functions is find. It takes a boolean expression of matrices (i.e. an expression which evaluates to a boolean matrix) as argument, and in form

                index = find(expr)
            

returns the indices of the array elements that evaluate to true, i.e. %t in a vector. See also the section called Flattened Matrix Representation.

In the form

                [rowidx, colidx] = find(expr)
            

it returns the row- and column-index vectors separately. Here is a complete example.

                -->a = [ 1 -4  3; 6  2 10 ]
                 a  =
                !   1.  - 4.    3.  !
                !   6.    2.    10. !

                -->index = find( a < 5 )
                 index  =
                !   1.    3.    4.    5. !

                -->a(index)
                 ans  =
                !   1. !
                ! - 4. !
                !   2. !
                !   3. !

                -->[rowidx, colidx] = find( a < 5 )
                 colidx  =
                !   1.    2.    2.    3. !
                 rowidx  =
                !   1.    1.    2.    1. !
            

The expressions expr can be arbitrarily complex, and they are not limited to a single matrix.

                -->b = [1 2 3; 4 5 6]
                 b  =
                !   1.    2.    3. !
                !   4.    5.    6. !

                -->a < 5
                 ans  =
                ! T T T !
                ! F T F !

                -->abs(b) >= 4
                 ans  =
                ! F F F !
                ! T T T !

                -->a < 5 & abs(b) >= 4
                 ans  =
                ! F F F !
                ! F T F !

                -->find( a < 5 & abs(b) >= 4 )
                 ans  =
                    4.  
            

Last but not least find is perfectly OK on the left-hand side of an assignment. So, replacing all odd elements in a with 0 simply is

                -->a( find(modulo(a, 2) == 1) ) = 0
                 a  =
                !   0.  - 4.    0.  !
                !   6.    2.    10. !
            

To get the number of elements that match a criterion, just apply size(idxvec, '*') to the index vector idxvec of the find operation.

max, min

Searching the smallest or the largest entry in a matrix are so common that Scilab has separate functions for these tasks. We discuss max only as min behaves similarly.

To get the largest value saying

                max_val = max(a)
            

is enough. The alternate form

                -->[max_val, index] = max(a)
                 index  =
                !   2.    3. !
                 max_val  =
                    10.  
            

returns the position of the maximum element, too. The form of the index vector is the same as for size, i.e. [row-index, column-index]. Speaking of size, max has the forms max(mat, 'r'), and max(mat, 'c'), too.

                -->[max_val, rowidx] = max(b, 'r')
                 rowidx  =
                !   2.    2.    2. !
                 max_val  =
                !   4.    5.    6. !

                -->[max_val, colidx] = max(b, 'c')
                 colidx  =
                !   3. !
                !   3. !
                 max_val  =
                !   3. !
                !   6. !
            

These forms return the maximum values of each row or column along with the respective indices of the elements' rows or columns.

The third way of using max is with more than one matrix or scalar as arguments. All the matrices must be compatible, scalars are expanded to the full matrix size, like scalmat = scal * ones(mat). The return matrix holds the largest elements from all argument matrices.

                -->max(a, b, 3)
                 ans  =
                !   3.    3.    3.  !
                !   6.    5.    10. !
            
and, or

FIXME: write it!

sum, prod

FIXME: write it!

sort

FIXME: write it!

size

The size function handles all shape inquiries. It comes in four different guises. Assuming that mat is a scalar or matrix, size can be used as all-info-at-once function as in

                [rows, cols] = size(mat)
            

as row-only, or column-only function

                rows = size(mat, 'r')
                cols = size(mat, 'c')
            

and finally as totaling function

                elements = size(mat, '*')
            
matrix

A (hyper-)matrix can be reshaped with the matrix command. To keep things simple we demonstrate matrix with a 6x2-matrix.

                -->a = [1:6; 7:12]
                 a  =
                !   1.    2.    3.    4.     5.     6.  !
                !   7.    8.    9.    10.    11.    12. !

                -->matrix(a, 3, 4)
                 ans  =
                !   1.    8.    4.     11. !
                !   7.    3.    10.    6.  !
                !   2.    9.    5.     12. !

                -->matrix(a, 4, 3)
                 ans  =
                !   1.    3.     5.  !
                !   7.    9.     11. !
                !   2.    4.     6.  !
                !   8.    10.    12. !
            

In contrary to the Fortran-9x function RESHAPE, matrix neither allows padding, nor truncation of the reshaped matrix. Put another way, for a m times n matrix a the reshaped dimensions p, and q must obey m * n = p * q.

matrix works by columnwise "filling" the contents of the original matrix a into an empty template of a p times q matrix. (See also the section called Flattened Matrix Representation.) If this a too hard to imagine, the second way to think of it is imagining a as a column vector of dimensions (m * n) times 1 that is broken down column by column into a p times q matrix. In fact this is not pure imagination as in many situations there is the identity a(i, j) == a(i + n*(j - 1)) holds.

                -->a(2,4)
                 ans  =
                    10.  

                -->a(8)
                 ans  =
                    10.  
            

Moreover, the usual vector subscripting can be used to a matrix.

                -->a(:)
                 ans  =
                !   1.  !
                !   7.  !
                !   2.  !
                !   8.  !
                !   3.  !
                !   9.  !
                !   4.  !
                !   10. !
                !   5.  !
                !   11. !
                !   6.  !
                !   12. !
            

Evaluation Of Polynomials

Once upon a time there was a little Scilab newbie who coded an interface to the optim routine to make polynomial approximations easier. On the way an evaluate function for polynomials had to be written. The author was very proud of herself because she knew the Right Thing(tm) to do in this case namely the Horner algorithm. Actually she has come up with two implementations!

Example 6-2. Naive functions to evaluate a polynomial

        function yv = peval1(cv, xv)
        // Evaluate polynomial given by the vector its coefficients cv
        // in ascending order, i.e. cv = [p q r]  -->  p + q*x + r*x^2
        // at all points listed in vector xv and return the
        // resulting vector.

        yv = cv(1) * ones(xv)
        px = xv
        for c = cv(2 : $)
            yv = yv + c * px
            px = px .* xv
        end


        function yv = peval2(cv, xv)
        // same as peval1

        yv = cv($);
        for i = length(cv)-1 : -1 : 1
            yv = yv .* xv + cv(i)
        end
    

So what is wrong with that? This code looks OK and it does the job. But from the performance viewpoint it is not optimal! The fact that Scilab offers a separate type for polynomials has been ignored. Even if we are forced to supply an interface with the coefficients stored in vectors the built-in function freq is preferable.

Example 6-3. Less naive functions to evaluate a polynomial

        function yv = peval3(cv, xv)
        // same as peval1, using horner()

        p = poly(cv, 't', 'coeff')
        yv = horner(p, xv)


        function yv = peval4(cv, xv)
        // same as peval1, using freq()
        // The return value yv _always_ is a row-vector.

        p = poly(cv, 't', 'coeff')
        unity = poly(1, 't', 'coeff')
        yv = freq(p, unity, xv)
    

Table 6-2 shows the speed ratios (each line is normalized separately) for a polynomial of degree 4 that we got on a P5/166 Linux system.

Table 6-2. Performance comparison of different polynomial evaluation routines

evaluationspeval1peval2peval3peval4
53.54.217.0
10001.42.512.5

If we now decide to change our interface to take Scilab's built-in polynomial type the evaluation with freq can again be accelerated by a factor of more than 3.

Notes

[1]

Remember that the colon operator returns a row-vector.