Scilab—The fastest thing from France since Django Reinhardt. | |
cls |
In this chapter we discuss how expressions can be written to execute more quickly while doing the same thing. Scilab is powerful and flexible, therefore there are plenty of things one can do to speed up function execution. On the downside there are a lot of things the can be done the wrong way, slowing down the execution to a crawl.
In the first part we focus on high-level operations that are executed fast. The main class to name here are vectorized operations. Another class are all functions that are constructing or manipulating vectors or matrices as a whole. The second part of this chapter deals with the extension of Scilab through compiled functions for the sake of increased execution speed.
Not using vectorized operations in Scilab is the main source for suffering from a slow code. Here we present performance comparisons between different Scilab constructs that are semantically equivalent.
The key to achieve a high speed with Scilab is to avoid the interpreter and instead make use of the built in vectorized operations. Let us explain that with a simple example.
Say we want to calculate the standard scalar product s of two vectors a and b which have the same length n. Naive as we are, we start with
s = 0 // line 1 i = 1 // line 2 while i <= n // line 3 s = s + a(i) * b(i) // line 4 i = i + 1 // line 5 end // line 6
Here Scilab re-interprets lines 3 to 5 in every round-trip, which in total is n times. This results in slow execution. The example utilizes no vectorization at all. On the other hand it uses only very little memory memory as no vectors have to be stored.
The first step to get some vectorization is to replace the while with a for loop.
s = 0 // line 1 for i = 1:n // line 2 s = s + a(i) * b(i) // line 3 end // line 4
Line 2 is only interpreted once; the vector i = 1:n is set up and the loop body, line 3 is threaded over it. So, only line 3 is re-evaluated in each round trip.
OK, it is time for a really fast vector operation. In the previous examples the expression in the loop body has not been modified, but we can replace it with the element wise multiplication operator .* and replace the loop with the built-in sum function. (See also the section called Functions Operating on a Matrix as a Whole.)
s = sum(a .* b)
One obvious advantage is that we have a one-liner now. Is that as good as it can get? No, the standard scalar product is not only a built-in function it is also an operator:
s = a * b'
We summarize the timing results of a PII/330 Linux-system in Table 6-1.
Table 6-1. Comparison of various vectorization levels
construct | MFLOPS |
---|---|
while | 0.005 |
for | 0.008 |
.* and sum | 1.7 |
* | 2.8 |
In other words the speed ratio is 1:1.6:330:550. Of course the numbers vary from system to system, but the general trend is clear.
Accessing a vector- or matrix-element via indexing is slow. Sometimes the index cannot be avoided, but there are cases where it can. Compare
for i = 1:n v(i) = i end
and
v = [] for i = 1:n v = [v, i] end
The second snippet is not only faster, but in some circumstances may be clearer. Again there is a built-in operator that does the same job at lightning speed, the colon :, which is described in detail in the section called Vector Generation.
v = 1:n
The speed ratio is approximately 1:1.5:5000.
In the next example, Example 6-1, the functions actually try to do something useful: they mirror a matrix along its columns or rows. We show different implementations of mirrorN that all do the same job, but utilize more and more of Scilab's vector power with increasing function index N.
Example 6-1. Variants of a matrix mirror function
function b = mirror1(a, dir) // mirror matrix a along its // rows, dir = 'r' (horizontal) // or along its columns, dir = 'c' (vertical) [rows, cols] = size(a) select dir case 'r' then for j = 1 : cols for i = 1 : rows b(i, j) = a(rows - i + 1, j) end end case 'c' then for j = 1 : cols for i = 1 : rows b(i, j) = a(i, cols - j + 1) end end else error("dir must be ''r'' or ''c''") end function b = mirror2(a, dir) // same as mirror 1 [rows, cols] = size(a) b = [] select dir case 'r' then for i = rows : -1 : 1 b = [b; a(i, :)] end case 'c' then for i = cols : -1 : 1 b = [b, a(:, i)] end else error("dir must be ''r'' or ''c''") end function b = mirror3(a, dir) // same as mirror 1 [rows, cols] = size(a) select dir case 'r' then i = rows : -1 : 1 b = a(i, :) case 'c' then i = cols : -1 : 1 b = a(:, i) else error("dir must be ''r'' or ''c''") end function b = mirror4(a, dir) // same as mirror 1 select dir case 'r' then b = a($:-1:1, :) case 'c' then b = a(:, $:-1:1) else error("dir must be ''r'' or ''c''"); end
Besides the performance issue discussed here the functions in Example 6-1 demonstrate how much expressiveness Scilab has got. The solutions look quite different, though they yield the same results. The benchmark results of all functions are plotted in Figure 6-1, and the discussion is found in the section called Comparison of the Link Overhead. In brief the functions get faster from top to bottom, function mirror1 is the slowest, mirror4 the fastest.
The last of the examples, mirror4, introduces a new symbol, the "highest index", $ along a given direction. The dollar sign is only defined in the index expression of a matrix. As 1 always is the lowest index, $ always is the highest. Please note that the dollar represents a constant, but that this constant varies across the expression! More precisely it varies with each matrix dimension. Let us make things clear by stating an example.
-->m = [ 11 12 13; 21 22 23 ]; -->m(2, $) ans = 23. -->m($, $) ans = 23. -->m(:, $/2 + 1) ans = ! 12. ! ! 22. !
The $ sign leads us to the flattened or vector-like representation of a matrix, if we rewrite the third line of the above example to
-->m(1:$)[1] ans = ! 11. ! ! 21. ! ! 12. ! ! 22. ! ! 13. ! ! 23. !
In general a nxm matrix mat can be accessed in three ways:
as a unit by saying mat,
by referencing its elements according to their row and column with mat(i, j), or
via indexing into the flattened form mat(i).
The following equivalence holds: mat(i, j) == mat(i + (j - 1)*n). Scilab follows Fortran in its way to store matrices in column-major form. See also the discussion of the function matrix in the section called Functions Operating on a Matrix as a Whole.
Functions discussed in this section:
Colon Operator ":"
linspace
logspace
zeros
ones
eye
diag
rand
find
max
min
and
or
sum
prod
sort
size
matrix
There are many built-in functions that work on vectors or matrices. Knowing what functions are available is important to avoid coding the same functionality with slow iterative expressions.
For further information about contemporary techniques of processing matrices with computers, the classical work "Matrix Computations" [golub:1996] is recommended.
There are two built-in functions and one operator to generate a row-vector of numbers.
Vector Generation Functions
This syntax of the colon operator is
initial [: increment] : final
with a default increment of +1. To produce the equivalent piece of Scilab code, we write
x = initial v = [ x ] while x <= final - increment x = x + increment v = [v, x] end
where v is the result. Note that the last element of the result always will be smaller or equal to the value final.
The syntax of linspace is
linspace(initial, final [, length])
using a default of 100 for length. linspace returns a row-vector with length entries, which divide the interval (initial, final) in equal-length sub-intervals. Both endpoints, i.e. initial and final are always included.
logspace works much like linspace, and the following relation holds
logspace(initial, final) == 10^linspace(initial, final)
After having discussed the most important vector generation functions, we now turn to functions that build a whole matrix at once.
All of the functions shown in this section are capable to produce arbitrary matrices including the boundary cases of row- and column-vectors.
Matrix Generation Functions
As the name suggests this function produces a matrix filled with zeros. The two possible instantiations are with two scalar arguments
n = 2 m = 5 mat = zeros(n, m)
or with one matrix argument
mat1 = [ 4 2; .. 4 5; .. 3 5 ] mat2 = zeros(mat1)
The first form produces the n times m matrix mat made up from zeros, whereas the second builds the matrix mat2 which has the same shape as mat1, and is also consisting of zeros.
Single scalar argument to zeros | |
---|---|
In the case of a single scalar argument zeros returns a 1x1 matrix, the sole element being a zero. |
Furthermore, note that
zeros()
is not allowed.
The command is functionally equivalent to zeros. Instead of returning a matrix filled with 0.0 as zeros does, ones returns a matrix filled with 1.0. The only difference from the caller's point of view is a third form which is permitted for ones, and that is calling the function without any arguments:
-->ones() ans = 1.
The eye function produces a generalized identity matrix, i.e. a matrix with all elements a(i, j) == 0.0 for i ~= j, and 1.0 for i == j. This command is functionally equivalent to zeros. The only extension is the usage without any argument, where the result automatically takes over the dimensions of the matrix in the subexpression it is used.
-->a=[2 3 4 3; 4 2 6 7; 8 2 7 4] a = ! 2. 3. 4. 3. ! ! 4. 2. 6. 7. ! ! 8. 2. 7. 4. ! -->a - 2*eye() ans = ! 0. 3. 4. 3. ! ! 4. 0. 6. 7. ! ! 8. 2. 5. 4. !
Function diag constructs a diagonal matrix mat from the vector v, with v being mat's main diagonal.
-->diag(2:2:8) ans = ! 2. 0. 0. 0. ! ! 0. 4. 0. 0. ! ! 0. 0. 6. 0. ! ! 0. 0. 0. 8. !
The second form of the diag function
diag(v, k)
constructs a matrix that has its diagonal k positions away from the main diagonal, the diagonal being made up from v again. Therefore, diag(v) is the special case of diag(v, 0). A positive k denotes diagonals above, a negative k diagonals below the main diagonal.
-->diag([1 1 1 1]) + diag([2 2 2], 1) + diag([-2 -2 -2], -1) ans = ! 1. 2. 0. 0. ! ! - 2. 1. 2. 0. ! ! 0. - 2. 1. 2. ! ! 0. 0. - 2. 1. !
The rand function generates pseudo-random scalars and matrices. Again the function shares its two fundamental forms with zeros. Moreover, the distribution of the numbers can be chosen from 'uniform' which is the default, and 'normal'. The generator's seed is set and queried with
rand('seed', new_seed)
and
current_seed = rand('seed')
In our opinion one of the most useful functions in the group of whole matrix functions is find. It takes a boolean expression of matrices (i.e. an expression which evaluates to a boolean matrix) as argument, and in form
index = find(expr)
returns the indices of the array elements that evaluate to true, i.e. %t in a vector. See also the section called Flattened Matrix Representation.
In the form
[rowidx, colidx] = find(expr)
it returns the row- and column-index vectors separately. Here is a complete example.
-->a = [ 1 -4 3; 6 2 10 ] a = ! 1. - 4. 3. ! ! 6. 2. 10. ! -->index = find( a < 5 ) index = ! 1. 3. 4. 5. ! -->a(index) ans = ! 1. ! ! - 4. ! ! 2. ! ! 3. ! -->[rowidx, colidx] = find( a < 5 ) colidx = ! 1. 2. 2. 3. ! rowidx = ! 1. 1. 2. 1. !
The expressions expr can be arbitrarily complex, and they are not limited to a single matrix.
-->b = [1 2 3; 4 5 6] b = ! 1. 2. 3. ! ! 4. 5. 6. ! -->a < 5 ans = ! T T T ! ! F T F ! -->abs(b) >= 4 ans = ! F F F ! ! T T T ! -->a < 5 & abs(b) >= 4 ans = ! F F F ! ! F T F ! -->find( a < 5 & abs(b) >= 4 ) ans = 4.
Last but not least find is perfectly OK on the left-hand side of an assignment. So, replacing all odd elements in a with 0 simply is
-->a( find(modulo(a, 2) == 1) ) = 0 a = ! 0. - 4. 0. ! ! 6. 2. 10. !
To get the number of elements that match a criterion, just apply size(idxvec, '*') to the index vector idxvec of the find operation.
Searching the smallest or the largest entry in a matrix are so common that Scilab has separate functions for these tasks. We discuss max only as min behaves similarly.
To get the largest value saying
max_val = max(a)
is enough. The alternate form
-->[max_val, index] = max(a) index = ! 2. 3. ! max_val = 10.
returns the position of the maximum element, too. The form of the index vector is the same as for size, i.e. [row-index, column-index]. Speaking of size, max has the forms max(mat, 'r'), and max(mat, 'c'), too.
-->[max_val, rowidx] = max(b, 'r') rowidx = ! 2. 2. 2. ! max_val = ! 4. 5. 6. ! -->[max_val, colidx] = max(b, 'c') colidx = ! 3. ! ! 3. ! max_val = ! 3. ! ! 6. !
These forms return the maximum values of each row or column along with the respective indices of the elements' rows or columns.
The third way of using max is with more than one matrix or scalar as arguments. All the matrices must be compatible, scalars are expanded to the full matrix size, like scalmat = scal * ones(mat). The return matrix holds the largest elements from all argument matrices.
-->max(a, b, 3) ans = ! 3. 3. 3. ! ! 6. 5. 10. !
FIXME: write it!
FIXME: write it!
FIXME: write it!
The size function handles all shape inquiries. It comes in four different guises. Assuming that mat is a scalar or matrix, size can be used as all-info-at-once function as in
[rows, cols] = size(mat)
as row-only, or column-only function
rows = size(mat, 'r') cols = size(mat, 'c')
and finally as totaling function
elements = size(mat, '*')
A (hyper-)matrix can be reshaped with the matrix command. To keep things simple we demonstrate matrix with a 6x2-matrix.
-->a = [1:6; 7:12] a = ! 1. 2. 3. 4. 5. 6. ! ! 7. 8. 9. 10. 11. 12. ! -->matrix(a, 3, 4) ans = ! 1. 8. 4. 11. ! ! 7. 3. 10. 6. ! ! 2. 9. 5. 12. ! -->matrix(a, 4, 3) ans = ! 1. 3. 5. ! ! 7. 9. 11. ! ! 2. 4. 6. ! ! 8. 10. 12. !
In contrary to the Fortran-9x function RESHAPE, matrix neither allows padding, nor truncation of the reshaped matrix. Put another way, for a m times n matrix a the reshaped dimensions p, and q must obey m * n = p * q.
matrix works by columnwise "filling" the contents of the original matrix a into an empty template of a p times q matrix. (See also the section called Flattened Matrix Representation.) If this a too hard to imagine, the second way to think of it is imagining a as a column vector of dimensions (m * n) times 1 that is broken down column by column into a p times q matrix. In fact this is not pure imagination as in many situations there is the identity a(i, j) == a(i + n*(j - 1)) holds.
-->a(2,4) ans = 10. -->a(8) ans = 10.
Moreover, the usual vector subscripting can be used to a matrix.
-->a(:) ans = ! 1. ! ! 7. ! ! 2. ! ! 8. ! ! 3. ! ! 9. ! ! 4. ! ! 10. ! ! 5. ! ! 11. ! ! 6. ! ! 12. !
Once upon a time there was a little Scilab newbie who coded an interface to the optim routine to make polynomial approximations easier. On the way an evaluate function for polynomials had to be written. The author was very proud of herself because she knew the Right Thing(tm) to do in this case namely the Horner algorithm. Actually she has come up with two implementations!
Example 6-2. Naive functions to evaluate a polynomial
function yv = peval1(cv, xv) // Evaluate polynomial given by the vector its coefficients cv // in ascending order, i.e. cv = [p q r] --> p + q*x + r*x^2 // at all points listed in vector xv and return the // resulting vector. yv = cv(1) * ones(xv) px = xv for c = cv(2 : $) yv = yv + c * px px = px .* xv end function yv = peval2(cv, xv) // same as peval1 yv = cv($); for i = length(cv)-1 : -1 : 1 yv = yv .* xv + cv(i) end
So what is wrong with that? This code looks OK and it does the job. But from the performance viewpoint it is not optimal! The fact that Scilab offers a separate type for polynomials has been ignored. Even if we are forced to supply an interface with the coefficients stored in vectors the built-in function freq is preferable.
Example 6-3. Less naive functions to evaluate a polynomial
function yv = peval3(cv, xv) // same as peval1, using horner() p = poly(cv, 't', 'coeff') yv = horner(p, xv) function yv = peval4(cv, xv) // same as peval1, using freq() // The return value yv _always_ is a row-vector. p = poly(cv, 't', 'coeff') unity = poly(1, 't', 'coeff') yv = freq(p, unity, xv)
Table 6-2 shows the speed ratios (each line is normalized separately) for a polynomial of degree 4 that we got on a P5/166 Linux system.
Table 6-2. Performance comparison of different polynomial evaluation routines
evaluations | peval1 | peval2 | peval3 | peval4 |
---|---|---|---|---|
5 | 3.5 | 4.2 | 1 | 7.0 |
1000 | 1.4 | 2.5 | 1 | 2.5 |
If we now decide to change our interface to take Scilab's built-in polynomial type the evaluation with freq can again be accelerated by a factor of more than 3.
[1] | Remember that the colon operator returns a row-vector. |