VectorLib
Site Index:
OptiVec home
MatrixLib
CMATH
Download
Order
Update
Support
|
VectorLib
VectorLib is the vector functions part of OptiVec. This file describes the basic principles of the OptiVec libraries and gives an overview over VectorLib. The new object-oriented interface, VecObj, is described in chapter 3. MatrixLib and CMATH are described separately.
This is the English version. Translation of the first three chapters into Portuguese by Artur Weber for https://www.homeyou.com/~edu/.
Contents
1. Introduction
2. The Elements of OptiVec Routines
3. C++ only: VecObj, the Object-Oriented Interface for VectorLib
4. VectorLib Functions and Routines: A Short Overview
5. Error Handling
6. Trouble-Shooting
7. The Include-Files and Units of OptiVec
1. Introduction
OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.
In contrast to integrated packages like MatLab or others, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Both C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries.
In comparison to these approaches, OptiVec is superior mainly with respect to execution speed – on the average by a factor of 3, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!
There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, OptiVec, was. in 1996, the first product on the market offering a comprehensive vectorized-functions library realized in a true Assembler implementation.
- All operators and mathematical functions of C/C++ are implemented in vectorized form; additionally many more mathematical functions are included which normally would have to be calculated by more or less complicated combinations of existing functions. Not only the execution speed, but also the accuracy of the results is greatly improved.
- Building blocks for statistical data analysis are supplied.
- Derivatives, integrals, interpolation schemes are included.
- Fast Fourier Transform techniques allow for efficient convolutions, correlation analyses, spectral filtering, and so on.
- Graphical representation of data offers a convenient way of monitoring the results of vectorized calculations.
- A wide range of optimized matrix functions like matrix arithmetics, algebra,
decompositions, data fitting, etc. is offered by MatrixLib.
TensorLib is planned as a future extension of these concepts for general multidimensional arrays.
- Each function exists for every data type for which this is reasonable. The data type is signalled by the prefix of the function name. No implicit name mangling or other specific C++ features are used, which makes OptiVec usable in plain-C as well as in C++ programs. Moreover, the names and the syntax of nearly all functions are the same in C/C++ and Pascal/Delphi languages.
- The input and output vectors/matrices of VectorLib and MatrixLib routines may be of variable size and it is possible to process only a part (e.g., the first 100 elements, or every 10th element) of a vector, which is another important advantage over other approaches, where only whole arrays are processed.
- A new object-oriented interface for C++, named VecObj, encapsulates all vector functions, offering even easier use and increased memory safety.
- Using OptiVec routines instead of loops can make your source code much more compact and far better readable.
The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.
Back to VectorLib Table of Contents
OptiVec home
1.1 Why Vectorized Programming Pays Off on the PC
To process one-dimensional data arrays or "vectors", a programmer would normally write a loop over all vector elements. Similarly, two- or higher-dimensional arrays ("matrices" or "tensors") are usually processed through nested loops over the indices in all dimensions. The alternative to this classic style of programming are vector and matrix functions.
Vector functions act on whole arrays/vectors instead of single scalar arguments. They are the most consequent form of "vectorization", i.e., organisation of program code (by clever compilers or by the programmer himself) in such a way as to optimize vector treatment.
Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.
Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest 2 or 4-processor core configurations, let alone on the classical single-processor PC. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.
1.1.1 General OptiVec Optimization Strategies
Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):
Preload of constants Floating-point as well as integer constants, employed in the evaluation of mathematical functions, are loaded into registers outside of the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.
Full XMM and FPU stack usage Where necessary, all eight (64-bit: all sixteen) XMM registers and/or all eight coprocessor registers are employed.
Prefetch of chunks of vector elements Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is wasted waiting for them when they are actually needed.
Use of SIMD commands You might wonder why this strategy is not listed first. The SSE or "Streaming Single-Instruction-Multiple-Data Extensions", introduced since the days of the Pentium III and improved with every new processor generation, provide explicit support for vectorized programming with floating-point data in float / single or double precision. At first sight, therefore, they should revolutionize vector programming. Given the usual relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) it could make otherwise. In many cases, the advantage of using an SIMD instruction instead of separate FPU instructions melts down to a 20-30% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible. Please note, however, that, the SIMD-employing library versions (P8, P9 etc.) generally sacrifice 1-2 digits of accuracy in order to attain the described speed gain. If this is not acceptable for your specific task, please stay with the P4 libraries.
Loop-unrolling Where SIMD instructions cannot be used and where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel execution pipes. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size.
Simplified addressing The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.
Replacement of floating-point by integer commands For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.
Strict precision control C compilers convert a float into a double – Borland Pascal/Delphi even into extended – before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary – which would be discarded anyway. There is also a brute-force approach to precision-control: You can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be slightly sped up from Pentium CPUs on. Be, however, prepared to accept even lower-than-single accuracy of your end results, if you elect this option. For further details and precautions, see V_setFPAccuracy.
All-inline coding All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.
Cache-line matching of local variables
The Level-1 cache of modern processors uses 64-byte lines. Many OptiVec functions need double-precision or extended-precision real local variables on the stack (mainly for integer/floating-point conversions or for range checking). 32-bit compilers align the stack on 4-byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cache-line boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles), 16-byte (for extendeds), or 64-byte boundaries (for XMM and YMM values).
Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range -2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions will crash without warning.
1.1.2 Multi-Processor Optimization
Multithread support
Modern multi-core processors allow the operating system to distribute threads among the available processors, scaling the overall performance with the number of available processor cores. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings, and the non-linear data-fitting functions), all other OptiVec functions are reentrant and may run in parallel.
When designing your multi-thread application, you have two options: functional parallelism and data parallelism.
Functional Parallelism
If different threads are performing different tasks – they are functionally different – one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a single-core CPU, this kind of multi-threading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multi-core computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only small-to-moderate size.
Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the thread-to-thread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical break-even sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functional-parallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.
Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries, OVVC4.LIB (for MS Visual C++), VCF4W.LIB (for Embarcadero/Borland C++), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with back-compatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).
Finally, for large vectors/matrices on multi-core machines, our multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC7M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level.
The "M" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" libraries, your programme must call V_initMT before any of the vector functions.
1.1.3 CUDA Device Support
Modern graphics cards are equipped with powerful multiprocessor capacity of up to several hundred processor kernels running in parallel. In recent years, interfaces have been developed, allowing to exploit this processing capacity not only for graphics rendering, but also for general calculations. One of these approaches is the CUDA concept by NVIDIA. Practically all current NVIDIA graphics cards support CUDA. Additionally, dedicated CUDA hardware is being offered by NVIDIA with the "Tesla" and "Fermi" board family. With the "C" libraries (e.g., OVVC8C.LIB), OptiVec offers a simple way to use a CUDA device for vector / matrix calculations without the hassles of actually programming in CUDA. There are a number of points to be considered:
- Obviously, the "C" libraries can be used only with a CUDA-enabled device installed. This means, only NVIDIA products are supported.
- Out of the compilers supported by OptiVec, currently, NVIDIA provides CUDA support only for MS Visual C++. This means there are presently no CUDA OptiVec libraries for the Embarcadero / Borland compilers available.
- It is necessary to have the latest display driver installed. Even brand-new computers most often do not have the latest drivers. They must be selected and downloaded from NVIDIA's web-site, www.nvidia.com.
- Already a sub-100$ graphics card can boost the performance of certain functions on a computer with a medium-range CPU by a factor of 10, dedicated hardware by much more. However, the combination of a high-end CPU with a low-end graphics card (as it is often found in laptop computers) will, at best, only marginally benefit from the "C" libraries.
- The cost of swapping data forth and back between main-board memory and graphics memory is so high that it can be "earned" back only for quite large vectors and matrices. E.g., for mathematical functions like the sine or exponential functions, CUDA pays off from 100,000 vector elements on. For matrix multiplication, payback occurs in the region of 200x200 elements. All OptiVec functions check if using the CUDA device makes sense and decide accordingly wether to source-out processing to the graphics processor or to stay on the CPU.
- Using CUDA with OptiVec is as easy as simply linking with the "C" library and with the cudaOptiVec import library. No modifications of your source code are necessary. On the other hand, by eliminating the repeated data transfers for each function, programming directly for CUDA devices with nVidia's CUDA SDK can lead to considerably higher performance than is possible with the use of the OptiVec "C" libraries.
- NVIDIA might at any time change the licence terms for their CUDA libraries, so that we might at some point no longer be able to include them in our distributions and/or to support CUDA at all.
1.1.4 Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries, OVVC4.LIB (for MS Visual C++), VCF4W.LIB (Embarcadero/Borland C++ compiler series), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with back-compatibility to older hardware, even down to 486DX, Pentium, Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P8 or P9 libraries (marked by the respective number in the in the library name).
For large vectors/matrices on multi-core machines, multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC8M.LIB (for MS Visual C++, using SSE3), VCF4M.LIB (for Embarcadero/Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in OVVC8C.LIB.
The "M" and "C" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.
Back to VectorLib Table of Contents
OptiVec home
2. Elements of OptiVec Routines
2.1 Synonyms for Some Data Types
To increase the versatility and completeness of OptiVec, additional data types are defined in <VecLib.h> or the unit VecLib:
a) C/C++ only:
The data type ui (short for "unsigned index") is used for the indexing of vectors and is defined as "unsigned int".
The 64-bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32-bit, the type quad is always signed. Functions for unsigned 64-bit integers are available only in the 64-bit versions of OptiVec.
The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80-bit reals, we define extended as "double" in the OptiVec versions for that compiler.
b) Delphi only:
The data type Float, which is familiar to C/C++ programmers, is defined as a synonym for Single. We prefer to have the letters defining the real-number data types in alphabetical proximity: "D" for Double, "E" for Extended, and "F" for Float. The letters "G" and "H" are already reserved for Great (128-bit real) and Half (16-bit real).
For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:
type | Delphi name | synonym | derived prefix |
8 bit signed | ShortInt | ByteInt | VBI_ |
8 bit unsigned | Byte | UByte | VUB_ |
16 bit signed | SmallInt | | VSI_ |
16 bit unsigned | Word | USmall | VUS_ |
32 bit signed | LongInt | | VLI_ |
32 bit unsigned | | ULong | VUL_ |
64 bit signed | Int64 | QuadInt | VQI_ |
64 bit unsigned (x64 version only!) | UInt64 | UQuad | VUQ_ |
16/32 bit signed | Integer | | VI_ |
16/32 bit unsigned | Cardinal | UInt | VU_ |
To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.
2.2 Complex Numbers
As described in greater detail for CMATH, OptiVec supports complex numbers both in cartesian and polar format.
If you use only the vectorized complex functions (but not the scalar functions of CMATH), you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;
The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;
If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the
functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );
For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.
2.3 Vector Data Types
We define, as usual, a "vector" as a one-dimensional array of data containing, at least, one element, with all elements being of the same data type. Using a more mathematical definition, a vector is a rank-one tensor. A two-dimensional array (i.e. a rank-two tensor) is denoted as a "matrix", and higher dimensions are always referred to as "tensors".
In contrast to other approaches, VectorLib does not allow zero-size vectors!The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
C/C++
typedef | float * | fVector |
typedef | double * | dVector |
typedef | extended * | eVector |
typedef | fComplex * | cfVector |
typedef | dComplex * | cdVector |
typedef | eComplex * | ceVector |
typedef | fPolar * | pfVector |
typedef | dPolar * | pdVector |
typedef | ePolar * | peVector |
typedef | int * | iVector |
typedef | byte * | biVector |
typedef | short * | siVector |
typedef | long * | liVector |
typedef | quad * | qiVector |
typedef | unsigned * | uVector |
typedef | unsigned byte * | ubVector |
typedef | unsigned short * | usVector |
typedef | unsigned long * | ulVector |
typedef | uquad * | uqVector |
typedef | ui * | uiVector |
| |
Pascal/Delphi
type | fVector | = ^Float; |
type | dVector | = ^Double; |
type | eVector | = ^Extended; |
type | cfVector | = ^fComplex; |
type | cdVector | = ^dComplex; |
type | ceVector | = ^eComplex; |
type | pfVector | = ^fPolar; |
type | pdVector | = ^dPolar; |
type | peVector | = ^ePolar |
type | iVector | = ^Integer; |
type | biVector | = ^ByteInt; |
type | siVector | = ^SmallInt; |
type | liVector | = ^LongInt; |
type | qiVector | = ^QuadInt; |
type | uVector | = ^UInt; |
type | ubVector | = ^UByte; |
type | usVector | = ^USmall; |
type | ulVector | = ^ULong; |
type | uqVector | = ^UQuad; |
| | |
|
Internally, a data type like fVector means "pointer to float", but you may think of a variable declared as fVector rather in terms of a "vector of floats".
|
Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following. |
C/C++ specific:
Vector elements can be accessed either with the [] operator, like VA[375] = 1.234;
or by the type-specific functions VF_element (returns the value of the
desired vector element, but cannot be used to overwrite the element) and
VF_Pelement (returns the pointer to a vector element).
Especially for some older Borland C versions (which have a bug in the
pointer-arithmetics), VF_Pelement has to be used instead of the syntax
X+n.
In your programs, you may mix these vector types with the static arrays of classic C style.
For example:
float a[100]; /* classic static array */
fVector b=VF_vector(100); /* VectorLib vector */
VF_equ1( a, 100 ); /* set the first 100 elements of a equal to 1.0 */
VF_equC( b, 100, 3.7 ); /* set the first 100 elements of b equal to 3.7 */
Pascal/Delphi specific:
As in C/C++, you may mix these vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100);
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)
Delphi also offers dynamically-allocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointer-based vectors of VectorLib with the array types of Pascal/Delphi:
| OptiVec vectors | Pascal/Delphi static/dynamic arrays |
alignment of first element | on 32-byte boundary for optimum cache-line matching | 2 or 4-byte boundary (may cause line-break penalty for double, QuadInt) |
alignment of following elements | packed (i.e., no dummy bytes between elements, even for 8, 10, and 16-bit types | arrays must be declared as "packed" for Delphi 4+ to be compatible with OptiVec |
index range checking | none | automatic with built-in size information |
dynamic allocation | function VF_vector, VF_vector0 | procedure SetLength (Delphi 4+ only) |
initialization with 0 | optional by calling VF_vector0 | always (Delphi 4+ only) |
de-allocation | function V_free, V_freeAll | procedure Finalize (Delphi 4+ only) |
reading single elements | function VF_element: a := VF_element(X,5); Delphi 4+ only: typecast into array also possible: a := fArray(X)[5]; | index in brackets: a := X[5]; |
setting single elements | function VF_Pelement: VF_Pelement(X,5)^ := a; Delphi 4+ only: typecast into array also possible: fArray(X)[5] := a; | index in brackets: X[5] := a; |
passing to OptiVec function | directly: VF_equ1( X, sz ); | address-of operator: VF_equ1( @X, sz ); |
passing sub-vector to OptiVec function | function VF_Pelement: VF_equC( VF_Pelement(X,10), sz-10, 3.7); | address-of operator: VF_equC( @X[10], sz-10, 3.7 ); |
Summarizing the properties of OptiVec vectors and of Pascal/Delphi arrays, the latter are somewhat more convenient and, due to the index range checking, safer, whereas the pointer-based OptiVec vectors are processed faster (due to the better alignment and to the absence of checking routines).
Back to VectorLib Table of Contents
OptiVec home
2.4 Vector Function Prefixes
In the plain-C, Pascal and Delphi versions, every OptiVec function has a prefix denoting the data-type on which it acts. (Read here about the overloaded C++ functions of VecObj.)
Prefix | Arguments and return value |
VF_ | fVector and float |
VD_ | dVector and double |
VE_ | eVector and extended (long double) |
VCF_ | cfVector and fComplex |
VCD_ | cdVector and dComplex |
VCE_ | ceVector and eComplex |
VPF_ | pfVector and fPolar |
VPD_ | pdVector and dPolar |
VPE_ | peVector and ePolar |
VI_ | iVector and int / Integer |
VBI_ | biVector and byte / ByteInt |
VSI_ | siVector and short int / SmallInt |
VLI_ | liVector and long int / LongInt |
VQI_ | qiVector and quad / QuadInt |
VU_ | uVector and unsigned / UInt |
VUB_ | ubVector and unsigned char / UByte |
VUS_ | usVector and unsigned short / USmall |
VUL_ | ulVector and unsigned long / ULong |
VUQ_ | uqVector and uquad / UQuad (for Win64 only!) |
VUI_ | uiVector and ui |
V_ | (data-type conversions like V_FtoD, data-type independent functions like V_initPlot) |
Back to VectorLib Table of Contents
OptiVec home
3. VecObj, theObject-Oriented Interface for VectorLib
VecObj, the object-oriented C++ interface to OptiVec vector functions was written by Brian Dale, Case Western Reserve University.
Among the advantages it offers are the following:
- automatic allocation and deallocation of memory
- simplified vector handling
- greatly reduced risk of memory leaks
- increased memory access safety
- intuitive overloaded operators
- simpler function calls
There are a few draw-backs, though, which you should be aware of:
- increased compiler load
- larger overhead (as for any encapsulated C++ code!), leading to
- increased code size
- decreased computational efficiency
- vectors can be processed only as a whole, not in parts
VecObj is contained in the include-files <VecObj.h>, <fVecObj.h>, <dVecObj.h> etc., with one include-file for each of the data-types supported in OptiVec.
To get the whole interface (for all data types at once),
#include <OptiVec.h>.
For access to any of the vector graphics functions, always include <OptiVec.h>.
MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.
All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"
For all vector classes, the arithmetic operators
+ - * / += -= *= /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax. The operator * refers to element-wise multiplication, not to the scalar product of two vectors.
All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.
While most VecObjfunctions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.
If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.
Continue with chapter 4. VectorLib Functions and Routines: A Short Overview Back to VectorLib Table of Contents
OptiVec home
Copyright © 1996-2022 OptiCode – Dr. Martin Sander Software Development
|