Improve Performance of Small Matrix Problems on the GPU Using pagefun - MATLAB & Simulink Example - MathWorks Deutschland (2024)

Open Live Script

This example shows how to use pagefun to improve the performance of independent operations applied to multiple matrices arranged in a multidimensional array.

Multidimensional arrays are an extension of 2-D matrices and use additional subscripts for indexing. A 3-D array, for example uses three subscripts. The first two dimensions represent a matrix and the third represents pages of elements (sometimes referred to as slices). For more information, see Multidimensional Arrays.

Improve Performance of Small Matrix Problems on the GPU Using pagefun- MATLAB & Simulink Example- MathWorks Deutschland (1)

While GPUs can effectively apply small independent operations to large matrices, performance is suboptimal when these operations are applied in serial, for example when the operations are applied in a for-loop. In order to avoid serial processing, the arrayfun function applies a scalar operation to each element of an array in parallel on the GPU. Similarly, the pagefun function applies a function to each page of a multidimensional GPU array.

The pagefun function supports applying most element-wise functions and a number of matrix operations that support GPU array input. MATLAB® also provides a number of dedicated page-wise functions, including pagemtimes, pagemldivide, pagemrdivide, pagetranspose, pagectranspose, pageinv, pagenorm, and pagesvd. Depending on the task, these functions might simplify your code or provide better performance than using pagefun.

In this example, a robot is navigating a known map containing a large number of features that the robot can identify using its sensors. The robot locates itself in the map by measuring the relative position and orientation of those features and comparing them to the map locations. Assuming the robot is not completely lost, it can use any difference between the two to correct its position, for instance by using a Kalman Filter. This example shows an efficient way to compute the feature positions relative to the robot.

Set up the Map

Define the dimensions of a room containing a number of features.

roomDimensions = [50 50 5];

The supporting function randomTransforms is provided at the end of this example and initializes N transforms with random values, providing a structure as output. It represents positions and orientations using 3-by-1 vectors T and 3-by-3 rotation matrices R. The N translations are packed into a 3-by-N matrix and the rotations are packed into a 3-by-3-by-N array.

Use the randomTransforms function to set up a map of 1000 features, and a start location for the robot.

numFeatures = 1000;Map = randomTransforms(numFeatures,roomDimensions);Robot = randomTransforms(1,roomDimensions);

The plotRobot function is provided as a supporting file with this example and plots a top-down view of the room, and a close up view of the robot and nearby features. The robot is represented by a blue box with wheels and the features are represented by red circles with accompanying lines representing their orientation. To use this function, open the example as a livescript.

Call the plotRobot function.

plotRobot(Robot,Map)

Improve Performance of Small Matrix Problems on the GPU Using pagefun- MATLAB & Simulink Example- MathWorks Deutschland (2)

Define the Equations

To correctly identify the features in the map, the robot needs to transform the map to put its sensors at the origin. Then it can find map features by comparing what it sees with what it expects to see.

For a map feature i we can find its position relative to the robot Trel(i) and orientation Rrel(i) by transforming its global map location:

Rrel(i)=RbotRmap(i)Trel(i)=Rbot(Tmap(i)-Tbot)

where Tbot and Rbot are the position and orientation of the robot, and Tmap(i) and Rmap(i) represent the map data. The equivalent MATLAB code looks like this:

Rrel(:,:,i) = Rbot' * Rmap(:,:,i)Trel(:,i) = Rbot' * (Tmap(:,i) - Tbot)

Perform Matrix Transforms on the CPU using a for-loop

The supporting function loopingTransform is provided at the end of this example and loops over all the transforms in turn, transforming each feature to its location relative to the robot. Note the like name-value argument for zeros function which makes the function return an array of zeros of the same data type as a prototype array. For example, if the prototype array is a gpuArray, then zeros returns a gpuArray. This allows you to use the same code on the GPU in the next section.

Time the calculations using the timeit function. The timeit function times the execution of loopingTransform multiple times and returns the median of the measurements. Since timeit requires a function with no arguments, use the @() syntax to create an anonymous function of the right form.

cpuTime = timeit(@()loopingTransform(Robot,Map,numFeatures))
cpuTime = 0.0042

Perform Matrix Transforms on the GPU using a for-loop

To run the same code on the GPU, simply pass the input data to the function as a gpuArray. A gpuArray represents an array stored in GPU memory. Many functions in MATLAB and in other toolboxes support gpuArray objects, allowing you to run your code on GPUs with minimal changes to the code. For more information, see Run MATLAB Functions on a GPU.

Ensure that your desired GPU is available and selected.

gpu = gpuDevice;disp(gpu.Name + " GPU selected.")

Create GPU arrays containing the position and orientation of the robot and the features in the map.

gMap.R = gpuArray(Map.R);gMap.T = gpuArray(Map.T);gRobot.R = gpuArray(Robot.R);gRobot.T = gpuArray(Robot.T);

Time the calculations using the gputimeit function. The gputimeit function is the equivalent of timeit for code that includes GPU computation. It makes sure all GPU operations have finished before recording the time.

gpuTime = gputimeit(@()loopingTransform(gRobot,gMap,numFeatures))
gpuTime = 0.1905

Perform Matrix Transforms on the GPU using pagefun

The GPU version is very slow because, although all calculations were independent, they ran in series. Using pagefun we can run all the computations in parallel.

The supporting function pagefunTransform is provided at the end of this example and applies the same transforms as the loopingTransform function using pagefun instead of a for-loop. The first computation is the calculation of the rotations. This involves a matrix multiply, which translates to the function mtimes (*). Pass this to pagefun along with the two sets of rotations to be multiplied:

Rel.R = pagefun(@mtimes,Robot.R',Map.R);

Robot.R' is a 3-by-3 matrix, and Map.R is a 3-by-3-by-N array. The pagefun function matches each independent matrix from the map to the same robot rotation, and gives us the required 3-by-3-by-N output.

The translation calculation also involves a matrix multiply, but the normal rules of matrix multiplication allow this to come outside the loop without any changes:

Rel.T = Robot.R' * (Map.T - Robot.T);

Time the calculations using the gputimeit function.

gpuPagefunTime = gputimeit(@()pagefunTransform(gRobot,gMap))
gpuPagefunTime = 3.3066e-04

Compare Results

Plot the timing results.

figurelabels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]);bar(labels,[cpuTime,gpuTime,gpuPagefunTime])ylabel("Execution Time (s)")set(gca,YScale="log")

Improve Performance of Small Matrix Problems on the GPU Using pagefun- MATLAB & Simulink Example- MathWorks Deutschland (3)

Calculate how much faster the execution using pagefun is than CPU and simple GPU execution.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ... cpuTime/gpuPagefunTime);
Executing the transforms on the GPU using pagefun is 12.65 times faster than on the CPU.
fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using for-loops on the GPU.\n", ... gpuTime/gpuPagefunTime);
Executing the transforms on the GPU using pagefun is 576.17 times faster than using for-loops on the GPU.

Locate a Lost Robot Using Multiple Possible Robot Positions

If the robot is in an unknown part of the map, it can use a global search algorithm to locate itself. The algorithm tests a number of possible locations by carrying out the above computation and looking for good correspondence between the features seen by the robot's sensors and what it would expect to see at that position.

Now there are multiple possible robot positions as well as multiple features. N features and M robots requires N*M transforms. To distinguish 'robot space' from 'feature space', use the 4th dimension for rotations and the 3rd for translations. That means that the robot rotations will be 3-by-3-by-1-by-M, and the translations will be 3-by-1-by-M.

Initialize the search with ten random robot locations. A good search algorithm would use topological or other clues to seed the search more intelligently.

numRobots = 10;Robot = randomTransforms(numRobots,roomDimensions);Robot.R = reshape(Robot.R,3,3,1,[]); % Spread along the 4th dimensionRobot.T = reshape(Robot.T,3,1,[]); % Spread along the 3rd dimension

A supporting function loopingTransform2 is defined at the end of this example and performs a looping transform using two nested loops, to loop over the robots as well as over the features.

Time the calculations using timeit.

cpuTime = timeit(@()loopingTransform2(Robot,Map,numFeatures,numRobots))
cpuTime = 0.0759

Create GPU arrays containing the robot rotations and translations.

gRobot.R = gpuArray(Robot.R);gRobot.T = gpuArray(Robot.T);

Time the calculations on the GPU using gputimeit.

gpuTime = gputimeit(@() loopingTransform2(gRobot,gMap,numFeatures,numRobots))
gpuTime = 2.1059

As before, the looping version runs much slower on the GPU because it is not doing calculations in parallel.

A supporting function pagefunTransform2 is provided at the end of this example and applies the same transforms as the loopingTransform2 function using two pagefun calls instead of nested for-loops. This function needs to incorporate the transpose operator as well as mtimes into a call to pagefun. The function also applies the squeeze function to the transposed robot orientations to put the spread over robots into the 3rd dimension, to match the translations. Despite this, the resulting code is considerably more compact.

The pagefun function expands dimensions appropriately so where we multiply 3-by-3-by-1-by-M matrix Rt with 3-by-3-by-N-by-1 matrix Map.R, we get a 3-by-3-by-N-by-M matrix out.

Time the calculations on the GPU using gputimeit.

gpuPagefunTime = gputimeit(@()pagefunTransform2(gRobot,gMap))
gpuPagefunTime = 0.0014

Compare Results

Plot the timing results.

labels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]);bar(labels,[cpuTime,gpuTime,gpuPagefunTime])ylabel("Execution Time (s)")set(gca,YScale="log")

Improve Performance of Small Matrix Problems on the GPU Using pagefun- MATLAB & Simulink Example- MathWorks Deutschland (4)

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ... cpuTime/gpuPagefunTime);
Executing the transforms on the GPU using pagefun is 55.85 times faster than on the CPU.
fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using nested for-loops on the GPU.\n", ... gpuTime/gpuPagefunTime);
Executing the transforms on the GPU using pagefun is 1549.19 times faster than using nested for-loops on the GPU.

Conclusion

The pagefun function supports a number of 2-D operations, as well as most of the scalar operations supported by arrayfun. Together, these functions allow you to vectorize a range of computations involving matrix algebra and array manipulation, eliminating the need for loops and making huge performance gains.

Wherever you are doing small calculations on GPU data in a loop, you should consider converting to a vectorized implementation in this way. This can also be an opportunity to make use of the GPU to improve performance where previously it gave no performance gains.

Supporting Functions

Random Transform Function

The randomTransforms function creates matrices defining N random transforms in a room of specified dimensions. Each transform comprises a random translation T and a random rotation R. The function can be used to set up a map of features in a room and the starting position and orientation of a robot.

function Tform = randomTransforms(N,roomDimensions)% Preallocate matrices.Tform.T = zeros(3,N);Tform.R = zeros(3,3,N);for i = 1:N % Create random translation. Tform.T(:,i) = rand(3,1) .* roomDimensions'; % Create random rotation by extracting an orthonormal % basis from a random 3-by-3 matrix. Tform.R(:,:,i) = orth(rand(3,3));endend

Looping Transform Function

The loopingTransform function transforms every feature to its location relative to the robot by looping over the transforms in turn.

function Rel = loopingTransform(Robot,Map,numFeatures)% Preallocate matrices.Rel.R = zeros(size(Map.R),like=Map.R);Rel.T = zeros(size(Map.T),like=Map.T);for i = 1:numFeatures % Find orientation of map feature relative to the robot. Rel.R(:,:,i) = Robot.R' * Map.R(:,:,i); % Find position of map feature relative to the robot. Rel.T(:,i) = Robot.R' * (Map.T(:,i) - Robot.T);endend

pagefun Transform Function

The pagefunTransform function transforms every feature to its location relative to the robot by applying the transforms using the pagefun function.

function Rel = pagefunTransform(Robot,Map)% Find orientation of map feature relative to the robot.Rel.R = pagefun(@mtimes,Robot.R', Map.R);% Apply translation.Rel.T = Robot.R' * (Map.T - Robot.T);end

Nested Looping Transform Function

The loopingTransform2 function performs a looping transform using two nested loops, to loop over the robots as well as over the features. The transforms map every feature to its location relative to every robot.

function Rel = loopingTransform2(Robot,Map,numFeatures,numRobots)% Preallocate matrices.Rel.R = zeros(3,3,numFeatures,numRobots,like=Map.R);Rel.T = zeros(3,numFeatures,numRobots,like=Map.T);for i = 1:numFeatures for j = 1:numRobots % Find orientation of map feature relative to the robot. Rel.R(:,:,i,j) = Robot.R(:,:,1,j)' * Map.R(:,:,i); % Find position of map feature relative to the robot. Rel.T(:,i,j) = ... Robot.R(:,:,1,j)' * (Map.T(:,i) - Robot.T(:,1,j)); endendend

Two-call pagefun Transform Function

The pagefunTransform2 function performs transforms to map every feature to its location relative to every robot using two calls to the pagefun function.

function Rel = pagefunTransform2(Robot,Map)% Find orientation of map feature relative to the robot.Rt = pagefun(@transpose,Robot.R);Rel.R = pagefun(@mtimes,Rt,Map.R);% Find position of map feature relative to the robot.Rel.T = pagefun(@mtimes,squeeze(Rt), ... (Map.T - Robot.T));end

See Also

pagefun | gpuArray | arrayfun | gputimeit

Related Topics

  • Measure and Improve GPU Performance
  • Improve Performance of Element-Wise MATLAB Functions on the GPU Using arrayfun
  • Using GPU arrayfun for Monte-Carlo Simulations
  • Illustrating Three Approaches to GPU Computing: The Mandelbrot Set

MATLAB-Befehl

Sie haben auf einen Link geklickt, der diesem MATLAB-Befehl entspricht:

 

Führen Sie den Befehl durch Eingabe in das MATLAB-Befehlsfenster aus. Webbrowser unterstützen keine MATLAB-Befehle.

Improve Performance of Small Matrix Problems on the GPU Using pagefun- MATLAB & Simulink Example- MathWorks Deutschland (5)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

Contact your local office

Improve Performance of Small Matrix Problems on the GPU Using pagefun
- MATLAB & Simulink Example
- MathWorks Deutschland (2024)

FAQs

How do you speed up GPU in MATLAB? ›

To speed up your code, you can try using your computer's GPU. If all the functions that you want to use are supported on the GPU, you can simply use the gpuArray function to transfer input data to the GPU, and call the gather function to retrieve the output data from the GPU.

Does MATLAB benefit from GPU? ›

MATLAB enables you to use NVIDIA® GPUs to accelerate AI, deep learning, and other computationally intensive analytics without having to be a CUDA® programmer. Using MATLAB and Parallel Computing Toolbox, you can: Use NVIDIA GPUs directly from MATLAB with over 1000 built-in functions.

How to let MATLAB use GPU? ›

Use MATLAB Functions with the GPU

To transfer it to the GPU and create a gpuArray object, use the gpuArray function. To operate with gpuArray objects, use any gpuArray -enabled MATLAB function. MATLAB automatically runs calculations on the GPU.

Can you use CUDA in MATLAB? ›

If running MATLAB® functions on the GPU does not sufficiently speed up your code, or if you need to use advanced GPU CUDA features, you can write your own CUDA code and run it in MATLAB by generating an executable MEX file using mexcuda or an executable kernel using parallel.

How do I maximize my GPU performance? ›

Ways to boost the GPU performance
  1. Keep the dust off. ...
  2. Ensure your laptop is plugged in. ...
  3. Update your drivers. ...
  4. Using AI-based algorithms. ...
  5. Enabling synchronized monitor refresh rate. ...
  6. GPU Overclocking.
Nov 30, 2023

Does Simulink use GPU? ›

GPU-accelerated simulation requires a CUDA®-enabled NVIDIA® GPU. When you enable GPU-accelerated simulation, the software partitions and generates CUDA MATLAB® executable (MEX) code from the MATLAB Function blocks and dynamically links the generated code to Simulink during simulation.

What is the best GPU for MATLAB? ›

One option to consider would be the Nvidia Quadro RTX 6000. It has 4608 CUDA cores and 24GB of RAM, which is more than the P4000, and has support for Matlab's GPU computing capabilities. Another option to consider would be the Nvidia GeForce RTX 3080, which has 8704 CUDA cores and 10GB of RAM.

Does MATLAB need dedicated GPU? ›

Graphics. No specific graphics card is required, but a hardware accelerated graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended. GPU acceleration using Parallel Computing Toolbox requires a GPU with a specific range of compute capability.

Can MATLAB use multiple GPUs? ›

MATLAB® supports training a single deep neural network using multiple GPUs in parallel. By using parallel workers with GPUs, you can train with multiple GPUs on your local machine, on a cluster, or on the cloud. Using multiple GPUs can speed up training significantly.

How do you create a GPU array in MATLAB? ›

You can create GPU arrays by transferring existing arrays from host memory to the GPU. Transfer an array from host memory to the GPU using the gpuArray function. The gpuArray function supports only numeric input arguments, for example, single , double , int8 , or logical input arguments.

How to check if array is on GPU MATLAB? ›

Determine if Array is gpuArray

Y = gpuArray(X); Use the function isgpuArray to verify that Y is a gpuArray .

Does MATLAB need CPU or GPU? ›

Graphics Processing Unit (GPU) for display

MATLAB Graphics are rendered using OpenGL technology, so a graphics card with superior OpenGL support can outperform a lesser card. Up-to-date drivers are recommended for the best visual appearance and robustness.

How to run Simulink simulation from MATLAB? ›

A prefered way to run a simulation from MATLAB is to use the SimulationInput object.
  1. simIn = Simulink.SimulationInput("Model_Name"); %create object.
  2. simIn = simIn.setVariable("my_Parameter",2); %example of setting a parameter override.
  3. out = sim(simIn); %run simulation, all results returned in "out"
Mar 16, 2017

Does CUDA help performance? ›

Due to their parallel architecture, CUDA cores can achieve high performance in tasks that can be parallelized, such as image processing, scientific simulations, and machine learning. However, they may not be as efficient in tasks that require complex branching or decision-making, which are better suited for CPU cores.

How to increase processing speed in MATLAB? ›

You can increase the execution speed of the generated code by implementing compiler and processor specific optimizations, specifying buffer reuse, and removing code you might not need.

Is MATLAB GPU accelerated? ›

Settings. Enables simulation acceleration by using GPU Coder. When this option is on, the software generates CUDA MATLAB® executable (MEX) code from the block and dynamically links the generated code to Simulink® during simulation.

How can I increase my GPU clock speed? ›

How to safely overclock your GPU? To safely overclock your GPU, increase the clock rate at increments of 10 MHz, and test your system for stability and performance after each adjustment. If your system struggles, lower your overclock by 10 or 20 MHz, then test again.

How to increase speed of MATLAB code? ›

Two effective programming techniques to accelerate your MATLAB code are preallocation and vectorization. With preallocation, you initialize an array using the final size required for that array. Preallocation helps you avoid dynamically resizing arrays, particularly when code contains for and while loops.

Top Articles
Latest Posts
Article information

Author: Carmelo Roob

Last Updated:

Views: 6070

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.