SPO600 Lab 6 Vectorization Lab

Hi all, I was assigned to work in a lab that and there are the specifications.

1. Write a short program that creates two 1000-element integer arrays and fills them with random numbers, then sums those two arrays to a third array, and finally sums the third array to a long int and prints the result.

Here is my lab6q1.cpp code:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace std;

int main(){
int first[1000];
int second[1000];
int sum[1000];
int i;
cout<<"Please enter the calue in 1st array\n";
first[i] = rand() %100;
cout<<"Please enter the calue in 2nd array\n";
second[i] = rand() %100;
cout<<"\n1st array values:\n";
cout<<first[i]<<" ";
cout<<"\n2nd array values:\n";
cout<<second[i]<<" ";
cout<<"\nSum of two arraies:\n";
cout << sum[i]<<" ";
return 0;

2. Compile this program on aarchie in such a way that the code is auto-vectorized. Annotate the emitted code (i.e., obtain a dissassembly via objdump -d and add comments to the instructions in <main> explaining what the code does).

I compile the program on Betty Aarch64 with g++ -O3 lab6q1.cpp. The program starting and ending logic of the assembly code is very similar to the regular C++ code. I use “cout” to separate different parts of the code so it is easier to read.

00000000004008c8 &lt;main&gt;:
 4008c8: d1400bff sub sp, sp, #0x2, lsl #12 // set up
 4008cc: d13b83ff sub sp, sp, #0xee0
 4008d0: a9bb7bfd stp x29, x30, [sp,#-80]!
 4008d4: 910003fd mov x29, sp
 4008d8: a90363f7 stp x23, x24, [sp,#48]
 4008dc: 90000001 adrp x1, 400000 &lt;_init-0x7b8&gt; // load x1 with the address label
 4008e0: b0000098 adrp x24, 411000 &lt;_DYNAMIC+0x80&gt; // load x24 with the address label
 4008e4: 91082300 add x0, x24, #0x208
 4008e8: 9132c021 add x1, x1, #0xcb0
 4008ec: a90153f3 stp x19, x20, [sp,#16]
 4008f0: a9025bf5 stp x21, x22, [sp,#32]
 4008f4: f90023f9 str x25, [sp,#64] // store

 //================First array input =========
 4008f8: 97ffffd6 bl 400850 &lt;_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt&gt; // print msg
 4008fc: 910143a0 add x0, x29, #0x50
 400900: 913fc3b5 add x21, x29, #0xff0
 400904: aa0003f3 mov x19, x0
 400908: 52800c94 mov w20, #0x64 // #100
 40090c: 97ffffd9 bl 400870 &lt;rand@plt&gt; // random function call
 400910: 1ad40c01 sdiv w1, w0, w20
 400914: 1b148020 msub w0, w1, w20, w0
 400918: b8004660 str w0, [x19],#4
 40091c: eb15027f cmp x19, x21
 400920: 54ffff61 b.ne 40090c &lt;main+0x44&gt; // branch to label 40090c if not equal
 400924: 90000001 adrp x1, 400000 &lt;_init-0x7b8&gt;
 400928: 91336021 add x1, x1, #0xcd8
 40092c: 91082300 add x0, x24, #0x208

 //================Second array input =========
 400930: 97ffffc8 bl 400850 &lt;_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt&gt; // print msg
 400934: 913fc3a1 add x1, x29, #0xff0
 400938: 913e8034 add x20, x1, #0xfa0
 40093c: aa0103f3 mov x19, x1
 400940: 52800c96 mov w22, #0x64 // #100
 400944: 97ffffcb bl 400870 &lt;rand@plt&gt; // random function call
 400948: 1ad60c01 sdiv w1, w0, w22
 40094c: 1b168020 msub w0, w1, w22, w0
 400950: b8004660 str w0, [x19],#4
 400954: eb14027f cmp x19, x20
 400958: 54ffff61 b.ne 400944 &lt;main+0x7c&gt;// branch to label 400944 if not equal
 40095c: 91082316 add x22, x24, #0x208
 400960: 90000001 adrp x1, 400000 &lt;_init-0x7b8&gt;
 400964: aa1603e0 mov x0, x22
 400968: 91340021 add x1, x1, #0xd00
 40096c: 90000019 adrp x25, 400000 &lt;_init-0x7b8&gt;

 //================First array output =========
 400970: 97ffffb8 bl 400850 &lt;_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt&gt; // print msg
 400974: 910143b3 add x19, x29, #0x50
 400978: 91346337 add x23, x25, #0xd18
 40097c: b8404661 ldr w1, [x19],#4
 400980: aa1603e0 mov x0, x22
 400984: 97ffff9b bl 4007f0 &lt;_ZNSolsEi@plt&gt; // print first array elements
 400988: aa1703e1 mov x1, x23
 40098c: d2800022 mov x2, #0x1 // #1
 400990: 97ffffbc bl 400880 &lt;_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt&gt; // print new line
 400994: eb15027f cmp x19, x21
 400998: 54ffff21 b.ne 40097c &lt;main+0xb4&gt; // branch to label 40097c if not equal
 40099c: 90000001 adrp x1, 400000 &lt;_init-0x7b8&gt;
 4009a0: aa1603e0 mov x0, x22
 4009a4: 91348021 add x1, x1, #0xd20

 //================Second array output =========
 4009a8: 97ffffaa bl 400850 &lt;_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt&gt; // print msg
 4009ac: 913fc3b3 add x19, x29, #0xff0
 4009b0: b8404661 ldr w1, [x19],#4
 4009b4: aa1603e0 mov x0, x22
 4009b8: 97ffff8e bl 4007f0 &lt;_ZNSolsEi@plt&gt;
 4009bc: aa1703e1 mov x1, x23
 4009c0: d2800022 mov x2, #0x1 // #1
 4009c4: 97ffffaf bl 400880 &lt;_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt&gt; // print new line
 4009c8: eb14027f cmp x19, x20
 4009cc: 54ffff21 b.ne 4009b0 &lt;main+0xe8&gt; // branch to label 4009b0 if not equal
 4009d0: d2800000 mov x0, #0x0 // #0
 4009d4: 910143a3 add x3, x29, #0x50
 4009d8: 8b000062 add x2, x3, x0
 4009dc: 913fc3a3 add x3, x29, #0xff0

 //================Second array output =========
 4009e0: 8b000061 add x1, x3, x0
 4009e4: 4c407841 ld1 {v1.4s}, [x2]
 4009e8: d283f202 mov x2, #0x1f90 // #8080
 4009ec: 4c407820 ld1 {v0.4s}, [x1]
 4009f0: 8b1d0042 add x2, x2, x29
 4009f4: 8b000041 add x1, x2, x0
 4009f8: 4ea08420 add v0.4s, v1.4s, v0.4s
 4009fc: 91004000 add x0, x0, #0x10
 400a00: 4c007820 st1 {v0.4s}, [x1]
 400a04: f13e801f cmp x0, #0xfa0
 400a08: 54fffe61 b.ne 4009d4 &lt;main+0x10c&gt; // branch to label 4009d4 if not equal
 400a0c: 91082315 add x21, x24, #0x208
 400a10: 90000001 adrp x1, 400000 &lt;_init-0x7b8&gt;
 400a14: aa1503e0 mov x0, x21
 400a18: 9134e021 add x1, x1, #0xd38
 400a1c: d283f213 mov x19, #0x1f90 // #8080
 400a20: d285e616 mov x22, #0x2f30 // #12080

 //================Sum array output =========
 400a24: 97ffff8b bl 400850 &lt;_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt&gt; // print msg
 400a28: 8b1d0273 add x19, x19, x29
 400a2c: 8b1d02d6 add x22, x22, x29
 400a30: 91346334 add x20, x25, #0xd18
 400a34: b8404661 ldr w1, [x19],#4
 400a38: aa1503e0 mov x0, x21
 400a3c: 97ffff6d bl 4007f0 &lt;_ZNSolsEi@plt&gt; // print sum elements
 400a40: aa1403e1 mov x1, x20
 400a44: d2800022 mov x2, #0x1 // #1
 400a48: 97ffff8e bl 400880 &lt;_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt&gt; // print new line
 400a4c: eb16027f cmp x19, x22
 400a50: 54ffff21 b.ne 400a34 &lt;main+0x16c&gt;
 400a54: a94153f3 ldp x19, x20, [sp,#16]
 400a58: a9425bf5 ldp x21, x22, [sp,#32]
 400a5c: a94363f7 ldp x23, x24, [sp,#48]
 400a60: f94023f9 ldr x25, [sp,#64]
 400a64: a8c57bfd ldp x29, x30, [sp],#80
 400a68: 52800000 mov w0, #0x0 // #0
 400a6c: 913b83ff add sp, sp, #0xee0
 400a70: 91400bff add sp, sp, #0x2, lsl #12
 400a74: d65f03c0 ret

3. Review the vector instructions for AArch64. Find a way to scale an array of sound samples (see Lab 5) by a factor between 0.000-1.000 using SIMD. (Note: you may need to convert some data types). You DO NOT need to code this solution (but feel free if you want to!).

SIMD(Single Instruction Multiple Data) extensions simplify development of application software by offering a single tool-chain and processing device, when compared to architectures with separate programmable DSPs or accelerators. The single tool-chain environment speeds time-to-market as software plays an increasingly important role in product development. The SIMD extensions are completely transparent to the operating system (OS), allowing existing OS ports to be used. New applications running on the OS can be written to explicitly use the SIMD extensions, providing an additional power/performance advantage. ( https://www.arm.com/products/processors/technologies/dsp-simd.php)



Author: dannydat2005

Hello, Welcome to my blog site on programming and related development topics. At this point in time I’m relatively new to the professional developer world, but am getting my foothold into the arena. I’ve benefited greatly already by the contributions and documentation that others have provided on the web for jobs that have been asked of me, and this shall represent the beginnings of my contributions in return. My interests in programming that you may find topics on within this site include open source development.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s