SPO Project Week 9

Hi all, I was assigned to work on an Open source project. Here is the outline:

Examine the glibc code to identify a function that is a candidate for optimization.

Project Steps

  • Identify a possible approach to the problem
  • Test your approach.
    • You may do this by testing an alternative library containing your implementation of the function.
  • Modify glibc. Use git to track your changes.
  • Submit your change to the glibc community.
    • Make sure that your change does not cause regressions on other platforms, especially x86_64

Getting Started:

I am using strspn.c from the glibc as part of my project.


According to cplusplus.com

size_t strspn ( const char * str1, const char * str2 );
Get span of character set in string
Returns the length of the initial portion of str1 which consists only of characters that are part of str2.

The search does not include the terminating null-characters of either strings, but ends there.

I have to test the code on betty (aarch64 ) to see how it works and what is the algorithm that it is using. here is my first demo code.

#include <stdio.h>
#include "timer.h"
#include <string.h>
int main(void){
 Timer t;
 int i, y=0, text,set;
 double rtime;
 char strtext[] = "1aaaaabeiou";
 char cset[] = "123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz123456789abcdefghojklmnopqrstuvwxyz";
 for (; y <= 100000; y++){
 i = strspn(strtext, cset);
 text = sizeof(strtext);
 set = sizeof(cset);
 rtime = t.currtime();
 printf("strtext length: %d\n", text);
 printf("cset length: %d\n", set);
 printf("The initial number has %d digits.\n", i);
 printf("It took %lf second\n", rtime);
 return 0;

The result of the demo is very simple. it works very fast since the data is not large enough to slow down the function call. So I try a different approach.

Searching for Vectorization:

I start to read more about how to apply the Vectorization concept in C++ code. It turns out that the compiler will do most of the optimization including Vectorize and basic compiler optimization functions. Because when we using a compile option -O3 which in modern gcc turns on auto-vectorization for us. But there is still some step I can do to optimize the code. So I start to research and edit the strspn function base on the result.

  1. adding  __builtin_assume_aligned
    • instead of regular declaration const char *a = str
    • we can allow the compiler to assume that the returned pointer is at least align bytes aligned
    • here is the result const char *a = __builtin_assume_aligned(str, 16);
  2. Inlining
    • is a manual or compiler optimization that replaces a function call site with the body of the called function.
    • this optimization is used in the memset part where we use multiple small memsets to enable inlining on most targets.
  3. Switch case
    • instead of using multiple if statement to check for the result of accepted string. we can replace it with switch case in order to increase the speed of check the conditions.
  4. size_t
    • we can replace the integer type with size_t because the variables will not become negative number.

Here is the edited code

#include <string.h>
#include <stdint.h>
#include 	<libc-internal.h>

#undef strspn
#ifndef STRSPN
# define STRSPN strspn

/* Return the length of the maximum initial segment
of S which contains only characters in ACCEPT. */
STRSPN (const char *str, const char *accept)
if (accept[0] == '\0')
return 0;
if (__glibc_unlikely (accept[1] == '\0'))
//---const char *a = str;
const char *a = __builtin_assume_aligned(str, 16);// (+) allows the compiler to assume that\
the returned pointer is at least align bytes aligned
for (; *str == *accept; str++);
return str - a;

/* Use multiple small memsets to enable inlining on most targets. */
//Sets the first num bytes of the block of memory pointed by ptr to the specified value (interpreted as an unsigned char).
unsigned char table[256];
unsigned char *p = memset (table, 0, 64);
// (+) Inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function.
memset (p + 64, 0, 64);
memset (p + 128, 0, 64);
memset (p + 192, 0, 64);

unsigned char *s = (unsigned char*) accept;
/* Different from strcspn it does not add the NULL on the table
so can avoid check if str[i] is NULL, since table['\0'] will
be 0 and thus stopping the loop check. */
p[*s++] = 1;
while (*s);

s = (unsigned char*) str; // chars which are only 1 byte (8 bits long)\
A char has a range -127 to 127 while an unsigned char has a range of 0 to 255.
if (!p[s[0]]) return 0;
if (!p[s[1]]) return 1;
if (!p[s[2]]) return 2;
if (!p[s[3]]) return 3;
//(+) replace if case with with switch case
//* * * * * * * * * * Added
case !p[s[0]]:
return 0;
case !p[s[1]]:
return 1;
case !p[s[2]]:
return 2;
case !p[s[3]]:
return 3;
//* * * * * * * * * *
s = (unsigned char *) PTR_ALIGN_DOWN (s, 4);

//---unsigned int c0, c1, c2, c3;
unsigned size_t c0, c1, c2, c3; // (+) Because the variables will not be negative number
do {
s += 4;
c0 = p[s[0]];
c1 = p[s[1]];
c2 = p[s[2]];
c3 = p[s[3]];
} while ((c0 & c1 & c2 & c3) != 0);

size_t count = s - (unsigned char *) str;
return (c0 & c1) == 0 ? count + c0 : count + c2 + 2;
// (+) break down the conditional statement
//* * * * * * * * * * Added
if (c0 & c1 == 0){
return count + c0;
return count + c2 + 2 ;
//* * * * * * * * * *
libc_hidden_builtin_def (strspn)

The next step is to test this code base on the large amount of data which will provide the different results when run in other to compare the result and improve the function.










Author: dannydat2005

Hello, Welcome to my blog site on programming and related development topics. At this point in time I’m relatively new to the professional developer world, but am getting my foothold into the arena. I’ve benefited greatly already by the contributions and documentation that others have provided on the web for jobs that have been asked of me, and this shall represent the beginnings of my contributions in return. My interests in programming that you may find topics on within this site include open source development.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s