Here is a comparison of how string handling is programmed in various languages (LoadRunner C, Java, .NET, VB, PL/SQL, MS T-SQL).
Topics this page:
The C language was designed (during the early 1970's) to use a "null" character (binary zero represented by escape character "\0") to mark the end of each string.
This design can lead to "off-by-one" errors when an extra space is not allocated to hold that extra unseen character. A 4 character string requires a static allocation of 5 bytes. If a 4 character string is copied to a variable created as having 4 bytes, an invisible null character flows into the adjacent variable in memory. That adjacent variable will now appear to be blank because a binary zero value within a string truncates that string,
Such a "null termination" errors are so common that discussion boards are filled with the acronym DFTTZ for "Don't Forget The Terminating Zero".
C's unbounded overlay of memory allows programs to "smash the stack" of memory variables.
These errors are notoriously difficult to find because most C compiler do not catch such errors and because the adjacent variable is not always the next variable defined in the code. So such errors lie dormant in deployed code until a particular set of inputs causes a failure. This is a common exploit used by hackers.
This is why Microsoft Visual Studio 2005 deprecated common C string functions such as strcpy(), strcat(), gets(), streadd(), strecpy(), and strtrns().
Those who must work with traditional C functions statically allocate a large string size (such as 4000) to make enough room. This approach bloats the program's memory footprint.
The workaround is to test the length of the input using strlen() and either put out an error message or dynamically allocate the string size needed.
This, unfortunately, also has the potential of creating memory leaks over time if memory is not deallocated.
BTW, all this is actually an improvement to how strings are handled in the Pascal language, which uses the first byte to store the length of the string. Since computers used 4 bits per byte, strings in Pascal are limited to 256 bytes.
LPCSTR (Long Pointer to Const String) is a pointer to a null (zero) terminated sequence of ANSI bytes.
The System.String class in the Microsoft C# language is defined as a sealed class, which prevents it from being inherited, for security reasons.
Peter van der Linden, in his Expert C Programming: Deep Secrets (SunSoft/Prenticd Hall, 1994, ISBN 0131774298) notes that strlen() doesn't include the null character because a correction in an early version of the ANSI C standard didn't get carried over.
Robert Seacord, in his "Secure Coding in C and C++: Strings" Addison Wesley Professional book 0321335724 published Dec 1, 2005, provides example coding.
Allocating Memory for Strings
In C, to dynamically allocate the string size needed, use the C malloc() function:
The "sizeof(char)" returns the number of bytes per character (as in the NULL character).
Michael C. Daconta's ISBN 0894354736 C Pointers and Dynamic Memory Management (Wellesley, MA: QED Publishing, 1993)
String Length Calculation
In C, to return the length of a string (less one for the C NULL terminator):
These two statements do the same thing. In C, strings are stored as a sequence of characters. The "rvalue" is the value of that first character. The "lvalue" is the address of that first character. An asterisk in front of a variable name points to the array.
In C, to return the length of the leading characters in a string that are contained in a specified string:
In Oracle PL/SQL, to return a number indicating the number of characters in column x:
Websites today have visitors from all over the world, so strings in applcations now need to consider different languages (a process called localization and internationaliation) by using functions that recognize unicode instead of assuming use of the small ANSI character set.
To lexicographically compare the case-sensitive alphabetic order of two entire strings:
To determine the case-insensitive alphabetic order of two entire strings:
These functions return a positive integer if str1 sorts before (is greater than) str2.
These functions return a negative integer if str2 sorts before (is greater than) str1.
The C language has these additional functions:
Like a regular expression [ro], to return the length of the leading characters in a string that are contained in a specified string:
4 of 13 characters are an o or an r
To compare localized strings based on Thread.CurrentCulture settings (now the default behavior), use
or when case matters (such as in passwords):
This is needed especially for the French and Swedish languages, which use a different soft order than English.
To compare strings byte-by-byte without linguistic interpretation (as C strcmp does), use
or when case matters (such as in passwords):
To get the position of a substring within a larger string,
positionInt = InStrRev( StringToBeSearched, StringToBeFound )
C# offers the culturally-sensitive IndexOf method of the CompareInfo attribute:
int position = comparer.IndexOf( StringToBeSearched, StringToBeFound );
Insert between the first 4 characters and last character:
In C, to concatenate two strings (such as to create a full path from folder and file name with a backslash in between):
In C, to concatenate onto the end of of str1 n characters from the beginning of string str2:
strncat( str1, str2, nChars );
However, strings can be manipulated without reallocation within the VB.NET StringBuilder class:
If someText.Length() actually used exceeds the someText.Capacity() defined for the object, the capacity of that object is automatically doubled.
The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuffer class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java.
The String class append method concatenates two strings together and overrides the + operator for concatenation of StringBuffer objects.
StringBuffer s1 = new StringBuffer("String 1"); StringBuffer s2 = s1; // illegal! StringBuffer s3, s4; s3 = s1 + s2; s4.append(s1).append(s2);
Java provides two classes that encapsulate string values: String and StringBuffer. Both hold collections of 16-bit Unicode characters, which allow them to support multiple languages.
Finding Strings in Strings
To get an index to the beginning of the last occurrence of string2 within string1:
These Java functions return a -1 if string2 does not exist.
JavaTo obtain a string-based representation of information stored in a Java object:
ASPTo display the HTML tags without processing:
To encode each space as + and the & as %26 and = as %3D and etc.
Trimming Spaces in Strings
Ruby has a String.chomp method to remove the last character (such as a new line \n character automatically added by the Ruby gets function).
Java and ASP VBScript provides a function to remove spaces from both left and right:
string = trim( string );
With Microsoft C#, trim is a method of a value type:
string name = " something "; name = name.trim();
In Oracle PL/SQL, you have to specify the string '_' to be removed from both the left and right:
TRIM( both '_' FROM col1 )
C doesn't have a predefined trim() function, so use one of these custom functions:
LPAD( x ,y [,z] )
RPAD( x ,y [,z] )
In ASP VBScript, to chop a string at a starting position for a fixed number of characters:
Mid( string , start ,length )
ASP VBScript overloads the Mid function to remove all characters before a startposition:
Mid( string , start )
In Oracle PL/SQL, to return a substring of string x, starting at the character in position number y to the end, which is optionally defined by the character appearing in position z of the string.
SUBSTR( x ,y [,z] )
In ASP VBScript, to find the numeric position where a searchstring is found within a given string:
Instr( string , searchstring )
Your first name:
Your family name:
Your location (city, country):
Your Email address:
Top of Page