ASCII non-printing characters

I had a question this morning from a junior working in another department for whom I have become the de-facto mentor. He’s learning SQL and dabbling in the dark arts of Crystal Reports and the benevolent power of SSRS.

He asked me this:

“I’m wondering if there is any way that a field holds line breaks and if I can utilize these within a report, example [TABLE].[FIELD] in [DATABASE] . See the screenshot where the general format of the email has been kept and then what I’m seeing in SSMS”

The screenshot shows a well formatted email:
Dear Person

We are writing to you to inform you….

I could have just replied “yes” but I can’t help myself but to try and add some more depth of understanding to the topic so here’s my reply. I think this is the sort of succinct response I would have appreciated reading when I was learning about these things.

In the ASCII table, there are 256 possible characters. 0-127 are universal. 128 – 255 are called ‘code pages’ and can contain different language characters depending on the code page chosen.

The Americans came up with ASCII in the 60s, being very English-language -centric, but it’s since been replaced by the Unicode standard which contains all languages (1 – 4 bytes per character) where ASCII has only 1 byte per char. Even still, Unicode hasn’t fully taken over yet, but certainly use it where you need multi-language support (beyond Western European languages). Thai and the other East Asian languages like Korean, Japanese etc are probably the most complicated multi-byte character languages.

In the image below, look at the “Dec” column from char(0) to char(32). These are called non-printing characters. A lot of them relate to old type-writer functions. Edit, Tele-Type functions. It makes sense that you would only need these codes for an automated type-writer like device, not a manual typewriter.

Most software still uses the CR carriage return – char(13) – and the LF line feed – char(10) – characters in combination to signify the end of a line.

Imagine a type-writer returning the “carriage” to the beginning of the line and then feeding the paper up one line. Some software has dropped one of those two chars and just uses one, which can lead to strange end of line stuff happening when you transfer data from one system to another. In those cases you sometimes need to parse the text and replace instances of the single character with an instance of both in combination.

When you save an ASCII file (as opposed to a binary file) to disk/memory, it is “serialized” meaning that it’s just one long string. That’s an oversimplification but serves to illustrate the point. The end-of-lines within the long string are marked by what might look like spaces, but it’s actually these non-printing characters.

If you pull that string field into an SSRS report, it will correctly display the text as entered, i.e. it will interpret all the non-printing characters correctly and your paragraph formatting will be maintained.

If you want to add in your own set of line breaks you can build a string like this:

=”This is the first line” + chr(10) + chr(13) + “this is the second line” etc etc

In some languages it will be written as chr(10) and in some it might be char(10). The plusses “+” might also be ampersands “&”

Obviously syntax changes but the methodology remains the same.

asciifull

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *