Unicode Strings and Memory Buffers

One of the Newton 2.x OS Q&As
Copyright © 1997 Newton, Inc. All Rights Reserved. Newton, Newton Technology, Newton Works, the Newton, Inc. logo, the Newton Technology logo, the Light Bulb logo and MessagePad are trademarks of Newton, Inc. and may be registered in the U.S.A. and other countries. Windows is a registered trademark of Microsoft Corp. All other trademarks and company names are the intellectual property of their respective owners.


For the most recent version of the Q&As on the World Wide Web, check the URL: http://www.newton-inc.com/dev/techinfo/qa/qa.htm
If you've copied this file locally, click here to go to the main Newton Q&A page.
This document was exported on 7/23/97.


Unicode Strings and Memory Buffers (8/26/96)

Q: Sometimes when I use the DILs to get a string, some memory gets corrupted even though I'm sure I've allocated more memory than I have characters in the string. What's going on?

A: One common cause is that strings arriving from a Newton device are in Unicode - which takes two bytes per character. If you've only allocated one byte per character, you risk memory corruption because the data is converted to the one-byte form only after the whole buffer has arrived. This might be too late to prevent overrunning the buffer bounds. So, you need to allocate enough space for the Unicode version.

For example, if you're expecting strings to be up to 50 characters long, you must allocate at least 100 bytes of memory in your buffer.