Welcome to MorphOS-Storage, a webserver dedicated to MorphOS users. ©2016-2024 Meta-MorphOS.org
Description:Unicode code point/UTF-8 support lib.
Developer/Porter:Joerg van de Loo
Readme:
Short: Unicode code point/UTF-8 support lib
Author: joergloo@aol.com (Joerg van de Loo)
Uploader: joergloo aol com (Joerg van de Loo)
Type: util/libs
Version: 10.21
Replaces: util/libs/UniLibDev.lha
Architecture: m68k-amigaos >= 1.2.0; ppc-amigaos; ppc-morphos; i386-aros

Foreword:
---------
With the ongoing progress in development of MorphOS and AmigaOS4, also in
order to treat UTF-8 no longer as stepchild, I do hope that they will
render this library useless (no, I'm not kidding).
That means that you should first check whether there is support in your OS
for a certain task and only in case not, you should fallback on functions
provided by Uni library.


Introduction:
-------------
Uni library is a support library for Unicode code points in range from 0 to
1'114'111 - thus not limited to the Basic Multilingual Plane (range 0 to
65'535).
You may determine code point attributes (UPPERCASE_LETTER,
LOWERCASE_LETTER, TITLECASE_LETTER etc.) as well as you are able to change
these attributes for a code point (mapping the code point to its
counterpart).

Because I haven't found a shared library with support functions that can
cover UTF-8 strings, I've built them into Uni Library as well, like for
example: UTF8StrCmp().

Furthermore, transcoding of strings from one format to an other is also
implemented, like through: UTF16ToUTF8().


Thus, it's a shared library for three tasks:
Determining code point attributes / mapping code points.
Handling of UTF-8 multibyte sequences.
Transcoding strings.


The enclosed documentation was drawn up in HTML - and I spent a lot of time
in order to clarify some misleading terms, which are frequently used by
people, who do not fully understand for what Unicode and its related terms
stand for. Okay, I'm not an expert myself, however, please read the
documentation I provided before you study the API of this library; it will
be your benefit.


Changes version 10 (release 7):
-------------------------------
This new version of Uni library was upgraded in order to adopt the Unicode
Standard, Version 10.0.0 character encoding scheme as published by the
Unicode Consortium and so far as my limited implementation can support it.

Changed a lot of code in order to gain some speed while composing UTF-8 code
point values, which was necessary for plain 68000 CPUs.
Enclosed plain 68000 version of this library.


Five new functions:
UTF8NumCodePoints()
UTF8RepSeq()
UTF8RepSeqArg()
UTF8CodePoint()
UTF8AnyCodePoint()

UniCheckEncoding() might now return 0, i.e. a broken UTF-8 encoding
is encountered.

Changed header file (uni.h).
Updated C proto and clib files in order to reflect new functions.

UniCodeChart() supports now 327 code charts - and please see notes in
"uni.h" regarding buffer size.

Garbled UTF-8 sequences can be catched as valid UTF-8 code points via
UTF8AnyCodePoint().

Added kickstart v33 fallback functions in order to compile OS2+ only source
codes also for OS1.2+ (tools/ks33).


Version 8 and 9 never left my harddisk.
---------------------------------------

Version 6 and 7 could only obtained via my homepage.
----------------------------------------------------

Changes 5.20 to 5.21 (release 6):
---------------------------------
Fixed a minor bug in UniCodeChart().


Changes 5.15 to 5.20 (release 5):
---------------------------------
This new version of Uni library was upgraded in order to adopt the Unicode
Standard, Version 5.2.0 character encoding scheme as published by the
Unicode Consortium and so far as my limited implementation can support it.

UniCodeChart() supports 27 new code charts and with that it supports 228
code charts in total.


Changes 5.14 to 5.15 (release 4):
---------------------------------
It may happened that the code point 128 wasn't properly handled upon
transcoding - fixed.
Finally, managed to create the AROS version by writing a tool (Weaver) that
creates automatically the library frame and all compiler needed files by
simply specifying a certain SFD file.


Changes 5.12 to 5.14 (release 3):
---------------------------------
This new version of Uni library was upgraded in order to adopt the Unicode
Standard, Version 5.1.0 character encoding scheme as published by the
Unicode Consortium and so far as my limited implementation can support it.

In addition, this new version fixes a bug which surfaced in case a
UTF-32/UTF-16 string was to be transcoded to UTF-8. The UTF-8 string buffer
had to be at least four bytes bigger than required (ouch...).

UniCodeChart() supports 22 new code charts and with that it supports 201
code charts in total.


Notes on transcoding singlebyte character encoding schemes:
-----------------------------------------------------------
I did release an additional archive (UniSupport v1.1) that shall make it
easier for you to transcode strings by utilizing IANA-IDs, which are also
used by the operating system's Locale library up from version 50 (MorphOS,
AmigaOS4, AROS).


Functions:
----------
The API provides these functions:

Code Points Attribute Information

UniIsAlpha()
UniIsAttr()
UniIsCon()
UniIsDigit()
UniIsLower()
UniIsNSM()
UniIsPrint()
UniIsPunct()
UniIsSpace()
UniIsTitle()
UniIsUpper()

UniToLower()
UniToTitle()
UniToUpper()

UniCodeChart()

UTF-8 String Information

UTF8IsLegal()
UTF8LegalStart()
UTF8NextChar()
UTF8PrevChar()
UTF8CharAtIndex()

UTF8StrInfo()
UTF8StrLen()
UTF8StrOfSize()
UTF8StrVisibleLen()

UTF8NumCodePoints() (*NEW V10*)

UTF-8 String Comparison / Modifiers

UTF8StrCat()
UTF8StrCmp()
UTF8StrCmpI()
UTF8StrCpy()
UTF8StrFind()
UTF8StrMatch()
UTF8StrNCat()
UTF8StrNCmp()
UTF8StrNCmpI()
UTF8StrNCpy()
UTF8StrPaste()
UTF8StrReplace()
UTF8StrTerminate()
UTF8StrToken()
UTF8StrToLower()
UTF8StrToTitle()
UTF8StrToUpper()

UTF8RepSeq() (*NEW V10*)
UTF8RepSeqArg() (*NEW V10*)

Miscellaneous (Wide Char) String Functions

UTF16StrLen()
UTF32StrLen()

UTF16CharAsUTF8Len()
UTF32CharAsUTF8Len()

Transcodings

LatinToUTF8()
UTF8ToLatin()

UTF16ToUTF8()
UTF32ToUTF8()

UTF8ToUTF16()
UTF8ToUTF16Char()
UniResultIsSurrogate()
UTF8ToUTF32()
UTF8ToUTF32Char()

UTF8CodePoint() (*NEW V10*)
UTF8AnyCodePoint() (*NEW V10*)

Encodings

UniCheckEncoding()
UniBomHasSize()
UniSwitchEncoding()


Upload Date:Jul 31 2018
Category:Dependencies/Library/Misc
Download:UniLibDev_10.21.lha
Md5:0077185fee0def9a2d92001dd85aba64
Size:415 KB
Downloads:198
Screenshot(s)
History
Last Comments