Class Charset
Provides support for handling different kinds of characters sets: A basic set of string functions, conversion methods and character constants.
Instances, Names and Aliases
Character set instances are treated as singletons and automatically buffered for reduced memory usage.
The name of a character set is based on the IANA-assigned name, using the
preferred MIME name if available. As a wide variety of names, often with
only slight differences in typography, exist, several aliases of supported
character sets are defined. On each request, aliases are resolved to the
correct IANA-assigned name of the character set, on which the actual character
set driver is based on. Aliases can either be assigned programmatically via
the Core\Charset
base class, or defined as modules that only consist of
a single alias registration call:
// Example of alias module ANSI_x3.4-1968 for character set ASCII. namespace Core\Charset\Alias; use Core\Charset; use Core\Loader; Loader::registerModule('1.0'); Charset::registerAlias('ansi_x3.4-1968', 'ascii');
Character Conversion
Character conversion is based on UTF-8
as an Internet standard and UCS-2
for fast conversion of characters in the Basic Multilingual Plane (BMP)
and representation of a single character by a fixed 16-bit integer.
Almost all traditional character sets have their code points mapped within
the Unicode BMP and can provide conversions to and from Unicode using lookup
tables to their corresponding UCS-2
character which is then easily encoded
in or decoded from UTF-8
.
- Core\Object implements Core\Chainable
- Core\Set implements Core\Accessor
- Core\Charset
Direct known subclasses
Indirect known subclasses
Core\Charset\Driver\iso88591
,
Core\Charset\Driver\iso885915
,
Core\Charset\Driver\utf8
,
Core\Charset\Driver\macroman
,
Core\Charset\Driver\sjis
,
Core\Charset\Driver\utf16
,
Core\Charset\Driver\utf16be
,
Core\Charset\Driver\utf16le
,
Core\Charset\Driver\utf32
,
Core\Charset\Driver\utf32be
,
Core\Charset\Driver\utf32le
Namespace: Core
Package: Core\Charset
See: https://en.wikipedia.org/wiki/Character_encoding
See: http://www.iana.org/assignments/character-sets/character-sets.xhtml
Since: 3.0
Requires: PHP 5.3.2
Version: 1.4.1
Located at Charset.inc.php
Methods summary
public static
|
|
public static
|
|
public static
string
|
|
public static
string[]
|
#
all( )
Loads all character set drivers and returns a list of their normalised names (the IANA registered name). The method automatically skips all drivers that are not supported on the current platform, e.g. due to missing extensions. |
public static
string
|
#
convertUCS2ToUTF8( integer $wchar )
Converts a single UCS-2 (16-bit integer) character to the corresponding UTF-8 byte sequence. |
public static
integer|null
|
#
convertUTF8ToUCS2( string $character, integer & $byteLength = null )
Converts a single UTF-8 character byte sequence to the corresponding UCS-2 (16-bit integer) character. |
public static
|
#
get( string $name )
Loads and returns a character set instance by name, performing alias expansion as necessary. Instances are automatically buffered, subsequent queries for the same character set will return the same instance. |
public static
|
|
public static
|
#
registerAlias( string $alias, string $targetName )
Defines a new character set alias. Aliases may be assigned manually or implicitly through character set alias modules. |
public
string
|
#
convertFrom(
Converts a string encoded in a given character set to the character set of this instance. |
public
string
|
#
convertFromActive( string $string )
Converts a string encoded in the globally active character set to the character set of this instance. |
abstract public
string
|
#
convertFromUTF8( string $string )
Converts a string encoded in UTF-8 to the current character set. |
public
string
|
#
convertTo(
Converts a string encoded in character set of this instance into a given character set. |
public
string
|
#
convertToActive( string $string )
Converts a string encoded in character set of this instance into the globally active character set. |
abstract public
string
|
#
convertToUTF8( string $string )
Converts a string encoded in the character set of this instance to UTF-8. |
public
string
|
#
format( string $format, mixed $_ = null )
Character set specific variant of standard sprintf() function. |
public
string
|
|
abstract public
|
|
abstract public
string
|
#
pad( string $string, integer $length, string $padding, integer $flags, string $truncationMarker = null )
Fits a string by padding or truncation to a desired length. Padding uses characters from from an optional padding string. |
public
integer
|
|
abstract public
string
|
#
sub( string $string, integer $from, integer $length = null )
Returns a substring of a given string. |
abstract public
string
|
#
toLower( string $string )
Transforms a given string to lowercase based on Unicode character properties with respect to this character set. |
abstract public
string
|
#
toUpper( string $string )
Transforms a given string to uppercase based on Unicode character properties with respect to this character set. |
abstract public
string
|
#
truncate( string $string, integer $start, integer $width, string $marker = null )
Truncates a string based on the desired visual width of the string. |
abstract public
integer
|
|
public
string
|
#
vformat( string $format, array $argv )
Character set specific variant of standard vsprintf() function. |
abstract public
string
|
#
xmlSpecials( $string, integer $quotes = null )
Encodes all special XML characters in provided string with respect to this character set instance. |
protected
|
#
_restoreEnvironment( )
Called, when the currently active charset is replaced. The active charset should restore the environment to a state prior to its setup. |
protected
|
Methods inherited from Core\Object
__autocreateFactory()
,
__call()
,
__processParameters()
,
__toString()
,
attachMethod()
,
chain()
,
getValueForKey()
,
getValueForKeyPath()
,
getValueForUndefinedKey()
,
issetValueForKey()
,
setValueForKey()
,
setValueForKeyPath()
,
setValueForUndefinedKey()
,
uuid()
Constants summary
string |
NUL
NUL character. |
#
"\x00"
|
string |
SOH
Start of heading character. |
#
"\x01"
|
string |
STX
Start of text character. |
#
"\x02"
|
string |
ETX
End of text character. |
#
"\x03"
|
string |
EOT
End of transmission character. |
#
"\x04"
|
string |
ENQ
Enquiry character. |
#
"\x05"
|
string |
ACK
Acknowledge character. |
#
"\x06"
|
string |
BEL
Bell character. |
#
"\x07"
|
string |
BS
Backspace character. |
#
"\x08"
|
string |
TAB
Horizontal tabulator character. |
#
"\x09"
|
string |
LF
New line line feed character. |
#
"\x0a"
|
string |
VT
Vertical tabulator character. |
#
"\x0b"
|
string |
FF
New page form feed character. |
#
"\x0c"
|
string |
CR
Carriage return character. |
#
"\x0d"
|
string |
SO
Shift out character. |
#
"\x0e"
|
string |
SI
Shift in character. |
#
"\x0f"
|
string |
DLE
Data link escape character. |
#
"\x10"
|
string |
NAK
Negative acknowledge character. |
#
"\x15"
|
string |
SYN
Synchronous idle character. |
#
"\x16"
|
string |
ETB
End of transmission block character. |
#
"\x17"
|
string |
CAN
Cancel character. |
#
"\x18"
|
string |
EM
End of medium character. |
#
"\x19"
|
string |
SUB
Substitute character. |
#
"\x1a"
|
string |
ESC
Escape character. |
#
"\x1b"
|
string |
FS
File separator character. |
#
"\x1c"
|
string |
GS
Group separator character. |
#
"\x1d"
|
string |
RS
Record separator character. |
#
"\x1e"
|
string |
US
Unit separator character. |
#
"\x1f"
|
string |
DEL
Delete character. |
#
"\x7f"
|
string |
CRLF
CRLF character sequence string. |
#
"\x0d\x0a"
|
integer |
NoQuotesFlag
Disables encoding of quotation marks. |
#
0
|
integer |
SingleQuotesFlag
Encodes single quotation marks. |
#
1
|
integer |
DoubleQuotesFlag
Encodes double quotation marks. |
#
2
|
integer |
LeftPadFlag
Pad string from the left. |
#
1
|
integer |
RightPadFlag
Pad string from the right. |
#
2
|
integer |
TruncatePadFlag
Allow truncation of string. |
#
4
|
Constants inherited from Core\Object
AnyParameterType
,
AutochainParameterType
,
AutocreateParameterType
,
BooleanParameterType
,
CharParameterType
,
EnumParameterType
,
IntegerParameterType
,
RealParameterType
,
StringParameterType
,
UserParameterType
Properties summary
protected static
|
$_activeInstance
Reference to globally active instance. |
|
protected static
array
|
$_aliasMap
Mapping of aliases to normalised driver names. |
|
protected static
array
|
$_instances
Cache of initialised character set instances. |
|
protected
string
|
$_v_displayName
Display name of character set. Must be set by each driver. |
|
protected
string
|
$_v_name
Fully resolved character set name, based on IANA assignment. |
Magic properties
public read-only
string
|
$displayName
Display name of charset set instance. The display name itself is not a valid character set name or character set alias name. Its solely purpose is to provide a well known display name for the charset to the user. |
public read-only
string
|
$name
Fully resolved name of the character set instance. Character set names are based on the default IANA assigned names and do not rely on the underlying operating system whose names may differ. |