xplo.re Medusa Core Framework 3.1
  • Namespace
  • Class
  • Tree
  • Deprecated
  • Event
  • Todo
  • Download

Namespaces

  • Core
    • Authentication
      • Auto
        • Driver
      • Driver
    • Cache
      • Driver
    • Charset
      • Driver
    • Configuration
    • Controller
    • Converter
      • Driver
    • Decoder
      • Driver
    • Encoder
      • Driver
    • Env
      • Authentication
      • Locale
      • Proxy
      • Server
        • HTTP
        • Redirect
        • X
    • Exception
    • Exchange
      • Driver
    • Field
    • Filter
      • Driver
    • Formatter
      • Driver
        • coreuimarkup
          • Token
    • Hash
    • Loader
    • Locale
    • Log
      • Driver
    • Module
      • Dependency
        • Requirement
          • Core
          • PHP
      • Linkage
        • Driver
    • PDF
    • Plugin
    • Query
      • Driver
    • Server
      • Driver
    • Session
      • Driver
    • Set
    • Storage
      • Driver
      • Field
        • Element
    • Stream
      • Driver
    • String
    • Translation
      • Driver
      • Language
        • Driver
    • URI
      • Driver
    • Version
    • View
      • Driver
        • coreui
          • Control
            • Button
            • Table
          • Element
        • htmlbuilder
          • Tags
        • yui
          • Modules
  • None
  • PHP

Classes

  • Authentication
  • Authentication_Token
  • Autoloader
  • Cache
  • Charset
  • ClassDescriptor
  • Closure
  • Controller
  • Converter
  • Date
  • Decoder
  • Delegate
  • Dispatcher
  • Encoder
  • Env
  • Env_Field
  • Env_File
  • Env_FileList
  • Env_Request
  • Env_SERVER
  • Exchange
  • Fault
  • Field
  • Filter
  • Filter_Value
  • Formatter
  • Hash
  • IP
  • Loader
  • Locale
  • Locale_Iterator
  • Locale_Node
  • Log
  • Module
  • Module_Iterator
  • Nothing
  • Object
  • OS
  • Plugin
  • Profiler
  • Query
  • Registry
  • Resource
  • Server
  • Session
  • Session_Token
  • Set
  • Storage
  • Storage_Result
  • Storage_Statement
  • Stream
  • String
  • Translation
  • URI
  • URI_Parameters
  • UUID
  • Value
  • Version
  • View

Interfaces

  • Accessor
  • Chainable
  • Comparable
  • Configurable
  • Equality
  • Identity
  • Inquiry
  • Masquerade
  • Mutator
  • SeekableStream
  • Storable
  • Variator

Exceptions

  • Exception

Constants

  • Copyright
  • ProductName
  • UseVersion
  • Version

Functions

  • ns_expand
  • ns_file_exists
  • ns_get_version
  • ns_resolve

Class Charset

Provides support for handling different kinds of characters sets: A basic set of string functions, conversion methods and character constants.

Instances, Names and Aliases

Character set instances are treated as singletons and automatically buffered for reduced memory usage.

The name of a character set is based on the IANA-assigned name, using the preferred MIME name if available. As a wide variety of names, often with only slight differences in typography, exist, several aliases of supported character sets are defined. On each request, aliases are resolved to the correct IANA-assigned name of the character set, on which the actual character set driver is based on. Aliases can either be assigned programmatically via the Core\Charset base class, or defined as modules that only consist of a single alias registration call:

// Example of alias module ANSI_x3.4-1968 for character set ASCII.
namespace Core\Charset\Alias;

use Core\Charset;
use Core\Loader;

Loader::registerModule('1.0');

Charset::registerAlias('ansi_x3.4-1968', 'ascii');

Character Conversion

Character conversion is based on UTF-8 as an Internet standard and UCS-2 for fast conversion of characters in the Basic Multilingual Plane (BMP) and representation of a single character by a fixed 16-bit integer.

Almost all traditional character sets have their code points mapped within the Unicode BMP and can provide conversions to and from Unicode using lookup tables to their corresponding UCS-2 character which is then easily encoded in or decoded from UTF-8.

Core\Object implements Core\Chainable
Extended by Core\Set implements Core\Accessor
Extended by Core\Charset

Direct known subclasses

Core\Charset\Driver\ascii, Core\Charset\MultiByte

Indirect known subclasses

Core\Charset\Driver\iso88591, Core\Charset\Driver\iso885915, Core\Charset\Driver\utf8, Core\Charset\Driver\macroman, Core\Charset\Driver\sjis, Core\Charset\Driver\utf16, Core\Charset\Driver\utf16be, Core\Charset\Driver\utf16le, Core\Charset\Driver\utf32, Core\Charset\Driver\utf32be, Core\Charset\Driver\utf32le

Abstract
Namespace: Core
Package: Core\Charset
See: https://en.wikipedia.org/wiki/Character_encoding
See: http://www.iana.org/assignments/character-sets/character-sets.xhtml
Since: 3.0
Requires: PHP 5.3.2
Version: 1.4.1
Located at Charset.inc.php

Methods summary

public static Core\Charset
# activate( string|Core\Charset $charset )

Loads and activates a named character set.

Loads and activates a named character set.

The character set name activated by this function may differ from the requested character set name due to alias mappings. To retrieve the fully resolved name, call Core\Charset::active() to get the active character set and query the name property.

Parameters

$charset

Name of character set to load and activate or reference to Core\Charset instance to activate.

Returns

Core\Charset
Reference to activated character set instance.

See

Core\Charset::active()

Since

3.0
public static Core\Charset
# active( )

Returns the currently active global Core\Charset instance.

Returns the currently active global Core\Charset instance.

Returns

Core\Charset
Reference to currently active global Core\Charset instance.

See

Core\Charset::activate()

Since

3.0
public static string
# alias( string $name )

Returns the normalised character set name of an alias.

Returns the normalised character set name of an alias.

To perform alias expansion this method may load alias modules or character set drivers to retrieve the normalised name.

Parameters

$name
Name of character set alias to return normalised name for.

Returns

string

Normalised character set name or null, if the character set alias is unknown. If the provided character set is not an alias, its name is returned.

See

Core\Charset::activate()
load()

Since

3.0
public static string[]
# all( )

Loads all character set drivers and returns a list of their normalised names (the IANA registered name). The method automatically skips all drivers that are not supported on the current platform, e.g. due to missing extensions.

Loads all character set drivers and returns a list of their normalised names (the IANA registered name). The method automatically skips all drivers that are not supported on the current platform, e.g. due to missing extensions.

Loading all character set drivers is required to get a full list of supported character sets; therefore it may cause extra load on the server. If a list of all available character sets is used frequently, an application should cache the return value of this method between instances of the application. The return value of this method is cached in-memory, hence multiple calls to this method are safe.

Returns

string[]
Array of all available character set names.

Since

3.0
public static string
# convertUCS2ToUTF8( integer $wchar )

Converts a single UCS-2 (16-bit integer) character to the corresponding UTF-8 byte sequence.

Converts a single UCS-2 (16-bit integer) character to the corresponding UTF-8 byte sequence.

Parameters

$wchar
Unsigned 16-bit integer UCS-2 character to convert.

Returns

string
UTF-8 by sequence of converted UCS-2 character.

Since

3.1
public static integer|null
# convertUTF8ToUCS2( string $character, integer & $byteLength = null )

Converts a single UTF-8 character byte sequence to the corresponding UCS-2 (16-bit integer) character.

Converts a single UTF-8 character byte sequence to the corresponding UCS-2 (16-bit integer) character.

If more than one UTF-8 character is provided, conversion stops after the first character; additional characters are ignored.

UCS-2 only covers the Unicode BMP. Characters outside the BMP cannot be encoded.

Parameters

$character
Single UTF-8 character byte sequence to convert.
$byteLength

Optional reference to variable to store the length in bytes of the converted UTF-8 character. This is useful for obtaining the byte increment required to iterate through a UTF-8 string for full UCS-2 conversion. The length is always set, even if conversion to UCS-2 is not possible.

Returns

integer|null

16-bit integer UCS-2 character from UTF-8 byte sequence on success, otherwise null, if provided byte sequence is of length 0 or not a valid UTF-8 byte sequence or Unicode BMP character.

Remark

The maximum length of a Unicode BMP character encoded in UTF-8 is 3 bytes. Full Unicode coverage is achieved with 4 bytes with last code point U+1FFFFF. Originally, UTF-8 allowed encoding higher range characters with code points up to U+7FFFFFFF requiring up to six bytes, but was restricted in 2003 by RFC 3629 to match the constraints of the UTF-16 character encoding.


See

https://tools.ietf.org/html/rfc3629

Since

3.1
public static Core\Charset
# get( string $name )

Loads and returns a character set instance by name, performing alias expansion as necessary. Instances are automatically buffered, subsequent queries for the same character set will return the same instance.

Loads and returns a character set instance by name, performing alias expansion as necessary. Instances are automatically buffered, subsequent queries for the same character set will return the same instance.

Parameters

$name
Name or alias name of character set.

Returns

Core\Charset
Instance of requested character set.

Throws

Core\Exception\InvalidArgument
The provided character set name or alias name is invalid.

See

Core\Charset::activate()
Core\Charset::active()
Core\Charset::alias()

Since

3.0
public static Core\Charset\Iterator
# iterator( )

Returns a new iterator over all available character sets.

Returns a new iterator over all available character sets.

Returns

Core\Charset\Iterator
Character set iterator instance.

See

Core\Charset::all()

Since

3.0
public static
# registerAlias( string $alias, string $targetName )

Defines a new character set alias. Aliases may be assigned manually or implicitly through character set alias modules.

Defines a new character set alias. Aliases may be assigned manually or implicitly through character set alias modules.

Overwriting existing aliases is allowed; however setting alias names for fully expanded character set names is discouraged and may result in problems with external libraries. Several system functions depend on correct character set names and will fail, if instead an invalid character set name is defined.

Parameters

$alias
Character set alias name to define.
$targetName
Name of target character set.

Since

3.0
public string
# convertFrom( Core\Charset $charset, string $string )

Converts a string encoded in a given character set to the character set of this instance.

Converts a string encoded in a given character set to the character set of this instance.

Parameters

$charset
Character set of the source string.
$string
String encoded in provided character set.

Returns

string
Converted string in character set of this instance.

Since

3.1
public string
# convertFromActive( string $string )

Converts a string encoded in the globally active character set to the character set of this instance.

Converts a string encoded in the globally active character set to the character set of this instance.

Parameters

$string
String encoded in globally active character set.

Returns

string
Converted string in character set of this instance.

Since

3.1
abstract public string
# convertFromUTF8( string $string )

Converts a string encoded in UTF-8 to the current character set.

Converts a string encoded in UTF-8 to the current character set.

Parameters

$string
UTF-8 encoded source string.

Returns

string
Converted string in character set of current instance.

Since

3.1
public string
# convertTo( Core\Charset $charset, string $string )

Converts a string encoded in character set of this instance into a given character set.

Converts a string encoded in character set of this instance into a given character set.

Parameters

$charset
Character set to convert source string to.
$string
String encoded in this character set.

Returns

string
String converted from this character set to provided character set.

Since

3.1
public string
# convertToActive( string $string )

Converts a string encoded in character set of this instance into the globally active character set.

Converts a string encoded in character set of this instance into the globally active character set.

Parameters

$string
String encoded in this character set.

Returns

string

String converted from this character set to globally active character set.

Since

3.1
abstract public string
# convertToUTF8( string $string )

Converts a string encoded in the character set of this instance to UTF-8.

Converts a string encoded in the character set of this instance to UTF-8.

Parameters

$string
String encoded in this character set.

Returns

string
String converted from this character set to UTF-8.

Since

3.1
public string
# format( string $format, mixed $_ = null )

Character set specific variant of standard sprintf() function.

Character set specific variant of standard sprintf() function.

Parameters

$format
Format string.
$_
Optional. Additional arguments according to format string.

Returns

string
Formatted string.

See

Core\Charset::vformat()
http://php.net/sprintf

Since

3.1
public string
# hash( )

Returns a hash value that uniquely identifies the character set.

Returns a hash value that uniquely identifies the character set.

Returns

string
Hash value of character set.

Since

3.0

Overrides

Core\Object::hash
abstract public
# len( $string )

Calculates the length in characters of a given string.

Calculates the length in characters of a given string.

Parameters

$string
to return length in characters for.

Since

3.0
abstract public string
# pad( string $string, integer $length, string $padding, integer $flags, string $truncationMarker = null )

Fits a string by padding or truncation to a desired length. Padding uses characters from from an optional padding string.

Fits a string by padding or truncation to a desired length. Padding uses characters from from an optional padding string.

Parameters

$string
Source string to fit desired length.
$length

Desired length in characters of source string. If negative or lower than the length of the source string, the string is truncated; if equal to the length of the source string, it is returned unchanged.

$padding

String used as padding. If required padding is shorter than this string, only the leading part will be used.

$flags
Padding flags.
$truncationMarker

Optional marker string appended to the source string if truncated. If not defined or null, no truncation marker is used.

Returns

string
Fitted (padded or truncated) variant of the source string.

Since

3.1
public integer
# sizeof( string $string )

Returns the size of a string in octets (not characters).

Returns the size of a string in octets (not characters).

Parameters

$string
Binary string to return length in octets for.

Returns

integer
Length in octets of source string.

Since

3.0
abstract public string
# sub( string $string, integer $from, integer $length = null )

Returns a substring of a given string.

Returns a substring of a given string.

Parameters

$string
Source string to return substring from.
$from

Zero-based start position of the substring to return.If negative, the start position is calculated from the end of the string. If the start position equals or exceeds the length of the string, false is returned instead.

$length

Optional maximum length of substring to return. If negative, the length is calculated from the total length of the string minus start position and length parameter (ie. up to a given number of characters before the end of the string). If omitted or null, the remaining string length is used instead. A value of false or 0 will return an empty string.

Returns

string
Substring extracted from the source string.

Since

3.0
abstract public string
# toLower( string $string )

Transforms a given string to lowercase based on Unicode character properties with respect to this character set.

Transforms a given string to lowercase based on Unicode character properties with respect to this character set.

Parameters

$string
String to transform to lowercase.

Returns

string
Transformed lowercase variant of source string.

Since

3.1
abstract public string
# toUpper( string $string )

Transforms a given string to uppercase based on Unicode character properties with respect to this character set.

Transforms a given string to uppercase based on Unicode character properties with respect to this character set.

Parameters

$string
String to transform to uppercase.

Returns

string
Transformed uppercase variant of source string.

Since

3.1
abstract public string
# truncate( string $string, integer $start, integer $width, string $marker = null )

Truncates a string based on the desired visual width of the string.

Truncates a string based on the desired visual width of the string.

Parameters

$string
String to truncate.
$start
Immutable offset of characters that are never truncated.
$width
The desired (visible) width of the remainder after the start position.
$marker
Optional marker string appended to the string on truncation.

Returns

string
String with a maximum number of start offset + width visible characters.

Since

3.1
abstract public integer
# width( string $string )

Returns the visible length of a given string.

Returns the visible length of a given string.

The width of a character is a fixed Unicode property. Some characters have twice the width of characters in standard width, whereas control characters have no width.

Parameters

$string
String to return width of.

Returns

integer
The standard character width visible width of the string.

Since

3.1
public string
# vformat( string $format, array $argv )

Character set specific variant of standard vsprintf() function.

Character set specific variant of standard vsprintf() function.

Parameters

$format
Format string.
$argv
Array of arguments according to format string.

Returns

string
Formatted string.

See

Core\Charset::format()
http://php.net/vsprintf

Since

3.1
abstract public string
# xmlSpecials( $string, integer $quotes = null )

Encodes all special XML characters in provided string with respect to this character set instance.

Encodes all special XML characters in provided string with respect to this character set instance.

Parameters

$string
to encode.
$quotes

Optional bit mask to control encoding of quotation marks. Allows a combination of the following flags:

  • Core\Charset::NoQuotesFlag

    Default. Quotation marks are not encoded. Ignored if combined with other flags.

  • Core\Charset::SingleQuotesFlag

    Encodes single quotation marks as their respective XML entities.

  • Core\Charset::DoubleQuotesFlag

    Encodes double quotation marks as their respective XML entities.

Returns

string

String where all special XML characters were replaced by their corresponding XML entity.

Since

3.1
protected
# _restoreEnvironment( )

Called, when the currently active charset is replaced. The active charset should restore the environment to a state prior to its setup.

Called, when the currently active charset is replaced. The active charset should restore the environment to a state prior to its setup.

protected
# _setupEnvironment( )

Called, when the charset is activated.

Called, when the charset is activated.

Methods inherited from Core\Set

__get(), __isset()

Methods inherited from Core\Object

__autocreateFactory(), __call(), __processParameters(), __toString(), attachMethod(), chain(), getValueForKey(), getValueForKeyPath(), getValueForUndefinedKey(), issetValueForKey(), setValueForKey(), setValueForKeyPath(), setValueForUndefinedKey(), uuid()

Constants summary

string NUL

NUL character.

NUL character.

Since

3.0
# "\x00"
string SOH

Start of heading character.

Start of heading character.

Since

3.0
# "\x01"
string STX

Start of text character.

Start of text character.

Since

3.0
# "\x02"
string ETX

End of text character.

End of text character.

Since

3.0
# "\x03"
string EOT

End of transmission character.

End of transmission character.

Since

3.0
# "\x04"
string ENQ

Enquiry character.

Enquiry character.

Since

3.0
# "\x05"
string ACK

Acknowledge character.

Acknowledge character.

Since

3.0
# "\x06"
string BEL

Bell character.

Bell character.

Since

3.0
# "\x07"
string BS

Backspace character.

Backspace character.

Since

3.0
# "\x08"
string TAB

Horizontal tabulator character.

Horizontal tabulator character.

Since

3.0
# "\x09"
string LF

New line line feed character.

New line line feed character.

Since

3.0
# "\x0a"
string VT

Vertical tabulator character.

Vertical tabulator character.

Since

3.0
# "\x0b"
string FF

New page form feed character.

New page form feed character.

Since

3.0
# "\x0c"
string CR

Carriage return character.

Carriage return character.

Since

3.0
# "\x0d"
string SO

Shift out character.

Shift out character.

Since

3.0
# "\x0e"
string SI

Shift in character.

Shift in character.

Since

3.0
# "\x0f"
string DLE

Data link escape character.

Data link escape character.

Since

3.0
# "\x10"
string NAK

Negative acknowledge character.

Negative acknowledge character.

Since

3.0
# "\x15"
string SYN

Synchronous idle character.

Synchronous idle character.

Since

3.0
# "\x16"
string ETB

End of transmission block character.

End of transmission block character.

Since

3.0
# "\x17"
string CAN

Cancel character.

Cancel character.

Since

3.0
# "\x18"
string EM

End of medium character.

End of medium character.

Since

3.0
# "\x19"
string SUB

Substitute character.

Substitute character.

Since

3.0
# "\x1a"
string ESC

Escape character.

Escape character.

Since

3.0
# "\x1b"
string FS

File separator character.

File separator character.

Since

3.0
# "\x1c"
string GS

Group separator character.

Group separator character.

Since

3.0
# "\x1d"
string RS

Record separator character.

Record separator character.

Since

3.0
# "\x1e"
string US

Unit separator character.

Unit separator character.

Since

3.0
# "\x1f"
string DEL

Delete character.

Delete character.

Since

3.0
# "\x7f"
string CRLF

CRLF character sequence string.

CRLF character sequence string.

Since

3.0
# "\x0d\x0a"
integer NoQuotesFlag

Disables encoding of quotation marks.

Disables encoding of quotation marks.

Since

3.1
# 0
integer SingleQuotesFlag

Encodes single quotation marks.

Encodes single quotation marks.

Since

3.1
# 1
integer DoubleQuotesFlag

Encodes double quotation marks.

Encodes double quotation marks.

Since

3.1
# 2
integer LeftPadFlag

Pad string from the left.

Pad string from the left.

Since

3.1
# 1
integer RightPadFlag

Pad string from the right.

Pad string from the right.

Since

3.1
# 2
integer TruncatePadFlag

Allow truncation of string.

Allow truncation of string.

Since

3.1
# 4

Constants inherited from Core\Object

AnyParameterType, AutochainParameterType, AutocreateParameterType, BooleanParameterType, CharParameterType, EnumParameterType, IntegerParameterType, RealParameterType, StringParameterType, UserParameterType

Properties summary

protected static Core\Charset $_activeInstance

Reference to globally active instance.

Reference to globally active instance.

Since

3.1
#
protected static array $_aliasMap

Mapping of aliases to normalised driver names.

Mapping of aliases to normalised driver names.

Since

3.1
#
protected static array $_instances

Cache of initialised character set instances.

Cache of initialised character set instances.

Since

3.1
#
protected string $_v_displayName

Display name of character set. Must be set by each driver.

Display name of character set. Must be set by each driver.

Since

3.1
#
protected string $_v_name

Fully resolved character set name, based on IANA assignment.

Fully resolved character set name, based on IANA assignment.

Since

3.0
#

Magic properties

public read-only string $displayName

Display name of charset set instance. The display name itself is not a valid character set name or character set alias name. Its solely purpose is to provide a well known display name for the charset to the user.

public read-only string $name

Fully resolved name of the character set instance. Character set names are based on the default IANA assigned names and do not rely on the underlying operating system whose names may differ.

Magic properties inherited from Core\Object

$hash, $uuid

xplo.re Medusa Core Framework 3.1 API documentation generated by ApiGen