Strings
A purely static String Helper to handle more advanced utf8 string manipulations.
Prerequisites
Just like Utf8
, Strings
requires mb_string, ext-intl is auto detected and used when available for UTF8 Normalization.
Methods
Signature | Description |
---|---|
filter(string $string):string |
Drops Zero Width white chars, normalizes EOL and Normalize UTF8 if ext-intl is available |
singleWsIze(string $string, bool $normalize = false, bool $includeTabs = true):string |
Replace repeated white-spaces to a single one, preserve original white-spaces unless normalized (every white-spaces to ' '), with or without tabs (\t) |
singleLineIze(string $string):string |
Make string fit in one line by replacing EOLs and white-spaces to normalized single white-spaces |
dropZwWs(string $string):string |
Remove Zero Width white-spaces |
normalizeWs(string $string, bool $includeTabs = true, int $maxConsecutive = null):string |
Normalize white-spaces to a single ' ' by default, include tabs by default |
normalizeEol($string, $maxConsecutive = null, $eol = null):string |
Normalize EOLs to a single LF by default |
normalizeText(string $text):string |
Return trim 'd and filter 'd $text |
normalizeTitle(string $title):string |
Return singleLineIze 'd and normalizeText 'd $title |
normalizeName(string $name):string |
Return ucword 'd and normalizeTitle 'd $name ("john \n\t doe " -> "John Doe" ) |
escape(string $string, int $flag = ENT_COMPAT, bool $hardEscape = true):string |
htmlspecialchars() wrapper with UTF8 set as encoding |
softEscape(string $string, int $flag = ENT_COMPAT):string |
Shortcut for escape(string $string, $flag, true) |
unEscape(string $string, int $quoteStyle = ENT_COMPAT):string |
htmlspecialchars_decode() wrapper |
convert(string $string, string $from = null, string $to = self::ENCODING):string |
Convert encoding to UTF8 by default. Basic $from encoding detection using Strings::detectEncoding() |
detectEncoding(string $string):string/null |
Detect encoding by checking Utf8::isUf8() , then trying with BOMs and ultimately fall back to mb_detect_encoding() with limited charsets first, then more internally in mb_convert_encoding() |
secureCompare(string $test, string $reference):bool |
Perform a Timing Attack safe string comparison (Truly constant operations comparison) |
contentHash(string $content):string |
Return a sha256 hash of the $content prefixed with $content length. Indented to quickly and reliably detect $content updates. |
White-spaces
White-spaces is a not so trivial matter, Strings
defines to classes of white-spaces :
- Zero width white-spaces:
/**
* U+200B zero width space
* U+FEFF zero width no-break space
*/
const ZERO_WIDTH_WS_CLASS = '\x{200B}\x{FEFF}';
- Non standard white-spaces:
/**
* U+00A0 no-break space
* U+2000 en quad
* U+2001 em quad
* U+2002 en space
* U+2003 em space
* U+2004 three-per-em space
* U+2005 four-per-em space
* U+2006 six-per-em space
* U+2007 figure space
* U+2008 punctuation space
* U+2009 thin space
* U+200A hair space
* U+202F narrow no-break space
* U+3000 ideographic space
*/
const NON_STANDARD_WS_CLASS = '\x{00A0}\x{2000}-\x{200A}\x{202F}\x{3000}';
Zero width white-spaces do not include Joiners because the idea is to remove text formatting, not to transform input text. Non standard white-spaces are also pretty specific to just match actual white-spaces and nothing more when removing / normalizing white-spaces.