Package groovy.util
Class CharsetToolkit
- java.lang.Object
- 
- groovy.util.CharsetToolkit
 
- 
 public class CharsetToolkit extends java.lang.ObjectUtility class to guess the encoding of a given text file.Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered. A byte buffer of 4KB is used to be able to guess the encoding. Usage: CharsetToolkit toolkit = new CharsetToolkit(file); // guess the encoding Charset guessedCharset = toolkit.getCharset(); // create a reader with the correct charset BufferedReader reader = toolkit.getReader(); // read the file content String line; while ((line = br.readLine())!= null) { System.out.println(line); }
- 
- 
Constructor SummaryConstructors Constructor Description CharsetToolkit(java.io.File file)Constructor of theCharsetToolkitutility class.
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.nio.charset.Charset[]getAvailableCharsets()Retrieves all the availableCharsets on the platform, among which the defaultcharset.java.nio.charset.CharsetgetCharset()java.nio.charset.CharsetgetDefaultCharset()Retrieves the default Charsetstatic java.nio.charset.CharsetgetDefaultSystemCharset()Retrieve the default charset of the system.booleangetEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.java.io.BufferedReadergetReader()Gets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.booleanhasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).booleanhasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).booleanhasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).voidsetDefaultCharset(java.nio.charset.Charset defaultCharset)Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.voidsetEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.
 
- 
- 
- 
Method Detail- 
setDefaultCharsetpublic void setDefaultCharset(java.nio.charset.Charset defaultCharset) Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.- Parameters:
- defaultCharset- the default- Charsetto be returned if an 8-bit- Charsetis encountered.
 
 - 
getCharsetpublic java.nio.charset.Charset getCharset() 
 - 
setEnforce8Bitpublic void setEnforce8Bit(boolean enforce) If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. It might be a file without any special character in the range 128-255, but that may be or become a file encoded with the defaultcharsetrather than US-ASCII.- Parameters:
- enforce- a boolean specifying the use or not of US-ASCII.
 
 - 
getEnforce8Bitpublic boolean getEnforce8Bit() Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.- Returns:
- a boolean representing the flag of use of US-ASCII.
 
 - 
getDefaultCharsetpublic java.nio.charset.Charset getDefaultCharset() Retrieves the default Charset
 - 
getDefaultSystemCharsetpublic static java.nio.charset.Charset getDefaultSystemCharset() Retrieve the default charset of the system.- Returns:
- the default Charset.
 
 - 
hasUTF8Bompublic boolean hasUTF8Bom() Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).- Returns:
- true if the buffer has a BOM for UTF8.
 
 - 
hasUTF16LEBompublic boolean hasUTF16LEBom() Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).- Returns:
- true if the buffer has a BOM for UTF-16 Low Endian.
 
 - 
hasUTF16BEBompublic boolean hasUTF16BEBom() Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).- Returns:
- true if the buffer has a BOM for UTF-16 Big Endian.
 
 - 
getReaderpublic java.io.BufferedReader getReader() throws java.io.FileNotFoundExceptionGets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.- Returns:
- a BufferedReader
- Throws:
- java.io.FileNotFoundException- if the file is not found.
 
 - 
getAvailableCharsetspublic static java.nio.charset.Charset[] getAvailableCharsets() Retrieves all the availableCharsets on the platform, among which the defaultcharset.- Returns:
- an array of Charsets.
 
 
- 
 
-