PythonToolsKit.DebugEncoding (version 0.0.1) | index DebugEncoding.py |
This tool helps you to debug encodings errors.
I writed this tool because i had problems with Windows
commands output launching with PowerShell on remote host using WinRM.
The remote host encodes his command output as cp437.
My PowerShell decodes the command output as cp1252.
To know what encodings are used i run this tool with this command:
~# python3 DebugEncoding.py éêâ --bad-values "‚ˆƒ" # I see in the output the 'é' is replaced by ',', 'ê' by 'ˆ' and 'â' by 'ƒ'.
...
Encoding: 'cp437', Decoding: 'cp1252', Output: '‚ˆƒ'
...
~#
Problem and soluce using python:
>>> from string import printable
>>> from os import popen
>>> output1 = popen("schtasks").read()
>>> output2 = output1.encode("cp1252").decode("cp437")
>>> assert "tâche" not in output1
>>> assert "tâche" in output2
>>> assert "Prêt" not in output1
>>> assert "Prêt" in output2
>>> assert "Désactivé" not in output1
>>> assert "Désactivé" in output2
>>> matchs, works = debug_encoding('éêâ', '‚ˆƒ')
>>> ("cp437", "cp1252") in [(match.encoding, match.decoding) for match in matchs]
True
>>> ("‚ˆƒ", "cp437", "cp1252") in [(work.decoded_values, work.encoding, work.decoding) for x in works.values() for work in x if work.encoding.startswith('cp') and work.decoding.startswith('cp')]
True
>>>
Soluce using PowerShell:
PS C:\Windows> $data = [Text.Encoding]::GetEncoding(1252).GetBytes($(schtasks))
PS C:\Windows> $command_output = [Text.Encoding]::GetEncoding(437).GetString($data)
~# python3 DebugEncoding.py éêâ --bad-values "‚ˆƒ"
Encoding: 'cp858', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp858', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp858', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp858', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp858', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp858', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp857', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp865', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp861', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp850', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp860', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp437', Decoding: 'mbcs', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'cp1254', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'cp1258', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'cp1252', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'cp1256', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'cp1255', Output: '‚ˆƒ'
Encoding: 'cp863', Decoding: 'mbcs', Output: '‚ˆƒ'
~# python DebugEncoding.py éêâ --decoding cp1252 --bad-values "‚ˆƒ" --json
[
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp861"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp857"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp863"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp437"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp858"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp860"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp865"
},
{
"bad_values": "‚ˆƒ",
"decoded_values": "‚ˆƒ",
"decoding": "cp1252",
"encoding": "cp850"
}
]
~# python3 DebugEncoding.py éêâ --encoding cp1252 --json
{
"ΘΩΓ": {
"bad_values": null,
"decoded_values": "ΘΩΓ",
"decoding": "cp437",
"encoding": "cp1252"
},
"ικβ": {
"bad_values": null,
"decoded_values": "ικβ",
"decoding": "iso8859_7",
"encoding": "cp1252"
},
"éêâ": {
"bad_values": null,
"decoded_values": "éêâ",
"decoding": "iso8859_15",
"encoding": "cp1252"
},
"οπθ": {
"bad_values": null,
"decoded_values": "οπθ",
"decoding": "cp869",
"encoding": "cp1252"
},
"ÈÍ‚": {
"bad_values": null,
"decoded_values": "ÈÍ‚",
"decoding": "mac_iceland",
"encoding": "cp1252"
},
"йкв": {
"bad_values": null,
"decoded_values": "йкв",
"decoding": "cp1251",
"encoding": "cp1252"
},
"יךג": {
"bad_values": null,
"decoded_values": "יךג",
"decoding": "cp1255",
"encoding": "cp1252"
},
"ιξβ": {
"bad_values": null,
"decoded_values": "ιξβ",
"decoding": "mac_greek",
"encoding": "cp1252"
},
"жЖР": {
"bad_values": null,
"decoded_values": "жЖР",
"decoding": "cp855",
"encoding": "cp1252"
},
"ÚŕÔ": {
"bad_values": null,
"decoded_values": "ÚŕÔ",
"decoding": "cp852",
"encoding": "cp1252"
},
"éęâ": {
"bad_values": null,
"decoded_values": "éęâ",
"decoding": "iso8859_10",
"encoding": "cp1252"
},
"ﻯﻳﻗ": {
"bad_values": null,
"decoded_values": "ﻯﻳﻗ",
"decoding": "cp864",
"encoding": "cp1252"
},
"Õõã": {
"bad_values": null,
"decoded_values": "Õõã",
"decoding": "hp_roman8",
"encoding": "cp1252"
},
"éźā": {
"bad_values": null,
"decoded_values": "éźā",
"decoding": "iso8859_13",
"encoding": "cp1252"
},
"ķĻŌ": {
"bad_values": null,
"decoded_values": "ķĻŌ",
"decoding": "cp775",
"encoding": "cp1252"
},
"Z²S": {
"bad_values": null,
"decoded_values": "Z²S",
"decoding": "cp273",
"encoding": "cp1252"
},
"ÚÛÔ": {
"bad_values": null,
"decoded_values": "ÚÛÔ",
"decoding": "cp857",
"encoding": "cp1252"
},
"ИЙБ": {
"bad_values": null,
"decoded_values": "ИЙБ",
"decoding": "koi8_r",
"encoding": "cp1252"
},
"щът": {
"bad_values": null,
"decoded_values": "щът",
"decoding": "cp866",
"encoding": "cp1252"
},
"้๊โ": {
"bad_values": null,
"decoded_values": "้๊โ",
"decoding": "tis_620",
"encoding": "cp1252"
},
"ىيق": {
"bad_values": null,
"decoded_values": "ىيق",
"decoding": "iso8859_6",
"encoding": "cp1252"
},
"ťÍ‚": {
"bad_values": null,
"decoded_values": "ťÍ‚",
"decoding": "mac_latin2",
"encoding": "cp1252"
}
}
~# python3 DebugEncoding.py éêâ --decoding cp1252
Output: '…~…€…w':
Encoding: 'shift_jis_2004', Decoding: 'cp1252'
Output: '…~…€…w':
Encoding: 'shift_jisx0213', Decoding: 'cp1252'
Output: 'é ê â ':
Encoding: 'utf_32_le', Decoding: 'cp1252'
Output: '+AOkA6gDi-':
Encoding: 'utf_7', Decoding: 'cp1252'
Output: 'QRB':
Encoding: 'cp500', Decoding: 'cp1252'
Output: 'QRB':
Encoding: 'cp1140', Decoding: 'cp1252'
Output: 'QRB':
Encoding: 'cp273', Decoding: 'cp1252'
Output: 'QRB':
Encoding: 'cp1026', Decoding: 'cp1252'
Output: 'QRB':
Encoding: 'cp037', Decoding: 'cp1252'
Output: 'ÿþé ê â ':
Encoding: 'utf_16', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'cp1254', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'latin_1', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'mbcs', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'iso8859_14', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'iso8859_9', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'iso8859_3', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'cp1258', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'cp1256', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'iso8859_16', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'iso8859_15', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp860', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp865', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp863', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp861', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp858', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp850', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp857', Decoding: 'cp1252'
Output: '‚ˆƒ':
Encoding: 'cp437', Decoding: 'cp1252'
Output: '$(D+1+4+$(B':
Encoding: 'iso2022_jp_2', Decoding: 'cp1252'
Output: '$(D+1+4+$(B':
Encoding: 'iso2022_jp_1', Decoding: 'cp1252'
Output: '$(D+1+4+$(B':
Encoding: 'iso2022_jp_ext', Decoding: 'cp1252'
Output: 'ÿþ é ê â ':
Encoding: 'utf_32', Decoding: 'cp1252'
Output: 'éêâ':
Encoding: 'utf_8', Decoding: 'cp1252'
Output: 'é ê â ':
Encoding: 'utf_16_le', Decoding: 'cp1252'
Output: '©ß©à©Ø':
Encoding: 'euc_jisx0213', Decoding: 'cp1252'
Output: '©ß©à©Ø':
Encoding: 'euc_jis_2004', Decoding: 'cp1252'
Output: '$(Q)_)`)X(B':
Encoding: 'iso2022_jp_2004', Decoding: 'cp1252'
Output: 'ÅÁÀ':
Encoding: 'hp_roman8', Decoding: 'cp1252'
Output: ' é ê â':
Encoding: 'utf_16_be', Decoding: 'cp1252'
Output: ' é ê â':
Encoding: 'utf_32_be', Decoding: 'cp1252'
Output: '$(O)_)`)X(B':
Encoding: 'iso2022_jp_3', Decoding: 'cp1252'
~#
Tests:
~# python3 -m doctest -v DebugEncoding.py
13 tests in 8 items.
13 passed and 0 failed.
Test passed.
~#
Functions | ||
|
Data | ||
__all__ = ['debug_encoding'] __author_email__ = 'mauricelambert434@gmail.com' __copyright__ = '\nDebugEncoding Copyright (C) 2023 Maurice Lamb...ome to redistribute it\nunder certain conditions.\n' __description__ = 'This tool helps you to debug encodings errors.' __license__ = 'GPL-3.0 License' __maintainer__ = 'Maurice Lambert' __maintainer_email__ = 'mauricelambert434@gmail.com' __url__ = 'https://github.com/mauricelambert/PythonToolsKit/' |
Author | ||
Maurice Lambert |