-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Minimal reproduction code and steps
$ http -b --pretty=none https://raw.githubusercontent.com/Ousret/charset_normalizer/master/data/sample-chinese.txt
Current result
����j��]Wikipedia�^�̡A������¦�F���ѤU���B�|�����B���H�ӡA�Ѧʬ�j�C�l�@�̡A����C�����|�]�C
��F�G�����~�Q�G��ܤ@�A�Τv���~�G��Q�E�A���U��y���G�ʤ��Q�A�X�O�C�ʸU�ءF�G�Q�j�����K���A��^�����L�G�ʸU�C�x��D�ѤU���Ӧ@���Ӧ��F���N�U���A�X�����B�Hġ�@�A�j��_�j�C
����@���A�X�j�ոܺ���C�m���l�����n��J�u���Aô�]�v�A�m����n��J�u��A��l�]�v�A��H�X���A���Nô�����l�]�C��´�����A�H���ڥ��A�����]�C
�j��o���A�P�D�ݵo�F�p�s�D�B�Х��B�����B�ӾǡB�y���B��w���C����ئh�i�w�\�ש��s�������F�\�����֡A�ѱo�Pġ��A����_��ɨ�a��H�A�M�h��Ҹ��չ�A���K���G�C
�Z�����A�Ҿڪ̡A�ꭲ���ۥѤ��ɳ\�i��ij�A�G�i�ۥѼs�ǤѤU�աC
�娥����l������~�C�i�A�����o��G�d�@�ʤK�Q�E�C
commons:����
�n�H�M�T�A��������@�ɡJ����j��C
Expected result
維基大典(Wikipedia)者,網路為礎;集天下知、四海言、眾人志,書百科焉。始作者,維基媒體基金會也。
典肇乎庚辰年十二月廿一,及己丑年二月十九,收各方語言二百五十,合逾七百萬目;二十大卷佔八成,單英文卷亦過二百萬。悉文乃天下有志共筆而成;有意助之,幾網路、隨纂作,大典茁焉。
維基一詞,出焉白話維基。《墨子閒詁》曰︰「維,繫也」,《說文》曰︰「基,牆始也」,其人合之,取意繫物之始也。維織網綱,以載根本,亦維基也。
大典得幸,同道兼發;如新聞、教本、爾雅、太學、語錄、文庫等。其卷目多可逕閱修於瀏覽之器;蓋眾不論誰,俱得與纂綴,弗制于其時其地其人,然則典所載謬實,未免爭辯。
凡維基之策,所據者,曰革奴自由文檔許可協議,故可自由廣傳天下耳。
文言維基始於丙戌年七夕,迄今得文二千一百八十九。
commons:卷首
聲象映響,具錄於維基共享︰維基大典。
Debug output
$ http --debug --pretty=none https://raw.githubusercontent.com/Ousret/charset_normalizer/master/data/sample-chinese.txt
HTTPie 2.6.0.dev0
Requests 2.26.0
Pygments 2.10.0
Python 3.9.7+ (heads/3.9:09390c837a, Sep 22 2021, 11:36:19)
[GCC 10.3.0]
/home/tiger-222/projects/httpie/venv39/bin/python
Linux 5.10.0-8-amd64
<Environment {'colors': 256,
'config': {'default_options': []},
'config_dir': PosixPath('/home/tiger-222/.config/httpie'),
'devnull': <property object at 0x7f55691a66d0>,
'is_windows': False,
'log_error': <function Environment.log_error at 0x7f55691ad3a0>,
'program_name': '__main__.py',
'stderr': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>,
'stderr_isatty': True,
'stdin': <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>,
'stdin_encoding': 'utf-8',
'stdin_isatty': True,
'stdout': <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>,
'stdout_encoding': 'utf-8',
'stdout_isatty': True}>
<PluginManager {'adapters': [],
'auth': [<class 'httpie.plugins.builtin.BasicAuthPlugin'>,
<class 'httpie.plugins.builtin.DigestAuthPlugin'>],
'converters': [],
'formatters': [<class 'httpie.output.formatters.headers.HeadersFormatter'>,
<class 'httpie.output.formatters.json.JSONFormatter'>,
<class 'httpie.output.formatters.xml.XMLFormatter'>,
<class 'httpie.output.formatters.colors.ColorFormatter'>]}>
>>> requests.request(**{'auth': None,
'data': RequestJSONDataDict(),
'headers': {'User-Agent': b'HTTPie/2.6.0.dev0'},
'method': 'get',
'params': <generator object MultiValueOrderedDict.items at 0x7f5569058900>,
'url': 'https://raw.githubusercontent.com/Ousret/charset_normalizer/master/data/sample-chinese.txt'})
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 626
Cache-Control: max-age=300
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Content-Type: text/plain; charset=utf-8
ETag: W/"afe2305da1ba095a7b6cfc792fb066356cc5f3f571f6dfcdea533a41a9b5daa1"
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
X-GitHub-Request-Id: 38A4:EAF4:D2C6A:ED8C9:6154225D
Content-Encoding: gzip
Accept-Ranges: bytes
Date: Wed, 29 Sep 2021 08:42:31 GMT
Via: 1.1 varnish
X-Served-By: cache-cdg20732-CDG
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1632904951.350979,VS0,VE0
Vary: Authorization,Accept-Encoding,Origin
Access-Control-Allow-Origin: *
X-Fastly-Request-ID: cb5e11b4a5af2375903564d55226b463701ef104
Expires: Wed, 29 Sep 2021 08:47:31 GMT
Source-Age: 74
����j��]Wikipedia�^�̡A������¦�F���ѤU���B�|�����B���H�ӡA�Ѧʬ�j�C�l�@�̡A����C�����|�]�C
��F�G�����~�Q�G��ܤ@�A�Τv���~�G��Q�E�A���U��y���G�ʤ��Q�A�X�O�C�ʸU�ءF�G�Q�j�����K���A��^�����L�G�ʸU�C�x��D�ѤU���Ӧ@���Ӧ��F���N�U���A�X�����B�Hġ�@�A�j��_�j�C
����@���A�X�j�ոܺ���C�m���l�����n��J�u���Aô�]�v�A�m����n��J�u��A��l�]�v�A��H�X���A���Nô�����l�]�C��´�����A�H���ڥ��A�����]�C
�j��o���A�P�D�ݵo�F�p�s�D�B�Х��B�����B�ӾǡB�y���B��w���C����ئh�i�w�\�ש��s�������F�\�����֡A�ѱo�Pġ��A����_��ɨ�a��H�A�M�h��Ҹ��չ�A���K���G�C
�Z�����A�Ҿڪ̡A�ꭲ���ۥѤ��ɳ\�i��ij�A�G�i�ۥѼs�ǤѤU�աC
�娥����l������~�C�i�A�����o��G�d�@�ʤK�Q�E�C
commons:����
�n�H�M�T�A��������@�ɡJ����j��C
Additional information, screenshots, or code examples
The issue occurs because the Content-Type
header specifies the UTF-8 encoding, but it is BIG5 actually.
Before #1110, the error was more obvious:
# headers (...)
__main__.py: error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
Traceback (most recent call last):
File "/.../Lib/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/.../Lib/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "__main__.py", line 19, in <module>
sys.exit(main())
File "__main__.py", line 9, in main
exit_status = main()
File "core.py", line 70, in main
exit_status = program(
File "core.py", line 190, in program
write_message(requests_message=message, env=env, args=args, with_headers=with_headers,
File "output/writer.py", line 44, in write_message
write_stream(**write_stream_kwargs)
File "output/writer.py", line 66, in write_stream
for chunk in stream:
File "output/writer.py", line 108, in build_output_stream_for_message
yield from stream_class(
File "output/streams.py", line 69, in __iter__
for chunk in self.iter_body():
File "output/streams.py", line 118, in iter_body
yield line.decode(self.msg.encoding) \
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
Solution
A potential fix would be to expand the impact area of the --response-as
option: it could handle all encoded responses and not only encoded prettified ones.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working