httpie · BoboTiG · Oct 6, 2021 · Sep 30, 2021 · Oct 4, 2021 · Oct 4, 2021
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@ This project adheres to [Semantic Versioning](https://semver.org/).
 - Added support for formatting & coloring of JSON bodies preceded by non-JSON data (e.g., an XXSI prefix). ([#1130](https://github.com/httpie/httpie/issues/1130))
 - Added `--format-options=response.as:CONTENT_TYPE` to allow overriding the response `Content-Type`. ([#1134](https://github.com/httpie/httpie/issues/1134))
 - Added `--response-as` shortcut for setting the response `Content-Type`-related `--format-options`. ([#1134](https://github.com/httpie/httpie/issues/1134))
-- Improved handling of prettified responses without correct `Content-Type` encoding. ([#1110](https://github.com/httpie/httpie/issues/1110))
+- Improved handling of responses without correct `Content-Type` encoding. ([#1110](https://github.com/httpie/httpie/issues/1110), [#1168](https://github.com/httpie/httpie/issues/1168))
 - Installed plugins are now listed in `--debug` output. ([#1165](https://github.com/httpie/httpie/issues/1165))
 - Fixed duplicate keys preservation of JSON data. ([#1163](https://github.com/httpie/httpie/issues/1163))
 

diff --git a/docs/README.md b/docs/README.md
@@ -1179,6 +1179,8 @@ HTTPie does several things by default in order to make its terminal output easy
 
 ### Colors and formatting
 
+TODO: mention body colors/formatting are based on content-type + --response-mime (heuristics for JSON content-type)
+
 Syntax highlighting is applied to HTTP headers and bodies (where it makes sense).
 You can choose your preferred color scheme via the `--style` option if you don’t like the default one.
 There are dozens of styles available, here are just a few notable ones:
@@ -1214,15 +1216,14 @@ You can further control the applied formatting via the more granular [format opt
 The `--format-options=opt1:value,opt2:value` option allows you to control how the output should be formatted
 when formatting is applied. The following options are available:
 
-|           Option | Default value | Shortcuts                                 |
-| ---------------: | :-----------: | ----------------------------------------- |
-|   `headers.sort` |    `true`     | `--sorted`, `--unsorted`                  |
-|    `json.format` |    `true`     | N/A                                       |
-|    `json.indent` |      `4`      | N/A                                       |
-| `json.sort_keys` |    `true`     | `--sorted`, `--unsorted`                  |
-|    `response.as` |     `''`      | [`--response-as`](#response-content-type) |
-|     `xml.format` |    `true`     | N/A                                       |
-|     `xml.indent` |      `2`      | N/A                                       |
+|           Option | Default value | Shortcuts                |
+| ---------------: | :-----------: | ------------------------ |
+|   `headers.sort` |    `true`     | `--sorted`, `--unsorted` |
+|    `json.format` |    `true`     | N/A                      |
+|    `json.indent` |      `4`      | N/A                      |
+| `json.sort_keys` |    `true`     | `--sorted`, `--unsorted` |
+|     `xml.format` |    `true`     | N/A                      |
+|     `xml.indent` |      `2`      | N/A                      |
 
 For example, this is how you would disable the default header and JSON key
 sorting, and specify a custom JSON indent size:
@@ -1237,11 +1238,10 @@ sorting-related format options (currently it means JSON keys and headers):
 
 This is something you will typically store as one of the default options in your [config](#config) file.
 
-#### Response `Content-Type`
+### Response `Content-Type`
 
-The `--response-as=value` option is a shortcut for `--format-options response.as:value`,
-and it allows you to override the response `Content-Type` sent by the server.
-That makes it possible for HTTPie to pretty-print the response even when the server specifies the type incorrectly.
+The `--response-as=value` option allows you to override the response `Content-Type` sent by the server.
+That makes it possible for HTTPie to print the response even when the server specifies the type incorrectly.
 
 For example, the following request will force the response to be treated as XML:
 
@@ -1261,26 +1261,6 @@ $ http --response-as='text/plain; charset=big5' pie.dev/get
 
 Given the encoding is not sent by the server, HTTPie will auto-detect it.
 
-### Binary data
-
-Binary data is suppressed for terminal output, which makes it safe to perform requests to URLs that send back binary data.
-Binary data is also suppressed in redirected but prettified output.
-The connection is closed as soon as we know that the response body is binary,
-
-```bash
-$ http pie.dev/bytes/2000
-```
-
-You will nearly instantly see something like this:
-
-```http
-HTTP/1.1 200 OK
-Content-Type: application/octet-stream
-
-+-----------------------------------------+
-| NOTE: binary data not shown in terminal |
-+-----------------------------------------+
-```
 
 ### Redirected output
 
@@ -1322,6 +1302,36 @@ function httpless {
     http --pretty=all --print=hb "$@" | less -R;
 }
 ```
+### Binary data
+
+Binary data is suppressed for terminal output, which makes it safe to perform requests to URLs that send back binary data.
+Binary data is also suppressed in redirected but prettified output.
+The connection is closed as soon as we know that the response body is binary,
+
+```bash
+$ http pie.dev/bytes/2000
+```
+
+You will nearly instantly see something like this:
+
+```http
+HTTP/1.1 200 OK
+Content-Type: application/octet-stream
+
++-----------------------------------------+
+| NOTE: binary data not shown in terminal |
++-----------------------------------------+
+```
+
+### Display encoding
+
+TODO:
+(both request/response)
+* we look at content-type
+* else we detect
+* short texts default to utf8
+(only response)
+* --response-charset allows overwriting
 
 ## Download mode
 

diff --git a/httpie/cli/argparser.py b/httpie/cli/argparser.py
@@ -458,8 +458,6 @@ def _process_download_options(self):
 
     def _process_format_options(self):
         format_options = self.args.format_options or []
-        if self.args.response_as is not None:
-            format_options.append('response.as:' + self.args.response_as)
         parsed_options = PARSED_DEFAULT_FORMAT_OPTIONS
         for options_group in format_options:
             parsed_options = parse_format_options(options_group, defaults=parsed_options)

diff --git a/httpie/cli/argtypes.py b/httpie/cli/argtypes.py
@@ -242,3 +242,19 @@ def parse_format_options(s: str, defaults: Optional[dict]) -> dict:
     s=','.join(DEFAULT_FORMAT_OPTIONS),
     defaults=None,
 )
+
+
+def response_charset_type(encoding: str) -> str:
+    try:
+        ''.encode(encoding)
+    except LookupError:
+        raise argparse.ArgumentTypeError(
+            f'{encoding!r} is not a supported encoding')
+    return encoding
+
+
+def response_mime_type(mime_type: str) -> str:
+    if mime_type.count('/') != 1:
+        raise argparse.ArgumentTypeError(
+            f'{mime_type!r} doesn’t look like a mime type; use type/subtype')
+    return mime_type
diff --git a/httpie/cli/constants.py b/httpie/cli/constants.py
@@ -85,13 +85,11 @@
 PRETTY_STDOUT_TTY_ONLY = object()
 
 
-EMPTY_FORMAT_OPTION = "''"
 DEFAULT_FORMAT_OPTIONS = [
     'headers.sort:true',
     'json.format:true',
     'json.indent:4',
     'json.sort_keys:true',
-    'response.as:' + EMPTY_FORMAT_OPTION,
     'xml.format:true',
     'xml.indent:2',
 ]

diff --git a/httpie/cli/definition.py b/httpie/cli/definition.py
@@ -9,7 +9,7 @@
 from .argparser import HTTPieArgumentParser
 from .argtypes import (
     KeyValueArgType, SessionNameValidator,
-    readable_file_arg,
+    readable_file_arg, response_charset_type, response_mime_type,
 )
 from .constants import (
     DEFAULT_FORMAT_OPTIONS, OUTPUT_OPTIONS,
@@ -310,22 +310,29 @@
 )
 
 output_processing.add_argument(
-    '--response-as',
-    metavar='CONTENT_TYPE',
+    '--response-charset',
+    metavar='ENCODING',
+    type=response_charset_type,
     help='''
-    Override the response Content-Type for formatting purposes, e.g.:
+    Override the response encoding for terminal display purposes, e.g.:
+        --response-charset=utf8
+        --response-charset=big5
+    '''
+)
 
-        --response-as=application/xml
-        --response-as=charset=utf-8
-        --response-as='application/xml; charset=utf-8'
+output_processing.add_argument(
+    '--response-mime',
+    metavar='MIME_TYPE',
+    type=response_mime_type,
+    help='''
+    Override the response mime type for coloring and formatting for the terminal, e.g.:
 
-    It is a shortcut for:
+        --response-mime=application/json
+        --response-mime=text/xml
 
-        --format-options=response.as:CONTENT_TYPE
     '''
 )
 
-
 output_processing.add_argument(
     '--format-options',
     action='append',

diff --git a/httpie/client.py b/httpie/client.py
@@ -12,7 +12,7 @@
 import urllib3
 from . import __version__
 from .cli.dicts import RequestHeadersDict
-from .constants import UTF8
+from .encoding import UTF8
 from .plugins.registry import plugin_manager
 from .sessions import get_httpie_session
 from .ssl import AVAILABLE_SSL_VERSION_ARG_MAPPING, HTTPieHTTPSAdapter

diff --git a/httpie/codec.py b/httpie/codec.py
diff --git a/httpie/compat.py b/httpie/compat.py
@@ -2,3 +2,53 @@
 
 
 is_windows = 'win32' in str(sys.platform).lower()
+
+
+try:
+    from functools import cached_property
+except ImportError:
+    # Can be removed once we drop Pyth on <3.8 support
+    # Taken from: `django.utils.functional.cached_property`
+    class cached_property:
+        """
+        Decorator that converts a method with a single self argument into a
+        property cached on the instance.
+
+        A cached property can be made out of an existing method:
+        (e.g. ``url = cached_property(get_absolute_url)``).
+        The optional ``name`` argument is obsolete as of Python 3.6 and will be
+        deprecated in Django 4.0 (#30127).
+        """
+        name = None
+
+        @staticmethod
+        def func(instance):
+            raise TypeError(
+                'Cannot use cached_property instance without calling '
+                '__set_name__() on it.'
+            )
+
+        def __init__(self, func, name=None):
+            self.real_func = func
+            self.__doc__ = getattr(func, '__doc__')
+
+        def __set_name__(self, owner, name):
+            if self.name is None:
+                self.name = name
+                self.func = self.real_func
+            elif name != self.name:
+                raise TypeError(
+                    "Cannot assign the same cached_property to two different names "
+                    "(%r and %r)." % (self.name, name)
+                )
+
+        def __get__(self, instance, cls=None):
+            """
+            Call the function and put the return value in instance.__dict__ so that
+            subsequent attribute access on the instance returns the cached value
+            instead of calling cached_property.__get__().
+            """
+            if instance is None:
+                return self
+            res = instance.__dict__[self.name] = self.func(instance)
+            return res
diff --git a/httpie/config.py b/httpie/config.py
@@ -5,7 +5,7 @@
 
 from . import __version__
 from .compat import is_windows
-from .constants import UTF8
+from .encoding import UTF8
 
 
 ENV_XDG_CONFIG_HOME = 'XDG_CONFIG_HOME'

diff --git a/httpie/constants.py b/httpie/constants.py
diff --git a/httpie/context.py b/httpie/context.py
@@ -11,7 +11,7 @@
 
 from .compat import is_windows
 from .config import DEFAULT_CONFIG_DIR, Config, ConfigFileError
-from .constants import UTF8
+from .encoding import UTF8
 
 from .utils import repr_dict
 

diff --git a/httpie/encoding.py b/httpie/encoding.py
@@ -0,0 +1,50 @@
+from typing import Union
+
+from charset_normalizer import from_bytes
+from charset_normalizer.constant import TOO_SMALL_SEQUENCE
+
+UTF8 = 'utf-8'
+
+ContentBytes = Union[bytearray, bytes]
+
+
+def detect_encoding(content: ContentBytes) -> str:
+    """
+    We default to utf8 if text too short, because the detection
+    can return a random encoding leading to confusing results:
+
+    >>> too_short = ']"foo"'
+    >>> detected = from_bytes(too_short.encode()).best().encoding
+    >>> detected
+    'utf_16_be'
+    >>> too_short.encode().decode(detected)
+    '崢景漢'
+
+    """
+    encoding = UTF8
+    if len(content) > TOO_SMALL_SEQUENCE:
+        match = from_bytes(bytes(content)).best()
+        if match:
+            encoding = match.encoding
+    return encoding
+
+
+def smart_decode(content: ContentBytes, encoding: str) -> str:
+    """Decode `content` using the given `encoding`.
+    If no `encoding` is provided, the best effort is to guess it from `content`.
+
+    Unicode errors are replaced.
+
+    """
+    if not encoding:
+        encoding = detect_encoding(content)
+    return content.decode(encoding, 'replace')
+
+
+def smart_encode(content: str, encoding: str) -> bytes:
+    """Encode `content` using the given `encoding`.
+
+    Unicode errors are replaced.
+
+    """
+    return content.encode(encoding, 'replace')