LiteralString overload for __getitem__ (#12714)
In PEP 675, Graham Bleaney and I had specified a list of `LiteralString`-preserving [overloads](https://peps.python.org/pep-0675/#appendix-c-str-methods-that-preserve-literalstring) for `str`. However, we didn't specify an overload for `__getitem__` and didn't give any rationale either. IIRC this was an edge case we didn't want to take a strong decision on unless users wanted it. Carl Meyer brought this up yesterday, so I think it's worth discussing. Pro: `my_literal_string[i]` or `my_literal_string[i:j]` should technically be compatible with `LiteralString`, since it is a substring of a literal-derived string. Con: The main downside is that an attacker might control the indexes and try to access a specific substring from a literal string in the code. For example, they might narrow down the string to `rm foo` or `SELECT *`. It's true that `join` and other methods could also construct dangerous strings from `LiteralString`s, and we even call that out as an accepted tradeoff in the PEP: > 4. Trivial functions could be constructed to convert a str to a LiteralString: > > def make_literal(s: str) -> LiteralString: > letters: Dict[str, LiteralString] = { > "A": "A", > "B": "B", > ... > } > output: List[LiteralString] = [letters[c] for c in s] > return "".join(output) > > We could mitigate the above using linting, code review, etc., but ultimately a clever, malicious developer attempting to circumvent the protections offered by LiteralString will always succeed. The important thing to remember is that LiteralString is not intended to protect against malicious developers; it is meant to protect against benign developers accidentally using sensitive APIs in a dangerous way (without getting in their way otherwise). > > Without LiteralString, the best enforcement tool API authors have is documentation, which is easily ignored and often not seen. With LiteralString, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice. > > -- [PEP 675 - Appendix B: Limitations](https://peps.python.org/pep-0675/#appendix-b-limitations) `__getitem__`, however, seems a bit different, because it (and `split`, `zfill`, etc.) accept an index or width that could be used to construct a dangerous query or a humongous string. So, we need to clarify the intent a little. What was the intent of these overloads? We wanted to forbid "arbitrary user-supplied strings" while allowing methods that preserved literal strings. We were not trying to prevent every possible exploit on the string. Since `__getitem__` forbids arbitrary user-supplied strings and preserves literal strings, I think we should add an overload for it.
typeshed
About
Typeshed contains external type annotations for the Python standard library and Python builtins, as well as third party packages as contributed by people external to those projects.
This data can e.g. be used for static analysis, type checking, type inference, and autocompletion.
For information on how to use typeshed, read below. Information for contributors can be found in CONTRIBUTING.md. Please read it before submitting pull requests; do not report issues with annotations to the project the stubs are for, but instead report them here to typeshed.
Further documentation on stub files, typeshed, and Python's typing system in general, can also be found at https://typing.readthedocs.io/en/latest/.
Typeshed supports Python versions 3.8 and up.
Using
If you're just using a type checker (mypy,
pyright,
pytype, PyCharm, ...), as opposed to
developing it, you don't need to interact with the typeshed repo at
all: a copy of standard library part of typeshed is bundled with type checkers.
And type stubs for third party packages and modules you are using can
be installed from PyPI. For example, if you are using html5lib and requests,
you can install the type stubs using
$ pip install types-html5lib types-requests
These PyPI packages follow PEP 561 and are automatically released (up to once a day) by typeshed internal machinery.
Type checkers should be able to use these stub packages when installed. For more details, see the documentation for your type checker.
Package versioning for third-party stubs
Version numbers of third-party stub packages consist of at least four parts.
All parts of the stub version, except for the last part, correspond to the
version of the runtime package being stubbed. For example, if the types-foo
package has version 1.2.0.20240309, this guarantees that the types-foo package
contains stubs targeted against foo==1.2.* and tested against the latest
version of foo matching that specifier. In this example, the final element
of the version number (20240309) indicates that the stub package was pushed on
March 9, 2024.
At typeshed, we try to keep breaking changes to a minimum. However, due to the nature of stubs, any version bump can introduce changes that might make your code fail to type check.
There are several strategies available for specifying the version of a stubs package you're using, each with its own tradeoffs:
-
Use the same bounds that you use for the package being stubbed. For example, if you use
requests>=2.30.0,<2.32, you can usetypes-requests>=2.30.0,<2.32. This ensures that the stubs are compatible with the package you are using, but it carries a small risk of breaking type checking due to changes in the stubs.Another risk of this strategy is that stubs often lag behind the package being stubbed. You might want to force the package being stubbed to a certain minimum version because it fixes a critical bug, but if correspondingly updated stubs have not been released, your type checking results may not be fully accurate.
-
Pin the stubs to a known good version and update the pin from time to time (either manually, or using a tool such as dependabot or renovate).
For example, if you use
types-requests==2.31.0.1, you can have confidence that upgrading dependencies will not break type checking. However, you will miss out on improvements in the stubs that could potentially improve type checking until you update the pin. This strategy also has the risk that the stubs you are using might become incompatible with the package being stubbed. -
Don't pin the stubs. This is the option that demands the least work from you when it comes to updating version pins, and has the advantage that you will automatically benefit from improved stubs whenever a new version of the stubs package is released. However, it carries the risk that the stubs become incompatible with the package being stubbed.
For example, if a new major version of the package is released, there's a chance the stubs might be updated to reflect the new version of the runtime package before you update the package being stubbed.
You can also switch between the different strategies as needed. For example, you could default to strategy (1), but fall back to strategy (2) when a problem arises that can't easily be fixed.
The _typeshed package
typeshed includes a package _typeshed as part of the standard library.
This package and its submodules contain utility types, but are not
available at runtime. For more information about how to use this package,
see the stdlib/_typeshed directory.
Discussion
If you've run into behavior in the type checker that suggests the type stubs for a given library are incorrect or incomplete, we want to hear from you!
Our main forum for discussion is the project's GitHub issue tracker. This is the right place to start a discussion of any of the above or most any other topic concerning the project.
If you have general questions about typing with Python, or you need a review of your type annotations or stubs outside of typeshed, head over to our discussion forum. For less formal discussion, try the typing chat room on gitter.im. Some typeshed maintainers are almost always present; feel free to find us there and we're happy to chat. Substantive technical discussion will be directed to the issue tracker.