Coordinating type annotations

@luk-f-a and I have been adding type annotations to code in numba/core. I want to make sure we don’t duplicate work on type annotations (good to hear that’s largely been the case so far).

I have a PR currently up that covers parts of core/types (mypy level 3), all of core/typeconv (mypy level 2), and adds typing_extensions as a dependency. https://github.com/numba/numba/pull/6222

@luk-f-a If you’ve completed the annotations for types/abstract, have you also done something for typing contexts? I was thinking that core/typing would be the next thing to tackle, and context.py would probably be one of the first files.

More generally, what do people think would be a good sequence of directories to annotate?

I haven’t done anything outside PR 6241. I want to limit each PR to only one file, to make reviews more manageable. In PR 6241 I focused on abstract.py and imported BaseContext but did not change anything about BaseContext.

I think working on types and then typing makes a lot of sense because we would be following the dependency chain. I wonder if core devs would agree or they see other modules as more fundamental.

I guess I worry about opening a separate PR for each file given the time, review, and CI resources that each PR requires. There definitely are some files that are large enough on their own, but others are rather straight forward to annotate.

I guess it could vary case by case. Bringing a file to level 3 only takes a few lines, but bringing it to level 2 likely requires many changes.
Personally I would decide putting myself in the reviewer’s shoes: can I review this in one sitting (about 15 minutes max)?
From a reviewer’s point of view, two small PRs are better than one large PR. There’s barely any overhead added by a second PR, and there’s a big context-switching cost in a PR that modifies 20 files over 2 unrelated topics. If the 20 files have related changes then it’s easy to make the connection, but if they are unrelated then it forces context-switching. This can be mitigated if the PR creator takes care to group the changes in separate commits, such that the reviewer can filter by individual commits and only review related changes.

HI Ethan and Lucio, I’d like to help you guys with this.

Maybe something that we could do is to follow what PyTorch is doing. They have a mypy.ini file at the root of the project ignoring all the files that still need annotation. This way, work can be done incrementally by annotating a small part of the project at each PR. What do you guys think?

Regards

I think that’s similar to what we do, but the other way around. Our mypy.ini explicitly marks the files that are annotated. Maybe one day the list of not-annotated will be shorted, but for now it’s easier to mark the annotated ones. Also, we have 3 different levels of annotations, depending on the strictness setting, and each file can be assigned to one of those.

Hi @guilherme, and cool! I’m waiting on the PR linked at the top of the thread to get reviewed before doing further work. If you want to build off of that and annotate other parts of the codebase that would be awesome. If you do please comment with which parts you’re working on so we don’t duplicate work.

Since folks here are more knowledgeable on type annotations then I am, I want to invite you all to the public meeting next Tue to talk about some of the concerns that Numba devs are having. Please see Public Numba Dev Meeting, Tuesday October 13 2020

Hi!

Many thanks to all for your input on Type Annotations both here and at the last public meeting. Here’s an update from the core developers…

At the last Numba core developer meeting (Meeting minutes), in light of the discussions in the October public meeting, decisions surrounding the addition of type annotations to the Numba code base were discussed and made. A summary of the conclusions:

  1. Type Annotations will be accepted into the Numba code base.
  2. Requirements ahead of accepting PRs to add type annotations, in priority order:
    1. Community members interested in this should come forward as “concept” owners (by expressing interest and commitment on this post/thread), this to include overseeing the work schedule and assisting with design, review and maintenance.
    2. The addition of typeguard as an option with CI config set up to enforce.
    3. Documentation added for a brief style guide, should cross reference the discourse thread and start with something minimal.
    4. Add the Sphinx type annotation extension such that documentation is automatically produced with type information added.

Once again, thanks to everyone who contributed to this discussion!

@stuartarchibald thanks for the update. For items 2 and 4, do you think the current type annotations are sufficient to exercise this? The only mypy related PR that has merged (@luk-f-a’s PR to add CI) didn’t actually add any type annotations to any functions. I think we’d want to have some files annotated first to be able to actually use typeguard and the Sphinx extension.

Also, for the third item, I believe that PR already added some documentation about type annotations. Besides the cross-reference to this thread, could you clarify what else you have in mind?

hi Stuart,

I think it’s great that we can go ahead with annotations. About the requirements

  1. I’m willing to help with design, review and maintenance.
  2. I assume typeguard would be enabled only for tests, is that correct?
  3. Agreed, some of that was included in the original PR, and it can be expanded as needed.
  4. it makes sense, although I suspect there won’t be that many user-facing functions annotated in the first rounds. But it does not hurt to enable the Sphinx extension sooner rather than later.

Hi @EPronovost,

@stuartarchibald thanks for the update. For items 2 and 4, do you think the current type annotations are sufficient to exercise this? The only mypy related PR that has merged (@luk-f-a’s PR to add CI) didn’t actually add any type annotations to any functions. I think we’d want to have some files annotated first to be able to actually use typeguard and the Sphinx extension.

I think it’d be good to have the minimal amount annotated to exercise typeguard and the sphinx extension, and then to have them wired in to CI and in general working (this can literally be just one annotated file if that is sufficient). This way they are ready to go, integrated and providing benefit instantly.

Also, for the third item, I believe that PR already added some documentation about type annotations. Besides the cross-reference to this thread, could you clarify what else you have in mind?

Indeed, that PR added some useful information about starting work on this. I think it’d be good to note this thread such that other users/contributors can join in the conversation. Further, I think it would be very helpful to add information about the typeguard and sphinx set up in CI/docs and a note in the developer docs comprising instructions on how to use these tools (mypy and typeguard especially).

Thanks :slight_smile:

Hi @luk-f-a,

Brilliant, thanks for volunteering and confirming! Much appreciated.

Yes, I think we’d probably need to run this as a testing mode.

Great.

I suspect similar, but as noted above, would be good to get it in early to start feeling the benefit sooner.

Thanks :slight_smile:

I started to experiment with typeguard basing on https://github.com/numba/numba/pull/6222. I am quite happy to see that it is able to catch type annotations that does not match the actual usage.

This is the commit to enable typeguard in testing: https://github.com/sklam/numba/commit/38edc89e65d62e2357628c1d21e4f6d4a4aeeb6a.
Here’s some of the adjustments I made to make typeguard happy for a few test files: https://github.com/sklam/numba/commit/436a9c41cde77e64101a562ae0aa8d3d977e4e9e. Note, the work is incomplete. There are still many errors reported by typeguard.

These were some challenges of using typeguard and mostly because of the way it uses TypeError message and how numba rewrites error messages. It makes it a lot harder to interpret the message. I ended up modifying typeguard a bit in the way it reports error.

I also spent a lot of time just understanding what can be done in type annotations and how to spell certain things. I did notice that type annotation is not very namedtuple friendly. At least not in the way numba creates dynamic namedtuple. And, I don’t think there is a way to say Type[NamedTuple].

Overall, I liked how typeguard enables us to ensure accurate typing annotations even though the code base is only partially annotated.

Cool! How do you want to coordinate here? Will you put up a typeguard PR and/or push your type annotation changes to my branch?

I did notice that type annotation is not very namedtuple friendly. At least not in the way numba creates dynamic namedtuple. And, I don’t think there is a way to say Type[NamedTuple] .

I believe you can do T_NamedTuple = TypeVar("T_NamedTuple", bound=NamedTuple), but that may not be useful in all scenarios. From the perspective of static analysis, a dynamically created namedtuple seems pretty similar to an Any.

I will start by making a typeguard PR that is CI tested even if it’s just running a subset of tests. My branch is still WIP and is no where near passing CI with typeguard enabled, so I don’t know if it’s a good idea to merge into your PR yet.

Yup. And I don’t think collections.namedtuple() is creating things that are treated as instance of NamedTuple. The docs say this about `typing.NamedTuple:

These are not used in annotations.

I’m adding typeguard into the CI in https://github.com/numba/numba/pull/6458