Open-sourcing software follows Thinking Machines’ values of being a builder and a team player—only this time, within a larger community. Open-source enables us to participate in the larger practice of peer production, and gives us the opportunity to understand our users and co-developers well. To such, there are two main open-source guidelines that we should follow:
We must strive to maintain a baseline quality on our software projects, this goes even outside our open-source work. A well-maintained project says a lot about the company, so we need to exert rigor in keeping our codebase healthy.
Documentation should be given equal priority to a program's business-logic. Remember that when we write documentation, we write for the person next to us.
README file. This file should contain a description and basic information on the project. For most Git hosting services such as Github/Gitlab, the README file is the first point-of-contact a developer has on our code.
API Documentation. A well-documented codebase can ease onboarding of outside contributors into our project. If possible, all modules, classes, and methods should be documented. Type signatures should be defined, and sample usage must be written. Your favorite language should have its preferred way of generating documentation: for Python it’s Sphinx, for Java it’s JavaDocs, etc.
Contributing Guide. Outside contributors may not be aware of how we setup our development environment and workflows. All of these should be expressed inside a Contributing Guide.
Releases and CHANGELOG. All releases should be documented in a file called CHANGELOG.md. It should narrate how the codebase evolved throughout patch, minor, and major releases. Versions should be listed in reverse chronological order, written for humans, and follows a sensible format.
Code is the core of our project and if left unkempt (especially with multiple contributors from multiple backgrounds, coding styles, etc.), might grow into an unmaintainable mess. Open-source software also demands a certain rigor in writing code.
Write idiomatic code that adheres to the community style guide. External contributors may also work with our project, thus, we need to adhere to a style guide set by the community (e.g. for Python, it is PEP 8). Your favorite language should have one.
Automated testing. As our codebase grows, we need to ensure that we are scaling responsibly. This entails shipping robust and well-tested code. A good set of test cases can give us more confidence when updating several parts of the codebase (or during refactors). In this regard, it is important that every project we open-source is accompanied by some form of test suite. Your favorite language should have its own test suite, so you should definitely check that out.
Continuous Integration/Deployment (CI/CD). Perhaps one of the most crucial aspects in open-source software is the idea of continuous integration and deployment. Because multiple collaborators (internal/external) are now working on the project, we have to ensure that every commit is well-integrated and verified automatically.
Once a project is open-sourced, we should expect that Issues and Pull Requests will come from the community. As representatives of the company, we should act in the most professional way as possible. Below are guidelines that should codify how we would relate to our co-creators.
Add a Code of Conduct. Explicitly stating a Code of Conduct in every repository assures external contributors that we are working with good intentions. At the same time, a Code of Conduct helps us protect ourselves. Contributors can interact with us in a variety of ways, and some might have unproductive behavior towards the project. A Code of Conduct demonstrates that we are serious in taking action when needed, and that all processes will be fair and transparent.
CODE_OF_CONDUCT.md
found in every repository. We can adopt common
templates such as the Contributor
Covenant, but it may be better to
write our own.Communicate our expectations via a Contributing Guide. Internally, we have our own system of naming branches or wording commits. If we want a consistent codebase, we should be able to relay our expectations to contributors. At the same time, the contributors should also be informed of their expectations from us: that we will get into their PRs for X number of days, that we will give thorough and helpful code reviews, etc.
The grant will support Thinking Machines’ work to create granular, high-quality training data for climate and health data analysis and AI modeling.
A former wildlife biologist turned geospatial intelligence analyst makes the case for AI-enabled tools in conservation efforts.
Amassing vast amounts of data comes with a vast responsibility to manage and protect it well.