Last week, my former teammates at IRS Direct File released the code base as open source software on Github. They pulled off an incredible feat, developing new OSS policies and pushing through bureaucracy to get this out into the public domain. After leading technical implementation in the pilot year, I left the Direct File team late last summer to work with the Dept of Ed on the FAFSA year 2 rollout, and I resigned from Federal service this past February.
Now that the code is public, I want to share how our team approached this work with care and a deep commitment to making the taxpayer experience better. Direct File was an optimistic labor of love from a caring team — we wanted to take an arduous and costly interaction that people have with the Federal government each year, and transform it a simple and positive experience. I’m writing this with a bunch of links to code and technical documents in the repository so that a technical audience can follow along and fact-check me along the way. You don’t have to take my word — now, I can show you what we did.
During the course of Direct File, some people argued that the IRS couldn’t be trusted to calculate your taxes — as if it were a conflict of interest and the IRS might try to cheat taxpayers. That couldn’t be further from the truth. We worked to accurately represent the tax code and make it accessible to every taxpayer, ensuring they paid what they owed — no more, no less — and received every credit and deduction to which they were legally entitled.
Chris Given has already written a bit about how the core of the application is a Fact Graph, which he started writing in February 2022. I’m going to concentrate a bit more on how that fact graph was used, and what it looked like to represent the tax code as code.
A view of all the screens
Before Direct File can do any tax calculations, it needs to collect data about a taxpayer — where they lived, who lived with them, if they got married, and what other situations applied to their life during the year. We need to ask taxpayers a lot of questions (far beyond what shows up on income forms). While the everyday user of Direct File responds to one question at a time in a survey-like experience, the codebase contains an alternate renderer that shows all of Direct File on a single page.
I’m starting with this page because it reflects our effort to make the tax code more accessible, and because it allows you to see the entire application of Direct File without creating an absurd number of tax returns. Every question, phrase, help text, and other piece of written content in Direct File was written by a team of content designers and reviewed by IRS counsel for legal accuracy and IRS interpretation. Originally, our designers managed the content in a huge design document named “the source of truth”, but it risked becoming out of sync with what we’d actually display to users. Thankfully, this was one of the areas that we could improve with a technical solution.
November 2023, 2 months before go-live, we built the All-Screens page. Because we wrote both business logic and UI declaratively, we could generate an alternate view of the app that showed every screen in one place. This started off to enable better collaboration between designers, lawyers, and engineers, but it also grew into a valuable resource for our translations team and customer support representatives, as they could easily see all of the content in the app.
The old source of truth design document lives on only as my shower curtain.
I think that the all-screens page may be one of the most valuable artifacts of open source Direct File — each screen contains plain language that was approved by IRS counsel, and the screen shows which tax tests the question will affect. Looking at the questions on this page, you can find a set of questions that are both comprehensive in their tax accuracy (the goal of the lawyers), but also easy to understand for a taxpayer (the goal of our designers). If you’re more curious about the design process that went into direct file, my teammates Suzanne Chapman and Katie Aloisi gave a presentation at the Code For America summit that you can see here — it’s a great presentation and worth reading.
If your computer is set up with node and you want to see all screens, you can run this locally and it’s fully functional. Just clone the repository, go to the `df-client-app` folder, install dependencies, and run `npm run start` — the page will be at http://localhost:3000/df/file/all-screens/index.html.
Understanding and improving the tax graph
All of the tax logic in the application is declaratively written as facts that depend upon each other and build up to a final refund or payment. Rendering these fact definitions into an actual graph allowed us to better analyze their complexity, and allowed us to write an architectural decision record (ADR) about how we needed to split our fact graph into various, better-tested modules that would allow us to separate concerns. This module-breaking enabled a testing strategy prior to our go-live to catch any bugs and ensure that we had tested the most important facts. It also provided us a bunch of cool visualizations and excuses to print on big pieces of paper. At the time of that ADR (11/27/2023) there were about 700 facts in the application. If my memory is right, we went live with around 1400 facts in the first tax year, and the repository currently shows 3030 facts. The tax code is complicated!
I’ll write more about some of our testing below, but I want to note a few things about the module ADR:
We analyzed the bugs found early in our development to look for patterns
We built tooling to analyze those patterns
We then modified our frameworks so that we could avoid entire categories of bugs in the future
These are practices I’m proud of (honest bug assessment is hard in any environment, let alone a government one), and I’m glad there’s now a public record of how we used thoughtful engineering to deliver better outcomes for taxpayers.
If you’re a developer and want to play around with the graphs, you can download the codebase, and from the df-client-app directory, run `npm run generate-dependency-graph` and receive a .dot file of all the tax facts. You can use graphviz or other tools to then visualize all the facts, though it can be a little overwhelming. For something a bit simpler,`npm run generate-module-graph` will show the various modules that were built, and the better-tested connections between modules.
Representing edge cases
The tax code contains some pretty crazy edge cases that required a good amount of effort to build and test. Two of my favorite tax rules are the different ways to calculate whether the taxpayer was 65 or older during the tax year — people over 65 receive a greater standard deduction, but they also phase out of eligibility for EITC if they don’t have qualifying children. Confusingly, these rules use different definitions of what it means to be 65 during the tax year — for the standard deduction, the calculation uses the day before their birthday as their birthday (see: IRS topic 551). For EITC age phaseout, the calculation uses their actual birthday (see: pub 596, rule 11). This means that if a person turned 65 on January 1, 2025, they would count as age 65 for the standard deduction, but age 64 for EITC (a difference which serves to benefit the filer).
You can see these facts in the filers module:
<Fact path="/filers/*/age65OrOlder">
<Name>Age 65 or older</Name>
<Description>
Whether the filer is 65 or older. Pub 554 states "You are considered age 65 at the end of the year if your 65th birthday is on or before January 1 of the following year" so we include the January 1 exception here.
Do NOT use this fact for EITC calculations -- That age 65 requirement does _not_ have the January 1 exception.
</Description>
<Export downstreamFacts="true" mef="true" />
<Derived>
<GreaterThanOrEqual>
<Left>
<Dependency path="../ageCalculatedDayBeforeDOB" />
</Left>
<Right>
<Int>65</Int>
</Right>
</GreaterThanOrEqual>
</Derived>
</Fact>
...
<Fact path="/filers/*/ageForSelfOnlyEitcLimits">
<Description>
The fact that should be listed as someone's age on Form 8862 for EITC. Uses convoluted logic because the age for the 25 year old minimum age uses the day before the person's birthday, and the age for the 65 year old maximum uses the day of the person's brithday.
</Description>
<Derived>
...
</Derived
And further, because this got pretty complicated (especially when taking into account potentially deceased spouses), you can find unit tests for all of these age conditions in the filers test module.
I wish that these edge cases were rare — but they’re all over the tax code (children who are not children, married couples filing jointly who don’t meet the Joint Return Test, people who can be claimed as a dependent who aren’t, etc.). I feel confident that we represented these cases and gave the taxpayer every possible shot to file an accurate, beneficial return.
Scenario testing
Every software engineer knows that you haven’t built the product until you’ve built a suite of tests to prove that the product works. While I showed some unit tests above, we also developed a set of scenario tests to ensure the end-to-end running of the application. You can see 161 full tax returns in the scenarios folder that cover a series of different filing statuses, family situations, income situations, credits and deductions. We tested that these were submittable, produced expected results, and used continuous integration to ensure that these did not change in unexpected ways.
Rather than just auto-generating data or having a limited set of people creating returns, the original set of ~60 scenario tests were developed by asking everyone on the team to manually build a tax return. This was intentional — I wanted to create messy human data. We split our team in fun ways — designers created returns as head of household, engineers as married, people with birthdays in October taking student loan deductions, etc. I also offered an extra incentive to the team to step away from their already-busy jobs and develop test cases — after we reached 60, I promised to dump a bucket of ice water on my head over zoom. Turns out, that really motivated people and we developed most test cases in the course of one morning.
Here’s a photo of the ice bucket challenge result, which is oddly the second time my shower appears in this blog post:
Asking a diverse group of people to create test data revealed new bugs that we hadn’t previously found. In particular, this is how we discovered that we couldn’t support diacritical marks in a name (e.g. José). I had continually used the names “Alex” or “Test E Testerface” for testing in our scenarios (you can type that name with only your left hand on a QWERTY keyboard), but other members of our team had diacritical marks in their names, and this led us to these bugs. If we had a team where everyone looked like me and shared my background, we would have launched with more bugs.
Epilogue
I could go on and on showing off more features of Direct File, and I’m excited for my former coworkers to show me the parts of the codebase I haven’t seen yet. There are also a million and half things I would change about the codebase (I’ve never met a software engineer who’s gotten a codebase to prod without having a bunch of desired refactors). We continually improved it, and the team continued to improve it long after I left. You might find some messy pieces in the code. If Direct File lived on, they’d probably get cleaned up at some point. One of our team mottos, still immortalized on my laptop sticker, was that “The next version of Direct File is the best version of Direct File”.
It makes me sad that this is probably the last, and thus best, version of Direct File. Even with the code open source, Direct File’s power was in its presence on IRS.gov — an official, working tool that proved government can better serve the public. I’m proud to have spent most of my USDS tour contributing to that mission.
I’m a retired program manager, spent most of my working life with development and QA teams in high tech. Thoroughly enjoyed this write-up and applaud the quality work you and your team did. Makes me even more sad/angry that 18F was dismantled.
really good example on software that can't just ignore edge cases and focus on core users. Wish more software took these many pains to watch out for those edges. Excited to see what you work on next!