How to get up to speed on a legacy codebase

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the workflows category.

Last Updated: 2025-01-18

I'm going to demonstrate with an example from a JavaScript codebase.

Ask to go into their office for a day and get a guided tour

When I started working on Project M, I had no idea where to start and wasted a lot of time. Instead I should have went to their office and asked them to show me around the code and answer questions in real-time.

(obviously) Read their Documentation

When there's a README or Wiki in place, great. But often READMEs will be non existent because knowledge is implicit.

Look at the scripts for working with the entire codebase

In a JavaScript project, look at package.json scripts section:

"scripts": {
  "build": "rimraf ./lib && tsc",
  "postbuild": "pkg-ok",
  "docs": "typedoc",
  "lint": "tslint src/**/*.ts",
  "release": "semantic-release",
  "test": "yarn test:unit && yarn test:integration",
  "test:unit": "jest -c ./jest.unit.config.js",
  "test:integration": "jest -c ./jest.integration.config.js --runInBand",
  "seed": "ts-node src/seeders/index.js"
},

From this, we now know how to seed the data, how to run integration tests, and how to lint. I also saw some commands I was unfamiliar like typeorm. Their presence up here suggests they may be important parts of the architecture.

In other projects, there might be a /bin or /scripts folders that offer something similar.

Dependencies of software by same author

Look at dependencies in package.json to see any homegrown libraries they use.

  "dependencies": {
    "@middy/core": "^1.0.0-alpha.22",
    "@middy/http-cors": "^1.0.0-alpha.22",
    "@middy/ssm": "^1.0.0-beta.6",
    "auth-helper": "bitbucket:Project M/auth-service#1.1.0",
    "aws-sdk": "^2.532.0",
    "axios": "^0.18.0",
    "backend-logger": "bitbucket:Project M/logger-service#4.1.0",
    ...
  }

Here we see that auth-service and logger-service both link to private bitbucket repos belonging to the client, Project M.

src folder

I saw both a lib and a src folder with almost identical code. What purposes do these play? Grepping through the top-level files in the codebase led me to a tsconfig.json which basically says that the lib is the outdir for the generated code. Thus src must be the code we write. Makes sense based on the terminology used.

{
  "compilerOptions": {
    ...
    "outDir": "lib"
  },
  "exclude": [
    "node_modules",
    "**/*.sdk.ts",
    "**/sdk.ts",
    "./lib"
  ]
}

Find the outermost layer in their source code and go down that rabbit hole

In this JavaScript project, it was src/index.ts, a file which exports entities to the world outside the package.

From this starting point:

export * from './Database' // Me: "Now I need to look what is in ./Database/index.js" file
export * from './entities'

So here is ./entities/index.js

export { Consultation } from './Consultations/Consultation'
export { ConsultationMessage } from './Consultations/ConsultationMessage'
export { ConsultationTopic } from './Consultations/ConsultationTopic'
export { CustomerMessage } from './Consultations/CustomerMessage'
...

Look for any seed/factory data in development/test environments

This gives you a good idea of the shape of actual data as well as clues as to the importance of different entities:

export const seedPurchases: Array<
  RelationAsId<Purchase, 'customer' | 'accountant' | 'product'>
> = [
  {
    autoRenewal: true,
    createdAt: new Date('2018-10-02T11:00:00.000Z'),
    customer: { id: 1 },
    id: 1,
    ipAddress: '1.1.1.1',
    accountant: { id: 1 },
    paid: true,
    paymentMethod: PaymentMethod.PayPal,
    product: { id: 1 },
    updatedAt: new Date('2018-10-02T11:00:00.000Z')
  },
  {
    autoRenewal: true,
    createdAt: new Date('2018-10-03T11:00:00.000Z'),
    customer: { id: 2 },
    id: 2,
    ipAddress: '1.1.1.1',
    accountant: { id: 1 },
    paid: true,
    paymentMethod: PaymentMethod.PayPal,
    product: { id: 2 },
    updatedAt: new Date('2018-10-03T11:00:00.000Z')
  },

Read the tests - especially end-to-end & integration tests

import { getRepository, Purchase } from '../..'
import { setupDatabase } from '../../test/setupDatabase'

describe('Purchase', () => {
  setupDatabase()

  it('reads the correct count of seed data', async () => {
    expect(await getRepository(Purchase).count()).toEqual(4)
  })

  it('reads the first entry of seed data', async () => {
    expect(await getRepository(Purchase).findOne(1)).toEqual({
      autoRenewal: true,
      createdAt: expect.any(Date),
      id: 1,
      ipAddress: '1.1.1.1',
      paid: true,
      paymentMethod: 'PayPal',
      referredBy: null,
      updatedAt: expect.any(Date)
    })
  })
})

This shows us that the getRepository function takes an entity such as Purchase and can be called with count() or findOne()

Setup and tear-down methods give clear hints about what is going on to get the ENV setup. We see how to seed here!

import { DatabaseEnvironment } from './databaseEnvironment'

export function setupDatabase (runSeeders?: boolean) {
  let environment: DatabaseEnvironment
  beforeAll(async () => {
    environment = new DatabaseEnvironment({ runSeeders })
    await environment.setup()
  })

  afterAll(async () => {
    await environment.teardown()
  })
}

If there are no high-level tests like this, then at least refer to normal tests.

Check for environment dependencies (as opposed to the purely library dependencies)

In this project, the docker-compose.yml file specifies much.

Here we see postgres is needed and we also get the passwords required for getting going. That's a good start.

version: "3"
services:
  test-microservice-db:
    image: postgres:11.5
    volumes:
      - ./db.tmp:/var/lib/pgsql/data:Z
    ports:
      - "5432:5432"
    restart: always
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: G~UxaX7E
      POSTGRES_DB: database

The circleci setup for CI server setup does much too.

There was a huge amount of info here, but one part here reveals an implicit expectation that the developer's computer have $AWS_ACCESS_KEY_ID set:

  createDocs:
    docker:
    - image: circleci/python:2.7
    working_directory: ~/repo/
    steps:
    - attach_workspace:
        at: .
    - run:
        name: Install AWS CLI
        command: |
          sudo pip install awscli
    - run:
        name: Create AWS credentials manually
        command: |
          mkdir ~/.aws
          touch ~/.aws/config
          chmod 600 ~/.aws/config
          echo "[profile eb-cli]" > ~/.aws/config
          echo "aws_access_key_id=$AWS_ACCESS_KEY_ID" >> ~/.aws/config
          echo "aws_secret_access_key=$AWS_SECRET_ACCESS_KEY" >> ~/.aws/config
    - run:
        name: Copy documentation to S3
        command: |
          aws s3 sync ~/repo/docs s3://Project M-dev-docs/$CIRCLE_PROJECT_REPONAME --delete