How to get up to speed on a legacy codebase

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the workflows category.

Last Updated: 2025-07-18

I'm going to demonstrate with an example from a JavaScript codebase.

Ask to go into their office for a day and get a guided tour

When I started working on Project M, I had no idea where to start and wasted a lot of time. Instead I should have went to their office and asked them to show me around the code and answer questions in real-time.

(obviously) Read their Documentation

When there's a README or Wiki in place, great. But often READMEs will be non existent because knowledge is implicit.

Look at the scripts for working with the entire codebase

In a JavaScript project, look at package.json scripts section:

"scripts": {
  "build": "rimraf ./lib && tsc",
  "postbuild": "pkg-ok",
  "docs": "typedoc",
  "lint": "tslint src/**/*.ts",
  "release": "semantic-release",
  "test": "yarn test:unit && yarn test:integration",
  "test:unit": "jest -c ./jest.unit.config.js",
  "test:integration": "jest -c ./jest.integration.config.js --runInBand",
  "seed": "ts-node src/seeders/index.js"
},

From this, we now know how to seed the data, how to run integration tests, and how to lint. I also saw some commands I was unfamiliar like typeorm. Their presence up here suggests they may be important parts of the architecture.

In other projects, there might be a /bin or /scripts folders that offer something similar.

Dependencies of software by same author

Look at dependencies in package.json to see any homegrown libraries they use.

  "dependencies": {
    "@middy/core": "^1.0.0-alpha.22",
    "@middy/http-cors": "^1.0.0-alpha.22",
    "@middy/ssm": "^1.0.0-beta.6",
    "auth-helper": "bitbucket:Project M/auth-service#1.1.0",
    "aws-sdk": "^2.532.0",
    "axios": "^0.18.0",
    "backend-logger": "bitbucket:Project M/logger-service#4.1.0",
    ...
  }

Here we see that auth-service and logger-service both link to private bitbucket repos belonging to the client, Project M.

`src` folder

I saw both a lib and a src folder with almost identical code. What purposes do these play? Grepping through the top-level files in the codebase led me to a tsconfig.json which basically says that the lib is the outdir for the generated code. Thus src must be the code we write. Makes sense based on the terminology used.

{
  "compilerOptions": {
    ...
    "outDir": "lib"
  },
  "exclude": [
    "node_modules",
    "**/*.sdk.ts",
    "**/sdk.ts",
    "./lib"
  ]
}

Find the outermost layer in their source code and go down that rabbit hole

In this JavaScript project, it was src/index.ts, a file which exports entities to the world outside the package.

From this starting point:

export * from './Database' // Me: "Now I need to look what is in ./Database/index.js" file
export * from './entities'

So here is ./entities/index.js

export { Consultation } from './Consultations/Consultation'
export { ConsultationMessage } from './Consultations/ConsultationMessage'
export { ConsultationTopic } from './Consultations/ConsultationTopic'
export { CustomerMessage } from './Consultations/CustomerMessage'
...

Look for any seed/factory data in development/test environments

This gives you a good idea of the shape of actual data as well as clues as to the importance of different entities:

export const seedPurchases: Array<
  RelationAsId<Purchase, 'customer' | 'accountant' | 'product'>
> = [
  {
    autoRenewal: true,
    createdAt: new Date('2018-10-02T11:00:00.000Z'),
    customer: { id: 1 },
    id: 1,
    ipAddress: '1.1.1.1',
    accountant: { id: 1 },
    paid: true,
    paymentMethod: PaymentMethod.PayPal,
    product: { id: 1 },
    updatedAt: new Date('2018-10-02T11:00:00.000Z')
  },
  {
    autoRenewal: true,
    createdAt: new Date('2018-10-03T11:00:00.000Z'),
    customer: { id: 2 },
    id: 2,
    ipAddress: '1.1.1.1',
    accountant: { id: 1 },
    paid: true,
    paymentMethod: PaymentMethod.PayPal,
    product: { id: 2 },
    updatedAt: new Date('2018-10-03T11:00:00.000Z')
  },

Read the tests - especially end-to-end & integration tests

import { getRepository, Purchase } from '../..'
import { setupDatabase } from '../../test/setupDatabase'

describe('Purchase', () => {
  setupDatabase()

  it('reads the correct count of seed data', async () => {
    expect(await getRepository(Purchase).count()).toEqual(4)
  })

  it('reads the first entry of seed data', async () => {
    expect(await getRepository(Purchase).findOne(1)).toEqual({
      autoRenewal: true,
      createdAt: expect.any(Date),
      id: 1,
      ipAddress: '1.1.1.1',
      paid: true,
      paymentMethod: 'PayPal',
      referredBy: null,
      updatedAt: expect.any(Date)
    })
  })
})

This shows us that the getRepository function takes an entity such as Purchase and can be called with count() or findOne()

Setup and tear-down methods give clear hints about what is going on to get the ENV setup. We see how to seed here!

import { DatabaseEnvironment } from './databaseEnvironment'

export function setupDatabase (runSeeders?: boolean) {
  let environment: DatabaseEnvironment
  beforeAll(async () => {
    environment = new DatabaseEnvironment({ runSeeders })
    await environment.setup()
  })

  afterAll(async () => {
    await environment.teardown()
  })
}

If there are no high-level tests like this, then at least refer to normal tests.

Check for environment dependencies (as opposed to the purely library dependencies)

In this project, the docker-compose.yml file specifies much.

Here we see postgres is needed and we also get the passwords required for getting going. That's a good start.

version: "3"
services:
  test-microservice-db:
    image: postgres:11.5
    volumes:
      - ./db.tmp:/var/lib/pgsql/data:Z
    ports:
      - "5432:5432"
    restart: always
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: G~UxaX7E
      POSTGRES_DB: database

The circleci setup for CI server setup does much too.

There was a huge amount of info here, but one part here reveals an implicit expectation that the developer's computer have $AWS_ACCESS_KEY_ID set:

  createDocs:
    docker:
    - image: circleci/python:2.7
    working_directory: ~/repo/
    steps:
    - attach_workspace:
        at: .
    - run:
        name: Install AWS CLI
        command: |
          sudo pip install awscli
    - run:
        name: Create AWS credentials manually
        command: |
          mkdir ~/.aws
          touch ~/.aws/config
          chmod 600 ~/.aws/config
          echo "[profile eb-cli]" > ~/.aws/config
          echo "aws_access_key_id=$AWS_ACCESS_KEY_ID" >> ~/.aws/config
          echo "aws_secret_access_key=$AWS_SECRET_ACCESS_KEY" >> ~/.aws/config
    - run:
        name: Copy documentation to S3
        command: |
          aws s3 sync ~/repo/docs s3://Project M-dev-docs/$CIRCLE_PROJECT_REPONAME --delete

Semicolon & Sons

Semicolon & Sons

How to get up to speed on a legacy codebase

Ask to go into their office for a day and get a guided tour

(obviously) Read their Documentation

Look at the scripts for working with the entire codebase

Dependencies of software by same author

`src` folder

Find the outermost layer in their source code and go down that rabbit hole

Look for any seed/factory data in development/test environments

Read the tests - especially end-to-end & integration tests

Check for environment dependencies (as opposed to the purely library dependencies)

Get Episode Alerts and Freebies

Semicolon & Sons

Semicolon & Sons

How to get up to speed on a legacy codebase

Ask to go into their office for a day and get a guided tour

(obviously) Read their Documentation

Look at the scripts for working with the entire codebase

Dependencies of software by same author

src folder

Find the outermost layer in their source code and go down that rabbit hole

Look for any seed/factory data in development/test environments

Read the tests - especially end-to-end & integration tests

Check for environment dependencies (as opposed to the purely library dependencies)

Get Episode Alerts and Freebies

`src` folder