Migrating a large excel use-case to our platform

Context

A pretty cool moment for our platform was securing a partnership with a major Dutch government agency in the infrastructure sector. The task was large: migrate highly complex compliance data for over 150 of their clients from intricate Excel spreadsheets into our live web application. Each client's data was provided by multiple assessors, included numerous evidence files, and was critical to their operations.

To be honest that was not just a "technical task"; it was a high-stakes project to prove our platform's capability, solidify a key partnership, and save hundreds of hours of manual work. The success of this migration would provide mental ease to our team and to our partner. It would also allow them to use our platform with their data - as intended.

For confidentiality, I’ll refer to the client as the "Agency," but the core challenge is a common one in the enterprise world: bridging the gap between legacy data formats (like Excel) and modern web platforms, at scale.

What I Built

I developed an end-to-end, automated data migration pipeline that could:

  1. Parse Complex Excel Files: Ingest and interpret multi-sheet .xlsm files containing nested data structures and varied formatting.
  2. Programmatically Upload Files: Reverse-engineer our platform's S3 file upload mechanism to handle evidence files (images, PDFs, etc.) just like a real user would.
  3. Automate Form Submissions: Use our platform's GraphQL API to create workflows, fill out complex forms, and associate the uploaded evidence.
  4. Run in Multiple Environments: Operate seamlessly across testing and production environments with different configurations and IDs.
  5. Benchmark Performance: Track and report on execution time to estimate the total duration for the full-scale migration.

Deep Dive: Parsing and Mapping Complex Excel Data

The Real-World Data Model

The Agency provided a directory of folders, each representing a client (or "team"). Inside each folder:

Each client had three assessors ("Undertaking", "Initial Approval", "Oversight"), each with their own set of answers and evidence. The data model I built looked like this:

interface OrganizationData {
  organizationSlug: string;
  teams: Team[];
}

interface Team {
  name: string;
  uniqueId: string;
  type_of_organization: string;
  small: boolean;
  assessors: Assessor[];
}

interface Assessor {
  name: string;
  evidence: string[];
  backdatedCreatedAt?: string;
  formData: any[];
}

Excel Parsing: Edge Cases and Lessons

Parsing the Excel files was far from trivial. Here are some of the real-world challenges I faced:

Example: Dutch Date Parsing

function parseDutchDate(dateStr: string): Date | null {
  // Handles formats like "12-mrt-24" and "12/13-mrt-24"
  // ... see code for full implementation
}

Example: Multi-Assessor Data Extraction

const assessors: Assessor[] = [
  {
    name: "Undertaking",
    evidence: evidenceFiles,
    formData: createAssessorFormData(
      dataSheet,
      importSheet,
      "Undertaking",
      referenceNumbers,
    ),
  },
  // ... Initial Approval, Oversight
];

Example: Evidence File Discovery

const evidenceFiles = fs
  .readdirSync(documentsPath)
  .filter((file) =>
    [
      ".pdf",
      ".xlsx",
      ".xls",
      ".doc",
      ".docx",
      ".webp",
      ".jpg",
      ".jpeg",
      ".gif",
      ".csv",
      ".txt",
    ].some((ext) => file.endsWith(ext)),
  )
  .map((file) => path.join(teamDir, "Documents", file));

Automating File Uploads: S3, MIME Types, and Error Handling

Reverse-Engineering the Upload Flow

The platform used a three-step S3 upload process, but there was no documentation. I used browser DevTools to:

Handling MIME Types and File Validation

Uploading files at scale meant handling a wide variety of file types and edge cases:

Example: MIME Type Detection

function getMimeType(filePath: string): string {
  const extension = path.extname(filePath).toLowerCase();
  // ... mapping logic
  return mimeTypes[extension] || "application/octet-stream";
}

Example: Throttled Uploads

async function throttledUpload(
  file: string,
  delayMs: number = 10,
): Promise<any> {
  // ... upload logic
  await new Promise((resolve) => setTimeout(resolve, delayMs));
}

Example: Error Handling

try {
  await fs.promises.access(filePath, fs.constants.R_OK);
} catch (error) {
  throw new Error(`File not accessible: ${filePath}. Error: ${error.message}`);
}

Dynamic, Multi-Environment Automation

Test vs. Production: No Hardcoding Allowed

The migration had to run on both a test account and the real production environment. Each had different tenant IDs, workflow IDs, and team IDs. To make the script portable:

Example: Dynamic Workflow Lookup

const availableWorkflows = await availableWorkflowsForUser(tenantId);
const workflowCollectionId = availableWorkflows[0].workflowCollectionId;

Performance Benchmarking and Reporting

Why It Mattered

With hundreds of clients and thousands of files, knowing how long the migration would take was critical for planning and client communication. I built a timing metrics module to:

Example: Timing Metrics

const metrics = {
  startTime: Date.now(),
  clientsProcessed: 0,
  clientTimings: [],
  endTime: null,
};
// ... see code for full implementation

Lessons Learned: Requirements, Communication, and Scope

The Hidden Complexity of "Obvious" Requirements

One of the biggest surprises was how much detail was hidden in the "obvious" parts of the requirements. For example:

The Importance of Asking (and Re-asking) Questions

I learned to:

Scope Creep and Change Management

Even after the initial migration logic was built, new requirements kept emerging:

I built the script to be as flexible as possible, but also learned to communicate clearly about what changes would require rework.


Real-World Impact and Next Steps

What This Enabled

What I'd Improve Next


Key Takeaways

This project helped me understand a handful of realities regarding enterprise-scale data migration. It was a mix of meticulous planning, technical reverse-engineering, and constant communication. It showed me that automation, even in the form of a purpose-built script, can provide immense value, strengthen partnerships, and serve as a powerful sales and retention tool. Also for sure showed me that sometimes the most complex technical challenges are best solved by first asking the right human questions.