[Migration Hell] Complete Record of 6-Month GCP Migration Disaster: From 20-Year Physical Network to Cloud

[Migration Hell] Complete Record of 6-Month GCP Migration Disaster: From 20-Year Physical Network to Cloud

Prologue: The Naive View of Cloud Migration

Monday, January 15, 2025, 10:00 AM

“Migrating from physical network to cloud? That’s easy. Create a VPC, set up subnets, spin up VM instances. Three months is more than enough time.”

I confidently declared this in front of the executive team. Little did I know that I would witness six months of hell following that statement.

This is the blood, sweat, and tears record of a disastrous attempt to migrate a 20-year legacy physical network to GCP.

Chapter 1: “20 Years of Technical Debt” - The Reality

Migration Target: Unimaginably Complex Legacy Environment

What we were attempting to migrate was a physical network that had been continuously expanded over 20 years since the company’s founding.

Head Office Configuration (280 employees):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Internet (Multiple Lines)
[Ancient Cisco Router] 192.168.1.1 (Manufactured in 2005)
[L2 Switch Cascade]
 ├─ Sales Dept: 192.168.10.0/24 (50 devices)
 ├─ Development Dept: 192.168.20.0/24 (80 devices)  
 ├─ Accounting Dept: 192.168.30.0/24 (30 devices)
 ├─ HR Dept: 192.168.40.0/24 (40 devices)
 ├─ Servers: 192.168.100.0/24 (15 devices)
 └─ Printers/Others: 192.168.200.0/24 (35 devices)

3 Branch Offices:

  • Osaka Branch (50 employees)
  • Fukuoka Branch (30 employees)
  • Sapporo Branch (25 employees)

2 Manufacturing Plants:

  • Chiba Plant (Production Line Control Systems)
  • Gunma Plant (Quality Management Systems)

“Just move this to the cloud, right? Piece of cake!”

January 16: First Shock

When we actually started detailed investigation of the existing environment, I was horrified.

Discovered Problem Categories:

1. Chaotic Network Design

1
2
# Actual routing table
ip route show
1
2
3
4
5
192.168.10.0/24 via 192.168.1.10  # Sales (Why different gateway?)
192.168.20.0/24 via 192.168.1.1   # Development (Standard)
192.168.30.0/24 via 192.168.1.15  # Accounting (Different again)
192.168.40.0/24 via 192.168.1.1   # HR (Standard)
192.168.100.0/24 via 192.168.1.20 # Servers (Different again)

“Why does each department have a different gateway…?”

2. Legacy System Dependencies

Production Line Control System (Chiba Plant):

  • OS: Windows Server 2003 (End-of-life)
  • Application: Custom software written in Visual Basic 6.0
  • Communication: Hardcoded to specific IP address (192.168.100.5)
  • Vendor: Bankrupt 10 years ago

Quality Management System (Gunma Plant):

  • OS: Red Hat Enterprise Linux 4 (Ancient)
  • Database: Oracle 9i (Also ancient)
  • Network: Only works with fixed IP addresses
  • Documentation: None

“Can we even migrate this…?”

3. Mysterious Custom Systems

Twenty years of retired IT staff had created numerous “secret sauce” systems scattered throughout the infrastructure.

1
2
3
4
# Discovered mysterious cron jobs
crontab -l
0 2 * * * /home/legacy/mystery_sync.sh >> /dev/null 2>&1
30 3 * * * /usr/local/bin/old_backup.pl 192.168.100.10

Contents of mystery_sync.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/bash
# Creator unknown, creation date unknown
# Don't know what it does, but something breaks if we stop it

rsync -av 192.168.100.5:/mysterious/data/ 192.168.100.10:/backup/
if [ $? -eq 0 ]; then
    echo "Success" | mail -s "Daily Sync" someone@company.com
else
    # Do nothing on failure (why?)
    exit 0
fi

“Who created this…? What is this script for…?”

January 20: Submission of Migration Plan

Report to supervisor:

“After investigating the target systems, they’re more complex than expected. I’d like to extend the migration period to 6 months.”

Supervisor: “What are you talking about? You promised 3 months. We need to start the new fiscal year with the new system on April 1st.”

“But the legacy systems are…”

Supervisor: “Stop making excuses. Figure out how to make it work.”

My desperate decision under pressure: “Forklift Migration”

Chapter 2: February Optimism - The Small-Scale Trap

Migration Strategy: Phased Approach

At least attempting a phased migration, I established the following plan:

Phase 1 (February): Sapporo Branch (25 people, minimal configuration) Phase 2 (March): Osaka & Fukuoka Branches (80 people) Phase 3 (Late March): Head Office (280 people) Phase 4 (April): Plant Systems (Ultimate challenge)

“Starting small should mean small problems.”

February 5: Sapporo Branch Migration Begins

Sapporo Branch Existing Configuration:

1
2
3
4
5
[Router] 192.168.1.1
└─[Switch] 
   ├─ PCs: 192.168.1.100-125 (25 devices)
   ├─ Server: 192.168.1.10 (File server)
   └─ Printers: 192.168.1.200-202 (3 devices)

GCP Post-Migration Design:

1
2
3
4
5
6
7
Project: company-sapporo
VPC: sapporo-vpc (10.1.0.0/16)
Subnets:
  - office-subnet: 10.1.1.0/24
    - Compute Engine: 10.1.1.10-50
  - server-subnet: 10.1.2.0/24  
    - File Server VM: 10.1.2.10

“Simple and clean design. This should be quick.”

February 10: First Landmine

Day 5 of migration work, the first major problem occurred.

9 AM emergency call from Sapporo Branch:

“We can’t access the file server! We can’t work!”

Investigation revealed the problem:

1. Active Directory Domain Dependencies

The Sapporo file server was dependent on the Tokyo head office Active Directory domain controller.

1
2
3
Tokyo HQ DC: 192.168.100.10 (Physical)
    ↓ (Domain authentication via VPN)
Sapporo File Server: 192.168.1.10 → 10.1.2.10 (Migrated to GCP)

Problems:

  • GCP file server cannot connect to physical DC (192.168.100.10)
  • VPN configuration doesn’t support Active Directory authentication
  • No authentication = all users cannot access files

2. Fixed IP Address Dependencies

1
2
# Shared drives configured on each PC in Sapporo branch
net use Z: \\192.168.1.10\shared /persistent:yes

Problems:

  • All PCs have shared drives hardcoded to 192.168.1.10
  • Need to change to new IP (10.1.2.10) on 25 devices manually
  • No DNS setup, can’t resolve names

3. Bizarrely Complex Printer Dependencies

1
2
# Mysterious printer management system
cat /etc/cups/printers.conf

Incredibly, the Sapporo printer configuration was managed by a printer management server in Tokyo (192.168.100.15).

Dependency chain:

1
2
3
4
5
6
7
Sapporo PC  Sapporo Printer Configuration Request
    
Tokyo Printer Management Server (192.168.100.15)
     
Download Configuration  Sapporo PC
    
Execute Print on Sapporo Printer (192.168.1.200)

“Why do Sapporo printers depend on a Tokyo server…?”

February 12: Emergency Recovery Work

Weekend emergency response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 1. Emergency VPN setup
gcloud compute vpn-gateways create sapporo-to-tokyo \
    --region=asia-northeast1 \
    --network=sapporo-vpc

# 2. Temporary DNS configuration
echo "10.1.2.10 fileserver.company.local" >> /etc/hosts

# 3. Change settings on all PCs manually (25 devices)
# Run on each PC:
net use Z: /delete
net use Z: \\10.1.2.10\shared

Results:

  • 2 days of all-night work
  • Sapporo branch resumed normal operations Monday morning
  • However, printer problems remain unsolved

“Even a small branch office is this difficult…”

Chapter 3: March Collapse - Explosion of Complexity

March 1: Simultaneous Migration of Osaka & Fukuoka Branches

Learning from Sapporo experience, this time we conducted thorough advance investigation.

Osaka Branch (50 people) discovered complexity:

1. Complex Inter-Department VLAN Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Osaka branch VLAN configuration
vlan 10: Sales (192.168.10.0/24) - 30 devices
vlan 20: Technical (192.168.20.0/24) - 15 devices  
vlan 30: Management (192.168.30.0/24) - 5 devices

# Inter-department communication restrictions
# Sales → Technical: HTTP/HTTPS only
# Technical → Sales: All denied
# Management → All departments: All allowed
# All departments → Management: Specific ports only

Problem: Need to reproduce this complex VLAN inter-communication control with GCP VPC firewall rules.

2. Mysterious Business Systems

Osaka-specific Customer Management System:

  • Development language: Visual Basic 6.0
  • Database: Microsoft Access (.mdb format)
  • Network: Specific shared folder (\192.168.10.5\database)
  • Creator: Veteran employee who retired 10 years ago
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
' Actual VB6 code (partial)
Private Sub ConnectToDatabase()
    Dim dbPath As String
    dbPath = "\\192.168.10.5\database\customer.mdb"  ' Hardcoded!
    
    Set db = OpenDatabase(dbPath)
    If db Is Nothing Then
        MsgBox "Cannot connect to database", vbCritical
        End
    End If
End Sub

“IP addresses hardcoded… and it’s an Access file…”

3. Fukuoka Branch “Independent Kingdom” Problem

March 5, investigation of Fukuoka branch revealed the truth:

Fukuoka branch had a completely independent network design from other locations.

1
2
3
4
5
6
Fukuoka Branch Independent System:
- Independent domain: fukuoka.local (separate from HQ)
- Independent Active Directory: fukuoka-dc.fukuoka.local
- Independent mail server: mail.fukuoka.local
- Independent file server: files.fukuoka.local
- Independent business system: sales.fukuoka.local

Why did this happen?

In 2018, when Fukuoka branch became an independent profit center, the branch manager decided they “didn’t want to depend on HQ systems” and built a completely independent system.

Problems:

  • Fukuoka branch requires completely different migration method
  • No consistency with integrated GCP project design
  • User management and permission management also independent

“This isn’t migration, it’s new construction…”

March 15: Multi-Site Migration Nightmare

Results of forcing it in 1 month preparation period:

Migration Day Timeline

9:00 AM: Migration begins

1
2
3
4
5
# Create Osaka branch VPC
gcloud compute networks create osaka-vpc --subnet-mode=custom

# Create Fukuoka branch VPC  
gcloud compute networks create fukuoka-vpc --subnet-mode=custom

11:00 AM: First trouble Mistake in Osaka branch VLAN inter-communication control rules. Sales department communication to technical department completely blocked.

1:00 PM: Data migration errors multiply Data migration of Osaka customer management system (VB6 + Access) suffers multiple character encoding errors.

1
2
3
ERROR: Invalid character encoding in customer.mdb
ERROR: Record corruption detected in table [Customer Master]
ERROR: Primary key violation in [Sales History]

3:00 PM: Fukuoka branch authentication system completely down Configuration error in independent Active Directory migration. All users cannot log in.

5:00 PM: Emergency rollback decision

Though we wanted to “restore everything today,” we had to accept reality.

1
2
3
4
5
6
7
# Emergency rollback
# Return Osaka branch to physical environment
gcloud compute instances stop osaka-fileserver
gcloud compute instances stop osaka-dc

# Return Fukuoka branch to physical environment
gcloud compute instances stop fukuoka-systems --zone=asia-northeast1-a

March 16: Total Attack from Stakeholders

Emergency board meeting called

Executives: “Why did the migration fail?”

Me: “The existing systems were more complex than expected…”

Sales Manager: “Some customer data in Osaka branch was corrupted. How will you take responsibility?”

Technical Manager: “Didn’t you know about Fukuoka’s independent system beforehand?”

General Affairs Manager: “Migration costs have doubled the budget. Explain this.”

Additional problems discovered:

  • 230 customer records corrupted during data migration
  • Fukuoka branch email system stopped for 12 hours
  • Osaka branch payroll system unable to process month-end

“I’m completely screwed…”

Chapter 4: April Despair - Head Office Migration Catastrophe

April 1: Forced Head Office Migration at New Fiscal Year Start

Executive ultimatum: “Complete migration by new fiscal year start. No more delays allowed.”

Head Office (280 people) Migration Target Systems:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Department Networks:
├─ Sales: 192.168.10.0/24 (50 devices)
├─ Development: 192.168.20.0/24 (80 devices)  
├─ Accounting: 192.168.30.0/24 (30 devices)
├─ HR: 192.168.40.0/24 (40 devices)
├─ Executive: 192.168.50.0/24 (10 devices)
└─ Servers: 192.168.100.0/24 (30 devices)

Critical Systems:
├─ Active Directory (Windows Server 2012)
├─ Exchange Server (Email)
├─ SAP ERP (Core business system)
├─ Accounting System (Yayoi Accounting Server Edition)
├─ HR System (Custom developed)
└─ File Server Cluster (5 devices)

April 1, 9:00 AM: Migration Begins

Team Structure:

  • Me (Project Leader)
  • Network Engineers: 2 people
  • System Engineers: 3 people
  • External contractors: 5 people

Migration Schedule:

  • 09:00-12:00: System shutdown & data backup
  • 12:00-15:00: GCP instance creation & configuration
  • 15:00-18:00: Data migration & system configuration
  • 18:00-20:00: Operation verification & adjustment
  • 20:00: New system goes live

“This time we’ll succeed!”

April 1, 11:00 AM: First Fatal Problem

Unexpected situation during SAP system migration:

1
2
3
4
5
# Error during SAP migration work
SAP System Copy Error:
Cannot connect to database instance
Network path not found: 192.168.100.20
License validation failed: Hardware fingerprint changed

Discovered problems:

1. SAP License Strict Restrictions

The SAP system performed license authentication based on server hardware-specific information (CPU ID, MAC address, etc.).

Physical Server:

  • CPU ID: Intel-Xeon-E5-2680-v3-12C-24T
  • MAC Address: 00:1B:21:A4:32:F8
  • License Key: Unique key generated from this information

GCP Compute Engine:

  • CPU ID: Google-Custom-CPU (virtual)
  • MAC Address: 42:01:0A:80:00:XX (dynamic)
  • License Key: Completely different from physical server

“SAP license reacquisition needed… but the process takes 2 weeks…”

2. Mysterious Accounting System Specifications

Yayoi Accounting Server Edition hidden dependencies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
; Configuration file (hidden file)
[Network]
ServerIP=192.168.100.25
BackupPath=\\192.168.100.30\backup\accounting\
TempPath=C:\Windows\Temp\Yayoi\
PrinterServer=192.168.100.35

[Security]  
AllowedClients=192.168.30.0/24
DenyOtherNetworks=TRUE
HardwareValidation=ENABLED

Problems:

  • IP addresses hardcoded in numerous locations
  • Only connections from accounting department subnet (192.168.30.0/24) allowed
  • Hardware authentication also enabled

April 1, 2:00 PM: Cascading System Failures

Fatal flaw discovered in HR system:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Part of HR system (Python 2.7)
def get_employee_data():
    # Direct IP specification for database connection
    conn = psycopg2.connect(
        host="192.168.100.40",  # PostgreSQL Server
        port=5432,
        database="hr_system",
        user="hr_user",
        password="hr_pass_2019"  # Old password
    )
    return conn

def print_salary_report():
    # Direct output to printer server
    printer_server = "\\\\192.168.100.50\\HP_LaserJet_5000"
    os.system(f"copy salary_report.pdf {printer_server}")

Serious problems discovered:

  1. Python 2.7: End-of-life, doesn’t exist in GCP images
  2. PostgreSQL 8.4: Too old for GCP managed services
  3. Direct printer connection: Direct communication with network printer
  4. Hardcode festival: Written directly in source code, not configuration files

April 1, 4:00 PM: Company-wide System Outage

Error cascade:

1
2
3
4
5
16:00 - SAP ERP stops → Sales & accounting operations halt
16:15 - Accounting system inaccessible → Accounting department unable to work
16:30 - HR system errors → Payroll & attendance management impossible
16:45 - Exchange Server migration fails → Company-wide email stops
17:00 - File server authentication errors → Shared folder access impossible

Situation in the company:

  • Sales: “Can’t handle customers! Quote system unusable!”
  • Accounting: “Month-end processing due today, accounting system down!”
  • HR: “Payroll calculation system stopped! Today is salary payment day!”
  • Development: “Can’t access source code management server!”

April 1, 8:00 PM: State of Emergency Declaration

CEO’s direct emergency order:

“Restore all systems by business start tomorrow (9 AM). If not possible, cancel project and discipline those responsible.”

Emergency response from 8:00 PM:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Emergency stop of all GCP resources
gcloud compute instances stop --all --quiet

# Emergency recovery of physical systems
for server in $(cat physical_servers.txt); do
    ssh root@$server "systemctl start all_services"
done

# Network configuration recovery
./restore_physical_network.sh

All-night recovery work:

  • 9:00 PM~3:00 AM: Emergency physical system recovery
  • 3:00 AM~6:00 AM: Data consistency check
  • 6:00 AM~8:00 AM: Operation verification

April 2, 9:00 AM: Barely Resume Operations

Recovery results:

  • Core systems: 95% recovered
  • Email system: 100% recovered
  • File servers: 90% recovered (some data loss)

However, remaining problems:

  • Some SAP ERP data corrupted (50 order records)
  • Accounting system March closing process incomplete
  • Some HR system attendance data lost

“At least the minimum is working…”

Chapter 5: Plant System Migration Abandonment and Policy Change

April 15: Plant System Migration Cancellation Decision

Following the head office migration chaos, the highest-risk plant system migration was cancelled.

Chiba Plant Production Line Control System:

  • Windows Server 2003 (20 years in operation)
  • VB6 control software (source code lost)
  • Proprietary communication protocol with manufacturing equipment
  • Shutdown = Complete production line halt = ¥30 million daily loss

Gunma Plant Quality Management System:

  • Red Hat Enterprise Linux 4 (17 years in operation)
  • Oracle 9i (Vendor support ended)
  • Direct communication with measuring equipment (RS-232C, Ethernet)
  • Shutdown = Quality inspection impossible = Shipping halt

“Migrating this would be suicide…”

April 20: Fundamental Project Review

New Policy: Phased Hybridization

Abandoning complete migration, we switched to the following approach:

Phase 1: New System Cloud-First (Start immediately)

  • New systems deployed GCP-native
  • Minimal integration with existing systems

Phase 2: Individual Migration of Migrable Systems (6-month plan)

  • Low-risk systems with clear migration benefits
  • Ensure sufficient validation period

Phase 3: Legacy System Status Quo (Continue for now)

  • Plant systems etc. with too high risk
  • Consider gradual modernization

Phase 4: Hybrid Infrastructure Optimization (1-year plan)

  • Appropriate integration of on-premises and cloud
  • Balance security and performance

Chapter 6: Learning and Improvement - Correct Migration Strategy

May~October: Phased Improvement Implementation

Using lessons learned from failures, we practiced correct migration strategy.

Success Case 1: New Web Application Cloudification

June: Customer Portal Site New Development

1
2
3
4
5
6
7
8
9
# Correct cloud-native design
Project: customer-portal
Architecture:
  Frontend: React (Cloud Run)
  Backend: Node.js API (Cloud Run)
  Database: Cloud SQL (PostgreSQL)
  Storage: Cloud Storage
  CDN: Cloud CDN
  Monitoring: Cloud Operations Suite

Results:

  • Development period: 3 months (on schedule)
  • Operating costs: 60% reduction from conventional
  • Performance: 50% improvement in response time
  • Availability: 99.95% achieved

Success Case 2: Phased Migration of Sales Support System

July~September: Sales Force Automation (SFA) Migration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Stage 1 (July): Start data replication
  - Physical DB → Cloud SQL sync setup
  - Existing system continues operation
  - Data consistency validation

Stage 2 (August): Application migration
  - Web UI migrated to Cloud Run
  - Backend API divided into Cloud Functions
  - Gradual load balancing

Stage 3 (September): Complete migration
  - Physical system shutdown
  - DNS switchover
  - Enhanced monitoring

Migration results:

  • Downtime: 2 hours (within plan)
  • Data loss: 0 records
  • User satisfaction: Improved (better response time)

Success Case 3: File Server Cloud Migration

August~October: File Server Integration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Migration strategy:
  Old: 5 physical file servers
  New: Cloud Storage + Cloud Filestore
  
Phase 1: Data analysis & classification
  - Access frequency analysis
  - Data classification (Hot/Cold/Archive)
  - Permission matrix organization

Phase 2: Phased migration
  - Archive data → Cloud Storage (Coldline)
  - Cold data → Cloud Storage (Nearline) 
  - Hot data → Cloud Filestore (NFS)

Phase 3: Access method optimization
  - Direct access: Cloud Filestore
  - Web access: Cloud Storage + CDN
  - Mobile: Cloud Storage APIs

Correct Migration Strategy Lessons

1. Importance of Advance Investigation

Failed investigation:

  • Surface-level system configuration only
  • Insufficient understanding of dependencies
  • Inadequate stakeholder interviews

Improved investigation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## System Migration Advance Investigation Checklist

### Technical Investigation
- [ ] Network configuration diagram (physical & logical)
- [ ] System dependency map
- [ ] Data flow diagram
- [ ] API & communication protocol list
- [ ] License & authentication method verification
- [ ] Performance requirements definition

### Business Investigation  
- [ ] User departments & users identification
- [ ] Business flow & usage patterns
- [ ] Peak times & processing volumes
- [ ] SLA & availability requirements
- [ ] Compliance requirements

### Organizational Investigation
- [ ] Stakeholder identification
- [ ] Decision-making process
- [ ] Budget & resource confirmation
- [ ] Risk tolerance
- [ ] Success metrics definition

2. Phased Migration Principles

Failed approach:

  • Forklift migration (bulk migration)
  • Large scope execution at once
  • Insufficient backup strategy

Improved approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Phased migration strategy:
  Phase 0: Investigation & planning (1-2 months)
  Phase 1: Non-critical system early migration (1 month)
  Phase 2: Medium importance system migration (2 months)  
  Phase 3: Critical system migration (3-4 months)
  Phase 4: Optimization & integration (1-2 months)

Each Phase principles:
  - Small scope
  - Sufficient validation period
  - Rollback strategy
  - Stakeholder approval

3. Risk Management Framework

Migration Risk Matrix:

System Migration Difficulty Business Impact Priority Strategy
New Systems Low Medium High Cloud Native
Web Apps Low High High Phased Migration
File Servers Medium Medium Medium Hybrid
Core Systems High High Low Status Quo
Plant Systems Extremely High Extremely High Excluded Consider Modernization

Chapter 7: One Year Later - Reflection and Results

January 2026: Project Completion Report

Final migration results (1 year):

Migration Complete Systems

  • New systems: 100% cloud-native
  • Web applications: 85% migration complete
  • File & storage: 70% migration complete
  • Development & test environments: 100% cloud-based

Status Quo Systems

  • Core systems (SAP/Accounting): Continue on-premises
  • Plant control systems: Continue on-premises
  • Legacy business systems: Gradual modernization in progress

Hybrid Environment

  • On-premises ⟷ GCP: Stable VPN connection
  • Integrated monitoring: Cloud Operations Suite
  • Integrated ID management: Google Cloud Identity

Results by Numbers

Cost effectiveness:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Previous (Physical environment only):
- Infrastructure cost: ¥15M/year
- Operations personnel cost: ¥25M/year
- Maintenance cost: ¥8M/year
- Power & facility cost: ¥5M/year
Total: ¥53M/year

Current (Hybrid environment):
- GCP cost: ¥8M/year
- Physical infrastructure cost: ¥6M/year (reduced)
- Operations personnel cost: ¥18M/year (efficiency)
- Maintenance cost: ¥3M/year (reduced)
- Power & facility cost: ¥2M/year (reduced)
Total: ¥37M/year

Annual reduction: ¥16M (30% reduction)

Performance improvements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
System availability:
- Before migration: 97.8% (approximately 16 hours downtime monthly)
- After migration: 99.5% (approximately 4 hours downtime monthly)

Web application response time:
- Before migration: Average 2.3 seconds
- After migration: Average 0.8 seconds (65% improvement)

Development & deployment time:
- Before migration: Average 3 weeks to new feature release
- After migration: Average 1 week to new feature release

Organizational improvements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
IT department work allocation:
- Before migration: Operations & maintenance 70%, New development 30%
- After migration: Operations & maintenance 40%, New development 60%

Engineer satisfaction:
- Before migration: 6.2/10 (Legacy operation burden)
- After migration: 8.1/10 (Modern technology development)

Business department satisfaction:
- Before migration: 7.0/10 (Stable but slow improvement)
- After migration: 8.5/10 (Rapid new feature delivery)

Most Valuable Learning

1. Practical Hybrid Over Perfect Migration

Old thinking: “Migration must be complete to be meaningful”

Truth learned: “Placing appropriate systems in appropriate locations is important”

2. Organizational Challenges Harder Than Technical

Technical problems:

  • IP address changes
  • Inter-system integration
  • Data migration

Organizational problems (more difficult):

  • Inter-department interest coordination
  • Resistance to change
  • Ambiguous responsibility boundaries

3. Small Successes Build Trust

The trust lost from initial major failures was recovered through accumulation of small successes.

Trust recovery process:

1
2
3
4
Month 1-3: Major failure (Trust level 2/10)
Month 4-6: Small successes x3 (Trust level 4/10)
Month 7-9: Medium successes x2 (Trust level 6/10)
Month 10-12: Major success x1 (Trust level 8/10)

Summary: What ¥30 Million and 6 Months of Hell Taught Us

Project Total Cost

Direct costs:

  • Migration work expenses: ¥12M
  • System recovery expenses: ¥8M
  • External consultant fees: ¥6M
  • GCP usage fees (failed portion): ¥2M

Indirect costs:

  • Business downtime opportunity loss: ¥15M
  • Data recovery work: ¥3M
  • Additional personnel costs (overtime/weekend work): ¥4M

Total cost: ¥50M Wasted costs: ¥30M

Most Important Lessons

1. “It’s Easy” is the Greatest Trap for Engineers

Migrating legacy systems accumulated over 20 years is never easy.

  • Invisible dependencies
  • Lost knowledge (creator retirement)
  • Hardcoded configurations
  • Tight coupling with business flows

2. Phased Approach is Tedious But Ultimately Fastest

Bulk migration temptation:

  • “Doing it all at once is more efficient”
  • “Considering rollback costs…”
  • “Want to shorten project duration”

Reality: Considering bulk migration failure risks, phased migration is most reliable and ultimately fastest.

3. Organizational Solutions More Important Than Technical

Success factors:

  • Technical accuracy: 30%
  • Organizational coordination: 70%

Stakeholder coordination, expectation management, and change management are keys to migration success.

Recommendations for Other Organizations

Pre-Cloud Migration Checklist

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Legacy System Migration Essential Checklist

### Level 1: Migration Feasibility Assessment
- [ ] Confirm system creation year & technology stack
- [ ] Confirm creator/maintenance personnel employment  
- [ ] Confirm source code/design document availability
- [ ] Confirm license/contract conditions

### Level 2: Dependency Investigation
- [ ] Complete mapping of inter-system communication
- [ ] Create data flow diagrams
- [ ] Identify network dependencies
- [ ] Confirm external system/service integration

### Level 3: Business Impact Assessment
- [ ] Identify business impact scope of downtime
- [ ] Confirm SLA/availability requirements
- [ ] Understand peak times/processing volumes
- [ ] Confirm regulatory/compliance requirements

### Level 4: Risk Assessment & Countermeasures
- [ ] Calculate migration failure impact
- [ ] Develop rollback strategy
- [ ] Establish emergency contact system
- [ ] Implement backup/recovery testing

Green Field (New Development) Priority Approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
Priority 1: New systems
  - 100% cloud-native
  - Latest architecture & technology choices
  - Minimal integration with existing systems

Priority 2: Web frontends
  - Relatively low migration risk
  - High user experience improvement effect
  - Phased migration possible

Priority 3: Data & storage
  - High backup & DR effect
  - Phased migration possible
  - Can coexist with existing systems

Priority 4: Core systems
  - Consider modernization
  - Long-term plan (3-5 years)
  - Status quo also an option

Finally: The Value of Learning from Failure

This six months of hell was certainly a painful experience, but ultimately made our IT organization grow significantly.

What we gained:

  • Deep understanding of legacy systems
  • Recognition of risk management importance
  • Phased approach skills
  • Stakeholder management capability
  • Improved team cohesion

Most important learning: The value of learning from failures and continuing improvement, rather than avoiding challenges out of fear of failure.

I hope organizations undertaking similar migration projects won’t repeat the same failures we experienced.

And I believe that if you do fail, by continuing to improve without giving up, you will surely reach success.


Related articles:

Note: This article is based on actual migration project experience, but company names and system details have been anonymized and generalized.

技術ネタ、趣味や備忘録などを書いているブログです
Hugo で構築されています。
テーマ StackJimmy によって設計されています。