[Migration Hell] Complete Record of 6-Month GCP Migration Disaster: From 20-Year Physical Network to Cloud
Prologue: The Naive View of Cloud Migration
Monday, January 15, 2025, 10:00 AM
“Migrating from physical network to cloud? That’s easy. Create a VPC, set up subnets, spin up VM instances. Three months is more than enough time.”
I confidently declared this in front of the executive team. Little did I know that I would witness six months of hell following that statement.
This is the blood, sweat, and tears record of a disastrous attempt to migrate a 20-year legacy physical network to GCP.
Chapter 1: “20 Years of Technical Debt” - The Reality
Migration Target: Unimaginably Complex Legacy Environment
What we were attempting to migrate was a physical network that had been continuously expanded over 20 years since the company’s founding.
Head Office Configuration (280 employees):
|
|
3 Branch Offices:
- Osaka Branch (50 employees)
- Fukuoka Branch (30 employees)
- Sapporo Branch (25 employees)
2 Manufacturing Plants:
- Chiba Plant (Production Line Control Systems)
- Gunma Plant (Quality Management Systems)
“Just move this to the cloud, right? Piece of cake!”
January 16: First Shock
When we actually started detailed investigation of the existing environment, I was horrified.
Discovered Problem Categories:
1. Chaotic Network Design
|
|
|
|
“Why does each department have a different gateway…?”
2. Legacy System Dependencies
Production Line Control System (Chiba Plant):
- OS: Windows Server 2003 (End-of-life)
- Application: Custom software written in Visual Basic 6.0
- Communication: Hardcoded to specific IP address (192.168.100.5)
- Vendor: Bankrupt 10 years ago
Quality Management System (Gunma Plant):
- OS: Red Hat Enterprise Linux 4 (Ancient)
- Database: Oracle 9i (Also ancient)
- Network: Only works with fixed IP addresses
- Documentation: None
“Can we even migrate this…?”
3. Mysterious Custom Systems
Twenty years of retired IT staff had created numerous “secret sauce” systems scattered throughout the infrastructure.
|
|
Contents of mystery_sync.sh:
|
|
“Who created this…? What is this script for…?”
January 20: Submission of Migration Plan
Report to supervisor:
“After investigating the target systems, they’re more complex than expected. I’d like to extend the migration period to 6 months.”
Supervisor: “What are you talking about? You promised 3 months. We need to start the new fiscal year with the new system on April 1st.”
“But the legacy systems are…”
Supervisor: “Stop making excuses. Figure out how to make it work.”
My desperate decision under pressure: “Forklift Migration”
Chapter 2: February Optimism - The Small-Scale Trap
Migration Strategy: Phased Approach
At least attempting a phased migration, I established the following plan:
Phase 1 (February): Sapporo Branch (25 people, minimal configuration) Phase 2 (March): Osaka & Fukuoka Branches (80 people) Phase 3 (Late March): Head Office (280 people) Phase 4 (April): Plant Systems (Ultimate challenge)
“Starting small should mean small problems.”
February 5: Sapporo Branch Migration Begins
Sapporo Branch Existing Configuration:
|
|
GCP Post-Migration Design:
|
|
“Simple and clean design. This should be quick.”
February 10: First Landmine
Day 5 of migration work, the first major problem occurred.
9 AM emergency call from Sapporo Branch:
“We can’t access the file server! We can’t work!”
Investigation revealed the problem:
1. Active Directory Domain Dependencies
The Sapporo file server was dependent on the Tokyo head office Active Directory domain controller.
|
|
Problems:
- GCP file server cannot connect to physical DC (192.168.100.10)
- VPN configuration doesn’t support Active Directory authentication
- No authentication = all users cannot access files
2. Fixed IP Address Dependencies
|
|
Problems:
- All PCs have shared drives hardcoded to 192.168.1.10
- Need to change to new IP (10.1.2.10) on 25 devices manually
- No DNS setup, can’t resolve names
3. Bizarrely Complex Printer Dependencies
|
|
Incredibly, the Sapporo printer configuration was managed by a printer management server in Tokyo (192.168.100.15).
Dependency chain:
|
|
“Why do Sapporo printers depend on a Tokyo server…?”
February 12: Emergency Recovery Work
Weekend emergency response:
|
|
Results:
- 2 days of all-night work
- Sapporo branch resumed normal operations Monday morning
- However, printer problems remain unsolved
“Even a small branch office is this difficult…”
Chapter 3: March Collapse - Explosion of Complexity
March 1: Simultaneous Migration of Osaka & Fukuoka Branches
Learning from Sapporo experience, this time we conducted thorough advance investigation.
Osaka Branch (50 people) discovered complexity:
1. Complex Inter-Department VLAN Configuration
|
|
Problem: Need to reproduce this complex VLAN inter-communication control with GCP VPC firewall rules.
2. Mysterious Business Systems
Osaka-specific Customer Management System:
- Development language: Visual Basic 6.0
- Database: Microsoft Access (.mdb format)
- Network: Specific shared folder (\192.168.10.5\database)
- Creator: Veteran employee who retired 10 years ago
|
|
“IP addresses hardcoded… and it’s an Access file…”
3. Fukuoka Branch “Independent Kingdom” Problem
March 5, investigation of Fukuoka branch revealed the truth:
Fukuoka branch had a completely independent network design from other locations.
|
|
Why did this happen?
In 2018, when Fukuoka branch became an independent profit center, the branch manager decided they “didn’t want to depend on HQ systems” and built a completely independent system.
Problems:
- Fukuoka branch requires completely different migration method
- No consistency with integrated GCP project design
- User management and permission management also independent
“This isn’t migration, it’s new construction…”
March 15: Multi-Site Migration Nightmare
Results of forcing it in 1 month preparation period:
Migration Day Timeline
9:00 AM: Migration begins
|
|
11:00 AM: First trouble Mistake in Osaka branch VLAN inter-communication control rules. Sales department communication to technical department completely blocked.
1:00 PM: Data migration errors multiply Data migration of Osaka customer management system (VB6 + Access) suffers multiple character encoding errors.
|
|
3:00 PM: Fukuoka branch authentication system completely down Configuration error in independent Active Directory migration. All users cannot log in.
5:00 PM: Emergency rollback decision
Though we wanted to “restore everything today,” we had to accept reality.
|
|
March 16: Total Attack from Stakeholders
Emergency board meeting called
Executives: “Why did the migration fail?”
Me: “The existing systems were more complex than expected…”
Sales Manager: “Some customer data in Osaka branch was corrupted. How will you take responsibility?”
Technical Manager: “Didn’t you know about Fukuoka’s independent system beforehand?”
General Affairs Manager: “Migration costs have doubled the budget. Explain this.”
Additional problems discovered:
- 230 customer records corrupted during data migration
- Fukuoka branch email system stopped for 12 hours
- Osaka branch payroll system unable to process month-end
“I’m completely screwed…”
Chapter 4: April Despair - Head Office Migration Catastrophe
April 1: Forced Head Office Migration at New Fiscal Year Start
Executive ultimatum: “Complete migration by new fiscal year start. No more delays allowed.”
Head Office (280 people) Migration Target Systems:
|
|
April 1, 9:00 AM: Migration Begins
Team Structure:
- Me (Project Leader)
- Network Engineers: 2 people
- System Engineers: 3 people
- External contractors: 5 people
Migration Schedule:
- 09:00-12:00: System shutdown & data backup
- 12:00-15:00: GCP instance creation & configuration
- 15:00-18:00: Data migration & system configuration
- 18:00-20:00: Operation verification & adjustment
- 20:00: New system goes live
“This time we’ll succeed!”
April 1, 11:00 AM: First Fatal Problem
Unexpected situation during SAP system migration:
|
|
Discovered problems:
1. SAP License Strict Restrictions
The SAP system performed license authentication based on server hardware-specific information (CPU ID, MAC address, etc.).
Physical Server:
- CPU ID: Intel-Xeon-E5-2680-v3-12C-24T
- MAC Address: 00:1B:21:A4:32:F8
- License Key: Unique key generated from this information
GCP Compute Engine:
- CPU ID: Google-Custom-CPU (virtual)
- MAC Address: 42:01:0A:80:00:XX (dynamic)
- License Key: Completely different from physical server
“SAP license reacquisition needed… but the process takes 2 weeks…”
2. Mysterious Accounting System Specifications
Yayoi Accounting Server Edition hidden dependencies:
|
|
Problems:
- IP addresses hardcoded in numerous locations
- Only connections from accounting department subnet (192.168.30.0/24) allowed
- Hardware authentication also enabled
April 1, 2:00 PM: Cascading System Failures
Fatal flaw discovered in HR system:
|
|
Serious problems discovered:
- Python 2.7: End-of-life, doesn’t exist in GCP images
- PostgreSQL 8.4: Too old for GCP managed services
- Direct printer connection: Direct communication with network printer
- Hardcode festival: Written directly in source code, not configuration files
April 1, 4:00 PM: Company-wide System Outage
Error cascade:
|
|
Situation in the company:
- Sales: “Can’t handle customers! Quote system unusable!”
- Accounting: “Month-end processing due today, accounting system down!”
- HR: “Payroll calculation system stopped! Today is salary payment day!”
- Development: “Can’t access source code management server!”
April 1, 8:00 PM: State of Emergency Declaration
CEO’s direct emergency order:
“Restore all systems by business start tomorrow (9 AM). If not possible, cancel project and discipline those responsible.”
Emergency response from 8:00 PM:
|
|
All-night recovery work:
- 9:00 PM~3:00 AM: Emergency physical system recovery
- 3:00 AM~6:00 AM: Data consistency check
- 6:00 AM~8:00 AM: Operation verification
April 2, 9:00 AM: Barely Resume Operations
Recovery results:
- Core systems: 95% recovered
- Email system: 100% recovered
- File servers: 90% recovered (some data loss)
However, remaining problems:
- Some SAP ERP data corrupted (50 order records)
- Accounting system March closing process incomplete
- Some HR system attendance data lost
“At least the minimum is working…”
Chapter 5: Plant System Migration Abandonment and Policy Change
April 15: Plant System Migration Cancellation Decision
Following the head office migration chaos, the highest-risk plant system migration was cancelled.
Chiba Plant Production Line Control System:
- Windows Server 2003 (20 years in operation)
- VB6 control software (source code lost)
- Proprietary communication protocol with manufacturing equipment
- Shutdown = Complete production line halt = ¥30 million daily loss
Gunma Plant Quality Management System:
- Red Hat Enterprise Linux 4 (17 years in operation)
- Oracle 9i (Vendor support ended)
- Direct communication with measuring equipment (RS-232C, Ethernet)
- Shutdown = Quality inspection impossible = Shipping halt
“Migrating this would be suicide…”
April 20: Fundamental Project Review
New Policy: Phased Hybridization
Abandoning complete migration, we switched to the following approach:
Phase 1: New System Cloud-First (Start immediately)
- New systems deployed GCP-native
- Minimal integration with existing systems
Phase 2: Individual Migration of Migrable Systems (6-month plan)
- Low-risk systems with clear migration benefits
- Ensure sufficient validation period
Phase 3: Legacy System Status Quo (Continue for now)
- Plant systems etc. with too high risk
- Consider gradual modernization
Phase 4: Hybrid Infrastructure Optimization (1-year plan)
- Appropriate integration of on-premises and cloud
- Balance security and performance
Chapter 6: Learning and Improvement - Correct Migration Strategy
May~October: Phased Improvement Implementation
Using lessons learned from failures, we practiced correct migration strategy.
Success Case 1: New Web Application Cloudification
June: Customer Portal Site New Development
|
|
Results:
- Development period: 3 months (on schedule)
- Operating costs: 60% reduction from conventional
- Performance: 50% improvement in response time
- Availability: 99.95% achieved
Success Case 2: Phased Migration of Sales Support System
July~September: Sales Force Automation (SFA) Migration
|
|
Migration results:
- Downtime: 2 hours (within plan)
- Data loss: 0 records
- User satisfaction: Improved (better response time)
Success Case 3: File Server Cloud Migration
August~October: File Server Integration
|
|
Correct Migration Strategy Lessons
1. Importance of Advance Investigation
Failed investigation:
- Surface-level system configuration only
- Insufficient understanding of dependencies
- Inadequate stakeholder interviews
Improved investigation:
|
|
2. Phased Migration Principles
Failed approach:
- Forklift migration (bulk migration)
- Large scope execution at once
- Insufficient backup strategy
Improved approach:
|
|
3. Risk Management Framework
Migration Risk Matrix:
System | Migration Difficulty | Business Impact | Priority | Strategy |
---|---|---|---|---|
New Systems | Low | Medium | High | Cloud Native |
Web Apps | Low | High | High | Phased Migration |
File Servers | Medium | Medium | Medium | Hybrid |
Core Systems | High | High | Low | Status Quo |
Plant Systems | Extremely High | Extremely High | Excluded | Consider Modernization |
Chapter 7: One Year Later - Reflection and Results
January 2026: Project Completion Report
Final migration results (1 year):
Migration Complete Systems
- New systems: 100% cloud-native
- Web applications: 85% migration complete
- File & storage: 70% migration complete
- Development & test environments: 100% cloud-based
Status Quo Systems
- Core systems (SAP/Accounting): Continue on-premises
- Plant control systems: Continue on-premises
- Legacy business systems: Gradual modernization in progress
Hybrid Environment
- On-premises ⟷ GCP: Stable VPN connection
- Integrated monitoring: Cloud Operations Suite
- Integrated ID management: Google Cloud Identity
Results by Numbers
Cost effectiveness:
|
|
Performance improvements:
|
|
Organizational improvements:
|
|
Most Valuable Learning
1. Practical Hybrid Over Perfect Migration
Old thinking: “Migration must be complete to be meaningful”
Truth learned: “Placing appropriate systems in appropriate locations is important”
2. Organizational Challenges Harder Than Technical
Technical problems:
- IP address changes
- Inter-system integration
- Data migration
Organizational problems (more difficult):
- Inter-department interest coordination
- Resistance to change
- Ambiguous responsibility boundaries
3. Small Successes Build Trust
The trust lost from initial major failures was recovered through accumulation of small successes.
Trust recovery process:
|
|
Summary: What ¥30 Million and 6 Months of Hell Taught Us
Project Total Cost
Direct costs:
- Migration work expenses: ¥12M
- System recovery expenses: ¥8M
- External consultant fees: ¥6M
- GCP usage fees (failed portion): ¥2M
Indirect costs:
- Business downtime opportunity loss: ¥15M
- Data recovery work: ¥3M
- Additional personnel costs (overtime/weekend work): ¥4M
Total cost: ¥50M Wasted costs: ¥30M
Most Important Lessons
1. “It’s Easy” is the Greatest Trap for Engineers
Migrating legacy systems accumulated over 20 years is never easy.
- Invisible dependencies
- Lost knowledge (creator retirement)
- Hardcoded configurations
- Tight coupling with business flows
2. Phased Approach is Tedious But Ultimately Fastest
Bulk migration temptation:
- “Doing it all at once is more efficient”
- “Considering rollback costs…”
- “Want to shorten project duration”
Reality: Considering bulk migration failure risks, phased migration is most reliable and ultimately fastest.
3. Organizational Solutions More Important Than Technical
Success factors:
- Technical accuracy: 30%
- Organizational coordination: 70%
Stakeholder coordination, expectation management, and change management are keys to migration success.
Recommendations for Other Organizations
Pre-Cloud Migration Checklist
|
|
Recommended Migration Strategy
Green Field (New Development) Priority Approach:
|
|
Finally: The Value of Learning from Failure
This six months of hell was certainly a painful experience, but ultimately made our IT organization grow significantly.
What we gained:
- Deep understanding of legacy systems
- Recognition of risk management importance
- Phased approach skills
- Stakeholder management capability
- Improved team cohesion
Most important learning: The value of learning from failures and continuing improvement, rather than avoiding challenges out of fear of failure.
I hope organizations undertaking similar migration projects won’t repeat the same failures we experienced.
And I believe that if you do fail, by continuing to improve without giving up, you will surely reach success.
Related articles:
- Cloud Migration Phased Approach Practice Guide
- Legacy System Modernization Strategy
- Hybrid Cloud Operations Best Practices
Note: This article is based on actual migration project experience, but company names and system details have been anonymized and generalized.