Zhike(Kyle) Chen | Email: zk.chen007@gmail.com | GitHub: kk17
Summary
Backend Engineer, Data Engineer, and Machine Learning Engineer background. Contributor to open source projects like Hudi, Airflow, and Kyuubi.
- Big Data Stack: Spark, Flink, Kafka, Airflow, Presto, Apache Hudi, Apache Kyuubi, DBT/SQLMesh, etc.
- Cloud Platforms: AWS, GCP, and Alicloud.
- DevOps: GitLab CI/CD, GitHub Actions, Docker, Kubernetes, Terraform/Terragrunt, etc.
- MLOps: Ray, MLflow, JupyterHub, etc.
- Programming Languages: Python, Java, Bash shell, Scala, Go, etc.
Work Experience
GoTo Financial, LENDING DATA TEAM
Data Engineer Manager Jan. 2023 - Present
- Manage a streamlined, high-performing team of 10+ Data Engineers working across 5 main divisions: Data Ingestion, Stream Processing, Data Warehousing, Business Intelligence, and Machine Learning Engineering; built an open and agile team culture while promoting team members' growth.
- Lead cross-team collaboration, supporting multiple product lines and data requirements (synchronization, analytics, reporting) for over 10 business teams.
- Plan and execute data platform migration from AWS to GCP and from GCP to Alicloud; design equivalent data architecture on new cloud platforms while ensuring timely and high-quality migration completion.
- Identify potential system bottlenecks and implement optimization strategies to improve performance, reduce costs, and enhance efficiency; built internal data portal and self-service data capabilities to improve team productivity.
BYBIT SINGAPORE, DATA TEAM
Principal Data Engineer Sep. 2020 - Dec. 2022 (2 years and 4 months)
As a founding member of the data team, helped design and build the company's data platform and machine learning platform from scratch, continuously evolving the tech stack to improve performance and stability.
- Designed and maintained infrastructure services like Canal, Debezium, AWS EMR, Presto, and Airflow, leveraging GitOps and containerization to enhance productivity and standardization.
- Built a large-scale near real-time pipeline using CDC (Canal/Debezium) and DataLake (Apache Hudi) technologies, enabling efficient data delivery and supporting update/delete operations.
- Migrated Airflow from 1.0 to 2.0, improving task scheduling for ~40,000 daily tasks, and developed custom Airflow Operators and APIs for seamless integration with internal tools.
- Developed a unified SQL query middleware for engines like Kyuubi, Presto, and Hive, simplifying data access and enhancing user experience through integration with the data portal.
- Designed and implemented a machine learning platform using Juyterhub, Ray, MLflow, and Kubernetes, addressing MLOps challenges and supporting model training, tracking, and deployment.
ATOME/ADVANCEAI SINGAPORE, DATA ENGINEER TEAM
Senior Data Engineer Apr. 2018 - Sep. 2020 (2 years and 6 months)
As a Senior Data Engineer in the Data Engineer Team for the finance business line, I participated in the designing, building, and maintaining the company data platform.
- Migrated the ETL pipelines from Jenkins to Airflow to improve maintainability, performance, and stability. Developed Ad-hoc task feature for Airflow.
- Implement ETL data pipelines for the data processing from different upstream servers and sources. Optimized the ETL pipelines, improve efficiency, and reduce the overall execution time.
- Deployed and maintained Spark Thrift Server and Kyuubi Server, implement custom authentication and authorization. Significantly increased the ease of analyzing data without loss of security.
- Analyze data in the warehouse and created dashboards for business insights using PySpark, JupyterHub, and Superset.
NETEASE, HANGZHOU / GUANGZHOU, CHINA
Senior Backend Developer Jul. 2012 - Mar. 2018 (5 years and 9 months)
Initially contributed to NetEase Cloud Music before transitioning to the e-commerce department where I developed and maintained backend microservices.
- Designed and developed critical backend APIs for multi-platform applications, including user account management, payment processing, and integration with third-party systems like Sonos
- Built and maintained microservices infrastructure using Spring Boot/Cloud ecosystem, implementing SKU management, vendor management, and event messaging services
- Developed an automated CI/CD pipeline with GitLab CI, Docker, and custom scripting, significantly reducing deployment time and improving release reliability
- Refactored legacy applications into modern microservices architecture, improving maintainability and enabling rapid iteration
- Created custom testing and monitoring tools, including a Python-based API testing framework that improved testing efficiency and reliability
Education Background
- Singapore Management University SINGAPORE 2020 - 2022
Master in IT Business - AI track
- Jinan University GUANGZHOU, CHINA 2008 - 2012
Bachelor Of Computer Science And Technology