Hi! I’m Ziyang Xu (徐子扬, StatXzy7), Ph.D. Student in Mathematics at Department of Mathematics, The Chinese University of Hong Kong (CUHK) from August 2024, advised by Prof. Tieyong Zeng and Liu Liu.

My research interests include Large Language Models and Image Processing…

You can find my CV here: Ziyang Xu’s Curriculum Vitae.

Feel free to reach out if you want to chat or collaborate!

📖 Educations

  • 2024.08 - 2028.07(Expected) The Chinese University of Hong Kong, Ph.D. in Mathematics, Hong Kong SAR, China (Supervisor: Tieyong Zeng, Liu Liu)
  • 2020.09 - 2024.06 Lanzhou University, B.S. in Statistics, Lanzhou, China (Supervisor: Zhouping Li, XingXing Jia)
  • 2017.09 - 2020.07 High School Affiliated To Nanjing Normal University, Nanjing, China

💻 Short Visiting / Intern

  • 2024.10 - 2025.10 Theory Lab of Huawei in Hong Kong Research Institute, Research Intern of Project: Mathematical-Model-Based Image Algorithm Research, Hong Kong SAR, China
  • 2024.07 - 2024.08 Peking University, BICMR, AI for Mathematics Formalization and Theorem Proving Seminar, Beijing, China
  • 2023.07 - 2023.10 Western University, The Schulich School of Medicine & Dentistry, Mitacs Globalink Research Intern, (Supervisor: Pingzhao Hu), London, Canada

🎖️ Honors and Awards

  • 2024.06 Brilliant Graduate of Lanzhou University - Academic Type 2024 (出彩毕业生-学术深造型) (Top 5 from university) News
  • 2024.03 CUHK Vice-Chancellor’s PhD Scholarship
  • 2024.03 Outstanding Graduate of Gansu Province(甘肃省优秀毕业生) News
  • 2023.05 Chun-Tsung Scholar(䇹政学者), (The 25th Annual) Figure News
  • 2023.04 Mitacs Globalink Research Intern Scholarship, (2023) Figure News
  • 2022.12 National Scholarship(国家奖学金), (Rank 1/117) Figure News
  • 2022.06 Merit Student of Gansu Province(甘肃省三好学生, 0.6%) Figure News
  • 2021.12 National Scholarship(国家奖学金), (Rank 1/157) Figure News

📝 Publications

💻 Large Language Models

``EMNLP 2025``
sy   m

REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan

  • Abstract: Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it’s contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional “belief shift” vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.
``EMNLP 2025``
sy   m

Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
Haonan He, Yuchen Ren, Yining Tang, Ziyang Xu, Junxian Li, Minghao Yang, Di Zhang, Yuan Dong, Tao Chen, Shufei Zhang, Yuqiang Li, Nanqing Dong, Wanli Ouyang, Dongzhan Zhou, Peng Ye

  • Abstract: Large language models (LLMs) have shown remarkable capabilities in general domains, but their application to multi-omics biology remains underexplored. To address this gap, we introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences, including DNA, RNA, proteins, and multi-molecules. This dataset bridges LLMs and complex biological sequence-related tasks, enhancing their versatility and reasoning while maintaining conversational fluency. We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training. To overcome this, we propose ChatMultiOmics, a strong baseline with a novel three-stage training pipeline, demonstrating superior biological understanding through Biology-Instructions. Both resources are publicly available, paving the way for better integration of LLMs in multi-omics analysis. The Biology-Instructions is publicly available at: https://github.com/hhnqqq/Biology-Instructions.
  • Codes:

🧬 AI for Science

IEEE J-BHI
sy   m

PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings
Ziyang Xu, Haitian Zhong, Bingrui He, Xueying Wang, Tianchi Lu

  • Work: We present PTransIPs, a new deep learning framework for the identification of phosphorylation sites in host cells infected with SARS-CoV-2. It utilizes protein pre-trained language model (PLM) embeddings and transformer structure to make the final prediction, with transductive information maximization (TIM) loss to better evaluate the error. PTransIPs is also a universal framework for all peptide bioactivity tasks.
  • Performance: After comparing PTransIPs with five existing phosphorylation site prediction tools, we notice it achieves the best performance in all five model evaluation metrics (ACC, SEN, SPEC, MCC, AUC) for both S/T and Y sites.
  • Impact: We hope that PTransIPs will aid in deepening the understanding of SARS-CoV-2 phosphorylation sites and look forward to enhancing PTransIPs in the future to become a more powerful tool for the scientific community.
  • Codes:

Flag Counter

Last updated on 2025/11/12