Erica Yang

Senior Researcher, Advanced Data Analysis

Dr Erica Yang is senior computer scientist of the Data Division, Scientific Computing Department (SCD). She is also national labs services liaison officer of the SCD, responsible for service development liaison with the national laboratory directorate of STFC in three areas: data, systems, and visualisation.

Dr Erica Yang has broad interests in advanced data analysis methods and infrastructure technologies. In the recent years, she has extensively involved projects in designing, building and delivering operational software and infrastructure for next generation high volume and high throughput image processing, analysis and multi-dimensional visualisation technologies for scientific experiments at the Rutherford Appleton Laboratory (RAL). RAL hosts many cutting edge scientific instruments as part of the national laboratory directorate of STFC that produce not only unprecedented volume of data but also a wide range of complex data analysis challenges throughout the lifecycle of scientific experiment, simulation, and downstream data driven discoveries. They constantly push the boundary of modern computing, demanding a highly cross-disciplinary approach for building effective data analysis solutions. To that end, Dr Yang and her teams work with a spectrum of technologies, including

  • High throughput systems: CPU cluster, GPU cluster, recently Hadoop cluster, streaming and in-memory cluster computing technologies
  • Data management: data modelling and cataloguing
  • High dimensional visualisation: parallel 3D volumetric data rendering and visualisation
  • Semantics analytics: text clustering and correlation

News

  1. Co-chaired session on "Streaming Analytics for Effective Data Exploitation", ColoRS workshop on Collaboration for Resilience and Security, Washington DC, USA, November 2014.
  2. Awarded funding on a workshop on "Urban Analytical Science (provisional title)", in collaboration with Warwick Institute for the Science of Cities, Warwick University and Department of Sociology, Essex University, funded by STFC Global Challenge Exploration Fund, December 2014.
  3. 3-min presentation slides for Horizon 2020 ICT-16 Big Data networking day, Jan. 2015

Professional Career

Previously as a research fellow, she worked on workflow analysis and development for a distributed aircraft engine diagnosis computing project with Rolls-Royce at Leeds University. As a technical lead for the Oxford-Google books project at Oxford University, she architected and led the development of a data streaming and processing system which has downloaded and processed almost half a million digitised books with associated metadata records for the Bodleian Library. Through parallelism and distributed processing, this work has significantly shorten the time to obtain the data from 10 months to about a week.

In 2007, she moved to RAL, where she initially focussed on building large scale distributed (operating) systems in long term R&D projects funded by the European Commission Framework 7 programme, including XtreemOS, EchoGrid, and GridTrust projects. She developed next-generation operating system level services to provide native OS support for distributed computation and data management.

Building on the experience, she started to apply her expertise to large science laboratories in 2009. She has since been managing or leading research and development activities that have a solid root in real world science problems in large laboratories, including ISIS - UK national neutron facility, Diamond Light Source - UK national synchrotron facility, ILL - European neutron facility, and a large number of major world class physical and life science laboratories in Europe.

Expertise

Core Capabilities

She has extensive experience of data technologies in large science laboratory and in particular building software tools and systems to improve throughput, efficiency, and quality of scientific experiments and downstream data analysis and exploration activities. Her specialities include data management, semantic technologies, and high-througput systems based on distributed computing technologies, e.g. High Performance Computing (HPC) and High Throughput Computing (HTC) technologies. She has involved in the background research and development into the ICAT data cataloguing system, in particular, extending the underpinning metadata model to accommodate analysed data and link up science data with publications.

She has involved in the data provenance, controlled vocabulary developments of the PanData-ODI project. This has led to her work in knowledge management systems for large neutron and photon facilities, namely PaNKOS, and a collaboration with ILL facility, the European Neutron Source, at Grenoble, France, on leveraging text analytics methods and PaNKOS ontology for the application of matching, cross-linking, and recommending scientific data and publications. With her background in large scale distributed systems and service-oriented system developments, her current interests have extended into semantic indexing, analytics and linking for better understanding of large scale science conducted at a European scale, in particular their practical applications in discovering trends and patterns in experimental and computational sciences, keeping track of science developments over time, and using new knowledge from analytics to explore innovative ways to better explore and add values to science data.

In the recent years, she has also developed interest in high dimensional data and information visualisation, primarily in the field of tomographic imaging applications using X-ray and neutron in large facilities. This interest has led to extensive interactions with the CCPi consortium, the UK leading experts in tomographic imaging algorithm development and experimental imaging experts on the Harwell campus and university lab-based imaging facilities. She is leading the IMAT computing project to develop and pilot an in-experiment image reconstruction pipeline for the IMAT instrument, the first 3D tomography driven diffraction neutron instrument on the campus. Notably, this project has secured support and contributions from four divisions of SCD, via internal facility development fund for cluster development and visualisation, and a small EPSRC SLA grant for evaluation of tomography reconstruction software. It has also attracted funding from Harwell Imaging Partnership (HIP) for the development and deployment of HPC based image reconstruction pipeline using the ULTRA platform developed by SCD; and ISIS Mantid team for the development of FITS to Nexus converter and the integration of Mantid with ULTRA.

Recent project activities

I2S2 (2009 - 2010)

SRF (2011 - 2012)

  • My contributions: e-lab notebook design, development and integration with laboratory data cataloguing and archiving system, scientific research data management, data reduction, analysis and aggregation workflow study of Small-Angle Neutron Scattering (SANS) for SANS2D instrument

  • Science case studies: nanoscience using SANS2D instrument/ISIS TS2 with Dr. Cameron Neylon/ISIS Instrument Scientist)

PaNData-ODI (2012 - 2014)

Accelerator (2013)

  • My contributions: Semantics driven user friendly tools for experiment data analysis, data analysis workflow study of stress rig experiments for ENGIN-X instrument, project manaegment

  • Science case study: Engineering science using ENGIN-X instrument/ISIS TS1 with Dr Shu Yan Zhan/ISIS, and Dr Joe Kelleher/ISIS)

ULTRA (2014 - )

  • Areas: a high-throughput data processing platform for data analysis and exploration for diverse scientific experiments and computations (This is a suite of technologies built for the IMAT computing project, a collaboration bewteen SCD, ISIS, and DLS (More later)

  • My contributions: benchmarking and evaluation framework for tomographic imaging reconstruction algorithms and in particular, emperical studies into Astra toolbox, and CCPi CGLS implementations, project management

  • Science case study: material and engineering sciences with IMAT instrument with Dr Winfried Kockelmann and Dr Genoveva Burca)

Qualifications and Achivements

  1. Ph.D. in computer science, specialising in distributed systems and security from Durham University
  2. MSc in distributed systems and high speed networking from Oxford Brookes University
  3. BSc in applied mathematics from South China University of Technology
  4. PRINCE2 foundation and practitioner certificate
  5. Excellent track records in bringing multidisciplinary teams together to tackle shared data and compute challenges
  6. Successful technical leadership in delivering practical software solutions to address science challenges

Selected Talks

  1. “PaNSIG (PaNData), and the interactions between SG-IG (the RDA Structural Biology Interest Group) and PaNSIG (the research data needs of the Photon and Neutron Science community Interest Group)”, at Global Research Data Alliance structural biology interest group, Dublin, Ireland, 2014. (Link)
  2. "Managing research data for diverse scientific experiments", IUcR Crystallographic Information and Data Management Symposium, Warwick University, August 2013. (Link)
  3. "Linking raw experimental data with scientific workflow and software repository: some early experience in the PanData-ODI project", Workshop on Diffraction Data Deposition (DDD), Bergen, Norway, 2012. (Link)
  4. "Poking Through Research Lifecycles: Towards Leveraging Web Archives for Sustaining Digital Research Assets", International Workshop on Big-Data Analytics for the Temporal Web, November 15, 2011, Paris, France, 2011.

Other interest

She is interested to work with innovative and ambitious SMEs to exploit technological advances to create practical business solutions in a competitive market place, for example, via InnovateUK or H2020 funding programmes.

Linkedin Profile

(incomplete list of publications)

  • Error contacting Epubs

Back to Top

© 2014 Science and Technology Facilities Council - All Rights Reserved.