Dr Erica Yang is senior computer scientist of the Data Division, Scientific Computing Department (SCD). She is also national labs services liaison officer of the SCD, responsible for service development liaison with the national laboratory directorate of STFC in three areas: data, systems, and visualisation.
Dr Erica Yang has broad interests in advanced data analysis methods and infrastructure technologies. In the recent years, she has extensively involved projects in designing, building and delivering operational software and infrastructure for next generation high volume and high throughput image processing, analysis and multi-dimensional visualisation technologies for scientific experiments at the Rutherford Appleton Laboratory (RAL). RAL hosts many cutting edge scientific instruments as part of the national laboratory directorate of STFC that produce not only unprecedented volume of data but also a wide range of complex data analysis challenges throughout the lifecycle of scientific experiment, simulation, and downstream data driven discoveries. They constantly push the boundary of modern computing, demanding a highly cross-disciplinary approach for building effective data analysis solutions. To that end, Dr Yang and her teams work with a spectrum of technologies, including
Previously as a research fellow, she worked on workflow analysis and development for a distributed aircraft engine diagnosis computing project with Rolls-Royce at Leeds University. As a technical lead for the Oxford-Google books project at Oxford University, she architected and led the development of a data streaming and processing system which has downloaded and processed almost half a million digitised books with associated metadata records for the Bodleian Library. Through parallelism and distributed processing, this work has significantly shorten the time to obtain the data from 10 months to about a week.
In 2007, she moved to RAL, where she initially focussed on building large scale distributed (operating) systems in long term R&D projects funded by the European Commission Framework 7 programme, including XtreemOS, EchoGrid, and GridTrust projects. She developed next-generation operating system level services to provide native OS support for distributed computation and data management.
Building on the experience, she started to apply her expertise to large science laboratories in 2009. She has since been managing or leading research and development activities that have a solid root in real world science problems in large laboratories, including ISIS - UK national neutron facility, Diamond Light Source - UK national synchrotron facility, ILL - European neutron facility, and a large number of major world class physical and life science laboratories in Europe.
She has extensive experience of data technologies in large science laboratory and in particular building software tools and systems to improve throughput, efficiency, and quality of scientific experiments and downstream data analysis and exploration activities. Her specialities include data management, semantic technologies, and high-througput systems based on distributed computing technologies, e.g. High Performance Computing (HPC) and High Throughput Computing (HTC) technologies. She has involved in the background research and development into the ICAT data cataloguing system, in particular, extending the underpinning metadata model to accommodate analysed data and link up science data with publications.
She has involved in the data provenance, controlled vocabulary developments of the PanData-ODI project. This has led to her work in knowledge management systems for large neutron and photon facilities, namely PaNKOS, and a collaboration with ILL facility, the European Neutron Source, at Grenoble, France, on leveraging text analytics methods and PaNKOS ontology for the application of matching, cross-linking, and recommending scientific data and publications. With her background in large scale distributed systems and service-oriented system developments, her current interests have extended into semantic indexing, analytics and linking for better understanding of large scale science conducted at a European scale, in particular their practical applications in discovering trends and patterns in experimental and computational sciences, keeping track of science developments over time, and using new knowledge from analytics to explore innovative ways to better explore and add values to science data.
In the recent years, she has also developed interest in high dimensional data and information visualisation, primarily in the field of tomographic imaging applications using X-ray and neutron in large facilities. This interest has led to extensive interactions with the CCPi consortium, the UK leading experts in tomographic imaging algorithm development and experimental imaging experts on the Harwell campus and university lab-based imaging facilities. She is leading the IMAT computing project to develop and pilot an in-experiment image reconstruction pipeline for the IMAT instrument, the first 3D tomography driven diffraction neutron instrument on the campus. Notably, this project has secured support and contributions from four divisions of SCD, via internal facility development fund for cluster development and visualisation, and a small EPSRC SLA grant for evaluation of tomography reconstruction software. It has also attracted funding from Harwell Imaging Partnership (HIP) for the development and deployment of HPC based image reconstruction pipeline using the ULTRA platform developed by SCD; and ISIS Mantid team for the development of FITS to Nexus converter and the integration of Mantid with ULTRA.
I2S2 (2009 - 2010)
My contributions: metadata capturing, cataloguing and archiving for experiment data analysis workflows, cross-organisational data sharing, common scientific metadata models, data analysis workflow study of neutron diffraction data analysis for GEM instrument and the pilot design and implementation
SRF (2011 - 2012)
My contributions: e-lab notebook design, development and integration with laboratory data cataloguing and archiving system, scientific research data management, data reduction, analysis and aggregation workflow study of Small-Angle Neutron Scattering (SANS) for SANS2D instrument
Science case studies: nanoscience using SANS2D instrument/ISIS TS2 with Dr. Cameron Neylon/ISIS Instrument Scientist)
PaNData-ODI (2012 - 2014)
My contributions: data provenance, ontology, semantic technologies, knowledge management systems, semantic analytics, data cataloguing and archiving workflows of DLS and integrated data analysis workflow study for SANS2D (with support for SANSView)
My contributions: Semantics driven user friendly tools for experiment data analysis, data analysis workflow study of stress rig experiments for ENGIN-X instrument, project manaegment
ULTRA (2014 - )
Areas: a high-throughput data processing platform for data analysis and exploration for diverse scientific experiments and computations (This is a suite of technologies built for the IMAT computing project, a collaboration bewteen SCD, ISIS, and DLS (More later)
My contributions: benchmarking and evaluation framework for tomographic imaging reconstruction algorithms and in particular, emperical studies into Astra toolbox, and CCPi CGLS implementations, project management
She is interested to work with innovative and ambitious SMEs to exploit technological advances to create practical business solutions in a competitive market place, for example, via InnovateUK or H2020 funding programmes.
(incomplete list of publications)