Universities Unite to Advance AI-Driven Workflow Management on Modern Cyberinfrastructure

New $5M NSF Award Will Power PegasusAI, an AI-Driven Workflow System to Accelerate Science from the Edge to the Cloud and High-Performance Computing

A collaborative team from the University of Southern California’s Information Sciences Institute (ISI), the University of Tennessee, the University of Massachusetts Amherst, and the University of North Carolina at Chapel Hill has received a $5 million award from the U.S. National Science Foundation (Award No. 2513101) to develop PegasusAI. Building on the proven Pegasus Workflow Management System, PegasusAI will integrate advanced AI to automate resource provisioning, predict performance, detect anomalies, and guide scientists through human-in-the-loop adaptation. This next-generation framework will enable researchers to seamlessly execute and manage complex workflows across the entire computing continuum—from edge devices to exascale systems.

The project, titled “CSSI: Frameworks: Applying Artificial Intelligence Advances to the Next Generation of Workflow Management on Modern Cyberinfrastructure,” will create PegasusAI, a modular, intelligent extension of the widely adopted Pegasus workflow management systemPegasusAI will harness recent advances in AI to bring greater automation, adaptability, and insight to data-intensive scientific workflows that operate across high-performance computing (HPC), cloud, and edge systems. 

By enhancing the reliability and usability of distributed systems, PegasusAI will help scientists tackle some of today’s most pressing challenges—from modeling black holes and predicting hurricanes, to advancing cancer research and monitoring space debris. The system will also contribute foundational technologies to the broader NSF cyberinfrastructure ecosystem, enabling researchers across the U.S. to leverage NSF-supported computing platforms and services with greater efficiency and ease. 

The project brings together experts in workflow systems, AI, performance modeling, real-time analytics, distributed computing, and cyberinfrastructure across the four institutions:

  • Ewa Deelman, Ph.D., University of Southern California (Lead PI)
  • Anirban Mandal, Ph.D., University of North Carolina at Chapel Hill
  • Sai Swaminathan, Ph.D., University of Tennessee
  • Michela Taufer, Ph.D., University of Tennessee
  • Michael Zink, Ph.D., University of Massachusetts Amherst

“PegasusAI will bring unprecedented automation, adaptability, and resilience to data-intensive scientific workflows. By embedding AI directly into resource provisioning, performance prediction, anomaly detection, and user interfaces, we aim to create a system that is both powerful and easy to use—capable of meeting the needs of today’s most demanding science while lowering barriers for new users,” said Ewa Deelman, Ph.D., Principal Investigator of PegasusAI and Research Professor at the University of Southern California and ISI Research Director.

“As scientific workflows grow more complex and span across cloud, HPC, and edge systems, there is a critical need to move from static execution to intelligent, adaptive workflow management,” said Michela Taufer, Dongarra Professor at the University of Tennessee.  “PegasusAI will apply AI to optimize resource usage, predict performance, detect anomalies, and guide users through human-in-the-loop adaptation.”

“We want to make building and managing scientific workflows 100 times easier,” said Sai Swaminathan, Assistant Professor at the University of Tennessee. “Our work focuses on designing intelligent, human-centered interfaces that help researchers—regardless of their scientific background—compose, adapt, and steer complex workflows in a more user-friendly manner. We’re also exploring how PegasusAI can better understand human intent and decision-making in real-time, so that scientific discovery becomes more interactive, adaptive, and inclusive.”

“In conjunction with the development of an AI-driven resource ecosystem, PegasusAI seeks to provide training for domain scientists in the design and implementation of scientific workflows,” said Michael Zink, Paros Professor of Geophysical Sensing Systems at the University of Massachusetts Amherst. “This training represents a critical component of the project, aimed at enabling users with varying levels of technological proficiency to effectively utilize next-generation workflow systems. “

“With the resource ecosystem for executing scientific workflows getting increasingly complex and spanning the edge-to-core continuum, provisioning and automatically tailoring resources for workflows is becoming a significant challenge that scientists have to navigate,” said Anirban Mandal, Director for Network Research and Infrastructure at RENCI, UNC Chapel Hill. 

This award is part of NSF’s Cyberinfrastructure for Sustained Scientific Innovation (CSSI) program, which supports the development of sustainable and extensible software frameworks that serve scientific communities and support future innovation.

To learn more, visit the NSF award page: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2513101

New Tool Unlocks Biomedical Discovery with Launch of Biomedical Data Translator

A groundbreaking new platform is set to transform the way scientists and clinicians access and analyze biomedical information. The Biomedical Data Translator Consortium today announced the initial public release of the Biomedical Data Translator, a powerful open-source knowledge graph-based system designed to integrate and harmonize vast, complex biomedical datasets to accelerate translational science and patient care.

Published in the journal Clinical and Translational Science, the release highlights the Translator’s architecture, user interface, and ability to support novel insights across genomics, pharmacology, clinical research, and more. The system enables users – from researchers to healthcare providers – to explore relationships across diverse data types without requiring deep technical expertise.

“Translator bridges the gap between scattered biomedical data and actionable knowledge,” said Karamarie Fecho, PhD, lead author, CEO of Copperline Professional Solutions, LLC, and Research Affiliate at the Renaissance Computing Institute. “By integrating and harmonizing data and surfacing evidence with transparency and traceability, Translator empowers users to ask meaningful questions and generate new hypotheses at the point of need.”

Translator leverages a scalable, federated knowledge graph that enables seamless querying across multiple data sources, ranging from clinical trial results and drug-target interactions to disease ontologies and model organism studies. Its intuitive interface reveals connections and evidence step by step, making complex data navigable and insightful.

“Translator’s strength lies in its integration of diverse knowledge sources, standardized semantics, and multiple reasoning methods,” said Gwênlyn Glusman, PhD, co-lead author, and Principal Scientist at the Institute for Systems Biology. “It is designed to support hypothesis generation and exploration of existing knowledge, saving significant time and effort figuring out what is already known, and what could be established next.”

To demonstrate Translator’s capabilities, the publication showcases real-world applications, including:

  • Identifying therapeutic options for patients with rare diseases
  • Explaining mechanisms of action for investigational drugs
  • Screening drug candidates in model organisms

Funded by the National Center for Advancing Translational Sciences (NCATS), the Translator initiative represents a milestone in biomedical informatics, offering a new paradigm for data-driven discovery.

Read the full publication: https://ascpt.onlinelibrary.wiley.com/doi/10.1111/cts.70284.

The Translator system is for research purposes and is not meant to be used by clinical service providers in the course of treating patients. 

RENCI and NOAA Collaborate to Advance Coastal Safety

The University of North Carolina at Chapel Hill’s (UNC-CH) Renaissance Computing Institute (RENCI) and Coastal Resilience Center (CRC) are making waves in Earth data science through their collaboration with the National Oceanic and Atmospheric Administration (NOAA). This partnership improves methods to assess and address coastal community risks by incorporating observations into the ADCIRC storm surge model to more accurately predict historical water levels across the Atlantic, Gulf of Mexico, and Caribbean coasts, increasing safety for the individuals living there. Recent acknowledgements highlight their pivotal role in advancing coastal safety and community resilience:

Coastal Ocean Reanalysis (CORA)

At the heart of this collaboration is the Coastal Ocean Reanalysis (CORA) dataset, a joint effort between teams from NOAA, the University of Hawaii, and UNC-CH. RENCI played a pivotal role by contributing its computational expertise to generate the new dataset. CORA provides predicted hourly historic water level and wave information at 500-meter increments along the coast, using observations dating back more than 40 years. Significantly, the model predictions, guided by observations, cover many areas where pre-existing data was sparse, capturing water level variability that will inform future assessments and needs in coastal regions.

Essentially, the data bridges gaps in existing coastal data, offering both precision and breadth to equip decision-makers with critical insights for flood risk and resilience planning and assessment.

CORA Gains Validation and Recognition

In June 2024, a NOAA article about CORA noted that researchers were able to provide preliminary validation for the dataset. A follow-up article in January 2025 noted that the publicly available (via NOAA’s Open Data Dissemination platform) dataset is already being used to address gaps in available data to improve coastal community flood risk assessment and planning. The article also notes that future uses of the dataset will enhance NOAA tools, like their Sea Level Calculator and High Tide Flooding Outlooks, as well as the National Water Model for comprehensive coastal flood mapping. Additionally, NOAA is looking to expand the dataset to include flood risk assessments for the U.S. West Coast, Hawaii, and Alaska by 2026. 

In addition to these online accolades, during the 2024 American Geophysical Union (AGU) meeting, Dr. Rick Spinrad, Undersecretary of Commerce for Oceans and Atmospheres and Administrator for NOAA, delivered a keynote address celebrating NOAA’s many achievements. Among the highlights was the improved safety and awareness afforded to coastal communities thanks to CORA, a sentiment that was also published in NOAA’s 2024 Report, focused on how the agency was working to build a Climate Ready Nation.

Moving Forward

As NOAA continues to refine and expand CORA, the already vast applications will increase, improving coastal floodplain evaluations, supporting resilience planning, and offering decision-makers the ability to better protect their communities. By providing actionable insights into sea level rise and coastal inundation, RENCI and others involved in CORA are setting a new standard for improved climate resilience.

UNC researchers awarded up to $10M to leverage data science to accelerate cancer diagnosis and optimize delivery of precision oncology

A team of UNC-Chapel Hill researchers has been awarded up to $10 million in Advanced Research Projects Agency for Health (ARPA-H) funding to develop the Cancer Identification and Precision Oncology Center (CIPOC). The project is designed to improve cancer diagnosis and support personalized treatments by quickly aggregating and analyzing a wide range of health data, including electronic health records, histopathological and radiological images, insurance claims and geographic information.

Specifically, CIPOC will facilitate the development of an oncology health learning system that utilizes AI-ready data to generate real-time identification of new cancer cases, support patient recruitment for research, recommend precision cancer care, and help improve cancer care equity and quality. It also will create an accessible, adaptable system for health providers across diverse locations and resources.

The project is led by four principal investigators across Carolina:

  • Ashok Krishnamurthy, PhD, director of the Renaissance Computing Institute (RENCI) and data science core lead.
  • Jennifer Elston Lafata, PhD, professor in the Division of Pharmaceutical Outcomes and Policy at the UNC Eshelman School of Pharmacy and innovation and optimization partners lead.
  • Caroline Thompson, PhD, MPH, associate professor of epidemiology at UNC Gillings School of Global Public Health and rapid identification core lead.
  • Melissa Troester, PhD, MPH, professor of epidemiology at UNC Gillings and precision oncology core lead.

“CIPOC is a multi-disciplinary project that will significantly advance not just rapid cancer identification and precision oncology but also health data science and informatics,” said Krishnamurthy, a research professor of computer science at UNC-Chapel Hill. “The approaches we are developing can be used in other areas of health care, which is possible because CIPOC brings together diverse expertise across a number of fields to work together on a common goal.”

The project will organize and facilitate collaborative research conducted by faculty, staff and trainees from more than 12 schools, centers, departments and programs at UNC-Chapel Hill with a shared vision to create cutting-edge data tools researchers and practitioners can use at UNC – and in time across North Carolina and the United States – to improve the diagnosis and treatment of cancer.

“While precision oncology has made major advances in recent years, translation of these innovations to practice has lagged behind as has our ability to monitor, track, and therefore understand and plan for needed cancer-related services,” said Thompson, a UNC Lineberger Comprehensive Cancer Center member. “By accelerating the identification of cancer cases and developing innovative informatics tools to make improved, precision recommendations for care, this project can advance the provision of equitable care services and delivery.”

The three-year project will focus on building an oncology learning health system at UNC Health, with the potential to expand across North Carolina and nationally. A learning health system integrates scientific evidence, data and culture into daily care with a commitment to continuous improvement and innovation. The goal is to produce high-quality and high-value care that is equitable across diverse populations.

“As part of our efforts, we are forming a panel of nationally recognized experts and advisors. This panel will provide our team with ongoing feedback and serve as an independent sounding board. Their input is crucial to ensuring the usability and acceptability of our processes and products,” said Lafata, co-lead of the UNC Lineberger’s Cancer Care Quality Initiative. “This step is essential given our focus on accelerating academic discovery, optimizing cancer care delivery and supporting public health reporting. Additionally, these advisers will help us minimize any inherent biases in our work.”

CIPOC will utilize AI tools, including large language modeling, to quickly standardize, harmonize and link structured and unstructured data from multiple sources, enabling more precise tracking and treatment for different cancer types.

It also will develop an AI-driven virtual multidisciplinary tumor board to support the provision of precision oncology care. Studies have shown multidisciplinary tumor boards, in which a group of experts in different specialties review and discuss patients’ medical conditions and treatment options, can improve cancer outcomes. The board will use AI-ready datasets, including electronic health record-derived clinical data and histopathological and radiological images, to help inform prediction of risk and tumor progression as well as treatment decision making.

“We want to make precision oncology more widely available to North Carolinians. This project aims to develop tools that will use common medical record data to define care that responds to each patient’s unique tumor biology, reducing the need for additional, costly testing,” said Troester, co-leader of the UNC Lineberger Cancer Epidemiology Program.

CIPOC will make its data tools open source, allowing them to be scaled and adapted by health systems of any size, thus improving the use of clinical data for research and cancer care across a broad spectrum of communities. This innovation aligns with ARPA-H’s national goals to strengthen health care system resilience and equity.

The development and submission of the ARPA-H proposal was supported by the UNC Office of Research Development, with oversight by Nathan Blouin, MBC, CRA, assistant vice chancellor for research development, and Nate Warren, PhD, research development manager.

GRAU DATA joins the iRODS Consortium

GRAU DATA, a software company headquartered in Schwaebisch-Gmuend, Germany, has joined the iRODS Consortium, the membership-based organization that leads development and support of the integrated Rule-Oriented Data System (iRODS).

Since 2007, GRAU DATA has developed software products that simplify the management and protection of data for companies, research institutions, and government agencies. With specialized solutions in data archiving, data protection, and metadata-driven search, the company focuses on security, scalability, and user-friendliness in its software products. 

iRODS is an open-source software used to store, manage, and share large amounts of data and metadata. By providing mechanisms for defining rules for data storage, processing, and distribution, iRODS supports interoperability and scalability of data infrastructures. The iRODS Consortium guides iRODS development priorities and facilitates support, education, and collaboration opportunities for iRODS users.

David Cerf, Chief Data Evangelist at GRAU DATA, highlights how GRAU DATA’s products dovetail with iRODS to enhance the services and functionalities available to the worldwide iRODS user community.

“Our solutions help iRODS users cut storage costs by approximately 50% and make better use of unstructured data for AI and analytics,” said Cerf. For example, a GRAU DATA product called MetadataHub helps iRODS users turn unstructured data and its embedded metadata into valuable insights for analytics. “Getting the most out of unstructured data is crucial for improving data quality and speeding up results,” Cerf noted. “MetadataHub automates data preparation to make it AI-ready, improves training models, and sets up downstream applications while maintaining data lineage and governance.”

Through its reporting feature, MetadataHub also gives users a comprehensive view of their data landscape to better manage storage resources, reduce costs, and save time on storage management. “Ultimately, these solutions provide iRODS users with a significant advantage in efficiency, insight, and strategic value,” said Cerf. “We’re excited to become part of the iRODS Consortium.”

Terrell Russell, Executive Director of the iRODS Consortium, expressed enthusiasm in welcoming the Consortium’s newest member. “GRAU DATA clearly has a deep well of expertise in understanding the challenges organizations face in handling, using, and protecting large data collections,” said Russell. “We look forward to further enhancing our collaboration to help organizations effectively leverage all of the strengths that iRODS has to offer.”

The iRODS software has been deployed at thousands of locations worldwide for long-term management of data in various industries such as the oil and gas industry, biosciences, physical sciences, archives, and media and entertainment. The development team of the iRODS Consortium is based at the Renaissance Computing Institute (RENCI), which is affiliated with the University of North Carolina at Chapel Hill, USA. To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about GRAU DATA, please visit graudata.com.

Leading data science expert joins RENCI as deputy director

Rebecca Boyles, MSPH, currently the founding director of the Center for Data Modernization Solutions at RTI International, will join the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill as deputy director on June 24, RENCI Director Ashok Krishnamurthy, PhD, announced today.

Boyles’ leadership of the Center for Data Modernization at RTI International focuses on bridging the research and information technology gap by applying a data ecosystem perspective that enables researchers to maximize the value of their data assets. Boyles also has worked closely already as a partner to RENCI, in particular as a leader on both NHLBI BioData Catalyst and the NIH HEAL Data Stewardship Group, two important projects that help researchers harness the power of data. 

As RENCI’s deputy director, Boyles will take responsibility for RENCI’s research division by managing and enhancing research partnerships with faculty at UNC-Chapel Hill, Duke University, and North Carolina State University; building relationships between RENCI and Triangle area businesses; and leading efforts to bring new federal research funding to RENCI and its partner institutions. She will also apply her trademark skills in developing fit-for-purpose solutions that enable researchers to use data for the public good. 

“Rebecca is an exceptional leader with deep expertise in building data science teams and executing on innovative and impactful projects,” said Krishnamurthy. “We have worked with her on a number of joint projects, and this history shows us that she will be able to make significant strategic contributions at RENCI and in partnership with UNC and our broader research community.” 

In addition to her passion for data science, research, and information technology, Boyles has also enabled strong strategic growth at organizations throughout her career. While a data scientist at the National Institute of Environmental Health Sciences, Boyles clarified the strategic vision for the environmental health science data ecosystem, leveraging existing data assets to respond to timely public health issues. She identified opportunities to catalyze scientific advancements in chemical safety and public health through interactions with broad stakeholder groups. She also liaised with NIH leadership and served as science officer on the Big Data 2 Knowledge (BD2K) program including the Data Discovery Index, Frameworks for Community-Based Standards, and The Center for Predictive Computational Phenotyping.

“I am thrilled to join RENCI’s efforts to tackle intractable, long-standing problems by driving the future of scientific computing in collaboration with their partner institutions,” said Boyles. “I look forward to bringing my background in environmental health and biomedical research, along with my experience partnering with diverse groups, to contribute to the pursuit of novel and effective solutions.” 

Boyles holds an MSPH in Environmental Science and Engineering from the Gillings School of Public Health at UNC-Chapel Hill, along with a BA in Biology from UNC-Chapel Hill. Her areas of expertise include data modernization, FAIR data principles, data and modeling applications, data analysis and data management, data integration, and data strategy and implementation. 

Download a picture of Rebecca Boyles.

What to expect at the iRODS 2024 User Group Meeting

The worldwide iRODS community will gather in Amsterdam, NL from May 28-31

Members of the iRODS user community will meet at the Amsterdam Science Park in Amsterdam, NL for the 16th Annual iRODS User Group Meeting to participate in four days of learning, sharing use cases, and discussing new capabilities that have been added to iRODS in the last year.

The event, sponsored by SURF, RENCI, Globus, and Hays, will provide in-person and virtual options for attendance. An audience of over 100 participants representing dozens of academic, government, and commercial institutions is expected to join.

“We are excited to connect with our user community to learn more about the impact and utility of iRODS on a global scale in fields such as public health, materials science, biotechnology, and more.” said Terrell Russell, executive director of the iRODS Consortium. “In addition to learning from one another’s deployments and use cases, the 2024 iRODS User Group Meeting will provide opportunities to network with users around the world and sow the seeds for future collaboration.”

In May, the iRODS Consortium and RENCI announced the release of iRODS 4.3.2. Along with preparation for work on 5.0.0 and important bug fixes for the 4.3 series, notable updates include the new GenQuery2 parser allowing for richer metadata queries into the catalog, fixes for keyword combinations and bad inputs, a number of documentation additions, and a few new deprecation declarations. 

Another new feature is the S3 API v0.2.0. Many software libraries, tools, and applications now read and write the S3 protocol directly. Last year, the iRODS Consortium announced that the then-new iRODS S3 API could present iRODS via the S3 protocol, and shared details about the requirements, design, and initial implementation. This year, users will hear about the first two releases, the implementation of various endpoints, and the state of Multipart transfers.

During last year’s UGM, users were presented an overview and demonstration of exploratory work with further authentication services such as OAuth 2.0, OpenID Connect, and the iRODS HTTP API. At this year’s event, the iRODS Consortium will share updates through the first three releases of the HTTP API, including optimizations and setting the iRODS server up as an OpenID Connect Protected Resource.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature over 15 talks from users around the world. Among the use cases and deployments to be featured are:

  • iRODS Security Challenges Within an Enterprise Environment. Dow. Dow’s focus on data security necessitates a tailored approach for their internal users, leading to the development of the Scientific Data Management System (SDMS) Query Tool (SQT) — a user-friendly tool designed to facilitate secure access to specific datasets. The current gap with Metalnx for general users is that there is too much control over modifying data and collections. Additionally, it is difficult to synchronize the iRODS users to our existing Azure Security groups for permission management. This talk outlines the development of a Querying Tool utilizing the iRODS C++ API as a backend to communicate with iRODS. The talk will highlight the need for robust security architecture for Enterprise scale applications and where we are hoping to take the project to in the future.
  • Sharing data in a multi-system multi-role environment centered on iRODS. SURF and Erasmus University Rotterdam. SURF, the cooperative association of Dutch educational and research institutions, offers data infrastructure and services to the research communities. Some of its services are based on iRODS and are often used as building blocks for data platforms. One increasingly common architectural component in those platforms is a web portal where researchers can discover data using project specific queries. Once the data are found, they are made available to the researcher, directly, for example, with a download link or indirectly, triggering a copy to a computing environment where they are analyzed. The implementation of such workflow is time consuming. Its maintenance in the long term is often jeopardized by limited support available within the project and design choices too tailored for that use case makes its adoption by other organizations too difficult. We think that it is possible to model that workflow in a generic way as a reusable modular component and in a way flexible enough to support even the more stringent requirements associated with sensitive data. The component relies on iRODS and links together multiple web portals and repositories through an API layer based on FastAPI. We present here a proof of concept developed within the GUTS project, in collaboration with the project’s data management team and the research support.
  •  Integration of iRODS in a Federated IT Service through HTTP and Python API. CC-IN2P3. The Federated IT Service (FITS) project, a collaborative endeavor between the IN2P3 computing center and French national HPC Center named IDRIS, addresses the challenge of managing the escalating data volumes generated by research infrastructures. The project aims to consolidate computing and storage resources while maintaining control over hosting expenses and minimizing the ecological footprint of digital technologies. Within the FITS project, iRODS was selected as the storage pooling solution, leveraging its established use within the IN2P3 Computing Centre. This implementation enables project users to seamlessly access their data without being aware of its physical location. 
  • iRODS-based system turbocharged next-gen sequencing analysis during pandemic and beyond. National Institute for Public Health and the Environment (RIVM). The Dutch National Institute for Public Health and the Environment (RIVM) has numerous projects in various scientific domains that generate next generation sequencing data. Bioinformatics plays an important role in analyzing and interpreting this sequencing data. To support these analyses, we developed a platform that consists of a High Performance Compute (HPC) cluster, a Linux Scientific Workspace for software development and a Data Management System (DMS) based on iRODS. On top of this DMS, we also created a Job Engine: a tightly integrated process automation tool that manages the automated analyses of sequencing data on the HPC.

Bookending this year’s UGM are two in-person events for those who hope to learn more about iRODS. On May 28, the Consortium is offering beginner and advanced training sessions. After the conference, on May 31, users have the chance to register for a troubleshooting session, devoted to providing one-on-one help with an existing or planned iRODS installation or integration.

Registration for both physical and virtual attendance will remain open until the beginning of the event. Learn more at this year’s UGM at irods.org/ugm2024

About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

UNC Advances Hurricane-driven Flood Prediction Capabilities for Coastal Communities

On September 14, 2018, Hurricane Florence made landfall in the Wrightsville Beach area of coastal North Carolina. While the storm was a category 1, it caused catastrophic flooding throughout much of the state. The record amount of rain from the system combined with an already saturated soil. Rivers overflowed their banks, storm surge inundated coastal areas, and the water had nowhere to go. It was a rare compound flooding scenario that will be studied and remembered for a long time.

It is difficult to model compound flooding – fluvial (river), pluvial (surface flooding unrelated to rivers), and oceanic storm surge interaction – impacts, but this scenario is faced annually by communities in the path of tropical and extratropical storm systems. Unfortunately, the difficulty of modeling and understanding these events impedes already difficult hurricane decision-making, leaving countless communities at increased risk, and there is evidence that these compound flooding events may occur more frequently in the future (e.g., Wahl, T., et al. “Increasing risk of compound flooding from storm surge and rainfall for major US cities.” Nature Climate Change 5.12 (2015): 1093-1097.). But a new modeling approach for river representation in the widely used coastal model ADCIRC may help change that, providing predictions and insights to the decision-makers working to keep their communities safe during storm-related flood events.

The Renaissance Computing Institute (RENCI), University of North Carolina (UNC) Center for Natural Hazards Resilience, and Institute of Marine Sciences (IMS) at UNC-Chapel Hill, combined efforts under a grant from the National Oceanic and Atmospheric Administration (NOAA) to develop a better modeling approach for the compound flooding caused by these interconnected water systems. The resulting model advancement will help scientists represent river channel size variations and provide better insights into interactions between river channels and floodplains.

Current Models:

There are several models used to understand and predict coastal inundation scenarios, but two models are primarily used to understand flooding:

  1. ADCIRC is developed by a consortium of researchers in academia, government, and industry, with activities centered and coordinated at both UNC-Chapel Hill and Notre Dame. It is the most widely used storm surge modeling and analysis platform. In fact, FEMA uses the model for coastal flood insurance studies, defining storm surge levels for coastal insurance rates. However, the standard trapezoidal river channel representation used in ADCIRC only accounts for structures down to 30 m, with smaller structures (small rivers, man-made waterways, inlets, estuaries, etc.) creating a more burdensome computation. This creates inaccuracies when modeling compound flood events.
  2. HEC-RAS, a fluvial modeling system developed by the Army Corps of Engineers, accurately models river systems and has been the primary system used for real-time prediction of river flow and stage by the NOAA River Forecast Centers. It was originally developed as a model for inland river systems, where coastal waters do not reach. 

As a result, we currently have two unique and independently accurate models, one for storm surge flooding, and one for fluvial systems, but neither adequately accounts for impacts captured by the other. This means communities that fall into both flood risk zones are left outside our current ability to model and understand their unique circumstances.

Modeling Compound Flooding

The team’s new riverine feature in ADCIRC, led by Dr. Shintaro Bunya (a research scientist with UNC-Chapel Hill’s IMS and DHS-funded Coastal Resilience Center) and Prof. Rick Luettich (Earth, Marine, and Environmental Sciences (EMES) faculty member, Director of UNC-Chapel Hill’s IMS, and principal investigator of the Coastal Resilience Center), represents fluvial channels and man-made waterways using elongated, one-dimensional elements in the channel direction. The depth of the river and the height of the river bank are then specified at the same location. Previously not possible in ADCIRC, this “discontinuous” elevation permits a more accurate simulation of water flow and more easily accounts for smaller structures. The new river feature seamlessly fits into existing two-dimensional ADCIRC models and is as accurate at modeling fluvial flooding as HEC-RAS. The technique details and applications were recently published here.

Already, the model has proven its worth. The new river feature was demonstrated in a real-world application (see the figure below) using a large, ocean scale ADCIRC grid for detailed simulations along the North Carolina coastal region. The coastal river network, with about 200 m along channel resolution in the Neuse River, is represented by the narrow elements, detailed in insets A and B. The entire ADCIRC grid is shown in inset C. The orange-red colors show the predicted maximum water level contour in a Hurricane Florence (2018) simulation, and the plot in the upper right shows a comparison of observed versus predicted high water marks along the Neuse River. The agreement between observations and predictions is very high, indicating that this new approach to river channel representation in ADCIRC will be highly beneficial in predicting future flooding river flow conditions and their impacts on coastal flooding.

Figure. Real-world example of the new channel network feature in ADCIRC. 

This new model has the potential to provide better predictions for communities where evacuation decisions can be the hardest to make, in the hope that North Carolina and other coastal states are less likely to be caught off guard by the flood risks in these compound flooding events.

IT4Innovations National Supercomputing Center joins the iRODS Consortium

IT4Innovations National Supercomputing Center at VSB – Technical University of Ostrava, which is based in the Czech Republic, has become the newest member of the iRODS Consortium. The consortium brings together businesses, research organizations, universities, and government agencies from around the world to ensure the sustainability of the iRODS software as a solution for distributed storage, transfer, and management of data. Members work with the consortium to guide further development and innovation, expand its user and developer communities, and provide adequate support and educational resources.

IT4Innovations is the leading research, development, and innovation center active in the fields of High-Performance Computing (HPC), Data Analysis (HPDA), Quantum Computing (QC), and Artificial Intelligence (AI) and their application to other scientific fields, industry, and society. Since 2013, IT4Innovations has been operating the most powerful supercomputing systems in the Czech Republic, which are provided to Czech and foreign research teams from academia and industry.

Integrated Rule-Oriented Data System (iRODS) is an open-source software that is used by research, commercial and government organizations around the world. The iRODS software allows you to store, manage and share large amounts of data, including their metadata, between different organizations and platforms and provides a mechanism for defining rules for their storage, processing and distribution. iRODS is designed to support collaboration, interoperability and scalability of data infrastructures.

Martin Golasowski, senior researcher at IT4Innovations, summarizes the benefits of membership in the iRODS Consortium: “The demand for a comprehensive solution for fast and efficient data transfer between locations is increasing across the European scientific community. Membership in the iRODS Consortium will enable us to communicate directly with the development team of this solution and provide us with access to the latest features and support in providing these tools not only to the scientific community.”

“iRODS provides a virtual file system for various types of data storage, metadata management, and, last but not least, a mechanism for federating geographically distant locations for data transfer. These features are used in the LEXIS Platform, which simplifies the use of powerful supercomputers to run complex computational tasks through a unified graphical interface or using a specialized application interface. The transfer of large volumes of data between supercomputers and data storage is then performed automatically and transparently for those using iRODS and other data management technologies,” adds Martin Golasowski.

“We are very excited to have our friends in the Czech Republic join the Consortium,” said Terrell Russell, Executive Director of the iRODS Consortium. “Their expertise and collaborative insights have already made iRODS better for everyone. We look forward to continued progress working alongside IT4Innovations.”

The iRODS software has been deployed at thousands of locations worldwide for long-term management of PB data in various industries such as the oil and gas industry, biosciences, physical sciences, archives, and media and entertainment industry. The development team of the iRODS Consortium is based at the Renaissance Computing Institute (RENCI), which is affiliated with the University of North Carolina at Chapel Hill, USA. To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about IT4Innovations National Supercomputing Center, please visit www.it4i.cz/en.

Exploring the power of distributed intelligence for resilient scientific workflows

New project led by USC Information Sciences Institute seeks to ensure resilience in workflow management systems

Image AI generated by author using DALL-E.

Future computational workflows will span distributed research infrastructures that include multiple instruments, resources, and facilities to support and accelerate scientific discovery. However, the diversity and distributed nature of these resources makes harnessing their full potential difficult. To address this challenge, a team of researchers from the University of Southern California (USC), the Renaissance Computing Institute (RENCI) at the University of North Carolina, and Oak Ridge, Lawrence Berkeley and Argonne National Laboratories have received a grant from the U.S. Department of Energy (DOE) to develop the fundamentals of a computational platform that is fault tolerant, robust to various environmental conditions and adaptive to workloads and resource availability. The grant is planned for five years and includes $8.75 million of funding.

“Researchers are faced with challenges at all levels of current distributed systems, including application code failures, authentication errors, network problems, workflow system failures, filesystem and storage failures and hardware malfunctions,” said Ewa Deelman, research professor, research director at the USC Information Sciences Institute and the project PI. “Making the computational platform performant and resilient is essential for empowering DOE researchers to achieve their scientific pursuits in an efficient and productive manner.”

A variety of real-world DOE scientific workflows will drive the research – from instrument workflows involving telescope and light source data to domain simulation workflows that perform molecular dynamics simulations.  “Of particular interest are edge and instrument-in-the-loop computing workflows,” said co-PI Anirban Mandal, assistant director for network research and infrastructure at RENCI. “We expect a growing role for automation of these workflows executing on the DOE Integrated Research Infrastructure (IRI). With these essential tools, DOE scientists will be more productive and the time to discovery will be decreased.”

Fig. 1: SWARM research program elements.

Swarm intelligence

Key to the project is swarm intelligence, a term derived from the behavior of social animals (e.g., ants) that collectively achieve success by working in groups. Swarm Intelligence, or SI, in computing refers to a class of artificial intelligence (AI) methods used to design and develop distributed systems that emulate the desirable features of these social animals – flexibility, robustness and scalability.

“In Swarm Intelligence, agents currently have limited computing and communication capabilities and can suffer from slow convergence and suboptimal decisions,” said Prasanna Balaprakash, director of AI programs and distinguished R&D staff scientist at Oak Ridge, and co-PI of the newly funded project.  “Our aim is to enhance traditional SI-based control and autonomy methods by exploiting advancements in AI techniques and in high-performance computing.”

The enhanced metasystem, called SWARM (Scientific Workflow Applications on Resilient Metasystem), will enable robust execution of DOE-relevant scientific workflows such as astronomy, genomics, molecular dynamics and weather modeling across a continuum of resources – from edge devices near sensors and instruments through wide-area networks to leadership-class systems.

Distributed workflows and challenges

The project develops a distributed approach to workflow development and profiling. The research team will develop an experimental platform where DOE scientists will submit jobs and workflows to a distributed workload pool. Once a set of workflows becomes available in the workflow pool, the agents need will estimate each task’s characteristics and the resource requirements with continual learning capability. “Such methods enhance the capabilities of the agents. The research will include mathematically rigorous performance modeling and online continual learning methods.” remarked Krishnan Raghavan, an assistant computer scientist in Argonne’s Mathematics and Computer Science division and a co-PI of SWARM.  

In SWARM there is no central controller: the agents must reach a consensus on the best resource allocation. “In imitation of biological swarms, we will investigate how coalitions can adapt to various fault tolerance strategies and can reassign tasks, if necessary,” said Argonne senior computer scientist Franck Cappello, who is leading the development efforts on fault recovery and adaptation algorithms. Here the agents will coordinate decision-making for optimal resource allocation while minimizing communication between agents such as by formation of hierarchies and by adoption of adaptive communication strategies.

Evaluation

To demonstrate the efficacy of the swarm intelligence-inspired approach, the team will evaluate the method by swarm simulations, by emulation and prototyping on testbeds.  “We will re-imagine how workflows can be managed to improve both compute and networking at micro and macro levels”, said Mariam Kiran, Group Leader for Quantum Communications and Networking at ORNL.

This article was written in collaboration with USC ISI, RENCI, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, and Argonne National Laboratory.