Dr Carolyn Hogg has more than 25 years’ experience in species conservation, inspired by her upbringing visiting South Africa’s Kruger National Park.
She has worked on conservation projects for animals including koalas, humpback whales, and orange-bellied parrots, in places ranging from Australia, to Hawaii, and Alaska. As Senior Research Manager for the Australasian Wildlife Genomics Group in the Faculty of Science at the University of Sydney, Carolyn’s mission is to help protect rare species, often unique to the country.
Something to brighten your morning! Getting out of a trap backwards....after being rudely woken up by the scientists 😂 @KathyBelov @Sydney_Science pic.twitter.com/YsgqGVH5Uq
— Carolyn Hogg (@HoggCarolyn) March 15, 2020
“Australia separated from the other continents more than 95 million years ago,” she explains. “More than 87% of our mammals, 93% of our reptiles, 94% of our frogs and 45% of our birds can only be found here in Australia. They are vital for biodiversity and we have a responsibility to ensure their survival.”
For the past decade Carolyn has been focusing on genomics research for the cross-governmental Save the Tasmanian Devil Program (STDP).
The Tasmanian devil is a rare marsupial, found only on the island state of Tasmania. It is threatened not only due to man-made changes – from extreme climate events like the recent devastating bushfires, to habitat destruction, and vehicle collisions – but also by a contagious cancer that causes facial tumours and has reduced its population by more than 80 per cent.
By using the genome of Tasmanian devils, Carolyn and her team can provide on-the-ground conservation managers with detailed scientific information and insights on how best to protect the species.
Their work has accelerated ever since the AWS Cloud trial at the University of Sydney in 2019, enabling them not only to speed up their research, but also to carefully, and more effectively, manage their funding.
Accelerating research
Carolyn says that assembling and annotating genomes – which means identifying the location of specific genes in a genome and determining what they do – used to be a “laborious and resource intensive process.”
“Imagine I gave you a 5,000-piece jigsaw, but I didn’t give you the picture to work from,” she explains. “How do you solve it? Firstly, you spread all the pieces out – taking up a huge amount of room – then you find the edges. Slowly you start to slot bits together and you contract the space used by the other pieces.”
“We’re often working with more than a billion pieces of jigsaw and no guide. AWS tools and computing power coupled with the RONIN interface – a simple web application that allows anyone to launch complex compute resources – help us to process, analyse, and categorise all of that data, to build the complete picture.”
Bushfire impact
Now, Carolyn’s team are starting a new project to assemble and annotate the genomes of some of Australia’s most threatened species. They will be sharing this genome data on the AWS Public Dataset Program, an initiative designed to give researchers anywhere in the world fast, on-demand access to scientifically valuable, publicly available datasets, with the aim of accelerating scientific discovery.
By making the genomic data more easily available to researchers worldwide, they can help to intensify work to protect some of Earth’s most endangered animals in Australia and beyond.
“Australia has the worst mammal extinction rate in the world. We have lost more than 29 species over the past 200 years - that’s 35% of all modern mammal extinctions,” says Carolyn. “The recent Australian bushfires have been catastrophic for our wildlife, pushing even more species to the brink of extinction.”
“The types of issues Tasmanian devils face are often very similar to those faced by other threatened species, such as the koala, orange-bellied parrot, or the woylie – an extremely rare, small marsupial”
“These range from low genetic diversity, to a high rate of infectious disease, and a fragmented landscape. As the devil population drops, genetic diversity also decreases. This leads to inbreeding, weaker immune systems, and a vicious cycle of more disease.”
“In turn, that has serious knock-on effects for other wildlife in the area, because devils are at the top of the food chain.”Democratising data
According to Carolyn, the only way forward for large-scale genome projects is to “share ideas, data, techniques and tools”. By democratising access to data, more people can work with it and get to the answers sooner.“When you’re dealing with delicately balanced ecosystems, it’s vital to understand the best course of action, based on evidence,” says Carolyn. “When trying to tackle disease, for example, does improving genetic diversity by intervening and moving specific groups of animals improve the situation, or does it actually make the disease more virulent?”
“Even with fragmented information, you can start to establish patterns. And often these patterns are true in other species. They can be used to draw certain conclusions or rule out other ideas.”
“In terms of scaling this up to benefit other species, the ultimate goal would be to create a universal genomic library and tools that other researchers and conservation managers can access in order to make science-based decisions.”
“That’s why the AWS Public Dataset Programme can be so valuable in so many fields of study. These tools help us to bring the worlds of academia and conservation management closer together than ever before. And when we prove an idea works – it can go global.”
AWS Public Dataset Program
The AWS Public Dataset Program covers the cost of storage for some of the world’s most scientifically valuable, publicly available datasets, including those from NASA, the Hubble Space Telescope, the UK Meteorological Office, and the Allen Institute for Brain Science. By making these datasets available in the cloud, the program enables researchers to work with them without needing to download and store their own copies. This allows users to analyse massive amounts of data in minutes, regardless of where they are in the world, or how much local storage space or computing capacity they can access.
The AWS Public Dataset Program already hosts a number of human health datasets, including two of the world’s largest cancer genome data sets, as well as animal genome datasets from Genome Ark. Genome Ark is part of the Vertebrate Genomics Project, which is working to provide high quality genome sequences for diverse, endangered animal species. Genomes it has recently made available on the AWS Public Dataset Program include the New Zealand kakapo – a large, flightless, nocturnal, ground-dwelling parrot.
AWS works with data providers who seek to democratize access to scientific data by making it available for analysis on AWS, developing new cloud-native techniques, formats, and tools that lower the cost of working with data, and encouraging the development of communities that benefit from access to shared datasets.
Find out more about the AWS Public Dataset Program.