Computer program solves mysteries of fossilized leaves

The classification of leaves has been done for hundreds of years. Since the 1800s, scientists have described leaves based on certain criteria like leaf teeth, margins, lobes, the patterns of the veins and more. While this has allowed us to classify living species of plants, it hasn't been good enough to crack the "leaf code," or the tree of life for plants that includes species that are extinct and have only left behind fossils and no corresponding seeds, fruits or flowers which are the most important characteristics for classifying.

A paleobotanist and professor of geosciences at Penn State University, Peter Wilf, became determined to crack that code and he knew it would have to involve computers. Wilf read about a study where a computer vision program could determine if there was an animal in a photo or not and he immediately realized it was that type of software that could work.

"A bell rang in my head," said Wilf. "Instead of an animal, tell me if the image is of an oak leaf or not, or pick among several categories."

He got in touch with the lead author of that study Thomas Serre, professor of psychological science at Brown University, and they began applying the program to leaves and after nine years of experiments and building algorithms, they've published their own paper about just how well it worked in the Proceedings of the National Academy of Sciences.

"Paleobotanists have collected many millions of fossil leaves and placed them in the world's museums," said Wilf. "They represent one of the most underused resources for understanding plant evolution. Variation in leaf shape and venation, whether living or fossil, is far too complex for conventional botanical terminology to capture. Computers, on the other hand, have no such limitation."

beech leaves© Shengping Zhang / Penn State University

The Penn State research team created a database of 7500 images of cleared leaves, which is the type housed in museums that have been chemically bleached, stained and mounted on slides to reveal the vein patterns, most of which came from the Smithsonian Institution's National Museum of Natural History.

The team built a computer vision program where the computer actually learns from examples what distinguishes one plant species from another in their leaves. The researchers had already identified half of the photos so that the computer could create a dictionary of special features like vein intersections, bumps and asymmetries important for classifying the leaves while also learning to ignore imperfections like insect bites and slide mounting defects.

The researchers gave the computer a random selection of training and test photos 10 different times and the results were the same with only a one percent difference in the runs.

"The success of our computer vision approach suggests that this may be one of those tasks that are comparatively easier for computers because of computers' ability to process and analyze large numbers of specimens, to discover novel visual features that may have phylogenetic significance," said Serre.

The computer generates a heat map on top of the leaf image where it targets and rates areas important to identifying the species. This reveals a lot more information than the human eye could discern and it's also much faster. A trained person may take hours to describe a single leaf following the traditional protocol while the computer is thousands of times faster and produces a map of diagnostic characteristics.

So far, the researchers have achieved a 72 percent accuracy rate among 19 leaf families, which is the first target for paleobotanists trying to classify fossils -- putting them into families.

The computer program was even able pick up characteristics of well-known species that scientists hadn't noticed before, like finding unknown features of the tips of rose leaves and easily differentiating between leaves in the coffee family which are usually very hard to identify without their twigs attached.

The researchers believe this computer vision program will generate a vast amount of new botanical knowledge and perhaps finally give them the map to the tree of life for plants.

Tags: Computing | Nature | Technology


treehugger slideshows