Every dog is an individual!

CoLabo
9 min readMay 9, 2022

An interactive visualization of dogs’ behavioral and physical traits.

By Yinan Dong —

Hello! My name is Yinan Dong and I joined the Colubri lab at the University of Massachusetts Chan Medical School in June 2021, as a User Experience (UX) and information designer. The lab does research on computational modeling of infectious disease, including outbreak reconstruction methods, machine learning predictions of risk and outcomes, and interactive visualization of clinical and genomic datasets. Another important area of work in the lab is education and outreach, and in this context, we are creating simulation tools and mobile apps for students to learn about infectious disease. The first project I worked on after joining the lab was the Darwin’s Dogs visualization project, a collaboration between the Karlsson and Colubri labs. I worked as the data visualization designer and developer for an interactive web visualization that we created to complement the scientific publications and make the results more accessible to a general audience.

Darwin’s Dogs is one of the Darwin’s Ark projects from the Karlsson lab that studies dogs’ genetics and behaviors to advance the understanding of complex diseases in dogs and humans. A paper describing a very rich dataset from the Darwin’s Dogs project and a detailed analysis of this data was just published in Science magazine after 5 years of hard work. You can read more about it here. Congratulations to the team!

The scientists in the Darwin’s Ark project wanted to share their study and findings with general audiences that might not be familiar with the complex statistical techniques used in the research. The public probably also has a great interest in learning more about dogs in general, and their own pet dogs in particular. People often talk about the dog’s behaviors as strongly associated with dogs’ breeds or genetics. For example, “The Ultimate Data Dog” chart below shows the dogs’ popularity and overall scores based on dog breeds. The Labrador Retriever is a popular dog breed for pet owners, commonly thought to have a highly sociable, easy-to-train, and friendly personality.

The Ultimate Data Dog. Sources: American Kennel Club, Canine Inherited Disorders Database. Credits: Concept & design by David McCandless Research by Miriam Quick Dog artwork by Andrew Park

But are those stereotypes really true? Scientists from the Darwin’s Dogs project have found, looking at the dogs’ behavior survey data, that variation within each breed is very broad and so those behavioral stereotypes are far less pronounced than people typically think.

“Every dog is a (unique) individual!”

Therefore, we felt that there was a need for visualizations that deliver an easily accessible data representation to clarify the scientists’ discovery to experts and the general public alike. We were aiming to develop a visualization tool that could accompany the research paper to describe the statistical linkage between dog breeds and their behaviors and physical traits in a more visual way. This interactive visualization would then let users explore the probability of matching a dog breed with particular behavior features or physical traits they may be interested in. After an almost year-long process of design and development, we reached an interactive visualization web tool or dashboard that anybody can view here.

A screenshot of the web tool, be available at https://darwinsark.org/muttomics

The design process

  1. Initial conceptualization

We discussed the Darwin’s Dogs data, the goals of the visualization, and early ideas about a web-based interactive tool during an initial meeting back in June 2021 (which turned out to be our first in-person work meeting since the beginning of the pandemic).

Selfie, taken during the first in-person meeting of the team

An early sketch that resulted from this meeting of the interactive tool, as shown in the next diagram, had the following “user flow”: let users select the dog “personalities” they want from a predetermined list; then the best-matched dog breed(s) would be highlighted in the available pool.

An early conceptual sketch from Andrés

2. Parsing the data

a) The scientists collected survey data of behaviors on 18,371 dogs, and DNA samples for a subset of these dogs, from dog owners all over the country, which is fully available in this Dryad data repository.

b) The final dataset categorized dogs into 24 breeds, including 23 pure breeds and one mixed breed. These behavioral surveys were analyzed to determine the probability of each one of these 24 breeds to present any combination of up to eight personality traits and eight physical traits.

3. Discuss different types of charts with the group

We were looking for an intuitive way to represent the data, one where the audiences could quickly grasp the data value and its implications. After analyzing a sample dataset, I came up with different chart types and mockups of the webpage to show how users could explore the data in various ways.

The following images summarize all the char types that I proposed and discussed with the group. It includes the traditional bar chart, progress bar chart, the area square chart, the waffle chart, and the “paw” chart:

The traditional bar chart
Different chart types we considered for the visualization, the square pie chart (top-left), the paw chart (top-right), the progress bar chart (bottom-left), and the square area chart (bottom-right)

The progress bar chart shows the probability of matching the dog you want based on users’ selections of dog personalities.

The area square chart uses the area or size of the square to represent the data value, in this case, a probability. A treemap is another chart that uses the proportion of the square size to represent the data value. The big rectangle is divided into many small rectangles to describe the data value of its sub-categories.

The tree map sketch

The other type of chart has a more unusual or whimsical aesthetic, using the dog paw shape as the chart base to map the data, which could be more fun to look at. Using visual metaphors and symbols can attract viewers and make for a more memorable visualization.

Using a dog’s paw shape to create an unusually looking chart

Most importantly, throughout our discussions, we tried to determine whether using the size/area of the square or a discrete number of area units would lead to a more intuitive visualization of the probability of a data selection.

A sketch illustrating the discussion about the alternative two visualization methods: square area and count

We also realized that our data has many extremely small probability values, less than 1%, and so the square size would have been too small to be read and interpreted. Furthermore, we also thought that it would be harder for viewers to visually quantify a probability from the total area of the square rather than simply counting a number of unit squares, especially for those who don’t have good numeric skills.

In the end, we decided to use an animated waffle chart, where the total area is represented by a number of elementary area units that are introduced with an animation effect to facilitate counting. We concluded that this chart could be a more intuitive way for the general public to visualize the probability of each breed matching their selection, and obtain a direct understanding of how these probabilities are not as large or uniform as one might expect:

The animated waffle chart

I coded different shapes of the animated waffle chart method, which used the dog body and the dog bone shape. I think that these shapes could make this visualization very fun for kids! These alternative versions of the dog charts could help engage young audiences with the data exploration:

The charts are shaped as bone (top) and a dog’s body (bottom)
The alternative version for young children, could be displayed on a touchscreen device

4. The visual style and storytelling

First of all, in order to keep the visual consistency with the Darwins’s Ark project, I chose the following color palette, the typeface, designed the UI elements and some mockups:

The visual style guide
One of the earlier web tool mockups

​​Secondly, we had to develop a visual language and storytelling that is accessible to the general public and try to make the whole story more attractive, by using titles/headlines, extra text explanations, web interactions, plain language, etc.

One problem I faced during the development stage was that the initial UI included buttons labeled with the technical terms from the research paper to describe the dog behavior as the constraints to filter the data. However, after I ran an internal test with the team, the feedback was that it would be hard for people to understand the meaning of each choice button, even dog owners.

Together with Kathleen Morrill, the first author in the paper and also a very talented artist (she draws the cute breed icons you see in the dashboard), and the rest of the team we devised the following solutions that I proposed for the problems noted above:

a) Added a mouse hover interaction to all the filter buttons to explain each term in plain language

b) Added a tag function to all buttons. When users select each trait button, it will call out tags with more common terms that people usually use when talking about dogs’ traits on social media. This is what the button tagging function looks like:

The tags explain the user selections

c) Added some subtitles to guide users step by step to explore the data on the web

d) Implemented a scrolling text in an introduction header page, which suggests some questions and introduces the concept and data, to stimulate people’s interest to explore the visualization below.

5. Limitations

Visualization is a way to represent the data and tell a data story, by no means to make any judgments, or say it’s the only truth. Every dataset has some level of uncertainty or bias. You can freely have your own thoughts and ways to interpret the data. We are just happy to share our discovery! :)

We don’t have a conclusion about how efficient to use the animated waffle chart method to represent the probability data, comparing it with other char types. Because this project lacks a user study to evaluate the different visualization methods to represent the probability data. But this could be a future study.

Learning more about the science behind the visualization:

The paper was finally published in the scientific journal Science on April 28, and it literally took the world by storm, confirming that this is a topic of great interest to people outside the dog research field!

The paper as a cover story in the Science magazine, read it at https://www.science.org/doi/10.1126/science.abk0639

The paper is fully accessible free of charge from this link, and numerous news articles were written about it, including in The New York Times, The Atlantic, and The Wall Street Journal. If you want to listen to Elinor Karlsson, the lead investigator in this project, talk about the work, please check her interview on NBC News. With all this interest about the topic, I hope that our visualization helps in informing the discussion and it also provides a fun experience to the users.

🧡 Special thanks

Thanks to Elinor K. Karlsson, Andrés Colubri, Kathleen Morrill, Diane P. Genereux, and Marjie Alonso. This is teamwork! We all discussed together to come up with ideas and solutions. And as noted before, Kathleen created all the cute dog breeds images and gifs. Thanks to Andrés Colubri taking the time and efforts to revise this article.

--

--

CoLabo

Colubri Lab at the University of Massachusetts Medical School