Canadian Bioinformatics and Computational Biology Strategic Framework

Download a PDF

Introduction

Data-intensive scientific research lays the groundwork for the development of the revolutionary approaches to problem solving and decision-making that are the primary drivers of innovation.


Big data illustration 1 Big Data requires special expertise to analyze large data sets and convert them into user-friendly formats.

Major technological advances during the last decade have accelerated the production of datasets of unprecedented size and complexity that in turn have driven the development of the new technologies and infrastructures needed to collect, store, manage, mine and integrate all the information. Many countries, including Canada, are struggling to extract meaningful information from all these big data, to derive maximum value from the billions of dollars invested in research and to realize the promised benefits to society.

In the life sciences, data are generated from a variety of sources, including biological research, data-intensive technologies such as quantitative imaging, environmental biodiversity studies, large population cohorts, plant and livestock breeding programs, multi-national clinical trials, and e-health initiatives. In recent years, the increase in data production has been particularly dramatic in the “omics” technologies, specifically in the area of genome sequencing. The Human Genome Project, completed in 2003, required hundreds of sequencing machines and cost over $1 billion over a 10-15 year period. Today, it is possible to sequence an individual’s genome in 2-3 days for little more than $1,000, and store that genome on a single USB stick. These numbers are not static and some estimates suggest that, by 2020, data will be generated at up to one million times the current rate, which is orders of magnitude faster than Moore’s law (i.e., doubling of computing power every two years). The analysis of human genomes, transcriptomes, proteomes, interactomes, metabolomes and microbiomes, will provide the basic knowledge necessary to diagnose, understand and cure many diseases with associated impact on the mitigation of social health costs. Parallel research in the agriculture, energy, environment, fisheries, forestry, and mining sectors is providing important information to guide pest management strategies, sustainable farming practices, natural resource management, crop developments and environmental monitoring in the face of climate change.

Big data are of little value to society unless they can be analyzed, interpreted and applied. Raw data are seldom useful to end-user groups whether they be economists, environmentalists, natural resource managers, or health-care professionals. Special expertise is required to analyze, interpret and integrate large datasets and convert them into a user-friendly format, readily accessible to the people wanting to use the information. The two intersecting disciplines that address this need are bioinformatics and computational biology (B/CB). As a multidisciplinary science that combines aspects of computer science, chemistry, biochemistry, statistics, mathematics, engineering, physics and medical sciences, B/CB plays a crucial role in the analysis of complex biological data and processes. B/CB integrates biological themes together with the help of computer tools and biological databases, providing new knowledge of the systems under study. New innovative and cutting-edge algorithms and computational and statistical techniques generated by B/CB professionals are vital in efforts to effectively mine, rapidly access, and efficiently analyze vast quantities of data as well as integrating across related datasets to harvest and apply the information contained within them.

The changing dynamic of big data production in all life science sectors has also created new ethical, legal and social challenges that impact on the timely development and implementation of policies and guidelines. In response, several federal and provincial initiatives have recently been launched to provide sound data management policies for Canada and promote open access to information and data, maximizing the availability of research data to researchers and the private sector. In the health sector, for example, any research data that ultimately interface with personal and health information are legally protected by an array of rules rooted in a previous, non-digital information age, but new applications in B/CB and the associated data change the equation for ethical, legal and privacy matters. It will therefore be important to rebalance the current political, legal and regulatory precepts to realize the benefits of data-intensive science while continuing to preserve the rights of the patient.

Direction Setting: Harnessing the potential of big data together

Canada now stands at the edge of an information and communications revolution of transformative social, economic and cultural impact, involving deep conceptual changes in the research environment that have been increasingly enabled, accelerated and influenced by dynamic new technologies.

In 2014, Canada’s research funding agencies, the Social Sciences and Humanities Research Council (SSHRC), the Natural Sciences and Engineering Research Council (NSERC), the Canadian Institutes of Health Research (CIHR) and the Canada Foundation for Innovation (CFI), in collaboration with Genome Canada (GC), sought feedback from the community on a proposed realignment of agency funding policies regarding management of data. The Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship in Canada consultation represents the first phase of a collective effort among Canadian research funders designed to encourage the collection and sharing of data through enhanced and coordinated data policies.

B/CB is recognized as a shared priority among research funders as it will be the key to extracting meaning from increasingly complex large data sets. Under the leadership of GC and CIHR, research funders, infrastructure providers, researchers and other stakeholders have been working closely together to: (1) obtain a clear picture of the B/CB landscape in Canada and abroad; and (2) reach consensus on Canada's strengths, achievements and needs in B/C.B The proposed B/CB framework represents an effort to co-develop a national approach to harness the potential of big data.

CANADIAN B/CB STRATEGIC FRAMEWORK

The vision underscoring the Canadian B/CB Strategic Framework is to build fully integrated B/CB capacity across the life sciences to ensure that Canadians derive maximum economic, health and social benefits from big data. It will also further position Canada as a leader in international initiatives to derive full value from the billions of dollars invested in research globally. To achieve this vision, concerted and coordinated efforts among all stakeholders are required in the following areas:

Strengthening the B/CB Research Enterprise: Resourcing to meet current and future demands

Due to extensive changes in research, technologies and methods, the bottleneck in scientific productivity has shifted from data production to data management, communication and interpretation. Without strong B/CB capabilities, crucial insights and discoveries would stay buried in the data, providing minimal return on the substantial investments in the life sciences and generating little benefit. The new tools and algorithms generated by the B/CB research community are, and will continue to be, the key to data interpretation and integration, bridging the gap between knowledge generation and application.

Resourcing Resourcing to meet current and future demands.

There is significant commonality in the B/CB needs of different sectors, with B/CB tools and algorithms developed in health, for example, being valuable in agriculture or the environment. This illustrates the integrating role of B/CB across the range of life sciences. While this situation helps foster research collaboration across sectors, securing research funding through traditional/investigator-initiated mechanisms has been challenging as this field falls at the intersection of the mandates of federally funded agencies but not squarely within the core purview of any.

Canadian research funding agencies have launched several strategic research initiatives that either specifically target B/CB (e.g., Genome Canada/CIHR Bioinformatics and Computational Biology competition) or promote the integration of B/CB in large-scale genomics projects (e.g., Genome Canada/CIHR Large-Scale Applied Research Project Competition in Genomics and Personalized Health). Of special note is the Discovery Frontiers Program: Advancing Big Data Science in Genomics Research (NSERC, Genome Canada, CIHR and CFI) which is supporting the flagship “Cancer Genome Collaboratory”. However, core funding for B/CB research programs cannot be sustained through strategic funding, which is currently the situation for many.

Although B/CB holds the key to data analysis, it is the availability and effective management of the appropriate hardware that makes this feasible. To address this need, a computing needs assessment and corresponding hardware plan is required that ensures that the necessary infrastructure resources are available to support B/CB research endeavors. Funding opportunities that require the integration of B/CB professionals and hardware providers in the project design, as well as confirmation of adequate computing support, will be vital for a strong and sustainable B/CB research community.

Immediate Actions

  • Coordinated action plan among life science research and infrastructure funders to promote alignment of B/CB funding opportunities, including:
    • Research initiatives designed to harness the potential of big data recognizing the importance of funding for both short-term demonstration projects as well as longer-term, large-scale projects
    • Open programs to provide sustained and predictable support for investigator-initiated B/CB research
    • Tools to promote the integration of B/CB and computing infrastructure into the planning of large-scale life sciences projects

For the B/CB research community to thrive and respond to the demand for innovative tools and algorithms, a coordinated action plan among Canadian life sciences research and infrastructure funders is required to ensure the appropriate alignment of funding opportunities. Through this increased coordination, Canada will be better positioned to maximize and leverage the impact of previous federal investments in research.

Transdisciplinary Capacity Building: The key to sustaining and growing the B/CB enterprise

Core informatics and computational modeling skills, coupled with an appreciation of the life sciences, are essential for the effective use and translation of high throughput data by the scientific community. As a result, B/CB expertise has become a skill set in high demand within both the private and public sectors, and B/CB professionals are highly sought after across the life sciences..

Transdiscipinary image Transdisciplinary approaches to training are required to provide the necessary skills to tackle the challenges of complex data sets.

Transdisciplinary approaches to training are required to provide the necessary skills to tackle the challenge of increasingly complex large data sets and the rapid pace of technological progress. B/CB crosses many academic departments, including molecular biology, biochemistry and genetics and requires integration with statistics, mathematics, computer sciences and engineering, The innovative bioinformatics graduate program developed at the University of British Columbia and Simon Fraser University is an excellent example of a novel training environment that links bioinformatics with basic biology to further the current research excellence in other life science sectors. Training in both biological and computational methodologies is made possible by integrating academic centres in computer science, statistics, molecular biology, and biotechnology, with translational research groups at hospitals and at the clinical interface.

This approach needs to also include a strategy to attract and retain B/CB leaders to oversee and be involved in these programs. In addition, to keep up with the pace of advanced technologies on the latest approaches used in computational biology to deal with the new data, short-term focused courses (such as the Canadian Bioinformatics Workshop Series) are required to ensure the continuous training/skills development of experts and users. If we are to achieve the broader vision of the national B/CB strategy and address the big data challenges of tomorrow, all stakeholders with a vested interest in training need to commit to this action.

Immediate Actions

  • Coordinated action plan championed by academic institutions and research funders, in partnership with other stakeholders requiring professional B/CB expertise that will:
    • Determine approaches and mechanisms required to foster the creation of sustainable transdisciplinary B/CB graduate training programs
    • Develop continuous, training approaches and processes to address the big data challenges of tomorrow

Academic institutions and funding agencies, working with infrastructure providers and other stakeholders (such as industry), need to develop a coordinated approach to inspire the evolution of transdisciplinary B/CB graduate training programs across Canada that are also linked nationally and internationally.

Formation of a National B/CB Network: National coordination for international impact

Coordination and collaboration have become the norm in data production efforts where Canadian B/CB researchers are playing a prominent role such as the International Cancer Genome Consortium, the International Rare Diseases Research Network, the Global Microbial Identifier, the International Human Epigenome Consortium, the International Wheat Sequencing Consortium and the International Cooperation to Sequence the Atlantic Salmon Genome to name a few. In response to the enormous amounts of data produced by these efforts, new international alliances have recently been formed to facilitate data sharing and promote the development of standard strategies and procedures for data management, such as the Global Alliance for Genomics and Health.

National Coordination National coordination is required for international impact.

Similarly, coordinated, collaborative approaches are emerging among the infrastructure funders and providers who are a vital resource for the B/CB community, such as the Canada Foundation for Innovation, Compute Canada and Canada’s Advanced Research and Innovation Network (CANARIE). Stronger linkages between these organizations and the data scientists and tool developers will better align computing infrastructures and their management with the needs of the B/CB professionals, reducing duplications and optimizing data storage and analysis. Furthermore, CANARIE, which manages an ultrahigh-speed network and associated funding programs is developing and implementing next-generation technologies in collaboration with the private sector. All these interactions are essential to better align computing infrastructure with the needs of the research community, reducing duplication and optimizing data storage and analysis capacity.

Moreover, the changing dynamic of big data production has also created new ethical, legal and social challenges that impact the timely development and implementation of policies and guidelines. In response, several federal and provincial initiatives have recently been launched to provide sound data management and stewardship policies for Canada and promote open access to information and data, maximizing the availability of research data to researchers and the private sector. A mechanism needs to be put in place to continuously inform policy and guideline development that involves the B/CB research and user community; which in turn will better position these communities in fulfilling their advocacy role for the field.

Immediate Actions

  • Establish a Canadian B/CB conference to connect the research community and stakeholders together to build an integrated community that is linked international
  • Bring all stakeholders together to develop an action plan for the formation of a national network, including how it needs be structured and funded to ensure its sustainability

Keeping up with the demands from the life sciences community for innovative and increasingly complex algorithms, tools and databases is surpassing existing researcher capacity. Along with a robust transdisciplinary research capacity action plan, we need to coordinate and integrate what is currently being done in Canada to be able to meet the demands of the user community. The establishment of a national conference would be a first step in coordinating efforts within the B/CB community, together with research funders, infrastructure providers, users and other stakeholders. In particular, the first conference would provide a forum for discussions as to how best to bring all stakeholders together as part of a sustainable national network.

The Way Forward: Connect, Coordinate and Train

The rate of data and new knowledge acquisition continues to accelerate, in turn escalating data storage and analysis challenges in the face of opportunities for the bio-economy and health sector. Scientists in Canada and around the world are struggling to understand and manage this vast amount of new data in order to ensure that deriving full value from past investments are fully realized and that critical information is not lost or forgotten.

This strategic framework is a national effort to rally all B/CB stakeholders as we strive to build fully integrated B/CB capacity across the life sciences. By connecting, coordinating and training, we will ensure that Canadians derive maximum economic, health and social benefits from big data.

When you're finished reading please complete the survey

Start the Survey