Informatics has changed the way lab data is collected, stored, shared and analyzed. However, it is widely agreed that the full potential of laboratory data, for example, metadata, aggregated data and even the experimental results themselves, has yet to be realized; in particular, the application of these types of data to solve scientific challenges, such as productivity, reproducibility and large-scale integration. By re-imagining the traditional paradigms for lab data, companies are utilizing novel approaches to increase the value of laboratory data, the experimental process as a whole and scientific discovery in general,
BioBright utilizes hardware and software to easily record lab data and integrate it for analyses. The company’s focus is biological applications, especially the pharmaceutical industry’s needs from discovery, through clinical testing and manufacturing. Partners include the Sanger Institute, US Department of Defense’s DARPA (Defense Advanced Research Projects Agency) and major pharmaceutical companies.
Tools for Insight
Darwin consists of three main tools. “One is a Dropbox-like tool that automatically collects data from equipment, computers and other infrastructure that is deployed in typical pharmaceutical and research workflows,” explained Charles Fracchia, founder and CEO of BioBright. “The second is a voice assistant, very similar to Siri or Alexa but the difference is we have the knowledge of all the custom vocabulary that goes on in the laboratory.” The third, Darwin Terminal, is a touchscreen dashboard for data visualization and analysis.
As Mr. Fracchia told IBO, “One of the big tensions is that the current paradigm for doing work in laboratory science requires a lot of manual labor and a lot of physical dexterity. The issue is doing the experiment and documenting the experiment are diametrically opposed in terms of activities.” The result is lost insight and experimental knowledge. “What happens today is that scientists are doing an experiment and they try to remember all the minutiae and variation that may have happened, and only write them down in their lab notebook later on,” he said.
But Darwin Speech, the voice assistant, enables the data to be collected in real time with context. “For example, one can make a note by just saying, ‘Darwin, sample 3 looks cloudy,’ or ‘Darwin, I think the yield on this step is going to be low,’” explained Mr. Fracchia. “We know who said it, we know at what time, in what context, which instruments were used, which samples were being used, and all that information is basically available to the user after the fact.”
Maximizing Available Information
Previously, much of this contextual data may have been systematically collected rarely, if at all. “There’s a lot of unrecorded information. There’s a lot of institutional information. There’s a lot of interaction information that is really crucial to how a company operates,” noted Mr. Fracchia. Such information comes from both inside and outside a particular lab, such as from scientific presentations or even another lab in the same company that has completed the same experiment. “Our system, because it collects all the context from the data, from the presentation, from the equipment information, from the voice notes that people give, and also integrates with electronic lab notebooks, can get all the historical information.” But, as he emphasized, this solution does require a change in workflow. “Our tools are designed to be completely transparent to the workflow, the way it is done today. We not asking anybody to really change their workflow.”
The collection, aggregation and integration of such data can provide new insights, such as the source of experimental error or discovery of best practices. “That’s the reason we collect information that may look seemingly tangentially related, but [in one example] we were able to find distributions of dispensing operations, and find in this case that there was an unusual distribution due to a human factor,” noted Mr. Fracchia. “What we’ve found is most useful are metrics that help scientists create a baseline of parameters that they know or that they suspect are playing an important role. Our system helps them hone their skills, their intuition and their knowledge about a particular workflow—something that today a machine cannot do,” he noted. He emphasized that Darwin augments what scientists do, rather than replacing scientists with automation.
The range of data available provides for key analyses and discoveries to me be made. “So everywhere from log files of dispensers, to calibration information of a plate reader, to historical data for the same compounds, we bring in, so you can ask those questions and then display it on a ‘mission control’ view.” As he put it, “You cannot improve what you cannot measure.”
Adding Instrument Data
As a data integrator, the company is very different from an ELN provider, emphasized Mr. Fracchia. ”We are the glue that collects all of the information and then makes it available through APIs to any other vendor that wants to integrate with it.” This includes instrument data. “In fact, we see our role as being really key in interacting with those existing established players to really augment everybody’s capabilities… We are there to provide this interoperability layer for everybody who wants it.”
Mr. Fracchia also spoke to the future potential of BioBright’s technology. “[O]ur goal is squarely aimed at providing information and insights that will augment the human scientist in that workflow. You design things very differently when you can do that,” he explained. “Now, all of a sudden, you can start asking questions like, ‘Darwin, show me all images with a particular cell line in them,’ or ‘Darwin, show me the distribution of the drugs to the organisms that we’ve tried, and put that together with the results.’” This influences how a scientist works. “Our goal if we are successful is that scientists, when they are doing their work, never leave the scientific plane of thinking.”
As with BioBright, Riffyn seeks to address lab data in new ways. The company is targeting a range of industries, including pharmaceutical, chemical and agricultural companies. Partners include Novozymes. Siemens Venture Capital is among its investors, with the company having raised $4.6 million in funding as of July 2016.
Building a Process
Riffyn’s cloud-based Scientific Design Environment (SDE) consists of several components: Riffyn Map, which enables a scientist to record experimental procedures into process flow diagrams; Riffyn Track, whose Measure mode joins procedural data with results data as well as joins related data across steps in the process; Riffyn Discover automates the joining of data, resulting in a comprehensive data table that can be downloaded; Riffyn Share; and Riffyn Bridge. Features include visual process design, retention of past versions, standard and customizable ontologies.
As Timothy Gardner, founder and CEO of Riffyn, told IBO, the solution addresses the needs of organization as a whole by transforming the data to enable new analyses and discoveries. “What is really happening is the expansion of science beyond the boundaries of a single person, or a couple of people. Answering questions is no longer about getting your own personal data set,” he explained. “Scientific questions are no longer answered by one person’s experiment. To do science today, you need to connect with data beyond your own personal boundaries.”
Mr. Gardner breaks the scientific workflow into three parts—the human workflow, the data workflow and the process workflow. The process workflow which he described as “the architecture of your experiments in the lab, the parametrization of those experiments, and the flow of material from step to step and how it gets transformed.” He calls this process workflow “the core of Riffyn.”
The problem now, as Mr. Gardner stated, is ‘[y]ou have a lot of data in a database and nobody can make any sense of it because you didn’t actually articulate what was the experiment and the process by which you generated all that data.” Riffyn takes a different approach. “We started with that process as the primary workflow object and all the data is wrapped around that process, so as soon as it goes into Riffyn it is already annotated with all of its metadata and all of the relationships between experimental steps and processing,” said Mr. Gardner. “So that you can take that information, and you can automatically reshape the data into an analytical form. With a click of a button, everything you did across multiple processing steps, even multiple procedures and groups, can be transformed into a statistical data form that’s ready for machine learning, ready for computation, ready for visualization, ready for analysis.”
Such data analyses can ultimately make new discoveries. “[There is] discovery when you can pull relationships from data for which no specific experiment was conducted. That happens when you can aggregate dozens of high-quality data sets—the relationships are emergent— not detectable in the one data set, but clearly there in the many,” stated Mr. Gardner. However, current data practices do not meet these needs. “But you can’t see that normally do that kind of aggregation today because data is typically siloed, inadequately annotated, and unknown or low quality.”
Data for the Whole Organization
As an example, Mr. Gardner described how the SDE integrated with an organization’s use of Sharepoint and how it changed the work process. “But Riffyn takes the Sharepoint request to a new level of usefulness. Their requests flow into Riffyn and it automatically sets up an experiment. They can then execute the experiment in Riffyn. This analysis can be done by a company’s existing platforms,” explained Mr. Gardner. “Then the data is reshaped and transported downstream to another group where it gets analyzed with another third-party tool. Riffyn becomes this glue for defining your experimental workflows and processes, annotating your data and structuring it for analysis,” he added. Data analyses that can be performed include visual/interactive plotting, variance analysis and multivariant regression.
Among the tools that build on Riffyn’s output is SAS’ JMP platform for statistical and qualitative data analysis and visualization. “Our recommendation and favorite is JMP from SAS, which a lot of people already have and are not making as good use of as they could because they’re not shaping the data into a form that’s ready for JMP,” said Mr. Gardner. “Riffyn does that shaping for them, and then it drops it right into JMP. Riffyn is also compatible with other analytical tools besides JMP, including Tableau, Spotfire, R and Python. We also have an SDK [Software Development Kit] supporting more than a dozen programming languages.”
Riffyn’s solutions do not integrate directly with analytical instrument data, said Mr. Gardner, due to the infinite number of drivers that would have to be created. However, the company provides an API for those that want to write their own drivers. In addition, integration from other lab systems is possible. “In most cases, we can make the link. For example, we have a component—called a data agent—which can connect to relational databases and pull data out,” he noted. “If your data is being collected by a robot which stores the data into an onboard MySQL database, we can pull the data out of there.”
Founded in 2016, Synthace has created the Antha open-source operating system for biology labs. Antha is designed to address experimental protocols and methods, rethinking experimental processes, and planning for scalability and traceability. Current customers include Merck & Co. and Dow AgroSciences. The company recently raised $9.6 million in Series A funding (see IBO 10/15/17).
Automating the Lab
As Syntace CEO Tim Fell told IBO, most lab work is currently executed manually, including planning. “As a result, the scale of experiment which can be easily planned and executed is limited to the capacity, skill and diligence of the investigator, rather than the true complexity of the systems being investigated.” This has other limitations as well. “It also means that work is conducted, and data is recorded after the fact, often in either a paper or electronic lab notebook, where again every piece of information collected has a cost in time to enter, and introduces errors from mistakes in entry.”
In addition, metadata to provide context, such as environmental data or consumables usage, is not collected, decreasing traceability. “And that lack of traceability makes it even harder to find the sources of error and noise in experimental methods, distinct from the noise in the underlying complex organisms and systems being investigated,” added Mr. Fell.
Antha flips this way of working “on its head,” according to Mr. Fell. “Instead of executing a process, and then recording that into some medium, such as paper or an electronic lab notebook, you define what will be done up front, augmented by a suite of design and simulation tools to help with that design process.” This enables scalability. As he explained, “Because all the tacit knowledge required to perform a working practice is captured in the Antha language, and can be translated into execution using hardware from multiple manufacturers, it becomes possible to easily scale the scope of an experiment to properly address the underlying complex system, rather than being limited to the experiment that you can personally execute.”
Likewise, Antha can enhance reproducibility, “It also enables people to shift the defaults in how wet lab operations work: all work is repeatable, and working practices can be composed with [ease], enabling higher levels of abstraction and productivity to be achieved as a result,” stated Mr. Fell. “It also sets out the necessary preconditions for reproducibility, as removing experimenter error from the pipeline, and providing better visibility into the sources of noise helps improve the background quality of lab work.”
Mr. Fell describes Antha as consisting of three pieces. “The first is the underlying Elements which describe the reusable working practices, such as construct assembly, or transformation, or assays, which make up the workflow for the experiment.” A visual web-based experimental workflow is created. “Generally, these workflows describe the flow of actions (both purely computational and physical) which make up the experiment for a single sample or observation,” he explained.
Describing the second and third pieces, Mr. Fell said, “In turn, a set of parameters, either physical inputs or data values, is passed into the workflow, enabling the systematic investigation of dozens of experiments at once easy by varying the input parameters for each run of the experiment,” he explained. “Finally, this combination of parameters and workflow is scheduled against a concrete collection of connected hardware, enabling the system to produce all the low-level details needed to conduct the experiment, from what consumables will be needed, to the optimal layouts of plates and liquid handler decks to enable taking advantage of all the various capabilities of different platforms, such as multi-channel liquid transfers.”
Built using the Go programming language, Antha extends the language to address biological testing. “Antha also simplifies many portions of the syntax and grammar, introduces a more powerful type system which understands scientific units and common lab objects, as well as providing types for physical objects used today,” noted Mr. Fell. “[It also] introduces a process description model that guarantees all the tacit knowledge of a working practice is properly captured within Antha elements.”
Lab Equipment Integration
Antha can directly interact with lab equipment and instruments. “[T]he Antha representation of the process is in turn the executable representation of the experiment, with drivers directly talking to the various pieces of lab equipment, including generating methods for a variety of pieces of lab hardware. That in turn means that data flows automatically from the devices in a structured context, with no cost in time to the investigator for that provenance.” He added, “Sensors such as environmental monitors can also be easily overlaid on these experimental execution logs, enabling a full picture of the context of experimental results to be presented.”
Specifically, Antha can record output files from analytical instruments, interface with vendor control software of other instruments such as liquid handlers (for example, Gilson and CyBio systems), and provide direct instrument control of basic lab equipment (currently, incubators and shakers). This also makes automation more accessible. “Lastly, because Antha workflows can be easily automated on liquid handling platforms, they make it easy to engage with automation in a flexible way, rather than the current situation where automation programming is normally restricted to a specialist automation engineer rather than being accessible to every bench scientist,” said Mr. Fell.