Release v2008Q2 (October 2008)
Produced by the
ACToR (Aggregated Computational Toxicology Resource) is a collection of databases collated or developed by the US EPA National Center for Computational Toxicology (NCCT). More than 200 sources of publicly available data on environmental chemicals have been brought together and made searchable by chemical name and other identifiers, and by chemical structure. Data includes chemical structure, physico-chemical values, in vitro assay data, exposure data, and in vivo toxicology data. Chemicals include, but are not limited to, high and medium production volume industrial chemicals, pesticides (active and inert ingredients), and potential ground and drinking water contaminants.
At present, chemical toxicity data resides in a variety of specialized databases, in many different and incompatible formats and in many different locations. Up to now, in order to compile all information on a given chemical, one needed to search multiple databases and then manually compile the resulting data. While this is possible to do for specific chemicals, it is very difficult to compile comprehensive data sets on chemically-similar sets of compounds using structure searching tools. By bringing together data from a large number of sources and making the data structure-searchable, ACToR will facilitate searches that transcend available data and chemical number. As such, it will be an important tool for the advancement of computational toxicology, which requires evaluation of information across broad scales of chemical class, use, structure and biological activity.
The ACToR project is compiling data (both quantitative and qualitative) from a large number of sources (called data collections), including EPA databases, PubChem, other NIH, USDA and FDA databases, state and other national sources, and from academic groups. One novel data collection is ToxRefDB (Toxicology Reference Database), which includes detailed information on in vivo guideline study results for pesticides and other potentially toxic chemicals that has been assembled by the National Center of Computational Toxicology. ACToR is also the primary repository of data being produced by the EPA ToxCast chemical prioritization program.
The majority of chemicals in ACToR have chemical structures, which will facilitate studies of structure-function relationships in sets of environmental chemicals. The DSSTox Program in the NCCT is responsible for structure annotation in ACToR
Adding new data into ACToR is straightforward. We are always interested in obtaining other data collections that could be incorporated into the system.
ACToR is organized into a series of domains, linked together by chemical.
Structure, names and other basic chemical information
Main ACToR Database
Quantitative and other tabular data on chemicals
Main ACToR Database
In vivo study data from multiple domains
ACToR / ToxRefDB Database
Genomics Microarray Data
Full microarray data sets, in both original and transformed versions
Biological Reference Data
Information on genes, proteins and pathways, downloaded from public sources
Main ACToR Database
Detailed data from the ToxCast and ToxRefDB programs - used for ToxCast analyses.
Separate ToxMiner database - linked to ACToR by chemical ID
Chemicals are organized into three main classes, the first two of which are modeled closely after the corresponding PubChem data model
Assays are composed of a set of assay components. These can be quantitative measurements, annotations, or URLs to other sources.
All data is initially compiled as part of a set of Data Collections. A data collection is at minimum a set of substances with corresponding CAS registry numbers and names. Additional information may include chemical structures and assays. As mentioned above, a generic chemical links together data from many data collections on all substances that share a common CAS registry number.
All data within ACToR is organized by Data Collections. A data collection contains substances, and optionally, chemical structures and assays. The entire list of data collections in ACToR can be seen by selecting the Data Collections in the left hand navigation bar. For each collection, the following data are presented: the name, a description, the institutional source, the type and the number of substance, generic chemicals and assay results. An assay result is one data value from one assay, for one substance. The number of generic chemicals may be less than number of substances if some substances do not have CAS numbers or if there are multiple substances with the same CAS number. To view the list of chemicals in the data collections, select the details link at the left. This will take you to the Data Collection View. To navigate to the external site from which the data was taken, select the Link Out hyperlink at the far right in data collections table.
The data collection page shows all the information within a single data collection. Here, the information is divided into three parts: overview (the information within the box), chemical table, and assay table.
The top chart provides a brief overview. This includes:
Name - name of the data collection
Link out - provides a direct path to the data source
Description - description of the data collection
ID - the internal id number of the data collection
Institutional Source - the name of the institution that provided the data
Source Type - shows if there is assay data or just a chemical list
Number of Substances - the number substances in the data collection
Number of Generic Chemicals - the number of generic chemicals in the database. This number may be less than number of substance if some substances do not have CAS numbers or if there are multiple substances with the same CAS number.
To see the data, click on “Show Chemical Table”.
Structure - contains a diagram of the chemical
CASRN - The CAS registry number
Name - the name
of the chemical
Generic - a link that will take you to the generic chemical view
Phenotype Summaries – The following fields indicate whether or not there is any information in the ACToR database for the current chemical for a series of broad toxicity phenotypes. These are general chemical hazard (typically indicating that acute toxicity studies are available), carcinogenicity, genotoxicity, developmental toxicity, reproductive toxicity, chronic toxicity and food safety. This last covers a variety of food safety studies, for instance whether the chemical has been allowed or banned from contact with food, whether it is considered safe, and if so at what level. A red box under one of the phenotypes simply indicates that data is available, and not that the chemical is recognized to cause that particular type of toxicity.
After “show assay” has been clicked on, a chart will appear. This lists the name of the assay associated with the current data collection along with the number of chemical substances and assay components associated with the assay.
By Clicking on the link under the first Column will take you to the assay view page.
This page presents all the information on a specific generic chemical. Data has been aggregated from all substances with a specific CASRN from all data collections.
CASRN- the CAS (Chemical Abstracts Service) Registration Number
Formula- the chemical formula
MW- molecular weight
SMILES - (Simplified Molecular Input Line Entry System) is a line notation used for representing molecules
InCHI - IUPAC International Chemical Identifier (InChI, pronounced "INchee") is an alphanumeric identifier for chemicals used to encode information about the molecule in a standard way.
Show Substances - provides a list of all the data collections from which information on this chemical was derived. Recall that chemicals are aggregated by CASRN.
Show Synonyms - provides alternate names for this chemical
Data By Toxicology Phenotype - selecting one of these links allows one to view the detailed data for this chemical for each of the major phenotypes.
Data by Toxicology Data Category - selecting one of these links allows one to see the data by the assay data category rather than by phenotype. In particular, this separately displays tabular, quantitative data vs. summary calls of toxicity vs. URL links to external data sources.
Non-Toxicology Data – Allows the user to see a variety of specific non-toxicology data on the chemical.
Data is organized into assays and assays into assay components. One can think of an assay as spreadsheet where there is one row per chemical. The columns are the “assay components”. In practice, a given chemical can have more than one row in the data table, and each of these is termed a “result group”.
In the tree that is displayed when one of the top level links is selected, the first level of information is the assay, showing the name and providing a link to the assay definition page. Down one level in the tree are one or more result groups (rows in the assay table) showing the name of the assay component and the value.
ACToR frequently uses expanding list to organize data. An expanding list has a green triangle next to them. To expand the list, click on “Show ________”. To collapse the list, click on “Hide ____”. After clicking on “Show ____”, a “Collapse All” and “Expand All” button may appear (depending on how long the list is). Clicking on the “Collapse All” button will show only assay names, which are both green and underlined. Assay names are direct links to the Assay View page. Clicking on the “Expand All” button will show the results from all of the studies. Many of these charts are so large that they contain multiple “pages”. See moving through charts more information.
The Substance View provides some detail information on the chemical substance from a specific data collection. In particular, details of the corresponding database IDs and substance-specific parameters are provided.
Some of type of information is
CASRN – CAS number
Data Collection – the name of the source where the data collection came from
Mixture – indicates whether the substance is pure or a mixture (not currently implemented)
Synonyms – alternative chemical name
Parameters – a variety of name-value pairs for the substance as provided by the data collection.
To search for a chemical, type in the full or partial name in the text box. Select either “exact match” or “any match”. Exact match will find the chemical whose name matches what you typed in. “Any” match will find matches that are similar to what you typed in. The search is performed against all of the synonyms that have been compiled for each generic chemical. Click on search, and a standard chemical list chart appears with the results. Note that the search by name program does not accept SMILES or InCHI notation. To use SMILES or InCHI see the Search by Structure page.
Using CAS numbers is another way of locating and identifying chemicals. Type in one or more CAS numbers in the text box, separated by either commas or new lines. After search has been clicked, a standard chemical chart will appear
There are two major way to construct a molecules
1. The typical way to construct a molecule is to select a template (see arrow 1) , bonds (arrows 2), and atoms (arrows 3 and 4). First, select a template then click on the canvas (see 1). Then, click on button 2 and select the bond type. To attach a bond, place the cursor over the molecule until a purple circle appears. If there is a need to connect two molecules together, click and hold the left mouse button and drag the other end of the bond to the other molecule until another purple circle appears before letting go of the left mouse button. To add atoms, either click on one of the “quick add” buttons (3) or select button 4. Button 4 causes a small window to appear with the periodic table on it. Select an element and then click close. Clicking on the “Query” tab, gives some more options that do not appear on the periodic table.
For a more in-depth tutorial for this program, click on button 6 on the upper right hand corner. This takes you to the ChemAxon Help Site.
Sometime a data collection contains an assay. An assay is usually information within a data collection and comes from a single source. It is usually arranged in a table, where the rows represent substances and the columns are assay components. Each cell within this chart is called an assay result. This page contains two sets of expandable/collapsing lists of assays: phenotypes and categories.
An assay may contain multiple phenotypes. In other words, an assay may contain information about both genotoxicity and neurotoxicity. To the right of the name of each phenotype name is the number of assays that contain that phenotype. This chart contains the headings: details, name, category, data collection and substance. See Moving through the chart to learn more this chart.
An assay may have only one category- which describes the assay in a broad sense. The number next to the category name is the number of assays that fall under that category. The chart contains the headings: assay id, source name aid, category, data collection id, substance count, and component count and. See Moving through the chart to learn more this chart.
If the chart contains more than 10 rows, then a list box and a “next 10” link will appear in the upper right hand corner of the chart. Click on the next button will show the next set of rows in the chart. To see all the data at once select the “Show all” option in the list box. To “jump” to another set of rows, select one of the lists box options under “Show all”. Click of the “Next 10” to see the next set of rows (if the number of rows on the next set if less than 10, then that number will replace the 10). The list box shows, what set is displayed and the total number of rows.
data is where the most of the in-depth information about chemicals is kept in the ACToR database. The Assay View has two parts – the overview table and the Assay Component table.
The assay overview table contains the headings: Assay ID, Data Collection ID and link, Name, Description, Short Name, URL link to the data source, Assay Category and number of components.
The component table contains the headings: Source ID, component name, component description, units, value type, and component type. To see the assay data, scroll down, and click on “Show Assay Data Page”. A new page with chart on it will present itself. In this chart, the columns headers are the assay components and the rows are the substance. The cells of this table are the assay results.
An active ingredient is one that prevents, destroys, repels or mitigates a pest, or is a plant regulator, defoliant, desiccant or nitrogen stabilizer. (for more information)
(Aggregated Computational Toxicology Resource) is a
collection of databases collated or developed by the
An assay is a collection of data for substances from one data collection. Currently, an assay can be thought of a simple table with rows being chemicals and columns being assay components. An assay falls into one data type category but may have multiple phenotypes. The data can have more than one row or entry for the same substance, and elements in the data matrix can be empty.
Assays are organized into a number of categories that describe the broad type of data presented. Several of these categories described the level of biological organization being probed, while others describe the class of information being presented. The current set of categories are:
· In vivo toxicology (tabular primary)
· In vivo toxicology (study listing primary)
· In vivo toxicology (tabular secondary)
· In vivo toxicology (summary calls)
· In vivo toxicology (summary report via URL)
· General Descriptive information
· Chemical Category
· Chemical Summary URL
· Chemical Use Level
· Pesticidal mode of action (MOA)
An assay component defines one column or element of an assay. A component has a unique ID, a name, a description, a data type, and optionally units.
Some assays are characterized by toxicology phenotypes. This allows one to organize the data in ACToR into broad toxicity areas. The current set of phenotypes are:
· General chemical hazard
· Acute Toxicity
· Subchronic Toxicity
· Chronic Toxicity
· Developmental Toxicity
· Reproductive Toxicity
· Dermal Toxicity
· Respiratory Toxicity
· Endocrine-Related effects
· Food Safety
· Toxicity (Other)
An assay result is one data point for a single substance and a single assay component.
CAS (Chemical Abstract Services) Registration Number (for more information)
Some examples of number in CAS format are:
A chemical is defined by a unique chemical ID in the database and can be either a substance or a compound.
Diagram of a chemical- can be used to search for information about chemicals
A compound is an entity with a chemical ID and chemical structure information, which may be a 2 or 3 dimensional molfile or a string representation. This can be SMILES or InCHII.
A collection of chemical and assay data from a single source
A generic chemical aggregates all data from all data collections for substances with a single given CAS number. It will have links to one or more substances and all of their related assay data, as well as all synonyms derived from the substances.
The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources. It was developed under IUPAC Project 2000-025-1-800 during the period 2000-2004. (for more information)
An inert ingredient means any substance (or group of structurally similar substances if designated by the Agency), other than an active ingredient, which is intentionally included in a pesticide product. (for more information)
An experiment that is performed outside of a living organisms (for examples test tubes)
Experimentation done on or inside of living organisms- other wise known as animal testing
SMILES (Simplified Molecular Input Line Entry System) is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions. (for more information)
A substance is an entity with a chemical ID, one or more names (including a CAS number) and potentially a URL pointing to primary data. One special name for the substance is the “source name sid” which is a unique alphanumeric label from the source, which allows a unique link back to the source.