The Australian Node contains information on genetic variants that have been discovered during genetic testing for both diagnostic and research purposes, along with their effects (both known and predicted) on human health.
The data within the Australian Node repositories comes directly from participating laboratories in Australia.
Data upload to the Australian Node repository is initiated by the laboratory staff at their discretion. When initiated, the HVP Exporter connects to the labs internal Laboratory Information Management System (LIMS) and extracts a defined set of data elements from new records added to the LIMS since the last upload cycle.
Variant Name | Gene Name |
Variant Class | Reference Sequence ID |
Variant Location | Test method |
Pathogenicity determination | Sample tissue |
Date of pathogenicity determination | Sample Source |
Age of patient at time of test | Whether lab still has sample left |
Justification for pathogenicity determination | Whether organisation has pedigree data |
PubMed/DOI ids for supporting evidence | Whether pedigree was considered during diagnosis of pathogenicity |
Whether variant is recorded in disease specific or gene specific database | Whether histograms are stored |
The data elements required for a submission to the HVPA Node are based primarily on the minimum reporting requirements specified in the National Pathology Accreditation Advisory Council guidelines Requirements for the Medical Testing of Human Nucleic Acids.
For each variant to be uploaded to the HVPA Node, a “Linkage Key” is generated. These linkage keys are coded identifiers that are generated from a subset of personally identifying information that have been non-reversibly encrypted. This process allows data within the HVPA Node to be linked at the patient level to records in other datasets without using personally identifying information and thus protecting the privacy and confidentiality of patients.
Example Record
ID | 193 |
---|---|
HashCode | YWwQFMTwdL4THvE5s09z8jBEdO5lo7aK== |
Variant_id | 3 |
InstanceDate | 2013-01-01 |
PatientAge | NULL |
TestMethod_id | NULL |
SampleTissue_id | NULL |
SampleSource_id | NULL |
Pathogenicity_id | Class 1 - Certainly not pathogenic |
Justification | NULL |
PubMed | NULL |
RecordedInDatabase | NULL |
SampleStored | NULL |
PedigreeAvailable | NULL |
VariantSegregatesWithDisease | NULL |
HistologyStored | NULL |
Patient_id | Lar0uXraIRIHI5RO2lmK== |
Organisation_id | 46dvEpZCg4BAqJgAqEkH |
ID | 3 |
---|---|
Gene_id | 1721 |
cDNA | c.4987-68A>G |
mRNA | NULL |
Genomic | g.-38473306A>G |
Protein | NULL |
VariantClass_id | Genomic |
Location | NULL |
Comments | NULL |
Pathogenicity_id | NULL |
HashCode | Lar0uXraIRIHI5RO2lmK== |
---|---|
Ethnicity_id | Null |
HashCode | 46dvEpZCg4BAqJgAqEkH |
---|
ID | 1721 |
---|---|
GeneName | BRCA1 |
GeneDescription | breast cancer 1, early onset |
RefSeqName | NM_007294 |
RefSeqVer | 2 |
RefSeqValidStart | NULL |
RefSeqValidEnd | NULL |
HGNC_ID | HGNC:1100 |
AlternateSymbols | RNF53, BRCC1 |
AlternateNames | BRCA1/BRCA2-containing complex, subunit 1 |
Chromosome | 17q21-q24 |
PreviousSymbols | |
PreviousNames | |
GenBankName | NG_005905 |
GenBankVer | 1 |
Entity Relationship Diagram
Data Quality
Maintaining data accuracy (defined as ensuring the data within the repository is described correctly (e.g variants are named according to the Human Genome Variation Society nomenclature system, data is internally consistent, etc.) is achieved via automated means at the time of submission.
Data accuracy checks are incorporated into the HVPA Exporter tool and issues are flagged to users before submission takes place. At the repository side, the HVPA Importer tool maps incoming data elements to common reference sequences to ensure internal consistency of naming.
Diagnostic Data
For data submitted by diagnostic laboratories (diagnostic data), we assume that the incoming data is of high quality (which we define as data that has been generated in a manner free from errors such as sequencing artefacts, incorrect calling bases and variants, etc.) due to the regulatory requirements that diagnostic laboratories must meet when generating this data. We purposely collect data from diagnostic labs only after they have reported results to the requesting clinician to ensure we capture the data past the point where changes can be made. If a laboratory subsequently finds that they have made an error in their report, a new report will be issued to the clinician and this new information will subsequently be submitted to the Node during the next data upload phase.
Research Data
Data contributed by laboratories that are not accredited for diagnostic purposes to the Node (research data) undergoes the same accuracy checks as diagnostic data regardless of whether they are submitted through the HVP Exporter or via a bulk, manual upload process. In terms of ensuring data quality, due to the disparate nature of the ways that research data can be generated, assessing data quality in a standardised fashion is difficult.
There are currently no recognised standards for accrediting research data generation practices that can be leveraged to assess the quality of data submitted to the Node. To address this, the Node is working to generate a national data quality standard. Until such time that a standard exists, the Node will continue to clearly differentiate the sources of data contained in the Node to allow users to clearly identify data that has been quality assessed to a recognised standard.