Time goes so fast that about four months have passed since I started working on this project as a GSoC student. Fortunately, all the three goals of this project are accomplished finally:
- Implement the variant normalizer
The normalizer in hgvs is extensively tested for all kinds of variants and variants in extreme context, like variants located at exon-intron boundary. The normalizer is flexible to use. The normalizer is configurable to shuffle to 3’ or 5’ direction. Users also could choose whether allowing the shuffling cross the exon-intron boundary or not.
- Support the parsing and manipulating of complex variants
The substitutions, indels, insertions, deletions and duplications have already been supported in hgvs. Now the hgvs also supports the parsing and manipulating complex variants, including compound variants, mosaic variants and chimeric variants, which are composed of multiple simple sequence variatns.
- Add REST interface to the UTA database
The Universal Transcript Archive (UTA) database stores rich transcripts related information, including sequences, exon structures and reference-transcript alignments. It is not only used by the hgvs package when mapping, validating and normalizing variants, but also could benefit the research of human transcripts. The REST interface makes it much easier for users to access data from this database. This also simplifies the installation of hgvs package.
Here I list the main features of the extended hgvs package:
- It supports the parsing and manipulating all kinds of HGVS variants as the nomenclature specified, including sub, delins, del, ins, dup, inv, con, compound variants, mosaic and chimeric variants, except the translocation variants.
- It provides full sort of variants manipulating operations, including mapping among genome, transcript and protein sequences, variants internal and external validation, and variants normalization for all kinds of genomic and transcripts variants.
- It facilities the batch processing of large number of variants.
I am very proud to be a contributor to this useful project. I feel very pleased to see that the hgvs package is becoming a more and more powerful and comprehensive package for parsing and manipulating all kinds of HGVS variants. I hope the hgvs package would benefit the genomic variants research community.
Here I’d like to show my great thanks to Dr. Reece Hart, who gave me a lot of guidelines and suggestions during this project. We met online every Tuesday and Friday since the beginning of the project, when we discussed my questions and what to do next. Without his kind help, I won’t complete this project successfully. I will also thank Dr. Kevin Jacobs for his helpful discussions and suggestions on implementing the normalizer.
It’s a nice experience to participate the GSoC and make my contribution to open source project. I also learned a lot during the development, including how to work on collaborating project through Internet, how to work with branches, besides technical things I learned. This is a wonderful and memorable journey in my life.