Top : DNA : Gene Prediction
This section includes links to gene prediction programs for both eukaryotic and prokaryotic organisms. Resources that evaluate the available gene predicting programs are also included.
AGenDAAGenDA is a web tool that compares the genomic sequences from evolutionarily related organisms in order to make gene predictions. It takes pairs of genomic sequences as input, aligns the sequences, and makes predictions based on splice signals, start and stop codons, and areas of conserved sequence.
AMIGeneAnnotation of MIcrobial Genes (AMIGene) is gene prediction server that can identify coding sequences in microbes.
AUGUSTUSAUGUSTUS is a eukaryotic gene prediction tool for modeling intron length distribution, and searching for motifs and multiple splice variants. It is particularly effective with larger sequences. It can be run through a web interface, or downloaded and run locally.
BAGELBActeriocin GEnome mining tooL (BAGEL) identifies putative bacteriocin ORFs (antimicrobial peptides) based on a database containing information about known bacteriocins and adjacent genes involved in bacteriocin activity.
BLASTBasic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
CPCCPC (Coding Potential Calculator) distinguishes protein-coding from non-coding RNAs based on the sequence features of the input transcripts.
DGSFDragon Gene Start Finder (DGSF) predicts promoters and transcription start sites (TSS) within CpG islands for mammalian DNA sequences.
DNAtoolsDNAtools include predicting DNA curvature; plotting physicochemical, statistical, or locally computed paramaters along DNA sequences; producing a 3-D model of a DNA sequence; searching an intron database.
EUGENE'HOMEUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. The data is currently tuned for plant sequences of up to 400kb.
FrameDFrameD is a program that predicts coding regions in prokaryotic and eukaryotic sequences that may contain frameshifts.
GeneAlignGeneAlign is a gene prediction tool that uses conservation of gene structures and sequence homologies between protein coding regions to increase prediction accuracy. GeneAlign is currently configured to align human and mouse sequences for gene prediction.
GeneFizzGeneFizz is a tool for identifying genes using by using the physical characteristics of helix-to-coil transitions in DNA.
GeneMarkThe GeneMark family of programs employ Markov models and are specifically tuned for gene prediction for sequences from prokaryotes, viral genomes and eukaryotes.
GeneSeqerGeneSeqer is a method to identify exon/intron structure by splice site prediction and spliced alignment in plant genomes.
GenomeScanIncorporates protein similarity information when predicting genes; based in part on GENSCAN.
GENSCANIdentification of complete gene structures in genomic DNA.
GlimmerGene Locator and Interpolated Markov Modeler; this prokaryote-gene finding tool is the primary microbial gene finder used at TIGR; free (including source code) with registration for non-commercial use.
GrailGrail is a suite of tools which recognizes sequence features like promoters, exon candidates, simple repeats and complex repetitive elements. It also models genes based on the exon candidates.
HHompHHomp is a web server for prediction and classification of outer membrane proteins (OMPs). Beginning with sequence similarity of a protein to known OMPs, HHomp builds a hidden markov model (HMM) and compares the input sequence to a database of OMPs by pairwise HMM comparison. The OMP database contains profile HMMs for over 20,000 putative OMP sequences.
HMMgenePrediction of vertebrate and C. elegans genes.
IBM Bioinformatics and Pattern Discovery GroupExtensive server possessing a wide range of tools for pattern discovery in DNA and protein sequences as well as in text. Tools for multiple sequence alignment, gene discovery, protein annotation, and other applications also exist on this server. A detailed help page is provided for all tools.
mGene.webmGene.web is a web server for the genome wide prediction of protein coding genes from eukaryotic DNA sequences based on pre-computed models of gene structures. Users may also compute their own model using their own data.
ORF FinderFinds all open reading frames in a sequence.
OrfPredictorOrfPredictor is designed for prediction of Open Reading Frames (ORFs) and coding regions of a batch of EST or cDNA sequences.
OrpheliaOrphelia is a machine learning program for predicting genes in short DNA sequences from metagenomic sequencing projects. The program encompasses fragment length specific prediction models for chain termination sequencing and pyrosequencing.
SLAMSLAM is a comparative-based annotation and alignment tool for syntenic genomic sequences that performs gene finding and alignment simultaneously. SLAM also predicts CNSs (conserved non-coding sequences).
TACTTranscriptome Auto-annotation Conducting Tool (TACT) is an automated tool for conducting functional annotation of transcripts that integrates sequence similarity searches and functional motif predictions. TACT was originally developed as an automatic annotation pipeline for the Human Full-length cDNA Annotation Invitational (H-Inv) project.
TargetIdentifierTargetIdentifier is designed for identifying full-length EST cDNAs and functionally annotating EST cDNAs.
TavernaTaverna is a tool for creating and running bioinformatics workflows.
TiCoTIS Correction (TiCo) is a tool for improving predictions of prokaryotic Translation Initiation Sites (TIS). TiCo can be used to analyze and reannotate predictions obtained by the program GLIMMER.
TIGR Software ToolsA list of open-source software packages available for free from The Institute for Genomic Research (TIGR).
TRACTSTRACTS calculates the frequencies, locations and lengths of all binary tracts (pairs of nucleotide combinations) in DNA sequences. Annotated output will include exon/intron regions and tract information.
TwinscanTwinscan is a system for predicting gene-structure in eukaryotic genomic sequences. In order to make its predictions, Twinscan combines the information from predicted coding regions and splice sites with conserservation measurements between the target sequence and sequences from a closely related genome. Currently sequences from mammalian genomes, and those of Arabidopsis thaliana, C. elegans, C. briggsae and strains JEC21 and H99 of Cryptococcus neoformans can be processed using Twinscan.
Virtual RibosomeThe Virtual Ribosome is a DNA translation tool with a built-in ORF finder that allows the use of alternative start codons, the IUPAC degenerate DNA alphabet, and all translation tables defined by the NCBI taxonomy group. The tool can also highlight the intron/exon structure of genes using information found in the feature table of GenBank flatfiles in the final translation results.
WU BLASTWashington University Basic Local Alignment Search Tool
xREIxREI is a web interface that allows users to explore a range of predefined phylo-grammars or create their own phylo-grammar. Grammars are visualized via state transition graphs and substitution matrices. xREI is based on xrate, the flexible software tool for modeling structural and phylogenetic variation in multiple sequence alignments.