FREQUENTLY ASKED QUESTIONS

 1. What can I use SXOligoSearch for?
 2. What are the benefits of SXOligoSearch?
 3. What is Illumina Pipeline?
 4. How does SXOligoSearch fit into the pipeline?
 5. What is an Illumina Quality file?
 6. What is the advantage of submitting the "prb.txt" files?
 7. Does SXOligoSearch make use of base quality information to prioritise results?
 8. What other file formats does SXOligoSearch support?
 9. Is it possible to submit a FASTA sequence file to SXOligoSearch without any other additional input?
 10. What is the maximum number of sequences that the SXOligoSearch trial server accepts?
 11. Upon submitting my request/query, how soon can I expect the results?
 
 1.What can I use SXOligoSearch for?
  You can use it whenever you need a global alignment of short oligonucleotide sequences. This includes:
  1.Large scale genome or transcriptome sequencing projects.
  2.Signal searches, for example searching for TATAA boxes using a log odds scoring matrix.
  3.Microarray probe validation.
  4.Finding miRNA candidates.
 2.What are the benefits of SXOligoSearch?
  1.Hundreds of times faster on Eukaryote-sized genomes.
  2.More reads aligned to unique locations.
  3.Gapped alignments.
  4.Allows for more mismatches per read.
  5.Reporting of alignments to repeats improves read density analysis and identification of large deletion polymorphisms.
  6.No read length limit; most suitable for oligonucleotides < 60bp.
 3.What is Illumina Pipeline?
  The main modules are:
  (a)The image analysis module (Firecrest)
  (b)The base-caller (Bustard)
  (c)The sequence aligner. There are two different modules that can be used:
   
  • PhageAlign does an exhaustive alignment (all possible alignments up to arbitrary edit distances are explored), but is reportedly slow.
  • Eland is faster than PhageAlign and aligns for up to two mismatches per read alignment in a reference genome.
 4.How does SXOligoSearch fit into the pipeline?
  SXOligoSearch can be used instead of Eland and PhageAlign. It takes an Illumina Quality file as input and aligns each read against the target sequences.
 5.What is an Illumina Quality file?
  An Illumina base calling program, Bustard, produces four output files:
  1.Sequence file s_lane_tile_seq.txt
  2.Illumina base quality (probability) file s_lane_tile_prb_txt
  3.Two files with intensity information.
  
These files are usually located in the Bustard folder under the image analysis folder.

The Illumina quality file differs from a Phred quality file in that it has a quality value for each base at every position of the read. This file has higher information content than FASTA or FASTQ representation of the reads. Hence it produces better alignments compared to the other format.

The Bustard base caller produces "prb.txt" files that express estimates of sequencing error probabilities in a convenient form. The four-value-per-base scheme also encodes information on the most likely base call.
 6.What is the advantage of submitting the "prb.txt" files?
  Illumina quality files have base probabilities for each base at every position. Probability combinations such as P(A) = 0.499, P(C) = 0.499, P(G) = 0.001, P(T) = 0.001 are present in "prb.txt" files but cannot be represented in other file formats.

In the Gerald FASTA format this base position would be recorded as an 'A' and, using Eland, an alignment to an 'A' would be treated as a match. An alignment to a 'C' would be treated as a mismatch. This is inaccurate as A and C obviously have the same probabilities.
 7.Does SXOligoSearch make use of base quality information to prioritise results?
  SXOligoSearch scores are calculated based on the log base 2 of the probability that the sequence aligned was the sequence read. The results are scored with the most probable sequences reported first.
 8.What other file formats does SXOligoSearch support?
  SXOligoSearch can process FASTA and Illumina-formatted FASTQ files that are produced from the analysis stage of the Illumina pipeline. Gerald produces a file, s_l_sequence.txt, containing all the sequences from one lane of a chip. Depending on the configuration of Gerald, this file will be in one of several formats. The following formats are supported by SXOligoSearch:
  (a)FASTA format. Parameter settings:
SEQUENCE_FORMAT -- FASTA
  (b)FASTQ with symbolic quality values. Parameter settings:
SEQUENCE_FORMAT -- FASTQ
QUALITY FORMAT - symbolic
  
Note: The FASTQ format is currently not available in this trial version
 9.Is it possible to submit a FASTA sequence file to SXOligoSearch without any other additional input?
  Yes, just enter the FASTA file location in place of the quality file.
 10.What is the maximum number of sequences that the SXOligoSearch trial server accepts?
  The largest file that can be uploaded is 2 Mbytes. Files generated by Bustard are for a single tile and when compressed (using gzip), the size of the "prb.txt" file will be approximately 1 Mbyte. Therefore a full tile can be processed.

When using FASTA files (s_l_sequence.txt) from Gerald, each file corresponds to one lane and is approximately 200 Mbytes. These files would have to be submitted in parts no greater than 2 Mbytes each (up to 6 Mbytes before compression).
 11.Upon submitting my request/query, how soon can I expect the results?
  It ranges from a few minutes to 24 hours, depending on the workload schedule of our servers.