How to upload and download FASTQ files to BaseSpace with BaseSpace CLI
Last updated
Last updated
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html
BaseSpace Sequence Hub (BSSH) is the Illumina cloud-based platform for data management, storage, and analysis. The BaseSpace Command Line Interface (CLI) tool can be used in conjunction to upload/download run and analysis files. * For example, locally generated sample data in the form of demultiplexed FASTQs can be imported to a project to use for downstream data analysis applications as long as the FASTQ files meet the file upload requirements.
The BSSH web uploader allows for files to be uploaded with a maximum size of 250 GB and a maximum number of 16 files per upload.
In order to upload multiple samples at once or larger files, BaseSpace Command Line Interface (CLI) is required to communicate directly through the BSSH API.
Note that using BaseSpace CLI requires familiarity with operating in a command line environment.
This article addresses how to upload and download data directly to or from an existing project with BaseSpace CLI. Installation: The bs executable can be manually downloaded using the operating system-specific direct download links in the Install section of the CLI Overview page.
Obtaining the project ID To upload or download files to/from a project, the project ID number is required. This can be obtained in two different ways.
Obtain the project ID from the BaseSpace webpage.
The project ID can be found in the web address bar of the project page on the BaseSpace website.
For example, if the project link is https://basespace.illumina.com/projects/212049842, the project ID = 212049842
Obtain the project ID through BaseSpace CLI
Run the command bs list projects
to obtain the project ID.
Instructions for Uploading FASTQ files via BaseSpace CLI For all platforms, the basic FASTQ upload command is:
bs dataset upload -p [ProjectIDNumber] --recursive [PathToFiles]
The path to the files can contain multiple folders. If uploading from within a folder containing all of the FASTQs to be uploaded, with the period in the following command meaning "this folder", an example command is:
bs dataset upload -p 53489437 --recursive .
Notes:
Merged lane FASTQs produced by bcl2fastq/BCL Convert are not immediately compatible for upload. To upload merged lane files, change the file name to include a lane number so that the file matches the format of SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz
and add the --allow-invalid-readnames
option to the CLI upload command. Merged lane files cannot be uploaded with the BaseSpace Sequence Hub web importer.
If there is an error about files not matching the Illumina naming convention, make sure that the file names are formatted exactly as in the example above. Extra underscores in the SampleName will cause an upload failure as underscores are the delimiter for the rest of the file name.
CLI upload defaults to FASTQ as the file type. Other file types such as BAMs, VCFs, and BEDs can be uploaded with CLI as well by adding to the command line: --type common.files
Note that FASTQs cannot be uploaded with the common.files
option, so must be uploaded separately from other file types.
Other special upload scenarios are discussed on the CLI Examples page.
Instructions for Downloading FASTQ files via BaseSpace CLIWatch this Video, which describes how to download FASTQs
Obtain the project ID as described above.
Run the following command: bs download project -i [ProjectID] -o [Destination] --extension fastq.gz
This will download all FASTQs stored in the specified project to the local folder.
Filtering analyses prior to downloading FASTQ files via BaseSpace CLI
In certain cases, it may be beneficial to apply a filter to list all analyses (FASTQ generation/BCL convert appsessions) performed on a specific run.
To obtain the list of analyses associated with a run, use the following command:
bs appsession list --input-run=[input Run ID]
This can then be used in combination with the bs appsession download
command to download FastQ files output by an analysis, as follows.
bs appsession list --input-run=[input Run ID] --terse | xargs -I @ bs download appsession -i @ -o [Local Output folder] --extension=fastq.gz
Note: If there are multiple appsessions associated with a run, fields such as ExecutionStatus
or DateCreated
can be used to filter them out, as follows:
bs appsession list --input-run=[input Run ID] --terse --exec-status=Complete | xargs -I @bs download appsession -i @ -o [Local Output folder] --extension=fastq.gz
For additional information, see BaseSpace CLI examples page.
For any feedback or questions regarding this article (Illumina Knowledge Article #1138), contact Illumina Technical Support techsupport@illumina.com. |