Difference between revisions of "FlyBase:Using FTP Archives"

From FlyBase Wiki
Jump to navigation Jump to search
Line 1: Line 1:
FlyBase is no longer supporting the archive servers, which replicated the website as it existed at the time of archiving. However, all of the data behind every release is available at our FTP archives, which you can reach on the [https://flybase.org/downloads/archivedata Archived Releases] page, from the Downloads tab of the menu bar. From there, scroll down to the Main Data Archives section:
+
FlyBase is no longer supporting the archive servers, which replicated the website as it existed at the time of archiving. However, all of the data behind every release is available at our FTP archives, which you can reach on the [https://flybase.org/downloads/archivedata Archived Data] page, from the Downloads tab of the menu bar. From there, scroll down to the Main Data Archives section:
  
  
Line 12: Line 12:
 
The '''[http://ftp.flybase.net/genomes/ FTP genomes archive]''' link holds genomic sequence data from many Drosophilid species, organized first by species and then by release.  If you are interested in a particular non-melanogaster Drosophilid, this is the easiest way to find all data specific to that species.  
 
The '''[http://ftp.flybase.net/genomes/ FTP genomes archive]''' link holds genomic sequence data from many Drosophilid species, organized first by species and then by release.  If you are interested in a particular non-melanogaster Drosophilid, this is the easiest way to find all data specific to that species.  
  
Data includes [https://useast.ensembl.org/info/website/upload/gff.html GFF, GTF], and [https://www.ncbi.nlm.nih.gov/WebSub/html/help/fasta.html FASTA] files formats, as well as the [[FlyBase:Downloads_Overview#Postgres_Chado_Database_Dump|Chado-XML]] database files for that species and release. The '''dna''' folders contain unprocessed sequences as .raw scaffold files.
+
Data includes sequences in [https://useast.ensembl.org/info/website/upload/gff.html GFF, GTF], and [https://www.ncbi.nlm.nih.gov/WebSub/html/help/fasta.html FASTA] files formats, as well as the [[FlyBase:Downloads_Overview#Postgres_Chado_Database_Dump|Chado-XML]] database files for that species and release. The '''dna''' folders contain unprocessed sequences as .raw scaffold files.
  
 
The fullname (e.g. Drosophila grimshawi) and four-letter FlyBase species abbreviation (e.g. Dgri) are different folders, but contain the same files within them.
 
The fullname (e.g. Drosophila grimshawi) and four-letter FlyBase species abbreviation (e.g. Dgri) are different folders, but contain the same files within them.
  
 
Precomputed files were never made for non-melanogaster species, so they are not available here.
 
Precomputed files were never made for non-melanogaster species, so they are not available here.
 
  
  
Line 33: Line 32:
 
The  '''[http://ftp.flybase.net/releases/ FTP releases archive]''' link holds data organized by release, for both D. melanogaster and non-melanogaster Drosophilids. These folders include:
 
The  '''[http://ftp.flybase.net/releases/ FTP releases archive]''' link holds data organized by release, for both D. melanogaster and non-melanogaster Drosophilids. These folders include:
  
* a chado-XML fodler, containing the [[FlyBase:Downloads_Overview#Postgres_Chado_Database_Dump|Chado-XML]] database files
+
* a '''chado-XML/''' folder, containing the [[FlyBase:Downloads_Overview#Postgres_Chado_Database_Dump|Chado-XML]] database files
* a collaborators folder, containing packages of data that we provide to other biological databases
+
* a '''collaborators/''' folder, containing packages of data that we provide to other biological databases
* a folder for each species containing genomic data (same files as above in the Genomes archive folder)
+
* a folder for each species (e.g. '''dgri.../''') containing genomic data (same files as above in the Genomes archive folder)
* a precomputed files folder, containing [[FlyBase:Downloads_Overview#Bulk_data_files|precomputed files]]
+
* a '''precomputed_files/''' folder, containing [[FlyBase:Downloads_Overview#Bulk_data_files|precomputed files]]
* a psql folder, continaing compressed SQL files of the Chado database
+
* a '''psql/''' folder, continaing compressed SQL files of the Chado database

Revision as of 19:42, 21 February 2024

FlyBase is no longer supporting the archive servers, which replicated the website as it existed at the time of archiving. However, all of the data behind every release is available at our FTP archives, which you can reach on the Archived Data page, from the Downloads tab of the menu bar. From there, scroll down to the Main Data Archives section:


The Main Data Archives section of the Archived Releases page

Any data that you previously accessed through the website by QuickSearch or Bulk Download is based on the files below. This page aims to help users become more familiar with these files, so they can figure out how to recapitulate searches done on the full website, using just the files below. There are multiple rendundant ways to reach the same file, so if two files in different folders have the exact same title, they are the same.


Genomes archive

Index of the FTP genomes archive.
Index of the D. grimshawi, FB2017_01 archive.

The FTP genomes archive link holds genomic sequence data from many Drosophilid species, organized first by species and then by release. If you are interested in a particular non-melanogaster Drosophilid, this is the easiest way to find all data specific to that species.

Data includes sequences in GFF, GTF, and FASTA files formats, as well as the Chado-XML database files for that species and release. The dna folders contain unprocessed sequences as .raw scaffold files.

The fullname (e.g. Drosophila grimshawi) and four-letter FlyBase species abbreviation (e.g. Dgri) are different folders, but contain the same files within them.

Precomputed files were never made for non-melanogaster species, so they are not available here.





Release archive

Index of the FTP release archive.

The FTP releases archive link holds data organized by release, for both D. melanogaster and non-melanogaster Drosophilids. These folders include:

  • a chado-XML/ folder, containing the Chado-XML database files
  • a collaborators/ folder, containing packages of data that we provide to other biological databases
  • a folder for each species (e.g. dgri.../) containing genomic data (same files as above in the Genomes archive folder)
  • a precomputed_files/ folder, containing precomputed files
  • a psql/ folder, continaing compressed SQL files of the Chado database