BioMirror perl module to update Bio-Mirror data and SRS indices Normally only out-of-date indices are updated. Errors are logged to /tmp/biomirror22195.log. Usage (one way): perl -MBioMirror -e 'BioMirror::main;' -- [arguments] datasection(s) Arguments: -h == help (brief usage). -doc == documentation (detailed usage). -f == force update now. -v == print out commands without performing them. -i == print SRS index info for selected data sets. -D == turn on debugging. -E key=val == set environ key=value -[no]logout == do/don't log to /tmp/biomirror22195.log #' -[no]srssection == do/don't srssection -release == set SRS release from data (but don't srs-index) -srsfiles == print srsfiles for data sets -srsservers == fetch and print srs server list -srsalldbs [http://srs-server/] [doc|servers] [specific databanks] == fetch and print all public srs databank info optionally set server of DATABANKS, limit to doc or servers info, and limit to specific SRS DBs -geticarus http://srs-server/ GenBank i == fetch icarus (i, is, it) file for DB data from srs-server -makesrsdb[=outfile] == make srsdb.i from available databanks -editdbi[=outfile] == edit databank.i file (put in current file list) -archinfo[=outfile|UPDATE] == print archive databank summary; UPDATE writes to $zpath/docs/about-databanks.txt -mirror=AFFRC_JP == set mirror server -mirror=Japan == set mirror server by country -domirrordb == do mirror selected databank(s) (use mirror server if set, else use source server) -pack[=ftp://bio-mirror.jp.apan.net/] == print databank packages for mirror.pl if -mirror is set use that, else if =ftp parameter is set, use that else use source url Valid data sections are all BIOCAT BLASTDB BLASTDBNEW BLOCKS BLOCKSPLUS CELEGANS DDBJ DDBJNEW EMBL EMBLNEW ENZYME FBAB FBAL FBGN FBTI FBPP FBRF FBTR FBTP GBGENOMES GENBANK GENBANKNEW INTERPRO LOCUSLINK MGD NRL3D PIR PFAM PFAMA PFAMB PFAMHMM PFAMSEED PRINTS PROSITE PROSITEDOC REBASE DATABANKS SWISSPROT SWISSNEW TAIR TAXONOMY TAXONOMY_NCBI TREMBL TREMBLNEW TFCELL TFCLASS TFFACTOR TFGENE TFMATRIX TFSITE UNIGENE DRUNIGENE MMUNIGENE RNUNIGENE ZFIN Available BioMirror servers are ACSys_AU AFFRC_JP IMCAS_CN IUBio_USA KAIST_KR NUS_SG Try 'perldoc BioMirror.pm' for more documentation. Important paths: Data=/bio/data Bio-mirror archive=/c3/iubio/biomir-pub/biomirror SRS software=/bio/mb/srs Data_class = Name : SRS db : Data files BioMirror::BiosoftCatalog = Bio-Catalog : BIOCAT : biocatal.txt BioMirror::BlastDB = BlastDB : BLASTDB : BioMirror::BlastDBDaily = BlastDB Daily : BLASTDBNEW : BioMirror::Blocks = Blocks : BLOCKS : blocks.dat BioMirror::BlocksPlus = Blocks Plus : BLOCKSPLUS : BioMirror::CElegans = C elegans AceDB : CELEGANS : BioMirror::DDBJ = DDBJ : DDBJ : BioMirror::DDBJNew = DDBJ Daily updates : DDBJNEW : BioMirror::EMBL = EMBL : EMBL : BioMirror::EMBLNew = EMBL Daily updates : EMBLNEW : BioMirror::Enzyme = Enzyme : ENZYME : enzyme.dat BioMirror::FlybaseAbs = FlyBase Aberrations : FBAB : FBab.acode BioMirror::FlybaseAlleles = FlyBase Alleles : FBAL : FBgn.acode BioMirror::FlybaseGenes = FlyBase Genes : FBGN : FBgn.acode BioMirror::FlybaseInsertions = FlyBase Insertions : FBTI : FBti.acode BioMirror::FlybaseProteins = FlyBase Proteins : FBPP : FBpp.acode BioMirror::FlybaseRefs = FlyBase References : FBRF : FBrf.acode BioMirror::FlybaseTranscripts = FlyBase Transcripts : FBTR : FBtr.acode BioMirror::FlybaseTransposons = FlyBase Transposons : FBTP : FBtp.acode BioMirror::GBGenomes = GenBank Genomes : GBGENOMES : BioMirror::GenBank = GenBank : GENBANK : (\.seq) BioMirror::GenBankNew = GenBankNew : GENBANKNEW : (\.flat)=(\.seq) BioMirror::InterPro = InterPro : INTERPRO : (\.xml), (\.dtd) BioMirror::LocusLink = LocusLink Human Genome : LOCUSLINK : LL_tmpl BioMirror::MGD = MGD Mouse Genome : MGD : BioMirror::NRL3D = NRL3D : NRL3D : nrl3d.dat BioMirror::PIR = PIR : PIR : pir1.dat, pir2.dat, pir3.dat, pir4.dat BioMirror::Pfam = Pfam : PFAM : (^Pfam) BioMirror::PfamA = PfamA : PFAMA : (^Pfam) BioMirror::PfamB = PfamB : PFAMB : (^Pfam) BioMirror::PfamHMM = PfamHMM : PFAMHMM : (^Pfam) BioMirror::PfamSeed = PfamSeed : PFAMSEED : (^Pfam) BioMirror::Prints = Prints : PRINTS : prints.dat BioMirror::Prosite = Prosite : PROSITE : prosite.dat BioMirror::PrositeDoc = PrositeDoc : PROSITEDOC : prosite.doc BioMirror::Rebase = Rebase : REBASE : allenz=rebase.dat BioMirror::SRSDatabanks = SRS Databanks : DATABANKS : databanks.dat BioMirror::SwissProt = SwissProt : SWISSPROT : (sprot[0-9]+.dat)=seq.dat BioMirror::SwissProtNew = SwissProt New data : SWISSNEW : new_seq.dat BioMirror::TAIR = TAIR Arabidopsis Genome : TAIR : BioMirror::Taxonomy_EBI = Taxonomy data from EBI : TAXONOMY : taxonomy.dat BioMirror::Taxonomy_NCBI = Taxonomy data from NCBI : TAXONOMY_NCBI : BioMirror::TrEMBL = TrEMBL : TREMBL : trembl.dat BioMirror::TrEMBLNew = TrEMBL new data : TREMBLNEW : trembl_new.dat BioMirror::TransfacCell = Transfac Cell : TFCELL : cell.dat BioMirror::TransfacClass = Transfac Class : TFCLASS : class.dat BioMirror::TransfacFactor = Transfac Factor : TFFACTOR : factor.dat BioMirror::TransfacGene = Transfac Gene : TFGENE : gene.dat BioMirror::TransfacMatrix = Transfac Matrix : TFMATRIX : matrix.dat BioMirror::TransfacSite = Transfac Site : TFSITE : site.dat BioMirror::UniGene = UniGene : UNIGENE : (\.data) BioMirror::UniGeneDR = UniGeneDR : DRUNIGENE : (\.data) BioMirror::UniGeneMM = UniGeneMM : MMUNIGENE : (\.data) BioMirror::UniGeneRN = UniGeneRN : RNUNIGENE : (\.data) BioMirror::ZFIN = ZFIN Zebrafish Genome : ZFIN : Mirror_class = Name : FTP : Web BioMirror::ACSys_AU = Bio-Mirror Australia at ACSys : ftp://bio-mirror.au.apan.net/biomirrors/ : BioMirror::AFFRC_JP = Bio-Mirror Japan at AFFRC : ftp://bio-mirror.jp.apan.net/pub/biomirror/ : http://bio-mirror.jp.apan.net/ BioMirror::IMCAS_CN = Bio-Mirror China at IMCAS : ftp://bio-mirror.cn.apan.net/ : http://bio-mirror.cn.apan.net/ BioMirror::IUBio_USA = Bio-Mirror USA at IUBio : ftp://iubio:iubio@bio-mirror.net/ : http://www.bio-mirror.net/ BioMirror::KAIST_KR = Bio-Mirror South Korea at KAIST : ftp://bio-mirror.kr.apan.net/pub/biomirror : http://bio-mirror.kr.apan.net/ BioMirror::NUS_SG = Bio-Mirror Singapore at NUS : ftp://bio-mirror.sg.apan.net/biomirrors/ : http://bio-mirror.sg.apan.net Perl documents of BioMirror modules ( BioMirror.pm Data.pm DataBanks.pm Mirror.pm Mirrors.pm SRS.pm DDBJ.pm EBI.pm Flybase.pm GenBankNewDaily.pm local-env.pm NCBI.pm Pfam.pm Swiss.pm Transfac.pm MeowGenes.pm ) [ BioMirror.pm ]---------------------------------- NAME BioMirror - manage biology databanks for Bio-Mirror use DESCRIPTION Manages biology databanks, including transport from Internet source or Bio-Mirror archive, and databank search indexing via calls to SRS software. See {http,ftp}://bio-mirror.net/ for BioMirror datasets. Uses these modules BioMirror::Data; BioMirror::DataBanks; BioMirror::Mirror; BioMirror::Mirrors; BioMirror::SRS; URI::URL; Local configuration BioMirror packages are loaded by loadPackages(). Put your BioMirror package files in ~/biomirror or ~/.biomirror to have them auto-loaded. AUTHOR D.G. Gilbert, Nov. 1999, software@bio.indiana.edu SOURCE http://bio-mirror.net/biomirror/software/biomirror/ METHODS BioMirror::loadPackages() Utility method to load user-defined BioMirror packages. Requires any .pm or .pl file in a 'biomirror' directory. Called without option to initialize from @INC directory list. BioMirror::BEGIN() Default settings and unix system commands. Locally configurable via the BioMirror/local-env.pm file. BioMirror::main() main entry for updating databanks. Call from other scripts, or for usage help `perl -MBioMirror -e 'BioMirror::main;'' run with arguments `perl -MBioMirror -e 'BioMirror::main;' -- -D -v blocks' BioMirror::fetchSrsIcarus($srsurl,$srsdb,$ifile,$outpath) Fetch icarus files from SRS server for given databank. $srsurl - SRS server url $srsdb - databank or 'all' $ifile - icarus file (i, it, is) or 'all' $outpath - put in this folder [ add methods to manage archive of these - diff to compare? ] BioMirror::printAllSrsDbInfo() Fetch and print SRS databank of databanks information about each public SRS databank, including documentation, available servers and their data release date. BioMirror::perldoc() Print perl documentation of BioMirror:: modules, `perl - MBioMirror -e 'BioMirror::perldoc;'' or try 'perldoc modulefile'. BioMirror::makeSrsdb($outfile) Make SRS srsdb.i configuration file for available databanks `perl -MBioMirror -e 'BioMirror::makeSrsdb('srsdb-new.i');'' $outfile -- optional output file (default is SRSDB:srsdb.i) BioMirror::printMirrorPacks() Print package file for use with mirror.pl. With no argument, use source of data, or specify an ftp url. No path on url implies a Bio-Mirror path. `perl -MBioMirror -e 'BioMirror::printMirrorPacks;'' `perl -MBioMirror -e 'BioMirror::printMirrorPacks(ftp://bio-mirror.net/);'' or call from main `perl -MBioMirror -e 'BioMirror::main;' -- -p' BioMirror::mirrorDatabank($dbname,$mirrorurl,$dotest,$packfile) Mirror a databank package. See BioMirror::DataBanks and `BioMirror::main() -p' output packages. Requires mirror.pl be installed, and located by $FtpMirror variable. Install from ftp://sunsite.org.uk/packages/mirror/mirror.tar.gz $dbname -- name of databank (in BioMirror::DataBanks.pm) $mirrorurl -- ftp url for mirroring (default will mirror from databank source) $dotest -- test only (C) $packfile -- optional package file to get $dbname package or call from main as `perl -MBioMirror -e 'BioMirror::main;' -- -mirror=USA -domirror PIR ' to test view what would mirror: `perl -MBioMirror -e 'BioMirror::main;' -- -view -mirror=USA - domirror GenBank ' BioMirror::printData() Print databank class info `perl -MBioMirror -e 'BioMirror::printData;'' BioMirror::printArchiveDbInfo($outfile,$dohtml) Print pretty list summarizing local archive of data, including total size, up-date, summary line of data. Section Size(Mb) Update Databank source blast 2159 29-Nov-1999 Blast DB from NCBI blocks/data-blocks 5 25-Nov-1999 BLOCKS from NCBI $outfile - optional output file, or UPDATE to update $ArchiveDbInfo BioMirror::printMirrors() Print mirror class info `perl -MBioMirror -e 'BioMirror::printMirrors;'' BioMirror::getDocFromUrl($surl,$striphtml) Fetch and return a document from url. Returns doc as $string. HTTP and FTP urls are supported. Requires LWP::UserAgent, HTTP::Request. 1 [ /bio/biomir-work/BioMirror/Data.pm ]---------------------------------- NAME BioMirror::Data - databank base class DESCRIPTION Defines common variables and methods for BioMirror databanks. Most variables are set by subclasses of this package/class. See package BioMirror::DataBanks:: for subclasses. 1 [ /bio/biomir-work/BioMirror/DataBanks.pm ]---------------------------------- NAME BioMirror::DataBanks -- package file to include BioMirror::Data subclasses. DESCRIPTION Package does nothing but whose file includes all BioMirror::Data subclasses. These data classes are loaded by BioMirror::Data:: methods (see `sub BioMirror::Data::getSubclasses;' ) This can be customized locally and included separately at runtime. For example, `perl -I/pathto/biomirror/configs -MBioMirror -e 'BioMirror::main;'' See also EBI.pm, NCBI.pm, Pfam.pm, Flybase.pm, Swiss.pm, Transfac.pm and other BioMirror::DataBanks perl modules. -- drop this in favor of BioMirror::loadPackages() method? 1 [ /bio/biomir-work/BioMirror/Mirror.pm ]---------------------------------- NAME BioMirror::Mirror - mirror server base class DESCRIPTION Defines common variables and methods for BioMirror mirror servers. Most variables are set by subclasses of this package/class. See package BioMirror::Mirrors:: for subclasses. 1 [ /bio/biomir-work/BioMirror/Mirrors.pm ]---------------------------------- NAME BioMirror::Mirrors -- package file to include BioMirror::Mirror subclasses. DESCRIPTION Package does nothing but whose file includes all BioMirror::Mirror subclasses. These data classes are loaded by BioMirror::Mirror:: methods (see `sub BioMirror::Mirror::getSubclasses;' ) This can be customized locally and included separately at runtime. For example, `perl - I/pathto/biomirror/configs -MBioMirror -e 'BioMirror::main;'' 1 [ /bio/biomir-work/BioMirror/SRS.pm ]---------------------------------- NAME BioMirror::SRS - SRS Sequence Retrieval System, using perl calls via system(). BioMirror::SRS::init($SRSRoot) Initialize SRS environment. $SRRoot is path to SRS installation (level of index/ etc/ bin/ ...) $ENV{SRSROOT} overrides $SRRoot value BioMirror::SRS::install() install - information for installing SRS software BioMirror::SRS::section() call srssection - compile SRS icarus (indexing language) files BioMirror::SRS::setRelease($dbname, $release) set data release number for SRS dataset $dbname - SRS dataset, $release - value to set BioMirror::SRS::info($dbname) get info from SRS dataset BioMirror::SRS::libraries() get SRS data library list BioMirror::SRS::allInfo($srsserver,$flags,$refdbarray) Get info for all known SRS databanks, including databank name, documentation, and available SRS servers with their data release dates. $srsserver - optional remote SRS server with DATABANKS databank, default will try getz query of local DATABANKS databank $flags - | $SRS::kServers - get only server info - | $SRS::kDocument - get only documentation $refdbarray - optional list of SRS DB names to fetch info, as array ref returns hash of hashes, keyed on DB name. See also SRS::getSrsServers(), SRS::fetchIcarusDoc() BioMirror::SRS::forceindex($turnon) force indexing (true/false) for SRS index, when index dates don't require it. BioMirror::SRS::check($dbname, $dpath, @files) Call srscheck and srsbuild if indicated by srscheck. If data is newer than data index, new index will be built. Returns 1 if did update, 0 otherwise. $dbname - srs databank name $dpath - path to data files @files - data files BioMirror::SRS::makeSrsdb( $outfile, @dataclasses) Make SRSDB:srsdb.i file to reflect available databank.i files Note that appropriate databank.i and .is files should exist in SRSDB folder for each @dataclass element. $outfile -- optional output file (default is SRSDB:srsdb.i) @dataclasses -- list of BioMirror::Data classes to include BioMirror::SRS::updateDbFileList( $srsdb, $reffilelist, $outfile ) Edit databank.i file to reflect available databank.i files $srsdb -- name of databank to edit $reffilelist -- ref of @list of $srsdb data files to include $outfile -- optional output file (default is SRSDB:$srsdb.i) BioMirror::SRS::fetchIcarusDoc($srsurl, $srsdb, $ifile, $outfile) Fetch and print Icarus document from SRS server. Requires LWP packages. $srsurl -- srs server, such as http://srs.ebi.ac.uk/ $srsdb -- srs db name, such as EMBL $ifile -- icarus file = ( i, is, it, db, gen) $outfile -- optional output (default to stdout) 1 [ /bio/biomir-work/BioMirror/DDBJ.pm ]---------------------------------- NAME BioMirror::DDBJ -- BioMirror::Data packages for DDBJ databanks 1 [ /bio/biomir-work/BioMirror/EBI.pm ]---------------------------------- NAME BioMirror::EBI -- BioMirror::Data packages for EMBL/EBI databanks 1 [ /bio/biomir-work/BioMirror/Flybase.pm ]---------------------------------- NAME BioMirror::Flybase -- BioMirror::Data packages for Flybase databank 1 [ /bio/biomir-work/BioMirror/GenBankNewDaily.pm ]---------------------------------- NAME BioMirror::GenBankNewDaily -- genbank daily update methods. DESCRIPTION This perl script reads in GenBank nc-daily entries, rebuilds the cumulative GenBank update. AUTHOR Tim Cutts, 12th January 1999 adapted by d.gilbert for use with BioMirror packages 1 [ /bio/biomir-work/BioMirror/local-env.pm ]---------------------------------- BioMirror/local-env.pm Set local paths and other configuration information for BioMirror package here. 1 [ /bio/biomir-work/BioMirror/NCBI.pm ]---------------------------------- NAME BioMirror::NCBI -- BioMirror::Data packages for NCBI databanks 1 [ /bio/biomir-work/BioMirror/Pfam.pm ]---------------------------------- NAME BioMirror::Pfam -- BioMirror::Data packages for Pfam databank 1 [ /bio/biomir-work/BioMirror/Swiss.pm ]---------------------------------- NAME BioMirror::Swiss -- BioMirror::Data packages for SwissProt/ExPASY databanks 1 [ /bio/biomir-work/BioMirror/Transfac.pm ]---------------------------------- NAME BioMirror::Transfac -- BioMirror::Data packages for Transfac databank 1 [ /bio/biomir-work/BioMirror/MeowGenes.pm ]---------------------------------- NAME BioMirror/MeowGenes.pm DESCRIPTION A collection of BioMirror::Data packages for building MeowGenes databank. Includes BioMirror::TAIR -- for TAIR Arabidopsis databank BioMirror::CElegans -- AceDB for C. elegans databank BioMirror::MGD -- for MGD Mouse databank BioMirror::LocusLink -- for NCBI LocusLink Human databank See also BioMirror::FlybaseGenes 1