Columbia Genome Center
Department of Medical Informatics
Department of Computer Science



Geneways Architecture

The general architecture of GeneWays includes eleven major components that are being developed at Columbia University. (The twelveth component, Bio/Spice, is being independently developed by a team led by Dr. Adam Arkin at University of California, Berkeley).  The Columbia University-developed modules include: a document-collection module, a document-sorting module, a preprocessing/tagging module, a disambiguation mod­ule, a synonym/homonym resolution module, a parsing module (GENIES), a relationship-learning module,  an interpreter module, an AI curator module, a visualization module, and a statistical data integration module. All modules except the relationship-learning module and simulation module are linked dynamically into a pipeline for processing HTML documents. The two remaining modules do not participate in dynamic processing of a document at production time; rather, one of them is used for statistical learning of terms and relationships aiming at improving other natural language processing modules (relationship-learning module), or standalone (simulation module).