The general architecture of GeneWays includes
eleven major components that are being developed
at Columbia University. (The twelveth component,
Bio/Spice, is being independently developed
by a team led by Dr. Adam Arkin at University
of California, Berkeley). The Columbia
University-developed modules include: a document-collection
module, a document-sorting module, a preprocessing/tagging
module, a disambiguation module, a synonym/homonym
resolution module, a parsing module (GENIES),
a relationship-learning module, an
interpreter module, an AI curator module,
a visualization module, and a statistical
data integration module. All modules except
the relationship-learning module and simulation
module are linked dynamically into a pipeline
for processing HTML documents. The two remaining
modules do not participate in dynamic processing
of a document at production time; rather,
one of them is used for statistical learning
of terms and relationships aiming at improving
other natural language processing modules
(relationship-learning module), or standalone
(simulation module).