URL: http://github.com/gsautter/idaho-core

License: BSD derivative

Dependencies:

-          mail.jar (by Sun/Oracle)

-          servlet-api.jar (by Sun/Oracle)

Builds:

-          StringUtils.jar (utilities for working with character strings; Unicode to ASCII conversion (accent stripping); CSV handling; generation and editing of regular expression patterns; basic Information Retrieval (IR) scoring functionality)

-          HtmlXmlUtil.jar (HTML and XML parser with error correction facilities, configurable via custom grammar objects; XPath implementation with extensible function library; template based HTML page generation; programmatic sending of multipart HTTP POST requests (including file uploads); thread safe XSLT with transformer chaining facilities)

-          EasyIO.jar (lightweight, cross-DBMS, differential relational data model management; simplified representation of SQL query results; streaming based parser for SQL dumps; Linux/Apache style text based configuration file handling; infrastructure for modular help in desktop applications; look-ahead byte and character streams; JSON parser; basic infrastructure for Servlet based web applications, including centralized, extensible authentication facilities, disc cache enabled receiver for multipart HTTP POST requests (including file uploads), hot Servlet re-initialization, registry based direct Servlet-to-Servlet communication, and ReCAPTCHA based web-bot protection)

-          Gamta.jar (editable token stream and XPointer based XML document representation for Natural Language Processing (NLP) and IR applications (GAMTA), including a default / reference implementation; XPath engine working on GAMTA documents, with extensible function library; markup scripting language for GAMTA documents; basic abstract Java classes for convenient implementation of Java coded document analysis, including respective class loading facilities; multitude of basic tagging / data extraction facilities for GAMTA documents, including gazetteers, regular expression patterns, and patterns over existing markup; Java Swing components for visualization; GAMTA document wrappers translating to other data representations used in NLP)