peopleFeatures Extraction and Analysis of Time-Series - People at Gna!: View a Job

 
 
Show feedback again

Developer wanted for Features Extraction and Analysis of Time-Series

Submitted By: bernardh
Date: Fri 24 Sep 2004 09:04:38 AM UTC
Status: Open

FEATS intends to be a state-of-the-art implementation of time-series data-mining algorithms in C++. The todo ordered list is:
- Segmentation
- Symbolic Representation
- Clustering
- Indexing
Bindings to other languages (Python and R) will be done.

It is intented as a contribution to the UCR Time Series Data Mining Archive [1]. I will try to test the library on the data-sets of the archive, and to implement algorithms described there (after implementing my own algorithms, of course :-) ).

FEATS major goals are:
- flexibility
- efficiency

Flexibility because it must be a useful fondation to research scientists in time-series data-mining.
Efficiency because it must also be useful on real-world data-sets. For now, the goal is to tackle time-series fitting in the computer memory (1Go). Data-base connectivity for truely huge data-sets will be considered later. For research scientists, it should be useable as a benchmarcking tool (hence the need for an open source code to fight implementation bias).

The means to these ends are :
1) static polymorphism (aka template meta-programming)
Following principles form "Generative Programming", we will make every compile-time knowledge available to the compiler. For instance, heavy use of Boost::mpl or Boost::fusion is to be expected.
Use of coding conventions (such as only one argument) to enable better metaprogrammation (ie. forwarding, code weaving ).

2) relevent encapsulation
Computations can often be eliminated by reusing intermediate results, for example when computing segmentation with n+1 segments). This will be achieved by the use of Functors instead of functions to store and reuse the relevent data.

References:
[1]Keogh, E. & Folias, T. (2002). The UCR Time Series Data Mining Archive
http://www.cs.ucr.edu/~eamonn/TSDMA/index.html. Riverside CA. University of California - Computer Science & Engineering Department

License GNU General Public License V2 or later
Development Status
: 4 - Beta

Details (job description, contact ...):

Implementing state of the art segmentation and clustering algorithms in C++.
Using generative programming with Boost::MPL and forthcoming Boost::fusion libraries to create benchmark-quality implementations.
Experience in profiling and/or numerical code is also welcome !
Use the mailing lists to contact me.

Required Skills:

Skill Level Experience
C++ Master < 6 Months
Show feedback again

Back to the top


Powered by Savane 3.1-cleanup