OSDM'09: The first Open Source in Data Mining workshop
Bangkok, Thailand, 27 April 2009
To be held with the 13th
Pacific-Asia
Conference on Knowledge Discovery and Data Mining
(PAKDD'09).
Time and venue:
The Imperial Queen's Park Hotel, Bangkok,
Room #3, 13:00 - 18:30.
Registration: Please use the
PAKDD'09 Registration System.
Links: Call For
Papers -
Submission System -
Available Tools
Proceedings: Complete proceedings are available as
one PDF file (1.2 MBytes).
Open source software is
becoming increasingly accepted in public and private sector
organisations in many countries. There is a variety of open source
data mining tools available to both researchers and practitioners,
some being simple research prototypes while others are fully developed
software tools.
This workshop aims to bring together data mining practitioners,
researchers and educator, with the objectives to present open source
data mining tools, discuss experiences and lessons learned using such
tools, and exchange ideas on how to promote the use of open source
tools in the field of data mining.
Workshop Program
- 13:00 - 13:10: Opening and Welcome.
- 13:10-14:10: Invited keynote presentation by
Dr Mark Hall,
Pentaho, New Zealand, one
of the original core developers of the
WEKA
open source data mining system.
Title: The WEKA open source data mining system
Abstract:
More than twelve years have elapsed since the first public
release of WEKA. In that time, the software has been re-written
entirely from scratch, evolved substantially and now
accompanies a text on data mining. These days, WEKA enjoys
wide-spread acceptance in both academia and business, has an
active community, and has been downloaded more than 1.4 million
times since being placed on SourceForge in April 2000. In this
talk, I'll review the history of WEKA, and, in light of the
recent 3.6.0 stable release, briefly discuss what has been
added since the last stable version in 2004.
Biography:
Mark Hall is one of the original core developers of the WEKA
data mining software and is responsible for leading Pentaho's
data mining solutions. He has 15 years experience as an
academic researcher in computer science and has published
extensively in machine learning and data mining conferences
and journals. Prior to joining Pentaho, Mark held teaching and
postdoctoral fellowship positions at the University of Waikato
in New Zealand.
- 14:10-15:10: Technical papers (part 1)
- OpenSubspace: An Open Source Framework for Evaluation
and Exploration of Subspace Clustering Algorithms in
WEKA.
Emmanuel Mueller, Ira Assent, Stephan Guennemann, Timm
Jansen, Thomas Seidl.
Software is
available
online.
- The open source library iZi for pattern mining
problems.
Frederic Flouvat, Fabien De Marchi, Jean-Marc Petit.
- 15:10-15:30: Afternoon tea/coffee break
- 15:30-16:30: Technical papers (part 2)
- The Konstanz Information Miner 2.0.
Thorsten Meinl, Nicolas Cebron, Thomas Gabriel, Kilian
Thiel, Bernd Wiswedel, Michael Berthold, Fabian Dill,
Peter Ohl, Tobias Koetter.
- Cougar2: An Open Source Machine Learning and Data
Mining Development Platform.
Abraham Bagherjeiran, Oner Celepcikay, Rachsuda
Jiamthapthaksin, Chunsheng Chen, Vadeerat Rinsurongkawong,
Seungchan Lee, Justin Thomas, Christoph Eick.
- 16:30-18:00: Demonstration session of open
source data mining tools
The following tools will be demonstrated (30 minutes each):
- KNIME
- WEKA
- Rattle
- 18:00-18:30: Panel discussion of open source
developers
Topic will be one (or both) of:
- Why open source for data mining research and education?
- Why publish your data mining tool as open source software?
This will be an informal session where workshop participants
are encouraged to participate.
Confirmed panel participants:
Important Dates
| Submission of full papers: |
9 January 2009
(passed) |
| Notification of Authors: |
30 January 2009 (passed) |
| Camera-ready version: |
16 February 2009 (passed) |
| OSDM'09 workshop date: |
27 April 2009, 13:00-18:30 |
Workshop Chairs
Program Committee
- Dr Rohan Baxter, The Australian Taxation Office, Australia
- Prof Michael Berthold, University of Konstanz, Germany
- Dr Christian Borgelt, European Center for Soft Computing, Spain
- Assist Prof Janez Demsar, University of Ljubljana, Slovenia
- Assoc Prof Eibe Frank, University of Waikato, New Zealand
- Dr Mark Hall, Pentaho, New Zealand
- Prof Joshua Huang, The University of Hong Kong, Hong Kong
- Assoc Prof Bernhard Pfahringer, University of Waikato, New Zealand
- Assoc Prof Blaz Zupan, University of Ljubljana, Slovenia
- Dr Yunming Ye, Harbin Institute of Technology, China
For further information please contact the organisers through:
OSDM09@togaware.com.
Open Source Data Mining Tools
- KNIME: a modular data
exploration and mining platform.
- Orange: A
Python based data mining toolkit.
- Rattle: A data mining
GUI written in the statistical software environment R and available as a plugin for
the WebFocus BI
product.
- WEKA: A
comprehensive data mining system written in Java and sponsored by
the open-source BI software company Pentaho.
- OpenSubspace:
A subspace clustering extension to WEKA.
- Febrl
(Freely Extensible Biomedical Record Linkage): A data cleaning,
deduplication and data linkage system.
Last modified: Wed May 6 11:13:06 EST 2009