geistlogistic

Information require attention

Flower

Posts Tagged ‘Machine learning’

Mendeley is recruiting a Marie Curie Senior Research Fellow

Mendeley is recruiting a Marie Curie Senior Research Fellow. Your primary responsibility will be to ensure that Mendeley’s research catalogue (i.e. collection of articles) is of high quality. Mendeley has crowdsourced the world’s largest research catalogue with over 50 million unique articles contributed by almost two million users over a period of four years. With your expert knowledge in data technologies and algorithms, you will take ownership of this catalogue, and work on innovative techniques for improving its quality. Your work should result in a cleaner, better structured and more scalable catalogue.

This position is part of the TEAM project (http://team-project.tugraz.at). You will spend 1 year in Mendeley’s London office before spending 1 year at TU Graz, the Knowledge Management Institute (http://kmi.tugraz.at/), Austria, collaborating with a top-class team. You will be passionate about working with large scale data collections and take pride in producing high quality data.

Description

Responsibilities

Ensure that the research catalogue is of high quality
Understand, maintain and help develop current crowdsourcing system
Disseminate results from your work both internally and externally

What you’ll be doing

Crowdsourcing a homogeneous catalogue from heterogeneous data sources, using modern data techniques
Identifying data sources, judging their appropriateness and working with data engineers to import them into the catalogue
Working with Data Engineers and Platform Team to make reliable/scalable systems
Working with Data Architect to ensure coherent data mapping, ontologies and schemas
Working with Mendeley’s Chief Scientist in contributing to solving data problems outside of the scope of catalogue crowdsourcing
Working 1 year from Mendeley’s London office, followed by 1 year in TU Graz before returning to London, with regular travel between both locations

What you should bring

PhD in the field of Computer Science or 4-10 years of full-time research (following first publication)
Expert knowledge of text and document processing, with strong machine learning background
Experience working with large-scale catalogues
Database integration experience
2+ years of Java programming; can independently prototype solutions to problems
Experience with big data technologies (e.g. Hadoop, MapReduce, NoSQL)
Unix skills, preferably Linux
Fluent spoken and written English
Strong presentation skills in communicating with experts and novices

What we offer

Salary of £50k per annum + stock options
No out-of-hours support expected
25 days holidays
Company benefits such as: cycle to work scheme, childcare vouchers, BUPA (private healthcare), Friday beer o’clocks (snacks and drinks on the house), free breakfast, monthly team night’s out, annual events (Christmas party and summer barbecue)
Working in a great environment in a central London office with roof terrace

TEAM-specific restrictions

Nationality: The researcher may be a national of a Member State of the Union, of an Associated Country or of any other third country
Mobility: At the time of selection, the researcher must not have resided or carried out his/her main activity in the country of the beneficiary home organisation for more than 12 months in the 3 years immediately prior to his/her selection under the project. International European interest organisations or international organisations.
The appointed researcher must not have spent more than 12 months in the 3 years immediately prior to the selection by the home organisation in the same appointing organisation.

If you are interested, send your CV and cover letter to jobs [at] mendeley [dot] com. If you are selected for an interview, we will let you know within two weeks.

Training sets, Test set, Measurements, Classifier, Models and how to get a clue out of it …

Currently, I’m attending a Data Mining course at the University of Maastricht. I’m quite surprised about the overlap of techniques that are applied in Data Mining and also in Recommender System research. For instance, the Data mining classifier are also evaluated on measures like accuracy, precision, and recall through computing a confusion matrix and finally draw a ROC curve. During the course we are using the open source software Weka to train various classifiers on different training sets. In a next step, these trained classifiers are applied on a test set to compare their accuracy to each other. The most reliable classifier can then be further optimized.

Data mining techniques are getting more and more populare in Technology Enhanced Learning since 2004. This also called Educational Data Mining can be applied for student grouping, task analysis, and to support teachers or learners. Marco and I thought about training a classifier on behavioral data of Moodle courses from 2007 with the specific focus on students that dropped out. The most accurate classifiers could then be used to monitor courses in 2009. If the classifier identifies similar patterns of behavior by students in the present course it could inform the teacher. Maybe the drop out of the students can be prevented through giving them special attention or explicitly connect them with students that perform more successfully.

The following figure describes the typical Knowledge Dicovery Process for Data mining like it is defiend by Usama Fayyad in his article of 1996: “From Data Mining to Knowledge Discovery in Database”.

Knowledge Discovery Process

Get Adobe Flash player