University of Trento
Machine Learning and Intelligent Optimization
From big data to knowledge to actionable insight
Realizing the vision of extreme automation.
On the shoulders of Galileo Galilei and Leonardo Da Vinci.
For more effective and affordable healthcare.
From consultancy to turnkey applications.
Interactive visualization for personal and intelligent choice of medical treatment.
The number needed to treat (NNT) is an epidemiological measure used in communicating the effectiveness of a health-care intervention, typically a treatment with medication. The NNT is the average number of patients who need to be treated to prevent one additional bad outcome (e.g. the number of patients that need to be treated for one to benefit compared with a control in a clinical trial). The higher the NNT, the less effective is the treatment.
We demonstrate that a Machine Learning approach is superior to conventional statistical methods for the detection, monitoring and management of Parkinson’s disease. In spite of the very sparse data of this specific Parkinson's diagnosis problem, we can predict incidence and monitor progression of the disease with approximately 100% accuracy on the competition data. In addition to producing accurate detection, a machine learning approach paves the way for disruptive innovation in the monitoring and management of the disease.
Winning Submission to Parkinson Data Challenge (PDF) Demo with cellural phones can be organized on demand.
One starts from data about patients which can have a heart disease (disease yes or no is the categorical output variable) and wants to predict the probability that a new patient has the heart disease. A logistic function is used to transform the output of a linear predictor to obtain a value between zero and one, which can be interpreted as a probability.
Huge amounts of data are produced during business operations. The LIONlab develop methods and tools to mine this treasure and extract actionable insight. Applications are far-reaching, ranging from marketing and e-commerce to bioinformatics, healthcare and social networks. After models through "learning from data" methods are available, automated improvement motors (optimization) can be run to obtain better and better solutions.
Reactive Search advocates the integration of sub-symbolic machine learning techniques into local search heuristics for solving complex optimization problems. The word reactive hints at a ready response to events during the search through an internal online feedback loop for the self-tuning of critical parameters. Methodologies of interest for Reactive Search include machine learning and statistics, in particular reinforcement learning, active or query learning, neural networks, and meta-heuristics (although the boundary signalled by the "meta" prefix is not always clear).
The increasing availability of huge amounts of data in machine readable format from sources as diverse as databases of chemical compounds, DNA and protein sequences and structures, tagged bookmarks, digital libraries, images, web pages and blogs represents an unprecedented opportunity as well as a formidable challenge for machine learning systems. Such a complex body of information calls for the most recent advances in machine learning research in order to scale to large datasets, deal with complex structured data both in input and output, and jointly solve multiple related tasks, as well as learn models able to transfer knowledge among similar tasks. Models able to provide interpretable explanations for their decisions are especially appealing for the domain experts. Our research is mainly focused on kernel machine algorithms for structured data, multitask learning and statistical relational methods.
Computational molecular biology is a hot research area and a continuous source of relevant and challenging problems for machine learning. Structural bioinformatics aims at predicting the three-dimensional structure of macromolecules such as proteins and RNA, given their sequence of residues or nucleotides. Given its intrinsic complexity, the problem has been addressed by tackling a number of related sub-tasks, such as secondary structure, contact map or disulphide bridge prediction. Being able to effectively solve such sub-tasks and combine their outputs into a reliable 3D structure predictor is one of the greatest challenges in bioinformatics. The activity of living cells involves a huge number of interactions among their components, which can be represented as regulatory, metabolic and interaction networks and whose structure is mostly unknown. Machine learning techniques need to be able to combine heterogeneous and noisy sources of information from evolutionary, similarity and experimental data in order to contribute in discovering such relational structures.
Start date: Oct 2011 - end date Oct 2013
The field of machine learning is in the midst of a "relational revolution." After many decades of focusing on independent and identically-distributed (iid) examples, some researchers are now studying problems in which the examples are linked together into a complex network. The world wide web, research papers with their citation and relational data bases are noticeable examples of these networks, but any learning problem can in principle be formulated in the framework of networked data, since the relationships amongst the examples can be induced by unsupervised learning. Whereas it is quite straightforward to formulate learning in networked data when symbolic variables are involved, most approaches to learning in the continuum setting do not consider relationships amongst the examples. Our basic assumption is that problems like the classification of a pattern can benefit from a collective computation that takes into account either explicit or implicit relationships with related patterns. The overall process turns out to take place on networked data more than on single instances, thus opening the doors to new computational schemes.
The different methodological contributions from the partners of the project will lead to a unified mathematical formalism for understanding functions and learning protocols on networked data. This study is supposed to introduce a framework for understanding generalization issues in semi-supervised and transductive frameworks, so as to develop a solid methodology to evaluate the results. Functions on networked data assume values that depend on both the information within the vertexes and the edges and, therefore, it turns out to be important to understand how machine learning models can effectively induce concepts over such domains. The study will also focus on active learning schemes that propose vertexes to be labeled mainly on the basis of the links amongst the examples. We will adress semi-supervised and transductive learning using different models, including probalistic inductive logic programming (PILP), neural networks, kernel machines, and we will investigate the extension of classic PCA techniques to this new graphical framework. We expect to shed light also on related models that are unified by the common mechanism of propagation throughout networked data, also in the case of dynamically changing data.
The application of the proposed theory will be mainly investigated in the fields of pattern recognition and data mining. In particular, we will carry out a systematic study on appropriate representations of pattern recognition problems within the setting of learning in networked data. Whereas the machine learning community has already gained a significant experience in the process of formulating appropriate problems in the framework of inductive logic programming by expressing relationships amongst the examples, in most cases the learning process involves collections of examples where neither relationships are made explicit nor are they used implicitly during the learning. A special attention will be devoted to the case in which the strength of the relationships amongst the pattern are learned by unsupervised learning and, subsequently, are refined by exploiting a few labelled examples. This project is expected to open a new direction in the whole area of pattern recognition by proposing a general framework to carry out collective classification. However, in order to show the impact in real-world applications, we will also focus on a specific problem of document analysis and recognition, namely the recognition of chemical drawings in low-quality documents. In addition, we plan to extend multirelational data mining methods to improve scalability on spatial data and to introduce relational approaches to transductive learning to face predictive tasks where only a small portion of available data is labelled.
Interestingly, the research carried out in this project is expected to have an impact in the field of the science of networks, where the graph dynamics is captured by explicit modeling of the growth process. The adoption of learning schemes is expected to yield a better basis for dealing with complex evolution processes, where simple assumptions, like the preferential attachment, might not yield the desired approximation. Another significant impact is expected in the field of optimization, where we plan to apply the methods developed in relational learning to stochastic local search techniques based on reactive search principles. The idea is to regard the search space as a graph, where edges represent neighborhood relationships between solutions to be explored.
ScienScan: an efficient visualization and browsing tool for academic searchScienScan
Exploratory project for road tunnels monitoring and tele-controlling. see DISI TRITON web site
TRITon is a research and innovation project funded by the project members and the Autonomous Province of Trento
(Provincia Autonoma di Trento, PAT) aimed at advancing the state
of the art in the management of road tunnels, specifically to improve safety and reduce energy costs.
To achieve these goals, TRITon will merge research on state-of-the-art technology
into the established practices of road tunnel infrastructures, supported by
project members that include local research centers and companies working in the field.
An example application, central in TRITon, is adaptive lighting. In current deployments, the light intensity inside the tunnel is typically regulated based on design parameters and the current date and time, and regardless of the actual environmental conditions. As it can be experienced when driving through a road tunnel too bright or too dark, this potentially determines a waste of energy, as well as a potential safety hazard. In TRITon, the light intensity inside the tunnel will instead be regulated through a wireless sensor network (WSN). This will relay sensed light information to the control station, which will exploit such information for fine-grained adaptation to environmental condition, significantly reducing costs and improving safety. A dedicated laboratory has been established to support TRITon's research and development activities.
However, to bring state-of-the-art research and technology like WSN into road tunnel management, the traditional lab-centered research is not sufficient. Indeed, TRITon will transfer its results in real test-sites, four operational tunnels on road SS 45bis near Trento. This will provide not only the ultimate test for the project outcomes, but also a direct and measurable benefit to the local population.
Project for scientific and technological cooperation founded by the Italian Ministry of Education, University and Research in the framework of international collaboration between Italy and USA
The research activities of the DAMASCO project are centered on "Intelligent Transport Systems" (ITS). The internetworking and communication frameworks are based on solutions suitable for ad hoc (multi-hop or peer-to-peer) and sensor networks so that new solutions to monitor and collect context information can be easily deployed. These data are elaborated to provide services for:
The deployment of ITS services for car networking is limited by the small number of vehicles equipped with sufficient hardware and software to create a network node. Thus, in the initial phase of the project, an infrastructure will be used to provide services to networking cars.
ROBERTO G. CASCELLA
The "Value" of Reputation in Peer-to-Peer Networks.
In Proceedings of the Fifth IEEE Consumer Communications & Networking Conference (CCNC 2008) Las Vegas, Nevada, USA, January 10-12, 2008. [doi]
ROBERTO G. CASCELLA
Costs and Benefits of Reputation Management Systems.
In Proceedings of the 9th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM 2008), Newport Beach, CA, USA, June 23-27, 2008. [doi]
MARIO GERLA, ROBERTO G. CASCELLA, ZHEN CAO, BRUNO CRISPO, ROBERTO BATTITI
An efficient weak secrecy scheme for network coding data dissemination in VANET - Invited Paper.
In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2008), Cannes, France, September 15-18 2008.
ROBERTO G. CASCELLA, ZHEN CAO, MARIO GERLA, BRUNO CRISPO, ROBERTO BATTITI
Weak Data Secrecy via Obfuscation in Network Coding Based Content Distribution.
Accepted for publication to the IFIP Wireless Days Conference, Dubai, United Arab Emirates, November 24-28 2008.
Type Extension Trees for feature construction and learning in relational domains
Type Extension Trees are a powerful representation language for "count-of-count" features characterizing the combinatorial structure of neighborhoods of entities in relational domains. TETs can be used as a feature discovery instrument in relational domains, and a metric on TET features can be constructed, in order to effectively exploit their expressive power in terms of "counts-of-counts". Experiments on bibliographic data (e.g., for the prediction of the future h-index of an author) show the potentiality of such features.TET
Preliminary web site: BioNets - mating in the computer world . University of Trento is participating as associate member of the Create-Net association.
Brain-Computer Evolutionary Multi-Objective Optimization (BC-EMO): a genetic algorithm adapting to the decision makerBC-EMO paper, BC-EMO software
GRID COMPUTING - FIRB-CNIT Project "Enabling platforms for high-performance computational grids oriented scalable virtual organizations (GRID.IT)"
The specific objectives of the entire FIRB research project are Grid Oriented optical switching paradigms and High Performance Photonic Tests.
The project is sponsored by the Province of Trento (Provincia Autonoma di Trento) and participated by ITC/irst (Istituto per la Ricerca Scientifica e Tecnologica), the Department of Computer Science and Telecommunications of the University of Trento and Alpikom S.p.A..
The general objective of E-NEXT is to reinforce European scientific and technological excellence in the networking area through a progressive and lasting integration of research capacities existing in the European Research Area (ERA). The detailed objectives of the Network are divided into two dimensions: the Research dimension and the Integration dimension. This reflects the two complementary lines of action of the Network of Excellence.
A web server for predicting catalytic residues in proteins from sequence and structureCatANalyst
Networking technologies and telecommunication services are experiencing fast and significant developments, both methodological and applicative. In the field of networking technologies, for instance, both backbone networks and access networks are subject to important innovations. As for the former, the advent of optical technologies not only provides unimaginable transport capacities in the backbone, but also opens up the opportunity for optical packet switching, with enormous gains in terms of reliability and efficiency. As for access networks, on the other hand, the successful deployment of xDSL technologies, the ever-increasing popularity of 802.11x technologies, and the availability of 3G infrastructures based on UMTS, are creating new interoperability problems but also new opportunities.
Analogous considerations hold for the technologies used in network terminals. In parallel with the natural progress of PC-class systems, we have faced the evolution in terms of processing power of Personal Digital Assistant systems (PDAs) on one side, and the production of new mobile phonesets with moderate processing capabilities, equipped with open operating systems and capable of running novel applications and services, on the other side.
The availability of such a great variety of technologies seems to speed up the diffusion of innovative telematic applications and services, such as those requiring multimedia content delivery. And, in fact, this is happening for a large number of innovative services in technically homogeneous systems and networks, as, for instance, the video-phone service and the video-delivery service over UMTS networks or IP Telephony services over wired IP networks.
Unfortunately, in more complex but realistic scenarios, such as those of large-scale hybrid wired and wireless networks, the heterogeneity in terms of networking technologies and of computing systems makes it hard, if not impossible, both QoS management and the interoperability of applications, unless complex interworking solutions are devised.
Therefore, the main focus of this proposal will be the study and the development of technologies and methodologies for the provision of communication services with controllable quality in higly heterogenous distributed systems, in terms of available networking infrastrutures, user terminal characteristics and typology of services and applications.
In particular, the project activity will aim at pursuing the following macro objectives:
While devising proper solutions for the above objectives, particular attention will be devoted to the scalability of solutions and to manageability of infrastructures, in order to allow dynamic control capabilities in scenarios that are extremely dynamic, due to both users mobility and system status variations.
Institutions Involved in QuaSAR Project
Responsible from UNITN: Mikalai Sabel.
The project aims at proposing an evolutionary methodology for the design and management of optical networks in order to support the migration from the current static assignment and routing techniques to adaptive, dynamic techniques based on IP-centric control as soon as new generation equipment becomes available.
Address: DISI - Università of Trento,
Via Sommarive 5, 38123, Trento