LIONway tutorial #2: Learning requires a method

Machine learning paradigms are very similar to the basic principles of human learning. Real learning has to do with extracting compact representations from the raw data which condense the regularities of the task to be learnt into a compact model. The model works when it generalizes properly for cases not considered during training (but extracted from the same probability distribution).

Generalization capabilities distinguish real learning from a trivial memorization(*) of the input examples (similar to “learning by heart” by poor students).

Our latest tutorial movie explains the basic methodology of machine learning in simple and human terms.

(*) Emphasis on “trivial”: memorizing all examples is not always a bad move, it depends on how you are going to use them later — see also our previous tutorial.


The lean startup and the LION way: elective affinity

Share the article below
Post title - LIONblog

While reading “The lean startup” book by Eric Ries, I had a strong déjà vu impression, a sensation that what I was reading was part of my previous experience.
The Lean Startup principle advocates a pragmatic, data- and experiment-driven method to create successful startups (“organizations dedicated to creating something new under conditions of extreme uncertainty”).
Summarizing the content of this fresh and inspiring book: forget about boring and speculative business plans with growth figures ranging into the next three years, start with a business idea, quickly build a “minimum viable product” (MVP), define relevant metrics and be serious with the measured data (avoid “vanity metrics allowing entrepreneurs to form false conclusions”). If the data do not show clear signs of growth, learn what went wrong, pick a new plan (“pivot”) and repeat the whole process. The faster the build–measure–learn feedback loop, the less money and time are wasted. The Lean Startup method is ultimately an answer to the question “How can we learn more quickly what works, and discard what doesn't?” (Tim O'Reilly).

The first elective affinity (“already seen”) is in my opinion with the classic method guiding experimental science, where the build–measure–learn is called “Scientific method” and is thus implemented:

  1. Design an experiment — scientific discoveries do not happen by chance; in spite of popular stories of apples falling on somebody’s head, experiments need to be designed with strategy and intention;
  2. Measure relevant quantities and treat the measured data with maximum attention and respect;
  3. Build (“learn”) a pragmatic model —no philosophy involved— of how the measured quantities are related.
At this point, the model provides better insight on the experiment design, and the process can start over with a new experiment — and an improved model if new measured data cannot be explained by the previous one.

As Galileo Galilei marked an important step towards the eventual separation of science from both philosophy and religion in 1600, in a way, Eric marks a step towards the separation of startup methods from the philosophy of giant established companies and the religion of standard accounting practices.

Galileo was born too early to have access to computers and he could not ride the horse of the exponential Moore’s law.

The second relationship (“already seen”) is a more recent one with the “LION way” approach. The LION way deals with connecting two topics which are in most cases separated: machine learning (the creation of flexible models from data) and intelligent optimization (the automated creation and selection of improving solutions).

In particular:

  1. Experiments can be designed by software (Design of Experiments — DOE) in a strategic manner, to derive the maximum amount of useful information in the shortest time;
  2. Measurements can be online and super-fast, think about an e-commerce service measuring all visitors behavior in real-time to quickly adapt to user preferences;
  3. Models can be learned by machine learning starting only from —possibly abundant— data, no costly and slow philosopher involved!
Last but not least, the whole reiteration of the three steps models can be implemented by (semi-–)automated schemes to design better and better solutions (intelligent optimization).
Post title - LIONblog

It is precisely when models are fed to optimizers (learning and intelligent optimization) that the real source of innovation and continuous learning starts.

More and more innovative and bold (LIONhearted?) people can now master the source of power arising from LION techniques to solve problems, improve businesses, create new applications.
More about LION at the LION way website.

Share the above article

Data-driven optimism

Share the article below
Post title - LIONblog

Figure: Matte Ridley during TEDGlobal 2010, July 2010 in Oxford, England. Credit: James Duncan Davidson / TED

Bad news are way more popular than good news. They sell more newspapers, they get more attention, they spread faster as cultural memes. While reading the book "The Rational Optimist: How Prosperity Evolves" by British journalist Matt Ridley, I could not help but thinking about the countless predictions of doom I encountered in my life.

As an example, for people of my age, "acid rain" was a big issue in the public opinion and, of course, at school. What could be better to show the evil nature of man against nature?

Citing from the book: In 1984, acid rain was the environmental scare of the day. As the science correspondent of The Economist, Matt wrote: `Forests are beginning to die at a catastrophic rate. One year ago, West Germany estimated that 8% of its trees were in trouble. Now 34% are...that forests are in trouble is now indisputable.' Experts told all Germany's conifers would be gone by 1990 and the Federal Ministry of the Interior predicted all forests would be gone by 2002.

Bunk. Acid rain (though a real phenomenon) did not kill forests. It did not even damage them. Scientists eventually admitted that forests thrived in Germany, Scandinavia and North America during the 1980s and 1990s, despite acid rain. The conventional wisdom fed by those with vested interests in alarm was 100% wrong.

I am happy to recommend this book to people interested in supporting discussions and decisions with measured data. Wrong predictions of doom are not only a cultural problem and a waste of time, but they can lead to sub-optimal decisions with huge costs for the community.
A take-home homework for the reader: Which are the most popular predictions of doom in 2014? How many of them will be demonstrated to be wrong by 2024?

According to the author, "The Rational Optimist" is a counterblast to the prevailing pessimism of our age, and proves, however much we like to think to the contrary, that things are getting better. Over 10,000 years ago there were fewer than 10 million people on the planet. Today there are more than 6 billion, 99 per cent of whom are better fed, better sheltered, better entertained and better protected against disease than their Stone Age ancestors. The availability of almost everything a person could want or need has been going erratically upwards for 10,000 years and has rapidly accelerated over the last 200 years: calories; vitamins; clean water; machines; privacy; the means to travel faster than we can run, and the ability to communicate over longer distances than we can shout. Yet, bizarrely, however much things improve from the way they were before, people still cling to the belief that the future will be nothing but disastrous. In this original, optimistic book, Matt Ridley puts forward his surprisingly simple answer to how humans progress, arguing that we progress when we trade and we only really trade productively when we trust each other.

My personal gist to take home: do not be afraid of being a data-driven optimist. In most cases from the past it is the pessimists who should feel ashamed about how they misinterpreted the data.

  • BOOK: The Rational Optimist: How Prosperity Evolves
    Published: May 2010.
  • Biography of Matt Ridley (source: wikipedia)
    Ridley is best known for his writings on science, the environment, and economics. He has written several science books including The Red Queen (1994), Genome (1999) and The Rational Optimist: How Prosperity Evolves (2010). In 2011, he won the Hayek Prize, which "honors the book published within the past two years that best reflects Hayek’s vision of economic and individual liberty." Matt Ridley's books have been shortlisted for six literary awards, including the Los Angeles Times Book Prize. His most recent book, The Agile Gene: How Nature Turns on Nurture, won the award for the best science book published in 2003 from the National Academies of Science.

Share the above article

Trading off cost versus quality for picking your favorite hospital

Share the article below
Post title - LIONblog

It is difficult to imagine a more difficult personal choice than picking a favorite hospital to be treated in after receiving a diagnosis. One is experiencing on his skin the concepts of multiple-objectives optimization. Ideally, one would like a very cheap and top-quality hospital, and if possible in the neighborhood of his home, ... ideally. In practice, one is dealing with tradeoffs.

Recently, the RWJF Hospital Price Transparency Challenge aims at increasing understanding and use of recently released hospital price data. The visualization category encourages submissions that allow users to better understand aspects of the data.

ClinicOptimizer is an interactive visualization that lets a customer select the best clinic based on his/her condition and preferences (described as interactive tradeoffs between clinic cost and quality). Additional insights about the pricing structure is offered. For sure, ClinicOptimizer will not solve all issues involved in this critical decision, but it will give more quantitative information to a person to base his decision on.

The more public data about healthcare becomes available(with suitable privacy-preserving ways), the better off individual patients are going to be, and the more empowered w.r.t. more organized and profit-seeking entities like insurance companies and hospitals.

Share the above article

Machine Learning Systems to Merge Human Intelligence and Computers.

Share the article below
Post title - LIONblog

Most decisions are guided by more than one desirable outcome. Think about health care, with doctors deciding about treatments for cancer by taking into account:

  • health benefit for the patient
  • toxicity (negative side-effects)
  • cost of the treatment
  • ...

Some information is present (the desirable outcomes like: high benefit, small cost, low toxicity,...) but some information is missing, the doctor may not have a complete and repeatable way to combine the different desirable outcomes in order to choose the best possible treatment. And the choice can be among hundreds of different treatments, some of them introduced recently, with new results about cure effectiveness being produced by the medical community on a daily basis. Indeed, a daunting task for a human person.

Many real-world problems like picking the best cancer treatment have a natural formulation as Multiobjective Optimization Problems (MOPs), in which multiple conflicting objectives need to be simultaneously optimized.

Because the most desirable combination of objectives is not known, the final decision maker must be kept in the loop: the additional information to guide the decision has to be extracted ...from his brain! The system interacts with the Decision Maker (DM) during optimization, by progressively focusing towards her preferred area in the decision space.

After some time, the system can learn about the preferences, so that it gradually becomes more and more automated. When human intelligence is coupled with big data, massive amounts of memory and computing power, much better decisions can be reached.

A recent paper on the topic:
Learning to diversify in complex interactive Multiobjective Optimization
by Roberto Battiti and his co-authors Dinara Mukhlisullina and Andrea Passerini,
received the "best paper award" at MIC 2013: The X Metaheuristics International Conference, Singapore, Aug 5-8, 2013.The motivation of the award was "for a groundbreaking contribution in this area."

Share the above article

Advocatus diaboli: diversification helps.

Share the article below
Post title - LIONblog

I am intrigued by abundant resonances between optimization schemes, in particular what we call LION schemes (learning from data plus optimization), and real-world issues related to management , decision making, startups, etc. So much that I am almost getting bored while reading some recent management books because they look like déjà vu “applied stochastic local search” to me. I will therefore focus my next blogs on these issues.

Let’s start from tradition. Experts in heuristics for optimization are familiar with diversification, as opposed to intensification. Diversification implies looking at many diverse solutions, exploring uncharted territories, avoiding being trapped in locally optimal solutions and decisions.

During the canonization process of the Roman Catholic Church, the Devil's advocate (Latin: advocatus diaboli), established in 1587 during the reign of Pope Sixtus V, was a lawyer appointed by Church authorities to argue against the canonization of a candidate. It was this person’s job to take a skeptical view, to look for holes in the evidence, to argue that any miracles attributed to the candidate were fraudulent, and so on. The Devil's advocate opposed God's advocate (Latin: advocatus Dei), whose task was to make the argument in favor of canonization.

God’s advocate is intensifying, searching for evidence in agreement with the current “locally-optimal” solution of making somebody a saint. Devil's advocate is trying to change that decision by actively searching for information leading to a very different – in fact contrary - decision.

A Devil’s advocate is the best insurance against groupthink, a phenomenon that occurs within a group of people, in which the desire for harmony or conformity in the group results in an incorrect or deviant outcome. Group members try to minimize conflict and reach a consensus decision without critical evaluation of alternative ideas or viewpoints, and by isolating themselves from outside influences.

In your business, you should probably hire a devil’s advocate if you do not have one already. And you should always value collaborators providing sound critical feedback as opposed to "yes men".

BTW: the Devil’s advocate was abolished by Pope John Paul II in 1983: If I may argue with a Pope’s decision in spite of my being Italian, I am not sure it was a wise decision about making future decisions.

Info from wikipedia, image from DUMC

Share the above article

Even Radiologists Can Miss A Gorilla

Share the article below
Post title - LIONblog

Notice anything unusual about this lung scan? Harvard researchers Trafton Drew and Jeremy Wolfe found that 83 percent of radiologists didn't notice the gorilla in the top right portion of this image. Can you spot the gorilla?

The striking fact is that even professional people can miss novel information because of their attention bias. They can be blind to the obvious but also blind to their blindness (D. Kahnemann, "Thinking, Fast and Slow").

If you never heard about the "invisible gorilla" experiment, here is a summary. When testing for for inattentional blindness, researchers ask participants to complete a primary task while an unexpected stimulus is presented. Afterwards, researchers ask participants if they seen anything unusual during the primary task.

In the Invisible gorilla test, conducted by Daniel Simons and Christopher Chabris, they asked subjects to watch a short video of two groups of people pass a basketball around. The subjects are told to count the number of passes made by one of the teams. A gorilla walks through the scene. After watching the video the subjects are asked if they saw anything out of the ordinary take place. In most groups, 50% of the subjects did not report seeing the gorilla. The failure to perceive the gorilla is attributed to the failure to attend to it while engaged in the difficult task of counting the number of passes of the ball.

These results indicate that the relationship between what is in one's visual field and perception is based much more on attention than was previously thought. These findings are of interest for psychology and also cast an alarming shadow when critical decisions depend on spotting novel and unexpected patterns, like in some medical diagnosis tasks.

In the cited radiology experiment, the gorilla was not detected because the radiologists were looking for cancer nodules, not gorillas, so "they look right at it, but because they're not looking for a gorilla, they don't see that it's a gorilla."

In other words, what we're thinking about — what we're focused on — filters the world around us so aggressively that it literally shapes what we see. We need to think carefully about the instructions we give to professional searchers like radiologists, because what we tell them to look for will in part determine what they see and don't see.

Proper classifiers trained by machine learning do not suffer from inattentional blindness, and they can learn from millions of cases, for sure many more than a single expert can see in his entire professional life. Appropriate "novelty detector" filters can in fact spot "gorillas" even if no gorilla has been encountered before during training. Expect more automated classifiers helping doctors in the near future.

Additional info: wikipedia

Share the above article

Bazaars do not always guarantee quality products.

Share the article below
Post title - LIONblog

A thought-provoking article in The Communications of the ACM [1] argues that the new way of producing software advocated in Eric Raymond's book The Cathedral and the Bazaar (O'Reilly Media, 2001) may not always produce top-quality results. In the Bazaar model the code is developed over the Internet in view of the public. "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone." As usual, the hype around "new ages" for software development did not always live up to expectations.

It is difficult to deny that serious "enterprise level" software requires a level of responsibility and design which is difficult to achieve by a self-regulating Bazaar-like collaboration. Let me cite a passage from the article:

"Getting hooked on computers is easy—almost anybody can make a program work, just as almost anybody can nail two pieces of wood together in a few tries. The trouble is that the market for two pieces of wood nailed together—inexpertly—is fairly small outside of the "proud grandfather" segment, and getting from there to a decent set of chairs or fitted cupboards takes talent, practice, and education. The extra 9,900 percent had neither practice nor education when they arrived in our trade, and before they ever had the chance to acquire it, the party was over and most of them were out of a job. I will charitably assume that those who managed to hang on were the most talented and most skilled, but even then there is no escaping that as IT professionals they mostly sucked because of their lack of ballast.

The bazaar meme advocated by Raymond, "Just hack it," as opposed to the carefully designed cathedrals of the pre-dot-com years, unfortunately did, not die with the dot-com madness, and today Unix is rapidly sinking under its weight."

Share the above article

Life is short: Automate the drudgery, keep the creative part

Share the article below
Post title - LIONblog

Image of the Large Hadron Collider (LHC).

I risked ending up doing experimental physics in my previous life, when I got a summer job at a Caltech experimental group in the late eighties. I was paid to help in setting up a huge experimental apparatus to measure neutrinos. Neutrinos are very elusive, and the piece I was handling was an enormous container of mineral oil, with detectors to measure photons.

To make the story short, I connected the powerful pump to transfer mineral oil from a container to another one. After a couple of minutes the tube collapsed and I ended up covered by a refreshing fountain of mineral oil. So when I talk about the drudgery of experimental science, I know what I am talking about.

Now most of the experimental work is done in front of computers, but still the emotional and creative part risks being overshadowed by many daily chores, like data integration, data cleaning, developing complex visualizations, designing "triggers" (predictors of interest for a specific experimental event).

This was our main motivation in developing our LION software: automate the daily chores to use our brain for the intelligent work of designing and interpreting new experiments.

Share the above article

Hype does not breed breakthroughs; it breeds more hype

Share the article below
Post title - LIONblog

Image from zapatopi.net/labs

After some years of experience with submitting projects for different funding agencies, and, on the other side of the boundary, reviewing projects submitted by other colleagues, I find the comments by Bertrand Meyer about incremental research versus paradigm-shift mania particularly inspiring and refreshing.

Being told that they have to be Darwin or nothing, researchers learn the game and promise the moon; they also get the part about "risk" and emphasize how uncertain the whole thing is and how high the likelihood it will fail.

By itself this is mostly entertainment, as no one believes the hyped promises. The real harm, however, is to honest scientists who work in the normal way, proposing to bring an important contribution to the solution of an important problem. They risk being dismissed as small-timers with no vision.

Some funding agencies have kept their heads cool. How refreshing, after the above quotes, to read the general description of funding by the Swiss National Science Foundation7: "The central criteria for evaluation are the scientific quality, originality, and project methodology as well as qualifications and track record of the applicants. Grants are awarded on a competitive basis."

In a few words, it says all there is to say. Quality, originality, methodology, and track record. Will the research be "groundbreaking" or "incremental"? We'll find out when it's done.

Long live incremental research! Long live serious researchers with their feet on the solid ground.

Share the above article

Big Data or LION?

Share the article below
Post title - LIONblog

"Big Data" has always sounded very American to me.

It reminds me of movies about immigrants coming to the USA, enticed by pictures of very big groceries growing in the fields.

More than the size of the data (it is at least thirty years that one knows how to use parallel computing to deal with big data) it is a matter of applying more and more automation, what we call Learning and Intelligent OptimizatioN (LION).

Going beyond big data? ... big intelligence? :)

Share the above article