Web mining and the network of USA politicians.


Importing data into LIONoso through web-mining.

The web contains abundant data about almost any subject. One can easily connect LIONoso to the web and have data automatically extracted from web pages and loaded into the LION workbench for additional analysis and visualization.

In this demo a LIONoso-compatible Python script mines a website, a public resource like, in order to create a list of representatives of the US congress (112th cycle). This is done by reading the web pages, understanding the format of the content and extracting the relevant information. Using the script is immediate: load it into the workbench, and click run to start it:

Analyzing the social network of representatives of the US congress (112th cycle)

After the script downloads the data from the web, the social network of representatives is analyzed by deriving the PageRank score, Authority and Hubness scores of each representative, and the social network connectivity, based on bill sponsoring and co-sponsoring.

While we refer to the exact descriptions of the different measures for the complete details, the social-network rank ("PageRank") of each representative depends on sponsoring many bills which are in turn co-sponsored by representatives of high rank. A high "Authority" level means that the representative is co-sponsored in many relevant bills, while a high "Hubness" value means that the representative co-sponsors many relevant bills. The raw production of each representative is also included in the data table.

Mapping politics to geography.

Each representative is associated to geographical coordinates related to the district in which he has been elected. By dragging the geographical coordinates into the LION dashboard, a geographical map will be opened showing each person represented by a bubble on the corresponding state. As usual in LION, more information can be associated to the visualization by dragging suitable values to change the bubble color, size, and form. The first figure below show representatives colored according to the party, the second figure shows a size proportional to the PageRank and a color related to teh Authority measure. Names can be obtained by clicking on each bubble, additional data by opening different visualizations.

Similarity map: clustering similar representatives together.

Very popular bills which tend to be signed by many representatives are filtered out (they typically do not have a high relevance for distinguishing different political orientations, and we are more interested in local interactions here). Two representatives will share an edge in their social network if their lists of (co-)sponsored bills overlap by at least 'connection_threshold' bills, not considering those that were (co-)sponsored by a fraction of representatives larger than 'popularity_threshold'. In the similarity map shown in the figure, connection_threshold = 10 popularity_threshold = .1 .

A measure of dissimilarity is then obtained for each couple of representatives as follows:
Let nij be the number of bills signed by both representatives i and j. If nij ≤ 10, i and j are not connected. Otherwise the connection exists and the dissimilarity is 1 / sqrt(nij-10). The more the co-signed bills, the more the similarity of political orientation, and the less the corresponding dissimilarity.

By using the focus and context navigation one can then analyze individual networks and see how strongly every representative is interacting with "near" colleagues, nearness being related to the above similarity measure.

Please contact us if you are interested in the details of this usage case and we will be happy to send you more details. The web mining and network analytics tools of LIONoso can be used for any situation with interacting entities, like products, customers, employees, biological systems, etc.

References:, a project of the Participatory Politics Foundation and the Sunlight Foundation.
Download the LIONoso-ready data file: web_mining_politics.lion