I Want Hue
by Mathieu Jacomy
iWantHue
Colors for data scientists. Generate and refine palettes of optimally distinct colors.
iWantHue allows you to generate palettes of colors. It is about mastering the properties of a palette by setting a range of Hue, Chroma (unbiased saturation) and Lightness. You can generate palettes of any size or just get the generator for a javascript project. The algorithm optimizes the perceptive distance in the color subspace, ensuring an optimal readability.
How it works
- K-means or force vector repulsion algorithms ensure an even distribution of colors
- The CIE Lab color space is used for computation, since it fits human perception
- The Hue/Chroma/Lightness color space is used to set constraints, since it is user-friendly
Examples and a tutorial
Idea
The idea behind iWantHue is to distribute colors evenly, in a perceptively coherent space,
constrained by user-friendly settings, to generate high quality custom palettes.
Explanations and an experiment on color theory
More info
more...
ScienceScape
by Mathieu Jacomy
ScienceScape
Helpers for scientometrics. Convert files, get networks, visualize stuff from Scopus or Web of Science.
ScienceScape allows to extract the DOIs (Digital Object Identifiers) from the list of references of a paper. It adds a column to your table of papers so that you have the list of cited DOI. You now have the DOI of your papers and the list of DOI of the papers they cite, so you can build the citation network in another tool (Table2Net).
More info
HeatGraph
by Mathieu Jacomy
Heatgraph
Visualize densities on spatialized networks. Get a global heatmap or a heatmap of the neighborhood of a given node.
Heatgraph allows to upload a GEXF graph from (Table2Net) or Gephi. Choose settings and produce a heat map of your network.
More info
Table 2 Net
by Mathieu Jacomy
Table 2 Net
Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell.
Table2Net allows you to get a network from a table. You will get a GEXF file that you can visualize and analyze with Gephi. You may extract different types of networks from a table. It depends on how you use columns to build the nodes and the edges:
Normal: if you want a single type of nodes, for instance authors. They will be linked when they share a value in another column, for instance papers.
Bipartite: if you want two types of nodes, for instance authors and papers, they will be linked whey they appear in the same row of the table.
Citation: if you have a column containing references to another one, for instance paper title and cited papers (title)
No link: a single type of nodes, without link
You can add columns as metadata of the nodes and/or edges, you can set a separator if you have multipe items per cell, and you can set a column as time if you want a dynamic network.
More info
ANTA, actor-network text analyzer
by Daniele Guido, Paul Girard
ANTA, actor-network text analyzer
ANTA or Actor Network Text Analyzer is a piece of software developed by the Sciences Po médialab to analyses medium-size text corpora, by extracting the expressions they contained in a set of texts and drawing a network of the occurrence of such expressions in the texts.
goals
Anta is a web platform based on Zend Framework which serves two main goals:
- simplify as much as possible the researchers' workflow in text analysis
- build a graph based onto the set of documents dealing with co-word analysis
Using ANTA is very easy. We will introduce its usage by reading the image displayed on the login page of the software.

the 5 steps workflow
Using ANTA requires going through 5 different steps: the researchers' workflow has been subdivided into 5 main steps, from the creation of the corpus to the words extraction (the so-called entities)
As soon as users have uploaded their texts and while their are tagging them, the system works in background analyzing the texts to extract the expressions occurring in them. To do so, Anta draws on Alchemy (Orchestr8 text API). Thanks to Alchemy, Anta identifies the n-grams (expressions of n-words) recurring in the text and is even capable to recognize 'named entities' as such. It knows, for example, that Alice is a person name, that France is a country and Paris a city. The expressions identified by Anta are called entities.
- Step one. Include documents, via upload or import.
First of all users are asked to upload the texts composing their corpus in the system. ANTA can read txt and pdf documents, and it has a doc support via catdoc.
- manual upload
- import google results (up to 100 documents per query)
-
import via json api
-
Step two. Tag documents and selection of subsets.
After having uploaded all their texts, users are asked to categorize them according to the classification that best suits their research interests. Documents, for example, can be tagged by author, by type, by subject and any other taxonomy used by the researcher.
- tag based selection to focus on small subsets
-
google spreadheet import / export of the list of documents (limited features)
-
Step three. Include entities.
Once the system has concluded its extraction, users are asked to chose which entities they want to include in their analysis and which one the one to exclude. Even for relatively small corpora, the number of the extracted entities is often surprisingly large, to large for a manual filtering. ANTA offers an semi-automatic filtering system that help users reduce the number of the entities they will analyze.
-
entities selection through visual TF/IDF measures
-
Step four. Tag entities, and merging as well.
In order to facilitate the analysis the included entities can be tagged by the users according to the classification best suits their research interests. It is also possible to merge entities that are synonymous for the scope of the research.
-
Step five. Export the graph.
The last step consists in exporting the graph of tagged documents and tagged entities. ANTA exporting system delivers a gefx file containing a bipartite network of documents and entities. Two nodes of the network are connected if the corresponding document is connected to the corresponding entity.
features
Anta executes a script - the "distiller" - for each document of the corpus which performs a sequence of processes decided by the user.
- chainable plugin structure
- one click analysis start and smart logging
- JSON api interfaces
- standard api, providing basic methods to upload, tag
- a "frog" api, relationships between documents and entities, lite api driven search engine
- a "squid" api, that implements the entities tf/idf visualization
external dependencies
Anta makes use of external text analysis services, like Alchemy Api from Orchestr8 http://www.alchemyapi.com/
- Php libraries:
- zend
- Python libraries:
- nltk for stemming features and sentences tokenizer, cfr. http://www.nltk.org/
- beautifulsoup, cfr. http://www.crummy.com/software/BeautifulSoup/
- networkx, cfr. http://networkx.lanl.gov/
more...
issue 2 navicrawler
by Paul Girard
webcorpus
A python library to transform a issuecrawler xml file into a navicrawler wxsf (xml) file format.
issuecrawler
A crawler for Social scientists designed and hosted by the digital method initiative.
Give the tool a list of url (or a web page containing urls) and let's crawl.
You can set the crawler in snowball or co-link analysis.
For more info : http://www.issuecrawler.net
navicrawler
A firefox plugin to equip Social scientists to build web corpus.
It had been designed by Mathieu Jacomy.
With this tool you crawl the web by browsing, building your corpus little by little
For more info : see web atlas website
use it online