I am interested in exploring alternatives to self-report questionnaires in the collection of quantifiable data on the phenomenology of specific global states of consciousness. In 2016, I started to use Natural Language Processing (NLP) algorithms to extract meaningful information from the largest available curated corpus of reports of drug-induced states (retrieved from from Erowid.org). While most laboratory studies of psychoactive drugs have a sample size of 10 to 20 participants, this corpus contains more than 15,000 unique reports for hundreds of different psychoactive drugs. Although these reports may be biased or confabulatory, and do not originate from controlled experiment, I hypothesised that the sheer size of the corpus, combined with the power of NLP algorithms, would allow me to extract the signal from the noise and obtain valuable data for future work in philosophy of mind and psychology.

After presenting my first preliminary results focusing on reports of drug-induced ego dissolution (see my 2016 presentation), I teamed up with Hannes Kettner (Imperial College London) and Yishu Miao (University of Oxford). Together, we have kept working on this project, and are now preparing a paper presenting our results. Our semantic model is able to quantify the similarity or dissimilarity of the subjective effects of every psychoactive drug in the corpus based on reports, as well as provide information on specific categories of effects (such as bodily effects, visual effects or effects on cognition).

Beyond drug research, this project is relevant to philosophy and psychology for a few reasons. First, it is a proof of concept study introducing a new method to gather data about phenomenology from text corpora. This is relevant for any philosophical or scientific project that relies on the analysis of first-person reports (e.g. research on dreaming). NLP algorithms blur the boundary between qualitative and quantitative analysis, and are less susceptible to bias than traditional qualitative analysis. Second, our method allows us to map out for the first time the full spectrum of subjective effects induced by hundreds of psychoactive molecules across all neuropharmacological classes (stimulants, sedatives, dissociative anaesthetics, cannabinoids, opioids, serotonergic psychedelics, empathogens, etc.). Third, this project may reveal which drugs may be used in controlled studies to investigate the neural correlates of specific conscious contents (to take one example, our model confirms that a poorly known drug called Diisopropyltryptamine has exclusive and unique effects on sound perception rather than visual perception; this kind of information could be used to inform future studies). 

The main NLP techniques I have been using since 2016 for this project are topic modelling (specifically with Latent Dirichlet Allocation), word embedding (word2vec), and more recenlty document embedding (doc2vec). Here is a visualization of an early LDA model from 2016 of all narratives reports for the top 20 substances of Erowid with 15 topics (also available on a separate page):

[Note: this page will be updated with recent results after the paper is submitted]