Ethnographic Data Science and eHRAF: inferring cultural patterns from ethnographic writing

Word2Vec Network

What can artificial intelligence and machine learning contribute to anthropology, the most human of disciplines? Ethnography combines elements of science and humanities, with ethnographic writing mediating exposition and science. Data science techniques such as Natural Language Processing (NLP) and statistical analysis can lead to better understanding of patterns emerging from ethnographer accounts of observations and experiences within or across societies and cultures.

iKLEWS (Infrastructure for Knowledge Linkages from Ethnography of World Societies) is an NSF-funded HRAF project which seeks to use data science to create digital semantic infrastructure and associated computer services supporting research based on HRAF’s growing ethnographic database, eHRAF World Cultures. A basic goal of iKLEWS is to greatly expand the research support of eHRAF World Cultures for addressing scientific, scholarly and applied research.

Demonstration at the AAA/CASCA Annual Meeting in Toronto, Canada (November 16-18)

On Friday, November 17 from 1:30-2:30pm, there will be a demonstration at the HRAF booth in the exhibit hall at Metro Toronto Convention Centre to illustrate how the developments produced via iKLEWS can facilitate and enhance research using the eHRAF World Cultures database. The demonstration by Prof. Michael Fischer, HRAF Vice President, will showcase a web app featuring prototype tools and analysis for HRAF data including visualizations such as concept maps and 3D Principal Component Analysis (PCA) graphs.

Network of words seeded from Kinship

In NLP, the Word2Vec algorithm uses a neural network model trained on a large corpus of text in order to reconstruct the linguistic contexts of words. With each word in the corpus being assigned a “vector” in space, the model is able to learn word associations, detect synonyms, and suggest other related terms. PCA plots are another way to visualize the data in a 3D field. As the demonstration will show, starting with just a few concepts as keywords, these can be expanded to lists of related terms drawn from the eHRAF corpus. Networks of these terms and how these interconnect, together with relevant excerpts from ethnographic texts, can lead the researcher to further related material.

PCA plot of kinshipLearn more about how to find HRAF at the AAA/CASCA 2023 Annual Meeting.