What is this?

Language Explorer is a proof-of-concept data analysis tool to help identify language groups in particular need of bible translation based on various data sources and linguistic databases. This instance focuses on the languages of Aboriginal Australia - I love our Aboriginal people, and want to see them have access to the bible, preferably in their native tongue.

Where to start

Background

There are navigation links and fields at the bottom of each page. I'll refer to those links collectively, in the sections below, as the navigation section. When the name and iso-639 language code is displayed in the tool, the colouring and formatting imply meaning in accordance with the following legend:

Legend

Translation State: Whole Bible New Testament Portions One book No Scripture Record Absent

L1 Speakers (from Joshua Project): None 0-9 10-99 100+ Unknown

ISO Retirement state: Retired Active

For a specific language group

The tool aims to have a page for each Aboriginal language group. If there are current speakers, the language group should be accessible in the tool. If there are no known speakers, the language group may be listed (many are, but I know that many are not, too). On the page for an individual group, data from the available sources is presented along with a map showing an approximate location for the speakers. In many cases the language group is clustered in a small area so the map is helpful but in some cases it is spread over a large area so the map can be misleading. Finally the page has links to the available data sources (including those whose data are not included in the tool). To look for a specific language group:

Use the Search field in the navigation section. You can search by name or by ISO-639 language code. Anglicised spelling is a bit hit-and-miss so if you can't find what you're looking for, try alternate spelling.
Follow the All languages link in the navigation section. This will get you a list, which is sorted by ISO 639 language code.
Follow the Map link in the navigation section. You can pan and zoom to explore language groups, using their location as specified by the Tindale and WALS data sources.
Go straight to one of the examples: awk (Awabakal) or bvr (Burarra) or wrm (Warumungu)

For Data Analysis

The tool provides a method to analyse important data for language groups to help assess the state of bible translation, and the ability of a language group to fallback to an English translation of scripture in the absence of the preferred situation of a translation in their heart language. To perform this sort of analysis follow the Language Table link in the navigation section. This shows a table of all language groups known to the tool, where each column can be filtered, and the table can be sorted by a single column. When using the table:

Columns can be sorted (a sort can only be active on one column at a time). Click on the column header to switch from ascending to descending. The arrow shows direction
Columns can be filtered (filters can be active on multiple columns simultaneously). Provide text to match in the text field above the column header Search (column name). To match any number enter [0-9] . Tech note: Text field accepts perl regex.

Examples:

Language groups with the most people that cannot read an english bible, where census data cleanly maps to a language group. Filter Translations for No (matches "No scripture"). Sort by Cannot Read English Bible Count
Language groups of more than 100 people where most have good english but there's no scripture in their heart language. Filter Speakers (Joshua Project) for [0-9][0-9][0-9] (matches any three numbers). Sort by Potential English Bible Users (%)

Data sources

Joshua Project - Import of Harvest database
WALS - Import of WALS 2013 database
Australian Census 2011 - Import of custom table data
SIL - Import of ISO 639-3 retired code elements mappings
Find A Bible - Web scrape of relevant Australian Aboriginal bible translation data
Austlang - Web scrape of Australian Aboriginal language data
Tindale's Catalogue of Australian Aboriginal Tribes - Web Scrape

Some more information is available in the DataSources.md file.

Disclaimer on data quality

As a proof-of-concept, the focus has been on the integration of discrete data sources, and while care has been taken to identify errors during the data import, it would be inappropriate to consider this as a tool to drive decision making without review of the aggregation process and scrutiny of the data itself. The purpose of this deployment is only casual review of the output of the aggregation engine and to show what is possible with this style of aggregation and analysis.

Licencing

I have not attempted to licence the data that is imported into the database and subsequently displayed in the tool so if you are from an organisation whose data I am using, and I am violating your licence agreement by displaying it in this proof-of-concept instance, Please contact me.

This software itself is licenced under the terms of the Licence file in the Github repository.

Deployment

Should you wish to deploy this yourself, it's not going to be a smooth process. It is repeatable for me, and there is some documentation in the DataSources.md and install.md documents in the "docs" directory. Contact me if you have any problems - I'm happy to help. All the source code is available in my GitHub repository.

-- Edwin Steele (edwin@wordspeak.org).