hopping-cities

File List
Login

Files of check-in [29db100a4b] in the top-level directory


Hopping Cities
==============
I have plotted the given flight data, along with other relevant data,
as in a data-driven dance of airplanes.

You can download it `here <http://big.dada.pink/hopping-cities.mkv>`_.

Legend
------
This plot contains about 20 variables.

Because the challenge focused on this particular dataset, I used the
different sensory media to distinguish between the given data and the
other data I acquired. The given flight data are presented only in the
video, and few other variables are presented in the video; the music
includes none of the given flight data.

Time dimension
^^^^^^^^^^^^^^
The time dimension is the key by which the video and music variables are
joined. Time is divided into months from January 2007 to June 2012
(66 months).  Each month corresponds to 16/9 (1.778) seconds, eight
beats of music, and eight frames of video. The whole song is thus
117 seconds long.

To compose the video I specify two sets of animations. The first set
describes the outgoing trips for the present month, and the second set
describes incoming trips. Each set is four frames long, so I get eight
frames by playing them in series.

To compose the music, I generate four beats of music for each month.
Then I play this twice, producing the eight beats per month. The music
contains mostly year-level and quarter-level data, so adjacent months
tend to sound similar to each other.

Video
^^^^^
Airplanes are arranged in dancing circles, with dancing circles
positioned on a scatterplot.

Each dancing circle corresponds to a city other than Budapest,
to which flights are going and coming.

Circle ~ city
    One dancing circle is one city
Airplanes ~ flights
    One ✈ = 10 flights
Airplane orientation ~ Flight direction
    Airplanes face into the circle for incoming flights and away from
    the circle for outgoing flights.
Circle radius ~ number of flights
    Larger circles mean more flights.
Front-back jump distance ~ passengers per flight
    If there are more passengers per flight, the circle will move in and
    out more.
Left-right jump distance ~ cargo per flight
    If there is more cargo in the flight (by mass), the planes in the
    circle move side-to-side more. Most routes do not convey any cargo,
    and these planes thus do not move side-to-side.
Hue ~ Schengen
    Blue Plane = Flight within Schengen Zone
Brightness ~ Schedule
    Airplanes in darker color correspond to 10 scheduled flights,
    and airplanes in lighter color correspond to 10 non-scheduled
    flights. Most flights are scheduled, and I did not include partial
    airplanes are not included, so I always round in favor of
    non-scheduled flights.

Any particular frame displays data from a particular direction of travel
(incoming or outgoing) and month. Each circle thus corresponds to a
route-month, and the number of planes in the circle corresponds to the
number of route-flights on that month.

In order to apply the jump aesthetics, I in fact made four frames for
each route-month. The planes are all the way in on the first of these
frames, all the way out on the third, and in the middle on the second
and fourth.

The circle is positioned based on country-level data for its city.
Cities in the same country has the same country-level data and thus form
concentric circles.

x ~ distance
    Distance between the country and
    Hungary, measured as the haversine result for the two country centroids.
y ~ log(population)
    Logarithm of population of the other country

Music
----- 
Recall that the same four beats of music are played twice per month.
The main tune starts with a baseline note for one half-beat, and the
remaining seven half-beats have pitches corresponding to proportional
changes in statistics.

1. Baseline note
2. Proportional change in Hungary retail sales since last year
3. Proportional change in Hungary imports since last year
4. Proportional change in Hungary exports since last year
5. Proportional change in Hungary hotel beds since last year
6. Proportional change in Hungary other beds since last year
7. Proportional change in Budapest population since last year
8. Proportional change in Budapest dwellings since last year

Pitches are chosen along a just-tuned major seventh chord.
For each 1% change, the note goes moves up or down by one step along the
chord. (This linear approximation of percent change is appropriate only
because the change each year is small.) For example, if imports go up
2% one year, the third note in the chord is played. If there is no
change since the previous year, the percent change is 0%, and the
baseline note is thus played.

Percussion ~ Summer
    A low drum plays on every down beat. During the summer
    (high travel season), a high clap is added on every second up beat.
Volume ~ Gross domestic product
    Volume of everything except the percussion is based on the change in
    GDP since the previous year (measured quarterly). It is high for larger increases,
    low for large decreases, and in-between when there is little change.
Number of grace notes ~ Unemployment rate
    Half-beat grace notes can be added to every beat.
    A grace note is added for a particular beat if there is a successful
    result of a Bernoulli trial with probability equal to the normalized
    unemployment rate for the particular month.
Pitch of grace notes ~ Direction of change in unemployment rate
    If a grace note is added to a beat, the pitch is chosen to be one
    step up or down (along the aforementioned chord) from the main pitch
    of the beat. It is one step up if the unemployment rate is going
    down, one step down if the unemployment rate is going up.

The given flight dataset
--------------------------------------
In the given flight dataset, each record is a route on a month. A route
might have multiple flights per month. The variables are

* Other city
* Other country
* Incoming or outgoing
* Scheduled or not
* Month
* Number of passengers (total across all flights)
* Cargo weight (total cargo weight across all flights)
* Number of flights (flight on the day)
* Seat capacity (single airplane seat capacity times number of flights)

I used all of these variables in my analysis except for the seat
capacity, as I thought it was unfair to cargo that there was a seat
capacity but not a cargo capacity.

I suspect that each row in fact corresponds to an airport rather than to
a city because large cities (with multiple airports) have multiple records
for a particular month-scheduling-direction. ::

    flights.sub <- subset(flights, month=='2009-08-01'&
                          scheduled=='Scheduled'&direction=='Outgoing')
    print(tail(sort(table(flights.sub$other.city.id))), 20)
    # Timisoara      Tirana Tirgu Mures     Treviso    Uzhgorod     Valetta
    #         1           1           1           1           1           1
    #     Varna      Venice      Warsaw        Wien      Zagreb      Zürich
    #         1           1           1           1           1           1
    #    Berlin    Göteborg       Milan      Moscow        Oslo       Paris
    #         2           2           2           2           2           2
    # Stockholm      London
    #         2           3

Dependencies
------------
One should be able to run the present analysis on modern operating
system with the following extra packages.

* ffmpeg (for converting audio file formats)
* sox (for playing audio files)
* R (main composition language)
  * grid library

You will need the following packages to build the data

* wget (for downloading data files)
* unzip
* Gnumeric (ssconvert for making CSVs)

I have saved the built data files. (Run ``make clean`` in the ``data``
directory to delete them.) so it is possible to run the analysis without
the second set of packages.

I ran the analysis on OpenBSD 5.9.

Data sources
------------
Aside from the given flight data, most of the other data come from the
`Hungarian Central Statistical Office <http://www.ksh.hu/>`_.
Full details are in the
`source code <http://src.thomaslevine.com/satrday-budapest-rdataviz/artifact/000e60dc58722d03>`_.

With the aforementioned dependencies, the present analysis should be
fully reproducible from the files in the
`repository <http://src.thomaslevine.com/satrday-budapest-rdataviz/dir?ci=tip&type=tree>`_.
Run ``make``, and then open ``/tmp/ugros.mkv`` in a video player.

References
----------

* `SatRdays data visualization challenge <http://budapest.satrdays.org/#datavizcompo>`_
* `Data music </!/data-music/>`_