SMU Research Data Repository (RDR)

File(s) stored somewhere else

Please note: Linked content is NOT stored on SMU Research Data Repository (RDR) and we can't guarantee its availability, quality, security or accept any liability.

BuzzCity mobile advertisement dataset

posted on 2014-01-01, 00:00 authored by Living Analytics Research Centre

This competition involves advertisement data provided by BuzzCity Pte. Ltd. BuzzCity is a global mobile advertising network that has millions of consumers around the world on mobile phones and devices. In Q1 2012, over 45 billion ad banners were delivered across the BuzzCity network consisting of more than 10,000 publisher sites which reach an average of over 300 million unique users per month. The number of smartphones active on the network has also grown significantly. Smartphones now account for more than 32% phones that are served advertisements across the BuzzCity network.

The "raw" data used in this competition has two types: publisher database and click database, both provided in CSV format. The publisher database records the publisher's (aka partner's) profile and comprises several fields:

  • publisherid - Unique identifier of a publisher.
  • Bankaccount - Bank account associated with a publisher (may be empty)
  • address - Mailing address of a publisher (obfuscated; may be empty)
  • status - Label of a publisher, which can be the following:
  • "OK" - Publishers whom BuzzCity deems as having healthy traffic (or those who slipped their detection mechanisms)
  • "Observation" - Publishers who may have just started their traffic or their traffic statistics deviates from system wide average. BuzzCity does not have any conclusive stand with these publishers yet
  • "Fraud" - Publishers who are deemed as fraudulent with clear proof. Buzzcity suspends their accounts and their earnings will not be paid

On the other hand, the click database records the click traffics and has several fields:

  • id - Unique identifier of a particular click
  • numericip - Public IP address of a clicker/visitor
  • deviceua - Phone model used by a clicker/visitor
  • publisherid - Unique identifier of a publisher
  • adscampaignid - Unique identifier of a given advertisement campaign
  • usercountry - Country from which the surfer is
  • clicktime - Timestamp of a given click (in YYYY-MM-DD format)
  • publisherchannel - Publisher's channel type, which can be the following:
  • ad - Adult sites
  • co - Community
  • es - Entertainment and lifestyle
  • gd - Glamour and dating
  • in - Information
  • mc - Mobile content
  • pp - Premium portal
  • se - Search, portal, services
  • referredurl - URL where the ad banners were clicked (obfuscated; may be empty). More details about the HTTP Referer protocol can be found in this article.

Related Publication: R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). Detecting click fraud in online advertising: A data mining approach, Journal of Machine Learning Research, 15, 99-140.


Usage metrics

    Living Analytics Research Centre


    Ref. manager