USDataverse's Wikipedia Hourly Page Views Time Series Data captures the precise hourly view data for globally popular Wikipedia pages during the entire month of January 2024. This dataset is presented as structured time series data, recording the titles of specific pages and their unique view counts at each full hour of the day across different language versions (domains). The dataset provides powerful data support for in-depth analysis of Wikipedia page view behaviors, user interest shifts, and the access trends of specific pages.
Data Uniqueness
- High Temporal Resolution & Complete Monthly Coverage: This dataset provides hourly page view data, fully covering the entire natural month of January 2024. The fine-grained time series allows analysts to track intra-day fluctuations, identify exact peak traffic times (such as hourly reactions following specific news releases), and perform precise cycle analyses (e.g., daily cycles, weekly cycles). Compared to public datasets that only provide daily or monthly aggregated data, this dataset’s time resolution offers significant advantages, enabling micro-behavior research and real-time trend capturing.
- Focus on Popular Pages, High Data Density: The data has been rigorously filtered to include only pages with at least 10 views per day, with daily records reaching 5-6 million entries on average. Each record represents a topic, person, or event that attracted significant public attention that month. This makes the dataset valuable for studying social hot topics, cultural trends, and the global dissemination and impact of major news events. It provides high-value, high-signal-to-noise analysis objects, which enhances research efficiency and depth.
- Standardized Structure Across Language/Regional Dimensions: The data is clearly labeled with the domain_code field to indicate the corresponding Wikipedia sub-project for each page. This standardized structure allows researchers to easily conduct cross-language and cross-cultural comparative studies. For example, they can analyze the temporal and spatial differences in public attention to the same international event across different language user groups, or explore the level of activity of specific cultural topics within their primary language communities.
Data Application Value
Social Trends & Public Attention Research: Researchers can use this data to quantitatively analyze the collective focus of global internet users during January 2024. By tracking the view count time series of specific page titles, they can empirically study the formation, evolution, and decay of public attention. This provides valuable data support for research in communication studies, sociology, and public policy. Web Traffic Prediction & Platform Operations Analysis: Internet companies and Wikipedia’s operational team can use this data to build and validate web traffic prediction models. Hourly sequence data is ideal for training machine learning models to predict future traffic peaks, optimize server resource allocation, and devise content recommendation strategies. The periodic patterns embedded in the data are crucial for enhancing prediction accuracy. Digital Humanities & Computational Social Science Exploration: This dataset offers rich empirical material for digital humanities and computational social science. Scholars can combine page titles with corresponding knowledge entities to study the online influence of cultural phenomena, the contemporary online attention shifts of historical figures or events, and even analyze cultural biases or regional differences in knowledge consumption and dissemination through cross-language data comparisons.
This dataset, with its hourly high temporal resolution, focus on popular pages, and standardized cross-language structure, provides a unique and powerful analytical tool for both academia and industry. Whether used to uncover micro-dynamics of public attention, optimize web infrastructure, or support cutting-edge cross-cultural digital research, this dataset offers a solid, detailed data foundation, empowering users to extract profound insights and value from massive online behavioral data.
Field Demonstration
Sample Data
Data Update Frequency
Updated irregularly
