Organization and exploration of heterogeneous personal data collected in daily life
© Teraoka; licensee Springer. 2012
Received: 9 September 2011
Accepted: 24 January 2012
Published: 24 January 2012
This paper describes a study on the organization and the exploration of heterogeneous personal data that are collected from mobile devices and web services in daily use. Although large amounts of personal data can be collected, it is not easy to find effective methods of reusing these data. With regard to collecting personal data, most lifelog research has focused on the capture of personal logs and personal data archives. Our research focuses on helping users recall and reminisce about past experiences by using an interactive system that enables them to explore personal data from several viewpoints. An organizing structure and a zooming user interface are proposed for an effective exploration of personal data. We also illustrate a digest view that includes a summary of personal data and landmarks that trigger memory recall. A prototype system is introduces for exploring a variety of personal data including photographs, Global Positioning System histories, Tweets, health data, and the number of steps walked per day.
KeywordsPersonal data lifelog recall user interfaces exploration
Many research topics, such as lifelogging, and personal information management, focus on the collection and the management of personal data. Extensive research on lifelogging has recently been carried out to collect vast amounts of personal data [1–3]. The personal data include email messages, schedules, Web sites visited, credit card payments, and photographs taken. They also include images, videos, sounds, and bio-sensor data. Most conventional research on lifelogging has been primarily concerned with the capture of personal data. It has also focused on building personal data archives .
Various personal data are stored in a variety of distributed sources, such as email messages, photographs on the WWW(World Wide Web), SMS(Short Message Service) on mobile phones, and perambulatory histories monitored by using GPS(Global Positioning System) embedded in mobile phones. There are also weight scales that connect to the Internet to store a user's weight on the WWW. It is expected to make wide use of smart meters that monitor the energy of homes by way of the WWW, such as the Google PowerMeter . A variety of these personal data can be collected in the near future even if special devices that have cameras, microphones, and various sensors embedded are not always worn.
This paper focuses on reusing personal data for recall and helping users find various personal data and related information. This paper also describes methods of organizing and interacting with personal data. Personal data are heterogeneous. In other words, they contain a variety of media, formats, and granularities. Hence, it would be better to organize them by effective viewpoints in order to explore interactively rather than use the usual keyword searches. Moreover, various landmarks that trigger different personal data and related information are reported.
First, some viewpoints and views for organizing personal data are explained. Second, summaries and landmarks of data are introduced. Third, a visual user interface for exploration of personal data are proposed. Finally, a prototype system is explained, followed by a discussion on related work and our conclusions.
Organization and exploration of personal data
Personal data in this paper include emails, photographs, telephone call histories, GPS histories, and health data such as body weight and the number of steps people walk. Also data include Tweets on Twitter, blogs, and schedules. Home energy use and costs are also included.
It is necessary to study four main items to manage and organize personal data.
Common metadata to manage heterogeneous data from a variety of data sources
Management of data permission and user authorization
Unified user interfaces to explore data
User assistance to recall memories from a mixture of heterogeneous data
This paper especially focuses on the latter two. Several viewpoints and corresponding views are studied taking into account the design of unified user interfaces. Summaries and landmarks are proposed to assist users to recall noteworthy experiences.
Viewpoints and scale
Heterogeneous personal data need to be visualized by organizing them along with some their attributes before they are explored. For example, data with location attributes can be displayed on a map and data with timestamps can be displayed on a calendar or a timeline list. Usually the 5W1H questions, - Who, What, Where, When, Why, and How -, involve the most popular concept used to organize information. LATCH is another concept  that includes 'Location', 'Alphabetic', 'Time', 'Category', and 'Hierarchy'.
These kinds of axes in this paper are called viewpoints and we studied three viewpoints of time, location, and people. Time is a major viewpoint because all personal data have timestamps.
All personal logs have timestamps. However, there are various points of view even in time. For example, some activities extend for a certain period of time. Moreover, personal logs include time series, such as GPS histories and monitored pulses. Moreover, home energy costs including electric bills and gas bill are totaled every month.
The change in scale for time corresponds to the change in the period, such as the year, month, and day.
Most personal logs have location attributes. Parts of them have the latitudes and longitudes of locations. Other logs have attributes of places in a schedule and on a calendar. They are assigned by the name of the places, and the addresses or names of shops. Occasionally, places indicate homes, offices, stations, or schools, which is information that depends on individual users.
The change in scale at locations corresponds to the change in the geographical region.
All personal data are related to people. In other words, all data have owner attributes. Personal data are usually related to people other than the owner, such as senders of emails, colleagues at meetings, and families in photographs.
The changes in scale for humans correspond to changes in groups of people.
Category is a supplementary axis that enables personal data to be selected. A text tag is one item of information in a category. It is also useful for filtering large amounts of data selected with the above viewpoint.
Views that correspond to viewpoints are explained. A variety of visualizations is available such as calendars and timelines even in a temporal viewpoint.
Views that feature temporal information
The most popular view that features temporal information is a calendar. It usually provides daily, weekly, monthly, and yearly forms on a calendar view. The amount of data to be displayed generally substantially increases as the time interval expands. Therefore, some representative data are displayed on the screen. Another view that features time is timeline visualization such as AllofMe .
A kind of zooming user interface is proposed in this paper to enable interaction from the temporal viewpoint. A zooming user interface (ZUI) is a graphical user interface that provides a visual scaling function [8–10]. Users can continuously change the size of the view to see more or less detail with the interface.
There are various methods of display that feature temporal information. In home energy costs, monthly usage and cost are displayed in figures on a monthly view. A bar chart in which 12 bars represents monthly use are displayed on a yearly scale.
As previously described, visualization changes depending on the temporal scale and characteristics of the data. For instance, location data are usually measured every few minutes or seconds and it would be worthless to display all data on a yearly scale.
Three user interfaces are considered to feature temporal information.
A temporal zooming interface that enables users to zoom the time hierarchy as shown in Figure 2.
Also, it is possible to use three views: a text label to display characters, a chart (e.g., bar and line charts), and an animation of time series data.
Views that feature locations
The most natural view that features locations is a map. Although location data are easy to monitor using GPS, detailed names of places cannot be understood solely from the latitude and longitude monitored by GPS. However, users occasionally write the names of places where they have been on Twitter. Also, location information such as 'homes', 'offices', and 'stations' are used on calendars. This means we use various levels of locational information in daily life.
Data are usually located on a map by the latitude and longitude to enable location data to be visualized. Therefore, personal data originally without location data were assigned to latitude and longitude by matching their timestamps to the timestamps of GPS histories in this research.
Personal data can be classified by related individuals from the viewpoint of people. The classified data are displayed on a list, or a graph structure that can represent the relationship among people.
A category viewpoint is usually used for filtering information. A tag cloud user interface for this kind of view has recently become very popular on the WWW.
Summaries and Landmarks
An effective navigation system is essential to enable interaction with large amounts of personal data. Furthermore, summaries of information and special landmarks are useful for recalling experiences by navigating personal data . Summaries are almost digests of daily life. Landmarks represent important events, such as parties, ceremonies, travel, and important meetings. They provide information as cues for recalling memories and exploring related information and events. A summary contains several landmarks. Of course, summaries and landmarks change depending on viewpoints and their scale.
This paper proposes six main landmarks.
Landmark user-generated data (e.g., photographs, videos, blogs, mail messages)
Landmark values (e.g., outliers)
A variety of methods for clustering photographs have been proposed [12, 13]. A simple method of clustering using only the creation time was applied to photographs in our prototype. Photographs, each of which is the closest to the center of a cluster, are considered to be representative photos and displayed as temporal landmarks.
GPS histories are divided with a clustering algorithm using only latitude and longitude. Each center of the clusters is considered to be a location landmark. Also, daily living areas and others can be distinguished by the frequency of appearance of each cluster. Other landmarks are places where people have rarely gone in daily life. Here, we used a simple expectation maximization (EM) algorithm implemented in WEKA  to cluster photographs and GPS histories.
Other candidates for landmarks are human landmarks. These include family members who frequently appear in photographs, colleagues who frequently communicate, old friends who meet after a long time, and pop stars whose songs are very often listened to.
Landmarks of tags are defined by the frequency of tags that are assigned to each item of personal data. A tag that has been in heavy use during a period of time is a candidate for a landmark. A tag that has rarely been used during a long period of time is also a candidate for a landmark.
Outliers are candidates for landmarks in time-series data, such as home energy use, the number of steps walked, and histories of body weight. Data that exceed pre-defined or user-defined thresholds are also candidates. Consequently, we often go out on days when we walk more steps than on other days and such landmarks help us find special events.
Other landmarks are public landmarks, which include shocking public news, bestsellers, blockbuster films, and annual rankings of top Web-search words. We can recall our own experiences on those days from these landmarks.
The concepts and user interface we propose were implemented in a prototype system. It was applied to nine types of personal data: photographs, GPS history, microblogging, schedules, web mail and SMS text messages, telephone call history on smart phones, numbers of steps walked per day as measured with pedometer, body weight measured every day, and home energy cost and use.
Parts of these personal data were collected from Web services, such as Flickr , Twitter , Gmail, and Google calendar. The other data were obtained from mobile devices and entered manually. We used iPhone and iPad as client mobile devices. iPhone was mainly used to collect personal data including GPS histories. iPad was mainly used to explore personal data with native-application user interfaces.
News topics (e.g., those from Yahoo! News) were used as one of the public landmarks.
This prototype was implemented mainly for demonstrating a feasibility of visualization and interaction with heterogeneous personal data. It basically used over 15,000 pieces of personal data from a test user and a lot of data from public including Flickr and Twitter. They were collected for more than a year.
Public landmarks that are related to the period of time are displayed on the right of the main view. News topics during the period of time have been displayed in this example.
Study and future work
Access control was not extensively studied in our current research. We need to safely manage permission for metadata and information on authorization. Since personal data are collected from diverse services, permission to use data is different from the original and complicated. Therefore, important work for the future is to study the management of permission and authorizations including research on OpenID  and OAuth .
Further study on summaries and landmarks is another important area for future work. Clustering algorithms for the content of data and attributes other than timestamps and locations should be studied. A search function is also necessary to enable explicit memories to be quickly and easily found. Taking into consideration the studies reported in this paper, we have to study several search functions, such as temporal, geographical, keyword, and spatio-temporal searches.
Social functions including synchronous and asynchronous communications are also important. Asynchronous communications using personal data acquired previously have vast potential in the future.
It is difficult to compare with existing approaches, because most of them are closed systems that we cannot try to evaluate them. Limited information could be obtained through single media, such as photographs. Data collected only from single device, such as PC, lack a variety of user experiences. In our research, not only photographs but also time series data, text messages, news topics, tags and combination of them facilitate a memory recall. Also, diverse perspectives are provided to users from a variety of media, different viewpoints (i.e., time, location, and people), and several landmarks.
Of course, detailed evaluations by users are very significant future works. Through developing the prototype and trials, the more the types of aggregated data increased and the more anxiety about information leaks and invasions of privacy increased. We will conduct a wide range of user tests with considering privacy issues deeply.
MyLifeBits is a system for storing lifetime data on a database . It stores data from personal computers and photos taken by SenseCam, which is a mobile device that has a camera module, digital light sensor, temperature sensor, and passive infrared sensor .
Eagle et al. proposed a 'reality mining' system that measured information access and use within different contexts, recognized social patterns in daily user activities, and inferred relationships . They used standard Bluetooth-enabled mobile phones. These researchers focused on collecting data with special devices.
Several user interfaces for memory aids have been proposed. iCLIPS provides a search user interface for logs from personal computers and photos taken by SenseCam . Visual Augmented Memory (VAM) tried to show cues that were who (face), where (room), when (timestamp), and what (any visible action) through facial recognition from images stored on a mobile computer . Autoalbum automatically generated a photo album by clustering photos based on the time they were created and the order in which they were taken .
Experience Explorer is a personal computer client that represents user data in a time-oriented manner . All user activities (e.g., content generated, phone calls, tracks, music that was listened to) on mobile phones appear linearly arranged under user names. Lines of the user's friends appear next.
MemoryLens Browser employs inferences about landmarks in visualizations for browsing files and appointments . Landmarks were predicted from a user's calendar data and computed as atypical organizers, atypical attendees, and atypical locations. To compute the value of locations atypical for events, they computed the number of times each location had appeared in a user's calendar over a fixed period of time.
The Stuff I've Seen (SIS) interface provides an integrated view of files on personal computers [24, 25]. The files were filtered with five fields of document titles, dates, ranks, authors, and mailtos.
Ringel et al. explained personal landmarks and public landmarks . Personal landmarks were important calendar events and first photos taken on given days. Public landmarks were national holidays and important news events. Only files on personal computers were used and the time scale for landmarks was fixed in their research. In our research, several landmarks were suggested and appropriate landmarks were presented with changing time scales.
PERSONE is a Web-based life log media browser in which videos and audio are gathered by a special gadget. It can be browsed with a conventional timeline view and a map view .
Zheng et al. proposed a recommendation system that recommends activities using GPS histories and also recommends locations using user activities . This seems useful in limited situations, because only small parts of personal data are used to analyze user activity patterns.
These researches tried to develop special devices, several visualizations and some activity analyses for a challenging and ambitious objective that is to collect and store all data of life and experiences. Since mobile communications and sensor networks such as IOT(Internet of Things) are becoming popular in our daily life, we have to study more natural and easier ways to collect and use personal logs.
Our approach expands previous researches and gives importance to help users recall and reminisce past events by integrating a variety of personal data in daily life with non-specialized devices and natural ways.
A study of the exploration of personal data was explained in this paper. A variety of viewpoints, views, and a temporal zooming user interface was described. Summaries and landmarks for memory cues were also presented. They were, e.g., representative photographs, outliers of time-series data, and locations. The methods we proposed enable users to recall and reminisce their memories and experiences.
Also, a prototype system in which our concepts were implemented was presented. It could be used to explore personal data including photographs, email messages, GPS histories, Tweets, histories of body weight, and home energy use. Further, a variety of personal data have to be integrated to study other views in the prototype. Detailed user evaluations and studies of other types of summaries and landmarks will be an important focus in future work.
List of abbreviations
- GPS :
Global Positioning System
- WWW :
World Wide Web
- SMS :
Short Message Service
- ZUI :
Zooming User Interface
- Doherty AR, Gurrin C, Jones GJF, Smeaton AF (Eds): In Proceedings of the ECIR 2010 workshop on information access for personal media archives: 28 March 2010; Milton Keynes, UK. online doras.dcu.ie/15373/2010
- Sellen A, Whittaker S: Beyond Total Capture: A Constructive Critique of Lifelogging. Communications of the ACM 2010,53(5):70–77. 10.1145/1735223.1735243View ArticleGoogle Scholar
- Bell G, Gemmell J: Total Recall. New York: DUTTON; 2009.Google Scholar
- Gemmell J, Bell G, Lueder R: MyLifeBits: A Personal Database for Everything. Communications of the ACM 2006, 49: 88–95.View ArticleGoogle Scholar
- Google PowerMeter[http://www.google.com/powermeter/about/]
- Wurman RS: Information Anxiety 2. Indianapolis: Que; 2000.Google Scholar
- Cockburn A, Karlson A, Bederson B: A Review of Overview+Detail, Zooming, and Focus+Context Interfaces. ACM Computing Surveys 2008., 41:Google Scholar
- Perlin K, Fox D: Pad: An Alternative Approach to the Computer Interface. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques: 2–6 August 1993; Anaheim, CA, USA. ACM; 1993:57–64.Google Scholar
- Bederson BB, Hollan JD: Pad++: A Zooming Graphical Interface for Exploring Alternate Interface Physics. In Proceedings of the 7th Annual ACM Symposium on User Interface and Software Technology: 2–4 November 1994; Marina del Rey, CA, USA. ACM; 1994:17–26.Google Scholar
- Horvitz E, Dumais S, Koch P: Learning Predictive Models of Memory Landmarks. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society: 5–7 August 2004; Chicago. Cognitive Science Society; 2004:583–588.Google Scholar
- Platt JC: AutoAlbum:Clustering Digital Photographs using Probabilistic Model Merging. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries: 16 June 2000; South Carolina, USA. IEEE; 2000:96–100.View ArticleGoogle Scholar
- Graham A, Garcia-Molina H, Paepcke A, Winograd T: Time as Essence for Photo Browsing Through Personal Digital Libraries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries: 13–17 July 2002; Portland, USA. ACM; 2002:326–335.Google Scholar
- Weka - Data Mining with Open Source Machine Learning Software in Java[http://www.cs.waikato.ac.nz/ml/weka/]
- Gemmell J, Williams L, Wood K, Lueder R, Bell G: Passive Capture and Ensuing Issues for a Personal Lifetime Store. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences: 15 Oct 2004; New York. ACM; 2004:48–55.View ArticleGoogle Scholar
- Eagle N, Pentland A: Reality mining:sensing complex social systems. Personal and Ubiquitous Computing 2006,10(4):255–268. 10.1007/s00779-005-0046-3View ArticleGoogle Scholar
- Chen Y, Jones GJF: Augmenting Human Memory using Personal Lifelogs. In Proceedings of the 1st Augmented Human International Conference: 2–4 April 2010; Megeve, France. ACM; 2010.Google Scholar
- Farringdon J, Oni V: Visual Augmented Memory(VAM). In Proceedings of the 4th International Symposium on Wearable Computers: 18–21 Oct 2000; Atlanta. IEEE; 2000:167–168.Google Scholar
- Belimpasakis P, Roimela K, You Y: Experience Explorer: a Life-Logging Platform Based on Mobile Context Collection. In Proceedings of the 3rd International Conference on Next Generation Mobile Applications, Services and Technologies: 15–18 Sep 2009; Cardiff, UK. IEEE; 2009:77–82.View ArticleGoogle Scholar
- Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC: Stuff I've Seen: A System for Personal Information Retrieval and Re-Use. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: 28 July - 1 Aug 2003; Toronto. ACM; 2003:72–79.Google Scholar
- Cutrell E, Dumais ST, Teevan J: Searching to Eliminate Personal Information Management. Communications of the ACM 2006, 49: 58–64.View ArticleGoogle Scholar
- Ringel M, Cutrell E, Dumais S, Horvitz E: Milestones in Time. In Proceedings of the 9th IFIP TC13 International Conference on Human-Computer Interaction: 1–5 Sep 2003; Zurich, Switzerland. IFIP; 2003:184–191.Google Scholar
- Kim IJ, Ahn SC, Ko H, Kim HG: PERSONE:Personalized Experience Recoding and Searching On Networked Environment. In Proceedings of the 3rd ACM Workshop on Continuous Archival and Retrieval of Personal Experiences: 27 Oct 2006; Santa Barbara, USA. ACM; 2006:49–54.View ArticleGoogle Scholar
- Zheng VW, Zheng Y, Xie X, Yang Q: Collaborative Location and Activity Recommendations with GPS History Data. In Proceedings of the 19th International World Wide Web Conference: 26–30 April 2010; Raleigh, NC, USA. ACM; 2010:1029–1038.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.