- Open Access
Organization and exploration of heterogeneous personal data collected in daily life
Human-centric Computing and Information Sciences volume 2, Article number: 1 (2012)
This paper describes a study on the organization and the exploration of heterogeneous personal data that are collected from mobile devices and web services in daily use. Although large amounts of personal data can be collected, it is not easy to find effective methods of reusing these data. With regard to collecting personal data, most lifelog research has focused on the capture of personal logs and personal data archives. Our research focuses on helping users recall and reminisce about past experiences by using an interactive system that enables them to explore personal data from several viewpoints. An organizing structure and a zooming user interface are proposed for an effective exploration of personal data. We also illustrate a digest view that includes a summary of personal data and landmarks that trigger memory recall. A prototype system is introduces for exploring a variety of personal data including photographs, Global Positioning System histories, Tweets, health data, and the number of steps walked per day.
Many research topics, such as lifelogging, and personal information management, focus on the collection and the management of personal data. Extensive research on lifelogging has recently been carried out to collect vast amounts of personal data [1–3]. The personal data include email messages, schedules, Web sites visited, credit card payments, and photographs taken. They also include images, videos, sounds, and bio-sensor data. Most conventional research on lifelogging has been primarily concerned with the capture of personal data. It has also focused on building personal data archives .
Various personal data are stored in a variety of distributed sources, such as email messages, photographs on the WWW(World Wide Web), SMS(Short Message Service) on mobile phones, and perambulatory histories monitored by using GPS(Global Positioning System) embedded in mobile phones. There are also weight scales that connect to the Internet to store a user's weight on the WWW. It is expected to make wide use of smart meters that monitor the energy of homes by way of the WWW, such as the Google PowerMeter . A variety of these personal data can be collected in the near future even if special devices that have cameras, microphones, and various sensors embedded are not always worn.
This paper focuses on reusing personal data for recall and helping users find various personal data and related information. This paper also describes methods of organizing and interacting with personal data. Personal data are heterogeneous. In other words, they contain a variety of media, formats, and granularities. Hence, it would be better to organize them by effective viewpoints in order to explore interactively rather than use the usual keyword searches. Moreover, various landmarks that trigger different personal data and related information are reported.
First, some viewpoints and views for organizing personal data are explained. Second, summaries and landmarks of data are introduced. Third, a visual user interface for exploration of personal data are proposed. Finally, a prototype system is explained, followed by a discussion on related work and our conclusions.
Organization and exploration of personal data
Personal data in this paper include emails, photographs, telephone call histories, GPS histories, and health data such as body weight and the number of steps people walk. Also data include Tweets on Twitter, blogs, and schedules. Home energy use and costs are also included.
It is necessary to study four main items to manage and organize personal data.
Common metadata to manage heterogeneous data from a variety of data sources
Management of data permission and user authorization
Unified user interfaces to explore data
User assistance to recall memories from a mixture of heterogeneous data
This paper especially focuses on the latter two. Several viewpoints and corresponding views are studied taking into account the design of unified user interfaces. Summaries and landmarks are proposed to assist users to recall noteworthy experiences.
Viewpoints and scale
Heterogeneous personal data need to be visualized by organizing them along with some their attributes before they are explored. For example, data with location attributes can be displayed on a map and data with timestamps can be displayed on a calendar or a timeline list. Usually the 5W1H questions, - Who, What, Where, When, Why, and How -, involve the most popular concept used to organize information. LATCH is another concept  that includes 'Location', 'Alphabetic', 'Time', 'Category', and 'Hierarchy'.
These kinds of axes in this paper are called viewpoints and we studied three viewpoints of time, location, and people. Time is a major viewpoint because all personal data have timestamps.
Scales were also considered for all viewpoints as seen in Figure 1. Data should be displayed differently to enable proper visualization depending on the scale of the viewpoint. For example, not all GPS histories are necessary to display a location viewpoint on the scale of a country on a map. It is better to display representative trajectories. Also, displaying all WWW browsing histories throughout the year is almost always not essential from the temporal viewpoint. As home energy costs are usually calculated per month, we obviously cannot obtain accurate charges per day.
All personal logs have timestamps. However, there are various points of view even in time. For example, some activities extend for a certain period of time. Moreover, personal logs include time series, such as GPS histories and monitored pulses. Moreover, home energy costs including electric bills and gas bill are totaled every month.
The change in scale for time corresponds to the change in the period, such as the year, month, and day.
Most personal logs have location attributes. Parts of them have the latitudes and longitudes of locations. Other logs have attributes of places in a schedule and on a calendar. They are assigned by the name of the places, and the addresses or names of shops. Occasionally, places indicate homes, offices, stations, or schools, which is information that depends on individual users.
The change in scale at locations corresponds to the change in the geographical region.
All personal data are related to people. In other words, all data have owner attributes. Personal data are usually related to people other than the owner, such as senders of emails, colleagues at meetings, and families in photographs.
The changes in scale for humans correspond to changes in groups of people.
Category is a supplementary axis that enables personal data to be selected. A text tag is one item of information in a category. It is also useful for filtering large amounts of data selected with the above viewpoint.
Views that correspond to viewpoints are explained. A variety of visualizations is available such as calendars and timelines even in a temporal viewpoint.
Views that feature temporal information
The most popular view that features temporal information is a calendar. It usually provides daily, weekly, monthly, and yearly forms on a calendar view. The amount of data to be displayed generally substantially increases as the time interval expands. Therefore, some representative data are displayed on the screen. Another view that features time is timeline visualization such as AllofMe .
A kind of zooming user interface is proposed in this paper to enable interaction from the temporal viewpoint. A zooming user interface (ZUI) is a graphical user interface that provides a visual scaling function [8–10]. Users can continuously change the size of the view to see more or less detail with the interface.
Figure 2 shows an overview of temporal zooming. A later section explains it in more detail.
There are various methods of display that feature temporal information. In home energy costs, monthly usage and cost are displayed in figures on a monthly view. A bar chart in which 12 bars represents monthly use are displayed on a yearly scale.
As previously described, visualization changes depending on the temporal scale and characteristics of the data. For instance, location data are usually measured every few minutes or seconds and it would be worthless to display all data on a yearly scale.
Three user interfaces are considered to feature temporal information.
A temporal zooming interface that enables users to zoom the time hierarchy as shown in Figure 2.
Also, it is possible to use three views: a text label to display characters, a chart (e.g., bar and line charts), and an animation of time series data.
Views that feature locations
The most natural view that features locations is a map. Although location data are easy to monitor using GPS, detailed names of places cannot be understood solely from the latitude and longitude monitored by GPS. However, users occasionally write the names of places where they have been on Twitter. Also, location information such as 'homes', 'offices', and 'stations' are used on calendars. This means we use various levels of locational information in daily life.
Data are usually located on a map by the latitude and longitude to enable location data to be visualized. Therefore, personal data originally without location data were assigned to latitude and longitude by matching their timestamps to the timestamps of GPS histories in this research.
Personal data can be classified by related individuals from the viewpoint of people. The classified data are displayed on a list, or a graph structure that can represent the relationship among people.
A category viewpoint is usually used for filtering information. A tag cloud user interface for this kind of view has recently become very popular on the WWW.
Summaries and Landmarks
An effective navigation system is essential to enable interaction with large amounts of personal data. Furthermore, summaries of information and special landmarks are useful for recalling experiences by navigating personal data . Summaries are almost digests of daily life. Landmarks represent important events, such as parties, ceremonies, travel, and important meetings. They provide information as cues for recalling memories and exploring related information and events. A summary contains several landmarks. Of course, summaries and landmarks change depending on viewpoints and their scale.
This paper proposes six main landmarks.
Landmark user-generated data (e.g., photographs, videos, blogs, mail messages)
Landmark values (e.g., outliers)
A variety of methods for clustering photographs have been proposed [12, 13]. A simple method of clustering using only the creation time was applied to photographs in our prototype. Photographs, each of which is the closest to the center of a cluster, are considered to be representative photos and displayed as temporal landmarks.
GPS histories are divided with a clustering algorithm using only latitude and longitude. Each center of the clusters is considered to be a location landmark. Also, daily living areas and others can be distinguished by the frequency of appearance of each cluster. Other landmarks are places where people have rarely gone in daily life. Here, we used a simple expectation maximization (EM) algorithm implemented in WEKA  to cluster photographs and GPS histories.
Other candidates for landmarks are human landmarks. These include family members who frequently appear in photographs, colleagues who frequently communicate, old friends who meet after a long time, and pop stars whose songs are very often listened to.
Landmarks of tags are defined by the frequency of tags that are assigned to each item of personal data. A tag that has been in heavy use during a period of time is a candidate for a landmark. A tag that has rarely been used during a long period of time is also a candidate for a landmark.
Outliers are candidates for landmarks in time-series data, such as home energy use, the number of steps walked, and histories of body weight. Data that exceed pre-defined or user-defined thresholds are also candidates. Consequently, we often go out on days when we walk more steps than on other days and such landmarks help us find special events.
Other landmarks are public landmarks, which include shocking public news, bestsellers, blockbuster films, and annual rankings of top Web-search words. We can recall our own experiences on those days from these landmarks.
Figure 3 outlines exploration using the zooming user interface we propose, which is a kind of zooming user interface [8–10]. Users control the scale of the view to change the time intervals. The time intervals are shortened by zooming in and extended by zooming out. We can also scroll right and left or onto the next and previous time intervals. Summaries, landmarks, and visual forms are changed appropriately with changes in temporal scales or intervals, where visual forms include text labels and charts.
Landmarks contain representative data within a period of time. When users click on landmarks, related personal data appear. In Figure 4, since landmark 'M2' is representative of data 'M21 ~ M27', these data appear when landmark 'M2' is clicked.
The concepts and user interface we propose were implemented in a prototype system. It was applied to nine types of personal data: photographs, GPS history, microblogging, schedules, web mail and SMS text messages, telephone call history on smart phones, numbers of steps walked per day as measured with pedometer, body weight measured every day, and home energy cost and use.
Parts of these personal data were collected from Web services, such as Flickr , Twitter , Gmail, and Google calendar. The other data were obtained from mobile devices and entered manually. We used iPhone and iPad as client mobile devices. iPhone was mainly used to collect personal data including GPS histories. iPad was mainly used to explore personal data with native-application user interfaces.
News topics (e.g., those from Yahoo! News) were used as one of the public landmarks.
This prototype was implemented mainly for demonstrating a feasibility of visualization and interaction with heterogeneous personal data. It basically used over 15,000 pieces of personal data from a test user and a lot of data from public including Flickr and Twitter. They were collected for more than a year.
Figure 5 has a system overview. Server-side modules were implemented using JavaTMand the MySQL database. The 'personal data collection' module was used to collect personal data from various Web services. The role of the 'data mining' module was to execute data clustering and to calculate outliers that will be described later. The 'request handler' module was used to handle client requests and make responses by retrieving personal data. Client applications including 'data explorer' and 'GPS data collection' modules for iPhone and iPad were native applications implemented in Objective-C. This prototype provided several views for exploring personal data, such as map views, calendar views, and digest views.
The map view displays personal data according to their locations as shown in Figure 6. Unfortunately, only a few types of data could be automatically obtained from location information. Therefore, the location information (i.e., longitude and latitude) of personal data was approximately calculated by matching timestamps to GPS histories. Only representative locations were displayed on an initial display based on the result of clustering by latitude and longitude. The representative location was defined as the center of each cluster and it was one item of landmark information. GPS histories gradually appeared while zooming in on an area on a map, and personal data related to the area were displayed.
A calendar view provides a familiar view as is usually seen in a schedule book. Users can switch from yearly views, monthly views, and daily views. Figure 7 is a screenshot of a monthly view in a calendar view. An area corresponding to the day displays some personal data on the day. The right of the screen lists personal data on the selected day, when a user clicks one of these days.
Digest views were implemented in the temporal zooming user interface we propose. Figure 8 has a screenshot of a digest view. Photographs, visual charts, representative locations, and home energy costs are displayed at the top of the view, which is the main view. The photograph with the highlighted border is a landmark. Some text tags that characterize personal data during the period of time are displayed at the bottom of the view. Here, the tags are visualized as a tag cloud interface. These tags are also landmarks. When users click tags, related data on the main view are highlighted.
A digest view initially displays a summary of personal data on a given date and time scale as shown in Figure 9, which represents the hierarchy of a digest view in this prototype. The others appear while interacting with the digest view. For example, related photographs appear when the landmark photograph is clicked as was previously explained. Figure 10 shows a screenshot where the figure at left is an initial view that is a summary for May 2010. The figure at right shows another view for May 2010 after related data have appeared.
Public landmarks that are related to the period of time are displayed on the right of the main view. News topics during the period of time have been displayed in this example.
Figure 11 has screenshots of zooming operations. After users zoom in on a monthly view, a daily view appears. The day in the daily view corresponds to the date in the center data in the monthly view while zooming in. After they zoom out of a monthly view, a yearly view appears. The year in the yearly view corresponds to the year in the date in the monthly view. Moreover, the view changes into the display for the previous month by flicking the view to the right and to the next month by flicking it to the left. Of course, users can move to the daily view by clicking the data on a monthly view. The daily view corresponds to the date on clicked personal data.
One of the other landmarks is an outlier value for time-series data, such as the number of steps walked, home energy use, and body weight. Figure 12 is a screenshot of an outlier of the number of steps walked in a month. When users click the highlighted bar that indicates an outlier on the chart, the daily view for the corresponding date appears. Since the user in this example went on a picnic, the number of steps was more than those walked on other days. It is possible to create landmarks for values in data greater than a threshold to track records, such as those on body weight, blood pressure, and savings.
Figure 13 shows a variety of views for time-series data such as the number of steps walked. As previously described, a view is changed and determined depending on the time scale. In the figure, (1) indicates the number of steps walked per day specified by the text label, (2) indicates the number walked everyday per month specified by the bar chart, and (3) indicates the average number walked per day for a year specified by the text label. Future work is for an appropriate view to be automatically selected according to personal data and the time scale.
In Figure 14, when a user selects a photograph on a daily view, photographs that other people took on the same day and place are shown. Users seem to find new facts or reminisce about the past from other people's personal data. Here, only an example of photos being shared is described. Other shared data should create possibilities of people communicating with one another and facilitate the recall of fond memories.
Study and future work
Access control was not extensively studied in our current research. We need to safely manage permission for metadata and information on authorization. Since personal data are collected from diverse services, permission to use data is different from the original and complicated. Therefore, important work for the future is to study the management of permission and authorizations including research on OpenID  and OAuth .
Further study on summaries and landmarks is another important area for future work. Clustering algorithms for the content of data and attributes other than timestamps and locations should be studied. A search function is also necessary to enable explicit memories to be quickly and easily found. Taking into consideration the studies reported in this paper, we have to study several search functions, such as temporal, geographical, keyword, and spatio-temporal searches.
Social functions including synchronous and asynchronous communications are also important. Asynchronous communications using personal data acquired previously have vast potential in the future.
It is difficult to compare with existing approaches, because most of them are closed systems that we cannot try to evaluate them. Limited information could be obtained through single media, such as photographs. Data collected only from single device, such as PC, lack a variety of user experiences. In our research, not only photographs but also time series data, text messages, news topics, tags and combination of them facilitate a memory recall. Also, diverse perspectives are provided to users from a variety of media, different viewpoints (i.e., time, location, and people), and several landmarks.
Of course, detailed evaluations by users are very significant future works. Through developing the prototype and trials, the more the types of aggregated data increased and the more anxiety about information leaks and invasions of privacy increased. We will conduct a wide range of user tests with considering privacy issues deeply.
MyLifeBits is a system for storing lifetime data on a database . It stores data from personal computers and photos taken by SenseCam, which is a mobile device that has a camera module, digital light sensor, temperature sensor, and passive infrared sensor .
Eagle et al. proposed a 'reality mining' system that measured information access and use within different contexts, recognized social patterns in daily user activities, and inferred relationships . They used standard Bluetooth-enabled mobile phones. These researchers focused on collecting data with special devices.
Several user interfaces for memory aids have been proposed. iCLIPS provides a search user interface for logs from personal computers and photos taken by SenseCam . Visual Augmented Memory (VAM) tried to show cues that were who (face), where (room), when (timestamp), and what (any visible action) through facial recognition from images stored on a mobile computer . Autoalbum automatically generated a photo album by clustering photos based on the time they were created and the order in which they were taken .
Experience Explorer is a personal computer client that represents user data in a time-oriented manner . All user activities (e.g., content generated, phone calls, tracks, music that was listened to) on mobile phones appear linearly arranged under user names. Lines of the user's friends appear next.
MemoryLens Browser employs inferences about landmarks in visualizations for browsing files and appointments . Landmarks were predicted from a user's calendar data and computed as atypical organizers, atypical attendees, and atypical locations. To compute the value of locations atypical for events, they computed the number of times each location had appeared in a user's calendar over a fixed period of time.
The Stuff I've Seen (SIS) interface provides an integrated view of files on personal computers [24, 25]. The files were filtered with five fields of document titles, dates, ranks, authors, and mailtos.
Ringel et al. explained personal landmarks and public landmarks . Personal landmarks were important calendar events and first photos taken on given days. Public landmarks were national holidays and important news events. Only files on personal computers were used and the time scale for landmarks was fixed in their research. In our research, several landmarks were suggested and appropriate landmarks were presented with changing time scales.
PERSONE is a Web-based life log media browser in which videos and audio are gathered by a special gadget. It can be browsed with a conventional timeline view and a map view .
Zheng et al. proposed a recommendation system that recommends activities using GPS histories and also recommends locations using user activities . This seems useful in limited situations, because only small parts of personal data are used to analyze user activity patterns.
These researches tried to develop special devices, several visualizations and some activity analyses for a challenging and ambitious objective that is to collect and store all data of life and experiences. Since mobile communications and sensor networks such as IOT(Internet of Things) are becoming popular in our daily life, we have to study more natural and easier ways to collect and use personal logs.
Our approach expands previous researches and gives importance to help users recall and reminisce past events by integrating a variety of personal data in daily life with non-specialized devices and natural ways.
A study of the exploration of personal data was explained in this paper. A variety of viewpoints, views, and a temporal zooming user interface was described. Summaries and landmarks for memory cues were also presented. They were, e.g., representative photographs, outliers of time-series data, and locations. The methods we proposed enable users to recall and reminisce their memories and experiences.
Also, a prototype system in which our concepts were implemented was presented. It could be used to explore personal data including photographs, email messages, GPS histories, Tweets, histories of body weight, and home energy use. Further, a variety of personal data have to be integrated to study other views in the prototype. Detailed user evaluations and studies of other types of summaries and landmarks will be an important focus in future work.
- GPS :
Global Positioning System
- WWW :
World Wide Web
- SMS :
Short Message Service
- ZUI :
Zooming User Interface
Doherty AR, Gurrin C, Jones GJF, Smeaton AF (Eds): In Proceedings of the ECIR 2010 workshop on information access for personal media archives: 28 March 2010; Milton Keynes, UK. online doras.dcu.ie/15373/2010
Sellen A, Whittaker S: Beyond Total Capture: A Constructive Critique of Lifelogging. Communications of the ACM 2010,53(5):70–77. 10.1145/1735223.1735243
Bell G, Gemmell J: Total Recall. New York: DUTTON; 2009.
Gemmell J, Bell G, Lueder R: MyLifeBits: A Personal Database for Everything. Communications of the ACM 2006, 49: 88–95.
Wurman RS: Information Anxiety 2. Indianapolis: Que; 2000.
Cockburn A, Karlson A, Bederson B: A Review of Overview+Detail, Zooming, and Focus+Context Interfaces. ACM Computing Surveys 2008., 41:
Perlin K, Fox D: Pad: An Alternative Approach to the Computer Interface. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques: 2–6 August 1993; Anaheim, CA, USA. ACM; 1993:57–64.
Bederson BB, Hollan JD: Pad++: A Zooming Graphical Interface for Exploring Alternate Interface Physics. In Proceedings of the 7th Annual ACM Symposium on User Interface and Software Technology: 2–4 November 1994; Marina del Rey, CA, USA. ACM; 1994:17–26.
Horvitz E, Dumais S, Koch P: Learning Predictive Models of Memory Landmarks. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society: 5–7 August 2004; Chicago. Cognitive Science Society; 2004:583–588.
Platt JC: AutoAlbum:Clustering Digital Photographs using Probabilistic Model Merging. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries: 16 June 2000; South Carolina, USA. IEEE; 2000:96–100.
Graham A, Garcia-Molina H, Paepcke A, Winograd T: Time as Essence for Photo Browsing Through Personal Digital Libraries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries: 13–17 July 2002; Portland, USA. ACM; 2002:326–335.
Weka - Data Mining with Open Source Machine Learning Software in Java[http://www.cs.waikato.ac.nz/ml/weka/]
Gemmell J, Williams L, Wood K, Lueder R, Bell G: Passive Capture and Ensuing Issues for a Personal Lifetime Store. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences: 15 Oct 2004; New York. ACM; 2004:48–55.
Eagle N, Pentland A: Reality mining:sensing complex social systems. Personal and Ubiquitous Computing 2006,10(4):255–268. 10.1007/s00779-005-0046-3
Chen Y, Jones GJF: Augmenting Human Memory using Personal Lifelogs. In Proceedings of the 1st Augmented Human International Conference: 2–4 April 2010; Megeve, France. ACM; 2010.
Farringdon J, Oni V: Visual Augmented Memory(VAM). In Proceedings of the 4th International Symposium on Wearable Computers: 18–21 Oct 2000; Atlanta. IEEE; 2000:167–168.
Belimpasakis P, Roimela K, You Y: Experience Explorer: a Life-Logging Platform Based on Mobile Context Collection. In Proceedings of the 3rd International Conference on Next Generation Mobile Applications, Services and Technologies: 15–18 Sep 2009; Cardiff, UK. IEEE; 2009:77–82.
Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC: Stuff I've Seen: A System for Personal Information Retrieval and Re-Use. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: 28 July - 1 Aug 2003; Toronto. ACM; 2003:72–79.
Cutrell E, Dumais ST, Teevan J: Searching to Eliminate Personal Information Management. Communications of the ACM 2006, 49: 58–64.
Ringel M, Cutrell E, Dumais S, Horvitz E: Milestones in Time. In Proceedings of the 9th IFIP TC13 International Conference on Human-Computer Interaction: 1–5 Sep 2003; Zurich, Switzerland. IFIP; 2003:184–191.
Kim IJ, Ahn SC, Ko H, Kim HG: PERSONE:Personalized Experience Recoding and Searching On Networked Environment. In Proceedings of the 3rd ACM Workshop on Continuous Archival and Retrieval of Personal Experiences: 27 Oct 2006; Santa Barbara, USA. ACM; 2006:49–54.
Zheng VW, Zheng Y, Xie X, Yang Q: Collaborative Location and Activity Recommendations with GPS History Data. In Proceedings of the 19th International World Wide Web Conference: 26–30 April 2010; Raleigh, NC, USA. ACM; 2010:1029–1038.
The author declares that they have no competing interests.
TT carried out whole work of the paper.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.