- Twitter’s openness for public consumption, its broad popularity and well documented API makes it an ideal starting point for social web mining.
- Tweets are easy to make and it is not surprising that many tweets reflect events happening in near real time allowing us to quickly glean useful information from important events that are happening.
- Whenever we collect data for analysis, it is always important to avoid collecting a bias sample and in this aspect twitter is perfect. Twitter’s following mechanism links people in a variety of ways, ranging from short conversational dialogues to interest graphs that connect people and topics they care about and ultimately represent a broad cross-section of society at an international level.
- Sites like Facebook and LinkedIn require the mutual acceptance of a connection between users while Twitter’s following mechanism allows you to keep up with the happenings of any users/topics! (undirected social graph vs directed social graphs)
- The 140 chars limitation per tweet means that each tweet carries concise information which makes it easy to infer the context of the words when doing sentiment analysis etc compared to other more verbose forms of social media like blogs.
The Interest graph can be described as a way of modelling connections between people and their interests. It allows the measuring of correlations between things for the objective of making intelligent recommendations like who to follow, what to purchase etc
What exactly is Twitter and what is a tweet?
Twitter is essentially a realtime highly social microblogging service that allows users to post short status updates, called tweets, that appear on timelines.
Tweets are limited to 140 characters of content that contain 1 or more entities and reference one or more places that map to locations in the real world.
To be more specific, tweets come bundled with 2 notable pieces of metadata: entities and places.
Entities: hashtags, URLs and various media
Places:Locations in the real world that may be attached to a tweet. It can be the actual location in which the tweet was made or a reference to a place described in a tweet
Timelines are chronologically sorted collections of tweets and they only display tweets that are deemed to be noteworthy according to certain metrics(number of retweets etc) .
When you log into twitter, the home timeline is what you will see and it displays tweets from users you are following.
A user timeline however only displays collection of tweets from that particualr user.
To find out what a particular user’s home timeline looks like, we can simply add a /following suffix to the url!
Timelines are essentially collections of tweets with relatively low velocity while streams are samples of public tweets flowing through twitter in realtime. The public firehose of all tweets has been known to peak at hundreds of thousands of tweets per minute during events with particularly wide interest
Twitter provides protocols to obtain small random sample of the public timeline that provides filterable access to enough public data for API developers to develop useful and powerful applications.