Welcome to part 5 of our sentiment analysis application with Dash and Python tutorial. Up to this point, we've got the application tracking sentiment live, but what we'd like to be able to do is use the user interface to type in whatever word(s) we'd like to track.
In order to do this, we need to first add an input field in our layout:
dcc.Input(id='sentiment_term', value='olympic', type='text'),
Inside of:
app.layout = html.Div( [ html.H2('Live Twitter Sentiment'), dcc.Input(id='sentiment_term', value='olympic', type='text'), dcc.Graph(id='live-graph', animate=False), dcc.Interval( id='graph-update', interval=1*1000 ), ] )
Our previous callback was just for the update interval and the output to the live graph. Now we want to include the input:
@app.callback(Output('live-graph', 'figure'), [Input(component_id='sentiment_term', component_property='value')], events=[Event('graph-update', 'interval')])
Now we pass that input into the wrapped function:
def update_graph_scatter(sentiment_term):
Next, we want to make our query use the term typed into the search box. This said, we need to watch out for SQL injection. Never trust your users. Even if 99.99% of your users are well-intentioned, it only takes 1 person to drop your table. SQLite does offer some options for doing this, but we're also trying to use Pandas. Luckily, pandas has support for parameters too. Here's an example, using our variable:
df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',))
Combining all of this:
import dash from dash.dependencies import Output, Event, Input import dash_core_components as dcc import dash_html_components as html import plotly import random import plotly.graph_objs as go from collections import deque import sqlite3 import pandas as pd import time #popular topics: google, olympics, trump, gun, usa app = dash.Dash(__name__) app.layout = html.Div( [ html.H2('Live Twitter Sentiment'), dcc.Input(id='sentiment_term', value='olympic', type='text'), dcc.Graph(id='live-graph', animate=False), dcc.Interval( id='graph-update', interval=1*1000 ), ] ) @app.callback(Output('live-graph', 'figure'), [Input(component_id='sentiment_term', component_property='value')], events=[Event('graph-update', 'interval')]) def update_graph_scatter(sentiment_term): try: conn = sqlite3.connect('twitter.db') c = conn.cursor() df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',)) df.sort_values('unix', inplace=True) df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/5)).mean() df.dropna(inplace=True) X = df.unix.values[-100:] Y = df.sentiment_smoothed.values[-100:] data = plotly.graph_objs.Scatter( x=X, y=Y, name='Scatter', mode= 'lines+markers' ) return {'data': [data],'layout' : go.Layout(xaxis=dict(range=[min(X),max(X)]), yaxis=dict(range=[min(Y),max(Y)]), title='Term: {}'.format(sentiment_term))} except Exception as e: with open('errors.txt','a') as f: f.write(str(e)) f.write('\n') if __name__ == '__main__': app.run_server(debug=True)
Okay, at this point, we're off to a pretty decent start. At least for me, this runs pretty well. For live sentiment, going back ~1000 datapoints should be more than enough. Right now, we're pulling 1,000 datapoints, using a 200MA (if there are 1,000 datapoints), then showing the latest 100 only. This is probably pointless, so maybe instead to keep the query as small as possible, we could instead do:
df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 200", conn ,params=('%' + sentiment_term + '%',))
At least for now, this makes more sense. Our unix time stamps are also ... not attractive at all. We can fix that up:
df['date'] = pd.to_datetime(df['unix'],unit='ms') df.set_index('date', inplace=True)
Now, we can instead do:
conn = sqlite3.connect('twitter.db') c = conn.cursor() df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 200", conn ,params=('%' + sentiment_term + '%',)) df.sort_values('unix', inplace=True) df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/2)).mean() df['date'] = pd.to_datetime(df['unix'],unit='ms') df.set_index('date', inplace=True) df.dropna(inplace=True) X = df.index Y = df.sentiment_smoothed
Giving us:
What if we wanted to go back further? Say we wanted to show 10,000 datapoints, either because the term is high volume/we want more history?
We really shouldn't be plotting more than a few hundred datapoints on a live graph, rendering this will significantly impact performance. One option we have is to use Pandas' resample. We could resample to one second with:
df = df.resample('1s').mean()
Full function something like:
@app.callback(Output('live-graph', 'figure'), [Input(component_id='sentiment_term', component_property='value')], events=[Event('graph-update', 'interval')]) def update_graph_scatter(sentiment_term): try: conn = sqlite3.connect('twitter.db') c = conn.cursor() df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',)) df.sort_values('unix', inplace=True) df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/2)).mean() df['date'] = pd.to_datetime(df['unix'],unit='ms') df.set_index('date', inplace=True) df = df.resample('1min').mean() df.dropna(inplace=True) X = df.index Y = df.sentiment_smoothed data = plotly.graph_objs.Scatter( x=X, y=Y, name='Scatter', mode= 'lines+markers' ) return {'data': [data],'layout' : go.Layout(xaxis=dict(range=[min(X),max(X)]), yaxis=dict(range=[min(Y),max(Y)]), title='Term: {}'.format(sentiment_term))} except Exception as e: with open('errors.txt','a') as f: f.write(str(e)) f.write('\n')
Without the resample:
With the resample: