Python Programming Tutorials

Dynamically Graphing Terms for Sentiment - Sentiment Analysis GUI with Dash and Python p.5

Welcome to part 5 of our sentiment analysis application with Dash and Python tutorial. Up to this point, we've got the application tracking sentiment live, but what we'd like to be able to do is use the user interface to type in whatever word(s) we'd like to track.

In order to do this, we need to first add an input field in our layout:

dcc.Input(id='sentiment_term', value='olympic', type='text'),

Inside of:

app.layout = html.Div(
    [   html.H2('Live Twitter Sentiment'),
        dcc.Input(id='sentiment_term', value='olympic', type='text'),
        dcc.Graph(id='live-graph', animate=False),
        dcc.Interval(
            id='graph-update',
            interval=1*1000
        ),

    ]
)

Our previous callback was just for the update interval and the output to the live graph. Now we want to include the input:

@app.callback(Output('live-graph', 'figure'),
              [Input(component_id='sentiment_term', component_property='value')],
              events=[Event('graph-update', 'interval')])

Now we pass that input into the wrapped function:

def update_graph_scatter(sentiment_term):

Next, we want to make our query use the term typed into the search box. This said, we need to watch out for SQL injection. Never trust your users. Even if 99.99% of your users are well-intentioned, it only takes 1 person to drop your table. SQLite does offer some options for doing this, but we're also trying to use Pandas. Luckily, pandas has support for parameters too. Here's an example, using our variable:

df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',))

Combining all of this:

import dash
from dash.dependencies import Output, Event, Input
import dash_core_components as dcc
import dash_html_components as html
import plotly
import random
import plotly.graph_objs as go
from collections import deque
import sqlite3
import pandas as pd
import time
#popular topics: google, olympics, trump, gun, usa

app = dash.Dash(__name__)
app.layout = html.Div(
    [   html.H2('Live Twitter Sentiment'),
        dcc.Input(id='sentiment_term', value='olympic', type='text'),
        dcc.Graph(id='live-graph', animate=False),
        dcc.Interval(
            id='graph-update',
            interval=1*1000
        ),

    ]
)

@app.callback(Output('live-graph', 'figure'),
              [Input(component_id='sentiment_term', component_property='value')],
              events=[Event('graph-update', 'interval')])
def update_graph_scatter(sentiment_term):
    try:
        conn = sqlite3.connect('twitter.db')
        c = conn.cursor()
        df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',))
        df.sort_values('unix', inplace=True)
        df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/5)).mean()

        df.dropna(inplace=True)

        X = df.unix.values[-100:]
        Y = df.sentiment_smoothed.values[-100:]

        data = plotly.graph_objs.Scatter(
                x=X,
                y=Y,
                name='Scatter',
                mode= 'lines+markers'
                )

        return {'data': [data],'layout' : go.Layout(xaxis=dict(range=[min(X),max(X)]),
                                                    yaxis=dict(range=[min(Y),max(Y)]),
                                                    title='Term: {}'.format(sentiment_term))}

    except Exception as e:
        with open('errors.txt','a') as f:
            f.write(str(e))
            f.write('\n')

if __name__ == '__main__':
    app.run_server(debug=True)

Okay, at this point, we're off to a pretty decent start. At least for me, this runs pretty well. For live sentiment, going back ~1000 datapoints should be more than enough. Right now, we're pulling 1,000 datapoints, using a 200MA (if there are 1,000 datapoints), then showing the latest 100 only. This is probably pointless, so maybe instead to keep the query as small as possible, we could instead do:

        df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 200", conn ,params=('%' + sentiment_term + '%',))

At least for now, this makes more sense. Our unix time stamps are also ... not attractive at all. We can fix that up:

df['date'] = pd.to_datetime(df['unix'],unit='ms')
df.set_index('date', inplace=True)

Now, we can instead do:

        conn = sqlite3.connect('twitter.db')
        c = conn.cursor()
        df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 200", conn ,params=('%' + sentiment_term + '%',))
        df.sort_values('unix', inplace=True)
        df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/2)).mean()

        df['date'] = pd.to_datetime(df['unix'],unit='ms')
        df.set_index('date', inplace=True)
        df.dropna(inplace=True)
        X = df.index
        Y = df.sentiment_smoothed

Giving us:

python data visualization applications with dash tutorials

What if we wanted to go back further? Say we wanted to show 10,000 datapoints, either because the term is high volume/we want more history?

We really shouldn't be plotting more than a few hundred datapoints on a live graph, rendering this will significantly impact performance. One option we have is to use Pandas' resample. We could resample to one second with:

        df = df.resample('1s').mean()

Full function something like:

@app.callback(Output('live-graph', 'figure'),
              [Input(component_id='sentiment_term', component_property='value')],
              events=[Event('graph-update', 'interval')])
def update_graph_scatter(sentiment_term):
    try:
        conn = sqlite3.connect('twitter.db')
        c = conn.cursor()
        df = pd.read_sql("SELECT * FROM sentiment WHERE tweet LIKE ? ORDER BY unix DESC LIMIT 1000", conn ,params=('%' + sentiment_term + '%',))
        df.sort_values('unix', inplace=True)
        df['sentiment_smoothed'] = df['sentiment'].rolling(int(len(df)/2)).mean()

        df['date'] = pd.to_datetime(df['unix'],unit='ms')
        df.set_index('date', inplace=True)

        df = df.resample('1min').mean()
        df.dropna(inplace=True)
        X = df.index
        Y = df.sentiment_smoothed

        data = plotly.graph_objs.Scatter(
                x=X,
                y=Y,
                name='Scatter',
                mode= 'lines+markers'
                )

        return {'data': [data],'layout' : go.Layout(xaxis=dict(range=[min(X),max(X)]),
                                                    yaxis=dict(range=[min(Y),max(Y)]),
                                                    title='Term: {}'.format(sentiment_term))}

    except Exception as e:
        with open('errors.txt','a') as f:
            f.write(str(e))
            f.write('\n')

Without the resample:

With the resample:

The next tutorial:

Intro - Data Visualization Applications with Dash and Python p.1
Interactive User Interface - Data Visualization GUIs with Dash and Python p.2
Dynamic Graph based on User Input - Data Visualization GUIs with Dash and Python p.3
Live Graphs - Data Visualization GUIs with Dash and Python p.4
Vehicle Data App Example - Data Visualization GUIs with Dash and Python p.5
Out of the Box Sentiment Analysis options with Python using VADER Sentiment and TextBlob
Streaming Tweets and Sentiment from Twitter in Python - Sentiment Analysis GUI with Dash and Python p.2
Reading from our sentiment database - Sentiment Analysis GUI with Dash and Python p.3
Live Twitter Sentiment Graph - Sentiment Analysis GUI with Dash and Python p.4
Dynamically Graphing Terms for Sentiment - Sentiment Analysis GUI with Dash and Python p.5
Deploy Dash App to a VPS web server - Data Visualization Applications with Dash and Python p.11