final-draft

This commit is contained in:
Shivan Sivakumaran 2021-05-24 16:16:42 +12:00
parent 456315a6f5
commit f133567e76
2 changed files with 69 additions and 48 deletions

BIN
dataframe.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 749 KiB

View File

@ -115,6 +115,7 @@
- Access API
- Channel upload playlist
- Video statistics
- `pandas` dataframe
</section>
<section data-markdown>
### 4. Get YouTube video statistics
@ -132,14 +133,15 @@
```
</section>
<section>
<pre><code data-line-numbers="3|5-16|17-18|18-29|30-32"># tubestates/youtube_api.py
<section data-markdown>
```python [|3|5-16|17-18|20-29|30-32]
# tubestates/youtube_api.py
upload_playlist_ID = channel_data['upload_playlist_ID']
upload_playlist_ID = channel_data['upload_playlist_ID']
video_response = []
next_page_token = None
while True:
video_response = []
next_page_token = None
while True:
# obtaining video ID + titles
playlist_request = self.youtube.playlistItems().list(
part='snippet,contentDetails',
@ -163,9 +165,12 @@ while True:
if next_page_token is None:
break
df = pd.json_normalize(video_response, 'items')
return df
</code></pre>
df = pd.json_normalize(video_response, 'items')
return df
</section>
<section data-markdown>
### Video statistics
![](dataframe.png)
</section>
<section data-markdown>
## How does TubeStats work?
@ -202,7 +207,7 @@ return df
</section>
<section data-markdown>
## 6. Testing
```python [|16-20]
```python [|15-20]
# tests/tests_youtube_api.py
from tubestats.youtube_api import create_api, YouTubeAPI
from tests.test_settings import set_channel_ID_test_case
@ -344,8 +349,13 @@ return df
</section>
<section data-markdown>
## Somethings I would like to discuss
- DataFrame and memory
- Error handling
- Async?
</section>
<section data-markdown>
### DataFrame immutability and memory?
```python []
df = self.df
df = df[['snippet.publishedAt',
'snippet.title',
@ -355,16 +365,27 @@ return df
df = df.fillna(0)
# changing dtypes
df = df.astype({'statistics.viewCount': 'int',
...
'statistics.commentCount': 'int',})
# applying natural log to view count as data is tail heavy
df['statistics.viewCount_NLOG'] = df['statistics.viewCount'].apply(lambda x : np.log(x))
df = df.sort_values(by='snippet.publishedAt_REFORMATED', ascending=True)
return DataFrame)
</section>
<section data-markdown>
## What did I learn
- Project based learning
- 'minimal viable product'
</section>
<section data-markdown>
## Conclusion
- Analysing consistency
- YouTube Data API --> Heroku
- Share your work!
</section>
<section data-markdown>
## Acknowledgements
- Menno
</div>
</div>