final-draft

This commit is contained in:
Shivan Sivakumaran 2021-05-24 16:16:42 +12:00
parent 456315a6f5
commit f133567e76
2 changed files with 69 additions and 48 deletions

BIN
dataframe.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 749 KiB

View File

@ -115,6 +115,7 @@
- Access API - Access API
- Channel upload playlist - Channel upload playlist
- Video statistics - Video statistics
- `pandas` dataframe
</section> </section>
<section data-markdown> <section data-markdown>
### 4. Get YouTube video statistics ### 4. Get YouTube video statistics
@ -132,14 +133,15 @@
``` ```
</section> </section>
<section> <section data-markdown>
<pre><code data-line-numbers="3|5-16|17-18|18-29|30-32"># tubestates/youtube_api.py ```python [|3|5-16|17-18|20-29|30-32]
# tubestates/youtube_api.py
upload_playlist_ID = channel_data['upload_playlist_ID'] upload_playlist_ID = channel_data['upload_playlist_ID']
video_response = [] video_response = []
next_page_token = None next_page_token = None
while True: while True:
# obtaining video ID + titles # obtaining video ID + titles
playlist_request = self.youtube.playlistItems().list( playlist_request = self.youtube.playlistItems().list(
part='snippet,contentDetails', part='snippet,contentDetails',
@ -163,9 +165,12 @@ while True:
if next_page_token is None: if next_page_token is None:
break break
df = pd.json_normalize(video_response, 'items') df = pd.json_normalize(video_response, 'items')
return df return df
</code></pre> </section>
<section data-markdown>
### Video statistics
![](dataframe.png)
</section> </section>
<section data-markdown> <section data-markdown>
## How does TubeStats work? ## How does TubeStats work?
@ -202,7 +207,7 @@ return df
</section> </section>
<section data-markdown> <section data-markdown>
## 6. Testing ## 6. Testing
```python [|16-20] ```python [|15-20]
# tests/tests_youtube_api.py # tests/tests_youtube_api.py
from tubestats.youtube_api import create_api, YouTubeAPI from tubestats.youtube_api import create_api, YouTubeAPI
from tests.test_settings import set_channel_ID_test_case from tests.test_settings import set_channel_ID_test_case
@ -344,8 +349,13 @@ return df
</section> </section>
<section data-markdown> <section data-markdown>
## Somethings I would like to discuss ## Somethings I would like to discuss
- DataFrame and memory
- Error handling
- Async?
</section> </section>
<section data-markdown> <section data-markdown>
### DataFrame immutability and memory?
```python []
df = self.df df = self.df
df = df[['snippet.publishedAt', df = df[['snippet.publishedAt',
'snippet.title', 'snippet.title',
@ -355,16 +365,27 @@ return df
df = df.fillna(0) df = df.fillna(0)
# changing dtypes
df = df.astype({'statistics.viewCount': 'int', df = df.astype({'statistics.viewCount': 'int',
... ...
'statistics.commentCount': 'int',}) 'statistics.commentCount': 'int',})
# applying natural log to view count as data is tail heavy
df['statistics.viewCount_NLOG'] = df['statistics.viewCount'].apply(lambda x : np.log(x)) df['statistics.viewCount_NLOG'] = df['statistics.viewCount'].apply(lambda x : np.log(x))
df = df.sort_values(by='snippet.publishedAt_REFORMATED', ascending=True) df = df.sort_values(by='snippet.publishedAt_REFORMATED', ascending=True)
return DataFrame)
</section> </section>
<section data-markdown>
## What did I learn
- Project based learning
- 'minimal viable product'
</section>
<section data-markdown>
## Conclusion
- Analysing consistency
- YouTube Data API --> Heroku
- Share your work!
</section>
<section data-markdown>
## Acknowledgements
- Menno
</div> </div>
</div> </div>