final-draft

2021-05-24 16:16:42 +12:00 · 2021-05-24 16:16:42 +12:00 · f133567e76
parent 456315a6f5
commit f133567e76
2 changed files with 69 additions and 48 deletions
--- a/dataframe.png
+++ b/dataframe.png
--- a/index.html
+++ b/index.html
@ -115,6 +115,7 @@
 					- Access API
 					- Channel upload playlist
 					- Video statistics
 					- `pandas` dataframe
 				</section>
 				<section data-markdown>
 					### 4. Get YouTube video statistics
@ -132,14 +133,15 @@
 					```
 				</section>
-				<section>
+				<section data-markdown>
-					<pre><code data-line-numbers="3|5-16|17-18|18-29|30-32"># tubestates/youtube_api.py
+					```python [|3|5-16|17-18|20-29|30-32]
 					# tubestates/youtube_api.py
-upload_playlist_ID = channel_data['upload_playlist_ID']
+					upload_playlist_ID = channel_data['upload_playlist_ID']
-video_response = []
+					video_response = []
-next_page_token = None
+					next_page_token = None
-while True:
+					while True:
 					    # obtaining video ID + titles
 					    playlist_request = self.youtube.playlistItems().list(
 						    part='snippet,contentDetails',
@ -163,9 +165,12 @@ while True:
 					    if next_page_token is None:
 						break
-df = pd.json_normalize(video_response, 'items')
+					df = pd.json_normalize(video_response, 'items')
-return df
+					return df
-					</code></pre>
+				</section>
 				<section data-markdown>
 					### Video statistics
 					![](dataframe.png)
 				</section>
 				<section data-markdown>
 					## How does TubeStats work?
@ -202,7 +207,7 @@ return df
 				</section>	
 				<section data-markdown>
 					## 6. Testing
-					```python [|16-20]
+					```python [|15-20]
 					# tests/tests_youtube_api.py
 					from tubestats.youtube_api import create_api, YouTubeAPI
 					from tests.test_settings import set_channel_ID_test_case
@ -344,8 +349,13 @@ return df
 				</section>
 				<section data-markdown>
 					## Somethings I would like to discuss
 					- DataFrame and memory
 					- Error handling
 					- Async?
 				</section>
 				<section data-markdown>
 					### DataFrame immutability and memory?
 					```python []
 					df = self.df
 					df = df[['snippet.publishedAt',
 					    'snippet.title',
@ -355,16 +365,27 @@ return df
 					df = df.fillna(0)
        # changing dtypes
 					df = df.astype({'statistics.viewCount': 'int',
 						...
 					    'statistics.commentCount': 'int',})
        # applying natural log to view count as data is tail heavy
 					df['statistics.viewCount_NLOG'] = df['statistics.viewCount'].apply(lambda x : np.log(x))
 					df = df.sort_values(by='snippet.publishedAt_REFORMATED', ascending=True) 
        return DataFrame)	
 				</section>
 				<section data-markdown>
 					## What did I learn
 					- Project based learning
 					- 'minimal viable product'
 				</section>
 				<section data-markdown>
 					## Conclusion
 					- Analysing consistency
 					- YouTube Data API --> Heroku
 					- Share your work!
 				</section>
 				<section data-markdown>
 					## Acknowledgements
 					- Menno
 			</div>
 		</div>