<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Arquivo de Aprendizagem de máquina - Ramon Domingos Blog</title>
	<atom:link href="https://ramondomingos.com.br/tag/aprendizagem-de-maquina/feed/" rel="self" type="application/rss+xml" />
	<link>https://ramondomingos.com.br/tag/aprendizagem-de-maquina/</link>
	<description>Conteúdo sobre tecnologia e engenharia de software.</description>
	<lastBuildDate>Tue, 17 Oct 2023 14:32:56 +0000</lastBuildDate>
	<language>pt-BR</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://ramondomingos.com.br/wp-content/uploads/2023/09/cropped-Logotipo_bold_minimalista_amarelo_para_blog__1_-removebg-preview-32x32.png</url>
	<title>Arquivo de Aprendizagem de máquina - Ramon Domingos Blog</title>
	<link>https://ramondomingos.com.br/tag/aprendizagem-de-maquina/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Aplicando Machine Learning no dataset sobre Doenças cardíacas</title>
		<link>https://ramondomingos.com.br/aplicando-machine-learning-no-dataset-sobre-doencas-cardiacas/</link>
					<comments>https://ramondomingos.com.br/aplicando-machine-learning-no-dataset-sobre-doencas-cardiacas/#respond</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Mon, 16 Oct 2023 17:13:38 +0000</pubDate>
				<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=191</guid>

					<description><![CDATA[<p>O infarto do miocárdio, ou ataque cardíaco, é a morte das células de uma região do músculo do coração por conta da formação de um coágulo que interrompe o fluxo sanguíneo de forma súbita e intensa. Fonte: ALVES, B. / O. / O.-M. Ataque cardíaco (infarto) &#124; Biblioteca Virtual em Saúde MS. Disponível em:&#160;https://bvsms.saude.gov.br/ataque-cardiaco-infarto/#:~:text=O%20infarto%20do%20mioc%C3%A1rdio%2C%20ou. Prever&#8230;</p>
<p>O post <a href="https://ramondomingos.com.br/aplicando-machine-learning-no-dataset-sobre-doencas-cardiacas/">Aplicando Machine Learning no dataset sobre Doenças cardíacas</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>O infarto do miocárdio, ou ataque cardíaco, é a morte das células de uma região do músculo do coração por conta da formação de um coágulo que interrompe o fluxo sanguíneo de forma súbita e intensa.</p>



<p>Fonte: ALVES, B. / O. / O.-M. Ataque cardíaco (infarto) | Biblioteca Virtual em Saúde MS. Disponível em:&nbsp;<a href="https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fbvsms.saude.gov.br%2Fataque-cardiaco-infarto%2F%23%3A%7E%3Atext%3DO%2520infarto%2520do%2520mioc%25C3%25A1rdio%252C%2520ou" target="_blank" rel="noreferrer noopener">https://bvsms.saude.gov.br/ataque-cardiaco-infarto/#:~:text=O%20infarto%20do%20mioc%C3%A1rdio%2C%20ou</a>.</p>
</blockquote>



<p>Prever uma possível doença cardíaca com base no histórico dos pacientes é ajudar a pessoa se cuidar antes de ter um sintoma, ou adoecer com sequelas. Analisar dados de saúde é uma ação bastante delicada, não podemos expor os pacientes de nenhuma forma, além de algumas vezes ser preciso um especialista para ajudar essa interpretação de forma mais eficaz.</p>



<p>Como de costume, os exemplos desse post estão no <a href="https://drive.google.com/file/d/1sBJ6w-Sege6ryUUm2swsyifOUn5BiFVM/view?usp=sharing">colab</a>.</p>



<p>Nesse post iremos realizar o treinamento com os algoritmos:  <strong>Support Vector Machine<br>(SVM), Random Forest (RF), Logistic Regress (LR), K-Nearest Neighbor (KNN), Decision Tree (DT)</strong>. Alguns algoritmos foram executados com diferentes parâmetros para chegar em uma configuração com uma boa acurácia.</p>



<h2 class="wp-block-heading">Sobre o dataset</h2>



<p>A base de dados que vamos usar nessa abordagem esta disponível em:  https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset com os seguintes dados:</p>



<figure class="wp-block-table"><table><tbody><tr><td><strong>Coluna</strong></td><td><strong>Descrição</strong></td><td><strong>Valores</strong></td></tr><tr><td>Age</td><td>Idade</td><td>22 a 77 anos.</td></tr><tr><td>Sex</td><td>Sexo</td><td>1: masculino 0: feminino</td></tr><tr><td>cp</td><td>Tipo de dor no peito.</td><td>1 a 4</td></tr><tr><td>trestbps</td><td>Pressão arterial em mm Hg na admissão ao hospital.</td><td>94 a 200</td></tr><tr><td>chol</td><td>Colesterol em mg/dl.</td><td>126 a 564</td></tr><tr><td>fbs</td><td>Glicemia em jejum maior que 120 mg/dl.</td><td>1: verdadeiro  0: falso</td></tr><tr><td>retecg</td><td>Resultados eletrocardiográfico em repouso.</td><td>0 a 2</td></tr><tr><td>thalach</td><td>Frequência cardíaca máxima alcançada.</td><td>71 a 202</td></tr><tr><td>exang</td><td>Angina induzida por exercício.</td><td>1:sim. 0:não</td></tr><tr><td>oldpeak</td><td>Depressão do segmento ST induzida por exercício em relação ao repouso.</td><td>0 a 6.2</td></tr><tr><td>slope</td><td>A inclinação do pico do segmento ST do exercício.</td><td>1 a 3</td></tr><tr><td>ca</td><td>Número de vasos principais coloridos por fluoroscopia.</td><td>0 a 3</td></tr><tr><td>thal</td><td>Dor no peito ou dificuldade para respirar.</td><td>1: normal<br>2: fixo<br>3: reversível</td></tr><tr><td>target</td><td>Indicador se possui ou não doença cardíaca 1</td><td>1: sim 0: não</td></tr></tbody></table></figure>



<h2 class="wp-block-heading">Pré processamento</h2>



<p><strong>Removendo duplicados</strong></p>



<p>Existem 1025 instâncias nesse dataset, após usar a lib <em>profile-report</em>  foi identificado várias instâncias repetidas. Instancias repetidas pode gerar um vício no algoritmo, ja que ele não irá predizer, e sim  replicar um dado visto anteriormente. Removido, usando a função do pandas <em>drop_duplicates()</em>. </p>



<p><strong>Removendo outliers</strong></p>



<p>Gerando uma visualização com bloxPlot, percebemos que existem outliers, e foi usado o Intervalo Interquartil para remove-los. Essa técnica foi comentado em outro post. Consulte <a href="https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/">aqui</a>.</p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="629" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1024x629.png" alt="" class="wp-image-192" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1024x629.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-300x184.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-768x472.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1536x944.png 1536w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image.png 1572w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Treinando os modelos</h2>



<p><strong>Base de testes:</strong></p>



<p>É muito importante separar a base em treino e teste. Para que um dado que esteja no treino, não esteja no teste. O scikit-learn, tem uma função que realiza isso:</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="y = df[&quot;target&quot;]
X = df.drop('target',axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state = 0)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">y </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">X </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">drop</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">axis</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> X_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_test </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">train_test_split</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">test_size</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">0.20</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">0</span><span style="color: #ECEFF4">)</span></span></code></pre></div>



<p><strong>Decision Tree:</strong></p>



<p>Esse algoritmo ja foi mencionado em outro post ( consulte <a href="https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/">aqui</a> ). Basicamente, cada bifurcação  é uma decisão, e vão sendo feitas, chamadas de nó,  até chegar em uma folha, que é a decisão propriamente dita.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="rf = RandomForestClassifier(n_estimators=20, random_state=12,max_depth=5)
rf.fit(X_train,y_train)
rf_predicted = rf.predict(X_test)
rf_conf_matrix = confusion_matrix(y_test, rf_predicted)
rf_acc_score = accuracy_score(y_test, rf_predicted)
print(&quot;confussion matrix&quot;)
print(rf_conf_matrix)
print(&quot;\n&quot;)
print(&quot;Accuracy of Random Forest:&quot;,rf_acc_score*100,'%\n')
print(classification_report(y_test,rf_predicted))" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">rf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">RandomForestClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">n_estimators</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">20</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">12</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">max_depth</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">5</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_predicted </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> rf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_conf_matrix </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">confusion_matrix</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> rf_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_acc_score </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> rf_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">confussion matrix</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">rf_conf_matrix</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Accuracy of Random Forest:</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">rf_acc_score</span><span style="color: #81A1C1">*</span><span style="color: #B48EAD">100</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">%</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">classification_report</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">rf_predicted</span><span style="color: #ECEFF4">))</span></span></code></pre></div>



<p>Accuracy of Random Forest: 84.78260869565217 %</p>



<p><strong>Random Forest<br></strong>Tem uma grande semelhança com o Decision Tree, a diferença é que de forma automatica, se realiza várias árvores, fazendo uma floresta. É uma ótima técnica quando se tem uma grande quantidade de dados e features.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="rf = RandomForestClassifier(n_estimators=20, random_state=12,max_depth=5)
rf.fit(X_train,y_train)
rf_predicted = rf.predict(X_test)
rf_conf_matrix = confusion_matrix(y_test, rf_predicted)
rf_acc_score = accuracy_score(y_test, rf_predicted)
print(&quot;confussion matrix&quot;)
print(rf_conf_matrix)
print(&quot;\n&quot;)
print(&quot;Accuracy of Random Forest:&quot;,rf_acc_score*100,'%\n')
print(classification_report(y_test,rf_predicted))" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">rf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">RandomForestClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">n_estimators</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">20</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">12</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">max_depth</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">5</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_predicted </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> rf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_conf_matrix </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">confusion_matrix</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> rf_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">rf_acc_score </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> rf_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">confussion matrix</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">rf_conf_matrix</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Accuracy of Random Forest:</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">rf_acc_score</span><span style="color: #81A1C1">*</span><span style="color: #B48EAD">100</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">%</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">classification_report</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">rf_predicted</span><span style="color: #ECEFF4">))</span></span></code></pre></div>



<p>Accuracy of Random Forest: 84.78260869565217 %</p>



<p>Interessante ressaltar, que ficou com o mesmo valor que a decision tree.</p>



<p>Decidi então realizar variações nas árvores de decisões, principalmente no critério de classificação e na profundidade máxima.</p>



<p>Através de medições de quanto uma instancia pertence a uma classe, o <strong>gini</strong> faz suas decisões, ja o <strong>entropy</strong>, além disso observa também a desordem dos outros dados.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="k_range = range(1,11)
scores = {}

for k in k_range:
  dtFor = DecisionTreeClassifier(criterion = 'entropy',random_state=0,max_depth = k)
  dtFor.fit(X_train, y_train)
  y_pred = dtFor.predict(X_test)
  scores[k] = accuracy_score(y_test,y_pred)
plt.plot(k_range,list(scores.values()), label='entropy')
for k in k_range:
  dtFor = DecisionTreeClassifier(criterion = 'gini',random_state=0,max_depth = k)
  dtFor.fit(X_train, y_train)
  y_pred = dtFor.predict(X_test)
  scores[k] = accuracy_score(y_test,y_pred)
plt.plot(k_range,list(scores.values()), label='gini')
plt.xlabel('Profundidade da Árvore')
plt.ylabel('% de Acurácia')
plt.legend()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">k_range </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">range</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">11</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">scores </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">{}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">for</span><span style="color: #D8DEE9FF"> k </span><span style="color: #81A1C1">in</span><span style="color: #D8DEE9FF"> k_range</span><span style="color: #ECEFF4">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">  dtFor </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">DecisionTreeClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">criterion</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">entropy</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">0</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">max_depth</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> k</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  dtFor</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  y_pred </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> dtFor</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  scores</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">k</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">y_pred</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">k_range</span><span style="color: #ECEFF4">,</span><span style="color: #88C0D0">list</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">scores</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">values</span><span style="color: #ECEFF4">()),</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">label</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">entropy</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #81A1C1">for</span><span style="color: #D8DEE9FF"> k </span><span style="color: #81A1C1">in</span><span style="color: #D8DEE9FF"> k_range</span><span style="color: #ECEFF4">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">  dtFor </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">DecisionTreeClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">criterion</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">gini</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">0</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9">max_depth</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> k</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  dtFor</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  y_pred </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> dtFor</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  scores</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">k</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">y_pred</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">k_range</span><span style="color: #ECEFF4">,</span><span style="color: #88C0D0">list</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">scores</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">values</span><span style="color: #ECEFF4">()),</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">label</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">gini</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">xlabel</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Profundidade da Árvore</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">ylabel</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&#39;</span><span style="color: #EBCB8B">% d</span><span style="color: #A3BE8C">e Acurácia</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">legend</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<p>Conseguimos ver um gráfico, que inicia com uma ótima acurácia: </p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="576" height="432" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1.png" alt="" class="wp-image-194" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1.png 576w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-1-300x225.png 300w" sizes="(max-width: 576px) 100vw, 576px" /></figure>
</div>


<p>Quando exibimos a árvore visual com apenas 1 nível de profundidade, percebemos que só se observa a feature <strong>thal</strong>, que é a referente a dor no peito, algo muito previsível, provavelmente quem vai ao hospital, a chance de possuir alguma dor, é bastante alta, o ideal era observar outras features.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img decoding="async" width="515" height="389" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-2.png" alt="" class="wp-image-195" style="aspect-ratio:1.3239074550128536;width:479px;height:auto" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-2.png 515w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-2-300x227.png 300w" sizes="(max-width: 515px) 100vw, 515px" /></figure>
</div>


<p></p>



<p>O segundo valor com uma boa acurácia, é o 3 profundidades, e ao plotar de forma visual, percebemos que existem outras observações.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="515" height="389" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-3.png" alt="" class="wp-image-196" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-3.png 515w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-3-300x227.png 300w" sizes="(max-width: 515px) 100vw, 515px" /></figure>
</div>


<p><strong>K-NeighborsClassifier</strong></p>



<p>Esse algoritmo analisa os vizinhos para tomar sua decisão e agrupar os dados. Possui algumas métricas, e podemos varias a quantidade de vizinhos analisados. No estudo foi usado euclidean e Manhattan, varias de 1 a 4 vizinhos, obtendo os seguintes níveis de acurácia.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="576" height="432" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-4.png" alt="" class="wp-image-198" style="aspect-ratio:1.3333333333333333;width:376px;height:auto" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-4.png 576w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-4-300x225.png 300w" sizes="(max-width: 576px) 100vw, 576px" /></figure>
</div>


<p>Então, usando 3 vizinhos e métrica manhattan, obtemos 71% de acurácia.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="knn = KNeighborsClassifier(n_neighbors=3, metric='manhattan')
knn.fit(X_train, y_train)
knn_predicted = knn.predict(X_test)
knn_conf_matrix = confusion_matrix(y_test, knn_predicted)
knn_acc_score_1_neighbors = accuracy_score(y_test, knn_predicted)
print(&quot;confussion matrix&quot;)
print(knn_conf_matrix)
print(&quot;\n&quot;)
print(&quot;Accuracy of K-NeighborsClassifier:&quot;,knn_acc_score_1_neighbors*100,'%\n')
print(classification_report(y_test,knn_predicted))" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">knn </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">KNeighborsClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">n_neighbors</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">metric</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">manhattan</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">knn</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">knn_predicted </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> knn</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">knn_conf_matrix </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">confusion_matrix</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> knn_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">knn_acc_score_1_neighbors </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> knn_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">confussion matrix</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">knn_conf_matrix</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Accuracy of K-NeighborsClassifier:</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">knn_acc_score_1_neighbors</span><span style="color: #81A1C1">*</span><span style="color: #B48EAD">100</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">%</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">classification_report</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">knn_predicted</span><span style="color: #ECEFF4">))</span></span></code></pre></div>



<p>Accuracy of K-NeighborsClassifier: 71.73913043478261 %</p>



<p><strong>Support Vector Classifier</strong></p>



<p></p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="svc =  SVC(kernel='rbf', C=2)
svc.fit(X_train, y_train)
svc_predicted = svc.predict(X_test)
svc_conf_matrix = confusion_matrix(y_test, svc_predicted)
svc_acc_score = accuracy_score(y_test, svc_predicted)
print(&quot;confussion matrix&quot;)
print(svc_conf_matrix)
print(&quot;\n&quot;)
print(&quot;Accuracy of Support Vector Classifier:&quot;,svc_acc_score*100,'%\n')
print(classification_report(y_test,svc_predicted))" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">svc </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF">  </span><span style="color: #88C0D0">SVC</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">kernel</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">rbf</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">C</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">2</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">svc</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">svc_predicted </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> svc</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">svc_conf_matrix </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">confusion_matrix</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> svc_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">svc_acc_score </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> svc_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">confussion matrix</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">svc_conf_matrix</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Accuracy of Support Vector Classifier:</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">svc_acc_score</span><span style="color: #81A1C1">*</span><span style="color: #B48EAD">100</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">%</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">classification_report</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">svc_predicted</span><span style="color: #ECEFF4">))</span></span></code></pre></div>



<p>Accuracy of Support Vector Classifier: 71.73913043478261 %</p>



<p><strong>Logistic Regression</strong></p>



<p></p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from sklearn.linear_model import  LogisticRegression
reg = LogisticRegression( )
reg.fit(X_train, y_train)
reg_predicted = reg.predict(X_test)
reg_conf_matrix = confusion_matrix(y_test, reg_predicted)
reg_acc_score = accuracy_score(y_test, reg_predicted)
print(&quot;confussion matrix&quot;)
print(reg_conf_matrix)
print(&quot;\n&quot;)
print(&quot;Accuracy of Support Vector Classifier:&quot;,reg_acc_score*100,'%\n')
print(classification_report(y_test,reg_predicted))" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> sklearn</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">linear_model </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF">  LogisticRegression</span></span>
<span class="line"><span style="color: #D8DEE9FF">reg </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">LogisticRegression</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">reg</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">reg_predicted </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> reg</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">predict</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">X_test</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">reg_conf_matrix </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">confusion_matrix</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> reg_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">reg_acc_score </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">accuracy_score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> reg_predicted</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">confussion matrix</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">reg_conf_matrix</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Accuracy of Support Vector Classifier:</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">reg_acc_score</span><span style="color: #81A1C1">*</span><span style="color: #B48EAD">100</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">%</span><span style="color: #EBCB8B">\n</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #88C0D0">print</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">classification_report</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">y_test</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF">reg_predicted</span><span style="color: #ECEFF4">))</span></span></code></pre></div>



<p>Accuracy of Support Vector Classifier: 91.30434782608695 %</p>



<h2 class="wp-block-heading">Comparação dos resultados</h2>



<p>Random Forest 84.7826091%</p>



<p>K-Nearest Neighbour (10) 60.8695652%</p>



<p>K-Nearest Neighbour (3) 71.7391303%</p>



<p>Decision Tree 84.7826094%</p>



<p>Support Vector Machine 71.7391305%</p>



<p>Logistic Regression 91.304348%</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="363" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-5-1024x363.png" alt="" class="wp-image-199" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/image-5-1024x363.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-5-300x106.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-5-768x272.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/10/image-5.png 1209w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Pela característica do problema, o modelo de regressão logistica tem um resultado melhor.</p>



<h2 class="wp-block-heading">Observações sobre o estudo:</h2>



<p>Esse trabalho foi apresentado na disciplina de Aprendizagem de máquina e produzido artigo. Junto do meu colega <a href="https://www.linkedin.com/in/gerfesson/">Gerfesson</a>. Obtivemos nota máxima.</p>



<p>Usamos também com referência diversos outros estudos, mas o principal foi esse, e fica a recomendação de leitura: </p>



<p>K. Rashid, M. A. Islam, R. A. Tanzin, M. L. Labib, and M. Khan, “Heart disease pre- diction using interquartile range preprocessing and hypertuned machine learning,” in <em>2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA)</em>, IEEE, Sept. 2022.</p>
<p>O post <a href="https://ramondomingos.com.br/aplicando-machine-learning-no-dataset-sobre-doencas-cardiacas/">Aplicando Machine Learning no dataset sobre Doenças cardíacas</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/aplicando-machine-learning-no-dataset-sobre-doencas-cardiacas/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Removendo outliers de uma base de dados</title>
		<link>https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/</link>
					<comments>https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/#comments</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Sat, 14 Oct 2023 18:12:39 +0000</pubDate>
				<category><![CDATA[Sem categoria]]></category>
		<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=187</guid>

					<description><![CDATA[<p>identificação de outliers, com boxplot</p>
<p>O post <a href="https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/">Removendo outliers de uma base de dados</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Os outliers de uma base de dados são aqueles valores que estão muito distante da maioria dos dados. Esses valores podem fazer com que uma média fique muito maior do que realmente ela seja, sempre é importante analisar os dados que possuímos no início do processo .  Olhar para mediana, e plotar gráficos com BoxPlot, é uma forma de identificar a distribuição da sua base de dados.</p>



<p>Alguns conceitos:</p>



<ul class="wp-block-list">
<li>Média: Soma de todos os valores, dividido pela quantidade de elementos.</li>



<li>Mediana: Exatamente o valor do meio dos dados, no caso de quantidade pares, é a média dos 2 valores.</li>



<li>Outlier: Ponto fora da Curva/ distribuição da maioria do dados.</li>
</ul>



<p>Vamos analisar o seguinte cenário hipotético, temos uma base de dados referente a internações em um hospital, com os seguintes dados: </p>



<figure class="wp-block-table"><table><tbody><tr><td>Id</td><td>Idade</td><td>Dias Internados</td><td>Quantidade de vezes internada</td></tr><tr><td>1</td><td>21</td><td>1</td><td>1</td></tr><tr><td>2</td><td>20</td><td>1</td><td>1</td></tr><tr><td>3</td><td>19</td><td>2</td><td>1</td></tr><tr><td>4</td><td>45</td><td>7</td><td>4</td></tr></tbody></table><figcaption class="wp-element-caption">Exemplo de base de dados, com outliers.</figcaption></figure>



<p>Dados:</p>



<ul class="wp-block-list">
<li>Dias Internados:
<ul class="wp-block-list">
<li>Média: 2.75 dias.</li>



<li>Mediana:1.5 dias.</li>
</ul>
</li>



<li>Idade:
<ul class="wp-block-list">
<li>Média: 26 anos.</li>



<li>Mediana:20,5 anos.</li>
</ul>
</li>
</ul>



<p>Da mesma forma que um dado isolado subiu a média para próximo de 3 dias, quando a maioria das internações duraram 1 ou 2 dias, e a media de idade para 26, quando a maioria estava próximo de 20. Uma base de dados com valores fora da curva, podem fazer seu algoritmo de aprendizagem de máquina, predizer de maneira menos assertiva, levando em considerações, essas exceções. </p>



<p>Uma técnica bastante utilizada é através da Identificação desses valores através de intervalos Interquartis, o boxPlot é um excelente meio de fazer essa identificação.</p>



<h2 class="wp-block-heading">O boxplot</h2>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis-1024x1024.png" alt="" class="wp-image-188" title="boxplot" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis-1024x1024.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis-300x300.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis-150x150.png 150w, https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis-768x768.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/10/Outlier-interquartis.png 1080w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Nessa representação visual, temos os intervalos, o primeiro quartil, é representado por 25%, o terceiro quartil, por 75%. Quando é subtraído Q3 e Q1, temos o <strong>Intervalo Interquartil</strong> (IQR). Esse valor, serve de referencia para se fazer os limites de máximo e mínimo. Qualquer valor além disso, é considerado um outlier, e é uma boa prática remove-los. O limite <strong>inferior</strong> se da  por <em>Q1-1.5*IQR</em>, e o <strong>superior</strong> <em>Q3+1.5*IQR</em>.</p>



<p></p>



<p>Essa etapa de pré-processamento, é muito importante para aumentar a acurácia de um algoritmo. No entanto, de nenhuma forma os dados devem ser alterados, o ideal é remove-los da base de treinamento. Nesse <a href="https://drive.google.com/file/d/1sBJ6w-Sege6ryUUm2swsyifOUn5BiFVM/view?usp=sharing">colab</a> existe essa etapa de análise e remoção de outliers.</p>



<p></p>



<p>Após identificar os outliers, o próximo passo é remove-los. </p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="for key in df.columns.values.tolist():
  Q1 = df[key].quantile(0.25)
  Q3 = df[key].quantile(0.75)
  IQR = Q3 - Q1 #IQR is interquartile range.

  filter = (df[key] &gt;= Q1 - 1.5 * IQR) &amp; (df[key] &lt;= Q3 + 1.5 *IQR)
  df = df.loc[filter]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">for</span><span style="color: #D8DEE9FF"> key </span><span style="color: #81A1C1">in</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">columns</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">values</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">tolist</span><span style="color: #ECEFF4">():</span></span>
<span class="line"><span style="color: #D8DEE9FF">  Q1 </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">key</span><span style="color: #ECEFF4">].</span><span style="color: #88C0D0">quantile</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">0.25</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  Q3 </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">key</span><span style="color: #ECEFF4">].</span><span style="color: #88C0D0">quantile</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">0.75</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  IQR </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> Q3 </span><span style="color: #81A1C1">-</span><span style="color: #D8DEE9FF"> Q1 </span><span style="color: #616E88">#IQR is interquartile range.</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">  </span><span style="color: #88C0D0">filter</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">key</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">&gt;=</span><span style="color: #D8DEE9FF"> Q1 </span><span style="color: #81A1C1">-</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">1.5</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">*</span><span style="color: #D8DEE9FF"> IQR</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">&amp;</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">key</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">&lt;=</span><span style="color: #D8DEE9FF"> Q3 </span><span style="color: #81A1C1">+</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">1.5</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">*</span><span style="color: #D8DEE9FF">IQR</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">  df </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">loc</span><span style="color: #ECEFF4">[</span><span style="color: #88C0D0">filter</span><span style="color: #ECEFF4">]</span></span></code></pre></div>



<p>Com esse trecho de código, conseguimos passar em todas as features e remover os dados que estão fora dessa definição. É muito importante analisar os dados que serão removidos, em alguns cenários, realmente existem dados fora do padrão, que realmente são importantes. Dessa forma, a clássica resposta &#8220;<strong>DEPENDE</strong>&#8221; se aplica perfeitamente, quando o assunto é remover dados de outliers.</p>
<p>O post <a href="https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/">Removendo outliers de uma base de dados</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/removendo-outliers-de-uma-base-de-dados/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Aplicando Árvore de decisão no dataset Íris</title>
		<link>https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/</link>
					<comments>https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/#comments</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Wed, 06 Sep 2023 21:41:31 +0000</pubDate>
				<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=179</guid>

					<description><![CDATA[<p>gráfico de decisões</p>
<p>O post <a href="https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/">Aplicando Árvore de decisão no dataset Íris</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>No <a href="https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/">post anterior </a>vimos uma aplicação simples do algoritmo Árvore de decisão, para entender se iríamos ou não para universidade em um determinado dia. O nosso treino, possuía poucas linhas, e no geral tínhamos poucas decisões para tomar, era apenas IR ou NÃO IR, mas, quando o nosso conjunto de possíveis decisões aumenta, a quantidade de dados que precisamos para validar nosso modelo também tende a aumentar.</p>



<p>Como de costume, todo os exemplos estão no <a href="https://drive.google.com/file/d/1E76nyf4BcAuUy2NNbWOPdPvj6T5IPSox/view?usp=sharing">colab</a>. </p>



<p>Vamos iniciar importando as nossas bibliotecas, iniciando nosso Toy Dataset Iris e transformando num dataframe do pandas.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="import pandas as pd
from sklearn.datasets import load_iris
data = load_iris()
iris = pd.DataFrame(data.data)
iris.columns = data.feature_names
iris['target'] = data.target
iris.head()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> pandas </span><span style="color: #81A1C1">as</span><span style="color: #D8DEE9FF"> pd</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> sklearn</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">datasets </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> load_iris</span></span>
<span class="line"><span style="color: #D8DEE9FF">data </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">load_iris</span><span style="color: #ECEFF4">()</span></span>
<span class="line"><span style="color: #D8DEE9FF">iris </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> pd</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">DataFrame</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">data</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">data</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">iris</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">columns </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> data</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">feature_names</span></span>
<span class="line"><span style="color: #D8DEE9FF">iris</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> data</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">target</span></span>
<span class="line"><span style="color: #D8DEE9FF">iris</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">head</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<p>Para ser mais didático, e melhorar a compreensão, vamos iniciar o nosso estudo, apenas com <strong>2 features</strong> referente a pétalas, para conseguirmos visualizar em um plano cartesiano. Em seguida adicionamos todos os campos.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="irisCopy = iris.loc[iris.target.isin([1,2]), ['petal length (cm)','petal width (cm)' , 'target']]
# separa em x e y
x = irisCopy.drop( 'target', axis=1)
y = irisCopy.target" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">irisCopy </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> iris</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">loc</span><span style="color: #ECEFF4">[</span><span style="color: #D8DEE9FF">iris</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">target</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">isin</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">2</span><span style="color: #ECEFF4">]),</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">petal length (cm)</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">petal width (cm)</span><span style="color: #ECEFF4">&#39;</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">]]</span></span>
<span class="line"><span style="color: #616E88"># separa em x e y</span></span>
<span class="line"><span style="color: #D8DEE9FF">x </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> irisCopy</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">drop</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">axis</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">y </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> irisCopy</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">target</span></span></code></pre></div>



<p>Como temos uma dataset bem grande, conseguimos dividi-lo em duas base, treino e teste. Vamos fazer isso usando o `train_test_split`.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from sklearn.model_selection import train_test_split
x_train, x_teste, y_train, y_test = train_test_split( x, y , test_size=0.30, random_state=22)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> sklearn</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">model_selection </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> train_test_split</span></span>
<span class="line"><span style="color: #D8DEE9FF">x_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> x_teste</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_test </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">train_test_split</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> x</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y </span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">test_size</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">0.30</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">22</span><span style="color: #ECEFF4">)</span></span></code></pre></div>



<p>Temos nossa base de teste e treino, agora vamos criar nosso classificador, usando nossa base de treino.<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f3cb-1f3fd.png" alt="🏋🏽" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from sklearn import tree
import matplotlib.pyplot as plt

clf =  tree.DecisionTreeClassifier(random_state=22)
clf = clf.fit(x_train, y_train)
fig, ax = plt.subplots(figsize=(10,8))

tree.plot_tree(clf)
plt.show()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> sklearn </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> tree</span></span>
<span class="line"><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> matplotlib</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">pyplot </span><span style="color: #81A1C1">as</span><span style="color: #D8DEE9FF"> plt</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">clf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF">  tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">DecisionTreeClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">22</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">clf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> clf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">x_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">fig</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> ax </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">subplots</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">figsize</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">10</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">8</span><span style="color: #ECEFF4">))</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot_tree</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">clf</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">show</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<p>Obtemos essa árvore:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="794" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-8-1024x794.png" alt="árvore de decisoes" class="wp-image-182" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-8-1024x794.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-8-300x233.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-8-768x596.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-8.png 1338w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Agora, vamos analisar cada nó, as decisões que estao sendo analisadas, e baseado nisso, vamos traças linhas em um gráfico, para identificar como estão sendo feito cada decisão:</p>



<ul class="wp-block-list">
<li>x[0] &lt; 4.75</li>



<li>x[0] &lt; 5.05</li>



<li>x[1] &lt; 1.65 ( nesse caso x[1], é o Y )</li>



<li>x[1] &lt; 1.6</li>



<li>x[0] &lt; 4.85</li>
</ul>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="
fig, ax = plt.subplots()
ax.scatter(
    x_train['petal length (cm)'],
    x_train['petal width (cm)'],
    c=y_train
)

ax.plot([4.75,4.75], [0,3], '--r') # primeiro nó
ax.plot([2,4.75],[1.65,1.65], '--r') # segundo nó
ax.plot([5.05,5.05], [3,0], '--r') # terceiro nó
ax.plot([4.75,5.05],[1.6,1.6], '--r') # quarto nó
ax.plot([4.75,5.05],[1.75,1.75], '--r') # quinto nó
ax.plot([4.85,4.85], [1.75,3], '--r') # sexto nó

ax.set( xlim=(3, 7), xticks=[2,3,4,5,6,7], ylim=(0.9,2.7), yticks=[1,1.5,2,2.5])
plt.show()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">fig</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> ax </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">subplots</span><span style="color: #ECEFF4">()</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">scatter</span><span style="color: #ECEFF4">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    x_train</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">petal length (cm)</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">],</span></span>
<span class="line"><span style="color: #D8DEE9FF">    x_train</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">petal width (cm)</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">],</span></span>
<span class="line"><span style="color: #D8DEE9FF">    </span><span style="color: #D8DEE9">c</span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF">y_train</span></span>
<span class="line"><span style="color: #ECEFF4">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">4.75</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">4.75</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #B48EAD">0</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># primeiro nó</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">2</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">4.75</span><span style="color: #ECEFF4">],[</span><span style="color: #B48EAD">1.65</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">1.65</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># segundo nó</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">5.05</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">5.05</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">0</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># terceiro nó</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">4.75</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">5.05</span><span style="color: #ECEFF4">],[</span><span style="color: #B48EAD">1.6</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">1.6</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># quarto nó</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">4.75</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">5.05</span><span style="color: #ECEFF4">],[</span><span style="color: #B48EAD">1.75</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">1.75</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># quinto nó</span></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot</span><span style="color: #ECEFF4">([</span><span style="color: #B48EAD">4.85</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">4.85</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #B48EAD">1.75</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">--r</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">)</span><span style="color: #D8DEE9FF"> </span><span style="color: #616E88"># sexto nó</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">ax</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">set</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">xlim</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">7</span><span style="color: #ECEFF4">),</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">xticks</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">[</span><span style="color: #B48EAD">2</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">3</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">4</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">5</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">6</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">7</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">ylim</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">0.9</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">2.7</span><span style="color: #ECEFF4">),</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">yticks</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">[</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">1.5</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">2</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">2.5</span><span style="color: #ECEFF4">])</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">show</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<p>Conseguimos ver as seguintes linhas:</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="904" height="704" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-9.png" alt="Grafico de decisões" class="wp-image-183" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-9.png 904w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-9-300x234.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-9-768x598.png 768w" sizes="(max-width: 904px) 100vw, 904px" /></figure>



<p>Dessa forma, podemos ver quais decisões foram tomadas pelo software. Agora, podemos evoluir, deixar de ser apenas 2 escolhas, e colocar para o algoritmo treinar todas as escolhas possíveis, ver a árvore ainda maior.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="
x_train, x_teste, y_train, y_test = train_test_split( iris.drop( 'target', axis=1), iris.target , test_size=0.20, random_state=10)

clf2 =  tree.DecisionTreeClassifier(random_state=22).fit(x_train, y_train)

fig, ax = plt.subplots(figsize=(10,8))

tree.plot_tree(clf2)
plt.show()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">x_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> x_teste</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_test </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">train_test_split</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> iris</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">drop</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">axis</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">),</span><span style="color: #D8DEE9FF"> iris</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">target </span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">test_size</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">0.20</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">10</span><span style="color: #ECEFF4">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">clf2 </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF">  tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">DecisionTreeClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">22</span><span style="color: #ECEFF4">).</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">x_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">fig</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> ax </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">subplots</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9">figsize</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">(</span><span style="color: #B48EAD">10</span><span style="color: #ECEFF4">,</span><span style="color: #B48EAD">8</span><span style="color: #ECEFF4">))</span></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot_tree</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">clf2</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">plt</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">show</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-7-1024x799.png" alt="árvore de decisões" class="wp-image-181" style="width:840px;height:656px" width="840" height="656" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-7-1024x799.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-7-300x234.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-7-768x599.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-7.png 1302w" sizes="(max-width: 840px) 100vw, 840px" /></figure>



<p>Agora, vamos avaliar nosso modelo, qual o score que ele possui:</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="clf2.score(x_train, y_train)
# 1" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9FF">clf2</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">score</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">x_train</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> y_train</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #616E88"># 1</span></span></code></pre></div>



<p>Um excelente aprendizado, nota máxima. Mas essa não é a única maneira de se avaliar um modelo. Existem outras métricas, que veremos em outro post.</p>
<p>O post <a href="https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/">Aplicando Árvore de decisão no dataset Íris</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/aplicando-arvore-de-decisao-no-dataset-iris/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Conceito da Árvore de decisão &#8211; Aprendizado de máquina</title>
		<link>https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/</link>
					<comments>https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/#comments</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Wed, 06 Sep 2023 17:26:52 +0000</pubDate>
				<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=168</guid>

					<description><![CDATA[<p>O Algoritmo de árvore de decisão é bastante popular, e possui representações gráficas de como o algoritmo esta realizando as decisões. Muito bom para ajudar o entendimento das operações que ele realiza, e prever possíveis falhas, em casos mais críticos. Dessa forma, adicionando mais cenários desse tipo para o treinamento. Neste post vamos utilizar uma&#8230;</p>
<p>O post <a href="https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/">Conceito da Árvore de decisão &#8211; Aprendizado de máquina</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>O Algoritmo de árvore de decisão é bastante popular, e possui representações gráficas de como o algoritmo esta realizando as decisões. Muito bom para ajudar o entendimento das operações que ele realiza, e prever possíveis falhas, em casos mais críticos. Dessa forma, adicionando mais cenários desse tipo para o treinamento. </p>



<p>Neste post vamos utilizar uma situação simples, com poucos nós. Para entendermos como ele funciona, e em quais situações ele é uma boa escolha, no próximo post utilizaremos datasets maiores, com mais decisões, além de Sim/Não. </p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1-1024x1024.png" alt="" class="wp-image-171" style="width:639px;height:639px" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1-1024x1024.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1-300x300.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1-150x150.png 150w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1-768x768.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Arvore-de-decisao-1.png 1080w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>


<p>No geral esse algoritmo busca classificar um registro ( problemas de classificação) ou  estimar um valor ( problemas de regressão). Como vemos nessa imagem , cada pergunta, chamadas de <strong>nó decisão</strong>, respondemos SIM ou NÃO, a primeira pergunta, o nó inicial é o <strong>nó raiz</strong> e o último, com a resposta, é o <strong>nó folha.</strong> Em inglês, Decision node, Chance node, Endpoint Node.</p>



<p>Mas como sair de uma simplesmente diagramação visual e chegar num modelo?</p>



<p>O <em><strong>sckit-learn </strong></em>faz esse treinamento, além de exibir uma representação visual das decisões como essa:</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="725" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4-1024x725.png" alt="" class="wp-image-172" style="width:565px;height:400px" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4-1024x725.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4-300x213.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4-768x544.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4-1536x1088.png 1536w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-4.png 1646w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>


<p>Preparei um <a href="https://colab.research.google.com/drive/1D_qsU6QAtFJTeiKncr6UosOD2fsN_bdf?usp=sharing">colab</a> com esses exemplos que teremos nesse post.</p>



<p>Inicialmente, preparei um array, usando numPy, baseado nessa situação, e exibir a tabela</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="import pandas as pd
import numpy as np
# Criando um array de resultados
numpy_array = np.array([
[True,True,False,False,False], [False,False,False,False,False],
[True,False,True,False,True], [True,False,False,True,True], 
[True,False,False,False,False]])
# Convertendo em Pandas dataFrame
df = pd.DataFrame(numpy_array, columns=['Tenho aula?', 'É Remoto', 'Vou de Carro', 'Vou de ônibus', 'target'])
df[&quot;target&quot;] = df[&quot;target&quot;].astype(int)
df['target_names']= pd.Categorical.from_codes (df[&quot;target&quot;], ['Não vou', 'Vou'])
# Exibindo
df.head()" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> pandas </span><span style="color: #81A1C1">as</span><span style="color: #D8DEE9FF"> pd</span></span>
<span class="line"><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> numpy </span><span style="color: #81A1C1">as</span><span style="color: #D8DEE9FF"> np</span></span>
<span class="line"><span style="color: #616E88"># Criando um array de resultados</span></span>
<span class="line"><span style="color: #D8DEE9FF">numpy_array </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> np</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">array</span><span style="color: #ECEFF4">([</span></span>
<span class="line"><span style="color: #ECEFF4">[</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">],</span></span>
<span class="line"><span style="color: #ECEFF4">[</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span></span>
<span class="line"><span style="color: #ECEFF4">[</span><span style="color: #81A1C1">True</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">,</span><span style="color: #81A1C1">False</span><span style="color: #ECEFF4">]])</span></span>
<span class="line"><span style="color: #616E88"># Convertendo em Pandas dataFrame</span></span>
<span class="line"><span style="color: #D8DEE9FF">df </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> pd</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">DataFrame</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">numpy_array</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">columns</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Tenho aula?</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">É Remoto</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Vou de Carro</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Vou de ônibus</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">])</span></span>
<span class="line"><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">]</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">].</span><span style="color: #88C0D0">astype</span><span style="color: #ECEFF4">(</span><span style="color: #88C0D0">int</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">target_names</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">]</span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> pd</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">Categorical</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">from_codes</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">target</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">],</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">[</span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Não vou</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #ECEFF4">&#39;</span><span style="color: #A3BE8C">Vou</span><span style="color: #ECEFF4">&#39;</span><span style="color: #ECEFF4">])</span></span>
<span class="line"><span style="color: #616E88"># Exibindo</span></span>
<span class="line"><span style="color: #D8DEE9FF">df</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">head</span><span style="color: #ECEFF4">()</span></span></code></pre></div>



<p>Ficou assim:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="328" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-5-1024x328.png" alt="" class="wp-image-175" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-5-1024x328.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-5-300x96.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-5-768x246.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-5.png 1316w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Em seguida, usando o sckitLearn para  criar uma classificador, treinar o modelo e criar a árvore de decisão, em seguida apresento aquela representação gráfica. Mostrada inicialmente.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from sklearn import tree
clf = tree.DecisionTreeClassifier( random_state=42)
clf = clf.fit(dados, df.target)
tree.plot_tree(clf)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> sklearn </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> tree</span></span>
<span class="line"><span style="color: #D8DEE9FF">clf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">DecisionTreeClassifier</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">random_state</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">42</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">clf </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> clf</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">fit</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">dados</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> df</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9FF">target</span><span style="color: #ECEFF4">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">tree</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">plot_tree</span><span style="color: #ECEFF4">(</span><span style="color: #D8DEE9FF">clf</span><span style="color: #ECEFF4">)</span></span></code></pre></div>



<p>No próximo post, vamos utilizar algum Toy dataset para esse algoritmo.</p>
<p>O post <a href="https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/">Conceito da Árvore de decisão &#8211; Aprendizado de máquina</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/conceito-da-arvore-de-decisao-aprendizado-de-maquina/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Hello scikit-learn</title>
		<link>https://ramondomingos.com.br/hello-scikit-learn/</link>
					<comments>https://ramondomingos.com.br/hello-scikit-learn/#respond</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Mon, 04 Sep 2023 22:46:52 +0000</pubDate>
				<category><![CDATA[Sem categoria]]></category>
		<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=140</guid>

					<description><![CDATA[<p>O scikit-learn é um das principais ferramentas que usamos para o Aprendizado de máquina. É um pacote em Python com uma rica documentação disponível em : https://scikit-learn.org/ . Além de ferramentas para usarmos nos estudos e aplicações de machine Learning, como os principais algoritmos para resolver problemas de cluster, classificação ou regressão, ele também possui&#8230;</p>
<p>O post <a href="https://ramondomingos.com.br/hello-scikit-learn/">Hello scikit-learn</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>O scikit-learn é um das principais ferramentas que usamos para o Aprendizado de máquina. É um pacote em Python com uma rica documentação disponível em : <a href="https://scikit-learn.org/">https://scikit-learn.org/</a> .</p>



<p>Além de ferramentas para usarmos nos estudos e aplicações de machine Learning, como os principais algoritmos para resolver problemas de cluster, classificação ou regressão, ele também possui dados de testes,  os chamados Toys datasets. Que envolve diferentes contextos, como tamanho de pétalas de flores, pacote iris, ou informações sobre pacientes diabéticos. Ao todo existem 6 pacotes &#8220;toys&#8221; para explorar.</p>



<p>Todo o código a seguir, esta disponível em um google colab, <a href="https://colab.research.google.com/drive/1LpXEjTQMr4MPiFeiwwK-uRCEL-11IaXd?usp=sharing">clicando aqui </a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="708" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1024x708.png" alt="" class="wp-image-141" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1024x708.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-300x207.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-768x531.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image.png 1118w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Para instalar esse pacote podemos usar o pip</p>



<pre class="wp-block-code"><code>pip&nbsp;install&nbsp;-U&nbsp;scikit-learn</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Assim como todos os pacotes em python, para usa-lo precisa importar.</p>



<pre class="wp-block-code"><code>import&nbsp;sklearn</code></pre>



<p>Para utilizar um dataset personalizado, utiliza a seguinte função.</p>



<pre class="wp-block-code"><code>from&nbsp;sklearn&nbsp;import&nbsp;datasets</code></pre>



<p>Já para utilizar uma base Toy, como o conjunto de dados sobre flores utiliza:</p>



<pre class="wp-block-code"><code>from&nbsp;sklearn.datasets&nbsp;import&nbsp;load_iris</code></pre>



<p>Assim, como a documentação mostra, esse conjunto de dados, possui altura e largura das pétalas de flores. Podemos importar de duas formas um dataset, a&nbsp;<strong>primeira</strong>:</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Importando dataset sem X, y</h2>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>#&nbsp;carregar&nbsp;os&nbsp;dados&nbsp;para&nbsp;uma&nbsp;vari'svel<br />dados&nbsp;=&nbsp;load_iris()<br />#&nbsp;Quero&nbsp;saber&nbsp;os&nbsp;valores,&nbsp;numéricos,&nbsp;&nbsp;da&nbsp;classificação&nbsp;da&nbsp;linha&nbsp;1&nbsp;,10,&nbsp;100<br />dados.target[[1,10,100]]<br />array([0, 0, 2])</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Se quisermos ver os valores de uma linha especifica , utilizamos a função:</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>dados.data[[1,10,100]]<br />array([[4.9, 3. , 1.4, 0.2], [5.4, 3.7, 1.5, 0.2], [6.3, 3.3, 6. , 2.5]])</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Se quisermos ver os títulos de cada coluna, utilizamos o seguinte comando:</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p></p>



<pre class="wp-block-code"><code>list(&nbsp;dados.feature_names)&nbsp;<br />['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Se quiser saber os&nbsp;<strong>target_names</strong>, isto é, os nomes das classificações, utilizo a seguinte função:</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>list(dados.target_names)<br />['setosa', 'versicolor', 'virginica']</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Como temos os target_names, esse modelo fornece um treino supervisionado, ja falamos sobre isso nesse <a href="https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/">post</a>.</p>



<p>A Parte&nbsp;<strong>DATA</strong>: São as características dos dados, as features.</p>



<p>A parte&nbsp;<strong>TARGET</strong>, é o rótulo, no caso, é o que queremos descobrir.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p></p>



<pre class="wp-block-code"><code>dados.target
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p><strong>0,1,2</strong> Corresponde, ao&nbsp;<code>dados.target_names</code>, <strong>respectivamente</strong>.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Importando dataset com X, y.</h2>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Na documentação, sempre existe essa notação. o X é o&nbsp;<strong>data</strong>, características, o Y é o&nbsp;<strong>target</strong>, o que queremos prever.</p>



<p>Lembra da fórmula:</p>



<p><strong>𝑦=𝑓(𝑥)</strong></p>



<p>então, qual é o Y, quando informamos x ? Isso é o que nosso algoritmo quer descobrir. Exatamente, como na fórmula.</p>



<p>Para utilizar essa notação , o&nbsp;<code>load_iris</code>, irá mudar.</p>



<pre class="wp-block-code"><code>X,&nbsp;y&nbsp;=&nbsp;load_iris(return_X_y=True)</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>nessa hora, o X ==&nbsp;<code>dados.data</code>&nbsp;e o y ==&nbsp;<code>dados.target</code></p>



<h2 class="wp-block-heading">Convertendo em PANDAS</h2>



<p>Você pode precisar passar o seu dataset <code>scikit-learn</code> para um dataframe <code>PANDAS</code>, ele é mais amigável e popular, na análise de dados, e pode ser útil na hora de apresentar os dados. Para fazer isso, import o pandas.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>import&nbsp;pandas&nbsp;as&nbsp;pd</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Converta seu <strong>dataSet</strong> em <strong>dataFrame</strong></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>iris_df&nbsp;=&nbsp;pd.DataFrame(data=&nbsp;dados.data,&nbsp;columns=dados.feature_names)</code></pre>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>iris_df[&#8216;target&#8217;]&nbsp;=&nbsp;dados.target</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>e confira</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="367" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2-1024x367.png" alt="" class="wp-image-143" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2-1024x367.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2-300x108.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2-768x275.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2-1536x550.png 1536w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-2.png 1680w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Você quer o nome? podemos add também um coluna mais amigável ainda. Com uma função do pandas:</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>iris_df['target_names']&nbsp;=&nbsp;pd.Categorical.from_codes(dados.target,&nbsp;dados.target_names)</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="249" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3-1024x249.png" alt="" class="wp-image-144" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3-1024x249.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3-300x73.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3-768x187.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3-1536x374.png 1536w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-3.png 1760w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Podemos fazer um gráfico, para ver e entender a utilidade dessa conversão.</p>



<p>Para conseguir plotar gráficos aqui , precisamos de uma notação.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>%matplotlib&nbsp;inline</code></pre>



<pre class="wp-block-code"><code>iris_df.plot.scatter(&nbsp;'sepal&nbsp;length&nbsp;(cm)',&nbsp;'sepal&nbsp;width&nbsp;(cm)',&nbsp;c='target')</code></pre>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="760" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1-1024x760.png" alt="grafico de flores" class="wp-image-142" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1-1024x760.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1-300x223.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1-768x570.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/image-1.png 1472w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Conjuntos de treinamento e teste</h2>



<p>Para separar seus dados, podemos dividir o grande conjunto em duas partes, para testarmos e validamos nossos algoritmos. O próprio scikit-learn ja faz isso para você. Conseguindo determinar quais porcentagens vão para cada conjunto. Utilizando o&nbsp;<code>train_test_split</code>, test_size, é a porcentagem das divisões, e o random_state, é uma semente para pegar aleatoriamente os elementos, caso o data_frame esteja ordenado.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<pre class="wp-block-code"><code>from&nbsp;sklearn.model_selection&nbsp;import&nbsp;train_test_split&nbsp;</code></pre>



<pre class="wp-block-code"><code>X_train,&nbsp;X_test,&nbsp;y_train,&nbsp;y_test&nbsp;=&nbsp;train_test_split(X,&nbsp;y,&nbsp;test_size&nbsp;=&nbsp;0.25,&nbsp;random_state=22)&nbsp;</code></pre>
<p>O post <a href="https://ramondomingos.com.br/hello-scikit-learn/">Hello scikit-learn</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/hello-scikit-learn/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Treinamento de Machine Learning supervisionado ou não</title>
		<link>https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/</link>
					<comments>https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/#comments</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Sat, 02 Sep 2023 22:38:59 +0000</pubDate>
				<category><![CDATA[Sem categoria]]></category>
		<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=123</guid>

					<description><![CDATA[<p>Treinamento supervisionado ou não supervisionado </p>
<p>O post <a href="https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/">Treinamento de Machine Learning supervisionado ou não</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="wp-block-image">
<figure class="alignleft size-large is-resized"><img loading="lazy" decoding="async" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada-1024x1024.png" alt="Treinamento supervisionado ou não supervisionado" class="wp-image-125" style="width:387px;height:387px" width="387" height="387" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada-1024x1024.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada-300x300.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada-150x150.png 150w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada-768x768.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Aprendizagem-supervisionada.png 1080w" sizes="(max-width: 387px) 100vw, 387px" /><figcaption class="wp-element-caption">Treinamento supervisionado ou não supervisionado </figcaption></figure>
</div>


<p>No <a href="https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/">post</a> que eu falei sobre a diferença entre machine Learn e Deep learning, comentei que para evoluir um modelo é necessário treiná-lo, fornecendo dados para isso, dessa forma, o aprendizado dessa máquina, é baseado no que fornecemos a ela. Quando vamos treinar um algoritmo para evolui-lo, e torná-lo mais assertivo nas tomadas de decisões, esse treinamento podemos fazer de duas formas:</p>



<h2 class="wp-block-heading">Treinamento machine learning Supervisionado</h2>



<p>No conjunto de dados de treinamento, existe uma variável chamada <strong>TARGET</strong>, que é o que queremos descobrir normalmente, preve-la no resultado final. Mas, essa característica foi colocada lá &#8220;manualmente&#8221;, por um humano. Fomos nós, seres humanos,  que nomeamos esse grupo de coisas com esse nome e o modelo possui uma referência do que esta certo, e do que esta errado.</p>



<h2 class="wp-block-heading">Treinamento de machine learning não supervisionado</h2>



<p>Diferente do supervisionado, esse não possui um target nomeando cada coisa ou categoria, para se referenciar ou agrupar coisas parecidas. Essa forma, o algoritmo precisa  de maneira independente agrupar os itens parecidos, não irá nomea-los, pois ele não conhece realmente o que é , mas sabe que são parecidos, levando em consideração as caraterísticas conhecidas por ele.</p>



<h2 class="wp-block-heading">Vantagens e desvantagens</h2>



<p>Como a maioria das coisas, existem vantagens e desvantagens de cada treinamento. Quando um aprendizado é supervisionado, precisamos avaliar muito bem nossa base de teste, classifica-la, muitas vezes conhecer bem o negócio que estamos desenvolvendo para evitar  sinônimos, ou ate target com regionalismos, como por exemplo o nome de uma fruta, pode ter diferentes nomes no país, mexerica, bergamota ou tangerina, por exemplo. </p>



<p>Já no não supervisionado, precisamos gastar um tempo maior, ja que não temos um  rótulo para avaliar, e mesmo sendo não supervisionado, as vezes irá precisar de uma intervenção humana para resultados mais satisfatórios. </p>



<h2 class="wp-block-heading">Qual modelo de treinamento  escolher?</h2>



<p>Sem dúvidas a origem e qualidade da base de treinamento é um fator quase que determinante para a escolha inicial de um modelo de treinamento. O Problema ja é conhecido, ou estamos tentando usar aprendizagem de máquina para gerar <em>insigths</em> e descobrir coisas?</p>



<p>A escolha de um método não é imutável, ela pode ser alterado com o tempo. E com os resultados que forem obtidos. Pode se iniciar não supervisionado para agrupar e em seguida identificar os grupos. </p>



<p>Usufruir dos dois meios é uma boa prática, ja que durante uma aprendizagem não supervisionada pode ser identificado padrões ate então desconhecidos. Padrões que poderiam passar despercebidos caso fosse em uma supervisionada, usando target.</p>
<p>O post <a href="https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/">Treinamento de Machine Learning supervisionado ou não</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/treinamento-de-machine-learning-supervisionado-ou-nao/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Diferença entre machine learning e deep learning</title>
		<link>https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/</link>
					<comments>https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/#comments</comments>
		
		<dc:creator><![CDATA[Ramon Domingos]]></dc:creator>
		<pubDate>Sat, 02 Sep 2023 15:34:34 +0000</pubDate>
				<category><![CDATA[Sem categoria]]></category>
		<category><![CDATA[Aprendizagem de máquina]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://ramondomingos.com.br/?p=109</guid>

					<description><![CDATA[<p>Tanto machine learning (aprendizado de máquina) quanto deep learning (aprendizado profunda) são termos bastante utilizados quando o tema é inteligência artificial. Apesar destes dois conceitos ajudarem as máquinas evoluírem e &#8220;pensarem&#8221; semelhantes a seres inteligentes como nós, humanos, não são a mesma coisa. Pense como um sendo a evolução do outro, e estando diretamente ligado&#8230;</p>
<p>O post <a href="https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/">Diferença entre machine learning e deep learning</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Tanto  machine learning (aprendizado de máquina) quanto deep learning (aprendizado profunda) são termos bastante utilizados quando o tema é inteligência artificial. Apesar destes dois conceitos ajudarem as  máquinas evoluírem e &#8220;pensarem&#8221; semelhantes a seres inteligentes como nós, humanos, não são a mesma coisa.</p>



<p>Pense como um sendo a evolução do outro, e estando diretamente ligado ao conceito anterior. Formando a base, sendo pilares da IA.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning-1024x1024.png" alt="Pilares da Inteligência artificial ( Deep Learning e Machine Learning) " class="wp-image-110" style="width:600px;height:600px" width="600" height="600" srcset="https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning-1024x1024.png 1024w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning-300x300.png 300w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning-150x150.png 150w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning-768x768.png 768w, https://ramondomingos.com.br/wp-content/uploads/2023/09/Machine-learning.png 1080w" sizes="(max-width: 600px) 100vw, 600px" /><figcaption class="wp-element-caption">Pilares da Inteligência artificial ( Deep Learning e Machine Learning) </figcaption></figure>
</div>


<h2 class="wp-block-heading">O que é machine learning?</h2>



<p>Nessa abordagem é necessário dados. Com um volume de dados os <strong>algoritmos</strong>, organizam, reconhecem padrões,  dessa forma, fazem com o que as máquinas aprendam, criando modelos para tomada de decisões.</p>



<p>Esses algoritmos, são instruções, passo-a-passo que devem ser executados com essa base de dados afim de identificar padrões. O termo parece novo, mas em 1956 as definições de inteligência ja eram conhecidas. Mas a tecnologia de hardware da época não permitia  a evolução e por em prática toda a teoria estudada. O Objetivo da aprendizagem de máquina é fazer os computadores encontrar respostas para coisas que eles não foram propriamente programados.</p>



<p>A Kizzy do Programação Dinâmica tem um <a href="https://www.youtube.com/watch?v=u8xgqvk16EA">vídeo</a> que comenta e mostra um exemplo de como identificar as frutas de forma convencional, onde precisamos usar comparadores if/else. Em seguida  o mesmo exemplo usando técnicas de aprendizagem de máquina.</p>



<p>Pensando numa análise de crédito onde temos o escore de crédito poderíamos fazer algo como:</p>



<pre class="wp-block-code"><code>def <strong>analisa_credito</strong>(escore:int, salario:int):
  if escore &gt; 900:
     <strong>return</strong>  salario * 3;
  elif escore &gt; 600:
     <strong>return</strong> salario * 2;
  else:
    return 0
}</code></pre>



<p>Mas sabemos que não existem apenas essas varáveis para analisar o crédito, e a medida que adicionamos mais propriedade referente a pessoa, como histórico de pagamentos de outros cartões, valores investido no banco ou  bens, se tornaria mais difícil fazer um código para essa finalidade ser realmente eficaz.  O ideal é termos uma base de dados, com diferentes características e créditos concedidos e a máquina aprender com eles. Ordenando, identificando padrões para tomar decisões.</p>



<p>Quando a máquina aprender com uma base de dados, o modelo gerado é capaz de tomar decisões &#8220;confiáveis&#8221;  quando submetido a novos dados. Ao invés de programarmos todos os resultados esperados, deixamos o software calcular.</p>



<h2 class="wp-block-heading">O que é deep Learning?</h2>



<p>Como ja comentei, esses termos e conceitos já existem desde a década de 1950. A principal diferença era os hardwares existentes na época, e a quantidade de dados que existiam para treinar os modelos.Então a deep learning se desenvolveu por volta de 2010, no momento em que surgiram computadores mais avançados e houve o aumento da quantidade de dados.</p>



<p>Um <a href="https://findstack.com.br/resources/big-data-statistics/#:~:text=(Cr%C3%B4nicas%20de%20TI),e%20mal%20s%C3%A3o%20o%20come%C3%A7o.">artigo</a> da Rebekah Carter publicado no dia 6 de setembro de 2022, nos traz alguns números da quantidade de dados que existem hoje como:</p>



<ul class="wp-block-list">
<li>As empresas geram cerca de 2,000,000,000,000,000,000 bytes de dados por dia</li>
</ul>



<ul class="wp-block-list">
<li>Cada ser humano criou cerca de 1.7 MB de dados por segundo em 2020</li>
</ul>



<p>É um número muito grande e incomum de vermos escritos dessa forma, com tantos zeros, referente a quantidade dados gerados por dia por uma empresa. Todo momento estamos gerado dados, nossos equipamentos pessoais, nossas redes sociais, nosso relógio. Tudo esta produzindo dados sobre nosso comportamento e são analisados, existem máquinas aprendendo com nosso comportamento nesse exato momento. Mas, com tantos dados, os algoritmos da  Deep learning são considerados de alto nível, e tentam imitar a rede neural do cérebro humano.</p>



<p>Dessa forma, podemos pensar sobre esses algoritmos, que são diversas camadas, não lineares, que simultaneamente são capazes de identificar imagens, reconhecer a fala humana, e decodificar audio,  e realizar tarefas mais avançadas sem a interferência humana.</p>
<p>O post <a href="https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/">Diferença entre machine learning e deep learning</a> apareceu primeiro em <a href="https://ramondomingos.com.br">Ramon Domingos Blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://ramondomingos.com.br/diferenca-entre-machine-learning-e-deep-learning/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
