thesis/doc/main.tex

\documentclass[a4paper,12pt]{article}

% Packages for formatting and functionality
\usepackage[utf8]{inputenc}  % UTF-8 encoding
\usepackage{graphicx}       % Include graphics
\usepackage{amsmath}        % Math formulas
\usepackage{amssymb}        % Math symbols
\usepackage{geometry}       % Page geometry
\geometry{margin=1in}       % 1-inch margins
\usepackage{setspace}       % Line spacing
\usepackage{titlesec}       % Custom section formatting
\usepackage{fancyhdr}       % Custom headers and footers
\usepackage{tocbibind}      % Include TOC, LOF, LOT in TOC
\usepackage{hyperref}       % Hyperlinks
\usepackage{caption}        % Custom captions
\usepackage{enumitem}       % Better control of lists

\usepackage[
    sortcites,
    backend=biber,
    hyperref=true,
    firstinits=true,
    maxbibnames=99,
    ]{biblatex}
\addbibresource{references.bib}

% Definiowanie nowego typu listy 'longenum' z pięcioma poziomami numeracji
\newlist{longenum}{enumerate}{5}
\setlist[longenum,1]{label=\arabic*., left=.5em}
\setlist[longenum,2]{label=\arabic*), left=1.em}
\setlist[longenum,3]{label=\alph*., left=1.5em}
\setlist[longenum,4]{label=\alph*), left=2em}
\setlist[longenum,5]{label=--, left=3.5em}

% Pakiet do nagłówków i stopek
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}  % Czyszczenie domyślnych nagłówków i stopek
% \fancyhead[L]{Biotech, PCz}
% \fancyhead[R]{MP}  % Lewy nagłówek

% \fancyhead[L]{Commit Date: \texttt{\commitDate}}  % Lewa
\fancyhead[L]{Commit: \texttt{\commitUUID}, \texttt{\commitDate}}  % % \fancyhead[L]{Commit UUID: \texttt{\commitUUID} \\ Commit Date: \texttt{\commitDate}}  % Lewa stopka - informacje lini
\fancyhead[R]{\thepage}       % Prawa stopka - Numer strony

% Line spacing
\setstretch{1.5}

% Header and footer
\pagestyle{fancy}
% \fancyhead{}
% \fancyfoot{}
% \fancyhead[L]{\leftmark}  % Chapter title in header
% \fancyfoot[C]{\thepage}   % Page number in footer

% Title page
% \title{\textbf{Master's Thesis in Biotechnology}\\ \large Design of Plasmids and Primers for \textit{dhaA} Gene Mutations Using Machine Learning Techniques}

\title{\large Design of Plasmids and Primers for \textit{dhaA} Gene Mutations Using Machine Learning Techniques}
% \author{Your Name\ \textit{Supervised by: Supervisor's Name}}
% \date{\today}
% \nodate

\input{commit.tex}

\begin{document}

% Title page
\maketitle

\newpage
% Abstract
\section*{Abstract}
% \addcontentsline{toc}{chapter}{Abstract}
This thesis focuses on the computational design and optimization of a~system for mutating and studying the \textit{dhaA} gene. The study begins with a~comprehensive biophysical analysis of the DhaA enzyme to identify critical structural regions, such as the catalytic channel, active sites, and key residues, using a~combination of computational tools. A~subsequent bioinformatics analysis of the \textit{dhaA} gene sequence supports mapping these regions and identifying potential mutation targets. Based on these findings, mutagenesis strategies are developed to enhance enzymatic activity and stability. Plasmid constructs are then designed to include wild-type and mutant variants of the \textit{dhaA} gene, ensuring compatibility with expression systems. Finally, machine learning algorithms are employed to optimize the entire process, including primer design, plasmid configuration, and mutation efficacy. This integrated computational workflow lays the groundwork for experimental validation and future applications in biotechnology.
% Table of contents
\newpage
\tableofcontents

% List of figures and tables
% \listoffigures
% \listoftables
% \newpage

\newpage
\footnotetext[1]{The structure of DhaA is available in the Protein Data Bank under PDB ID: \href{https://www.rcsb.org/structure/4E46}{4E46}~\cite{rcsb_4e46}.}

% Chapter 1: Introduction
\section{Introduction}
Dehalogenases are enzymes with significant environmental and biotechnological importance due to their ability to catalyze the breakdown of halogenated compounds~\cite{janssen2001dehalogenases, fumio2008dehalogenase_review}. These compounds, often present in industrial waste and agricultural runoff, are persistent pollutants known for their toxicity and environmental impact~\cite{fumio2008dehalogenase_review}. By hydrolyzing the carbon-halogen bond, dehalogenases convert harmful haloalkanes into less toxic alcohols and halide ions, making them ideal for use in bioremediation processes~\cite{kulig2008biotech}.

Among the various dehalogenases, the DhaA\footnotemark[1] enzyme has gained considerable attention due to its high catalytic efficiency and broad substrate specificity~\cite{chaloupkova2011structure_function}. Its potential applications extend beyond bioremediation, including synthetic biology and green chemistry, where engineered variants are used to perform environmentally friendly reactions~\cite{janssen2001dehalogenases}. Advances in genetic and protein engineering have further enhanced the utility of dehalogenases, making them indispensable tools for addressing environmental pollution and enabling sustainable biotechnological solutions~\cite{fumio2008dehalogenase_review, chaloupkova2011structure_function}.


Here’s the revised and expanded version with a stronger emphasis on machine learning (ML) and artificial intelligence (AI) in bioinformatics:

The growing interest in dehalogenases has been complemented by significant advancements in computational and experimental techniques. Biophysical tools, such as structural analysis using X-ray crystallography and molecular dynamics simulations, have played a pivotal role in unraveling the catalytic mechanisms of these enzymes. These approaches enable researchers to pinpoint critical residues involved in substrate binding and catalysis, shedding light on the intricate interplay between enzyme structure and function.

Bioinformatics tools have further expanded the possibilities in enzyme research. Techniques such as sequence alignment and homology modeling facilitate the identification of conserved regions and structural motifs essential for enzymatic activity. Docking studies provide valuable insights into substrate specificity and help visualize interactions at the molecular level, enabling the design of tailored modifications to improve enzyme performance.

In recent years, the integration of machine learning (ML) and artificial intelligence (AI) has revolutionized bioinformatics and enzyme engineering. ML algorithms, such as support vector machines, random forests, and neural networks, are now widely used to predict the functional impact of mutations on enzyme activity and stability. AI-powered tools can analyze vast datasets, uncover hidden patterns, and generate predictive models, significantly accelerating the discovery of novel enzyme variants.

Moreover, deep learning techniques have enabled the creation of advanced models for protein structure prediction, such as AlphaFold, which has transformed the field of structural biology. These models provide unprecedented accuracy in predicting enzyme conformations, aiding in the exploration of structure-function relationships. By applying AI-driven optimization algorithms, researchers can design mutations that enhance catalytic efficiency or alter substrate specificity, tailoring dehalogenases for specific industrial or environmental applications.

The combination of traditional computational techniques with cutting-edge AI methodologies allows for a holistic approach to enzyme engineering. This integration not only accelerates the development of biocatalysts but also broadens the scope of their applications in sustainable biotechnology and green chemistry. By leveraging these tools, researchers can address pressing environmental challenges and pave the way for innovative solutions in enzyme design.

\newpage
\subsection{The DhaA Enzyme and the \textit{dhaA} Gene}
The DhaA\footnotemark[1] enzyme, derived from the haloalkane dehalogenase family, plays a crucial role in catalyzing the hydrolysis of halogenated hydrocarbons~\cite{reference1}.
It is of particular interest due to its potential applications in bioremediation and synthetic biology~\cite{reference2, reference3}. Structurally, DhaA consists of a core $\alpha/\beta$ hydrolase fold and features a catalytic triad that facilitates substrate hydrolysis~\cite{reference4}. The enzyme's catalytic channel is a key structural component that determines substrate specificity and efficiency~\cite{reference5}.

\begin{figure}[h]
    \centering
    \begin{minipage}{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{figs/DhaA_structure.png}
        \vspace{0.2cm}
        \textbf{(A)} Overall structure of DhaA
    \end{minipage}
    \hfill
    \begin{minipage}{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{figs/DhaA_active_site.png}
        \vspace{0.2cm}
        \textbf{(B)} DhaA catalytic center containing 2-propanol and Cl$^-$
    \end{minipage}
    \caption{Illustration of DhaA and its catalytic center. (A) Shows the overall structure of DhaA, highlighting the $\alpha/\beta$-hydrolase fold. (B) Depicts the active site of DhaA with the products 2-propanol and chloride ion (Cl$^-$) bound, demonstrating the enzyme’s dehalogenation capability.}
    \label{fig:DhaA_panels}
\end{figure}

\noindent
\textbf{Structural Parameters}
\begin{longenum}
    \item \textbf{Overall Structure}:
    \begin{longenum}
        \item The enzyme adopts a core $\alpha/\beta$ hydrolase fold, characteristic of haloalkane dehalogenases~\cite{chaloupkova2011structure_function}.
        \item The catalytic triad, comprising residues \textbf{Asp124}, \textbf{His289}, and \textbf{Glu150}, is located at the active site~\cite{rcsb_4e46}.
    \end{longenum}
    \item \textbf{Catalytic Channel}:
    \begin{longenum}
        \item The channel has a conical shape with a diameter ranging from approximately \textbf{6-8 Å}.
        \item It features hydrophobic residues such as \textbf{Trp107}, \textbf{Phe149}, \textbf{Asn38}, and \textbf{Asn178} to facilitate substrate binding~\cite{chaloupkova2011structure_function}.
        \item The channel includes two main pathways:
        \begin{longenum}
            \item A \textbf{main tunnel} directing substrates to the active site.
            \item A \textbf{product tunnel} for efficient release of reaction products~\cite{janssen2001dehalogenases}.
        \end{longenum}
    \end{longenum}
\end{longenum}

\textbf{Functional Parameters}
\begin{longenum}
    \item \textbf{Substrates and Products}:
    \begin{longenum}
        \item DhaA hydrolyzes haloalkanes such as \textbf{1,2-dichloroethane}, \textbf{1,2-dibromoethane}, and \textbf{1-chlorobutane}.
        \item The reaction products include corresponding alcohols (e.g., ethanol, butanol) and halide ions (\textbf{Cl⁻}, \textbf{Br⁻})~\cite{chaloupkova2011structure_function}.
    \end{longenum}
    \item \textbf{Kinetic Parameters}:
    \begin{longenum}
        \item The catalytic turnover rate (\textbf{k\(_{cat}\)}) ranges between \textbf{2-10 s\(^{-1}\)} depending on the substrate.
        \item The substrate affinity (\textbf{K\(_m\)}) is in the micromolar to millimolar range~\cite{janssen2001dehalogenases}.
    \end{longenum}
    \item \textbf{Optimal Conditions}:
    \begin{longenum}
        \item Temperature: Optimal enzymatic activity is observed at \textbf{30-37°C}.
        \item pH: The enzyme is most active within a pH range of \textbf{7.0-8.5}~\cite{chaloupkova2011structure_function}.
    \end{longenum}
\end{longenum}

\textbf{Mechanism and Mutagenesis Insights}
\begin{longenum}
    \item \textbf{Catalytic Mechanism}:
    \begin{longenum}
        \item The catalytic triad facilitates the hydrolysis of carbon-halogen bonds by nucleophilic attack on the substrate~\cite{rcsb_4e46}.
        \item The reaction produces halide ions and alcohols while stabilizing transition states through hydrogen bonding.
    \end{longenum}
    \item \textbf{Engineering Enhancements}:
    \begin{longenum}
        \item Mutations targeting the catalytic channel (e.g., \textbf{Tyr176}, \textbf{Leu177}) enhance substrate specificity and catalytic efficiency.
        \item Stabilizing mutations (e.g., \textbf{Ser176Phe}) improve thermal stability and broaden substrate compatibility~\cite{chaloupkova2011structure_function}.
    \end{longenum}
\end{longenum}


% \subsection{The \textit{dhaA} Gene}
The \textit{dhaA}\footnote{The \textit{dhaA} gene was first characterized in \textit{Rhodococcus rhodochrous}, a Gram-positive bacterium known for its ability to degrade haloalkanes.} gene encodes the DhaA enzyme and is a subject of genetic engineering to improve its activity and stability~\cite{dhaA_origin}. Mutagenesis of specific residues, particularly those within the catalytic channel, has been explored to enhance its enzymatic properties~\cite{dhaA_mutagenesis1, dhaA_mutagenesis2}. The sequence analysis of \textit{dhaA} provides insights into regions suitable for targeted modifications~\cite{rcsb_4e46}.

\subsection{Computational and Experimental Tools}
This study utilizes a combination of computational and experimental approaches:
\begin{longenum}
    \item \textbf{Biophysical Analyses}: Tools such as HOLE are employed to characterize the catalytic channel and identify structural features critical for enzymatic function.
    \item \textbf{Bioinformatics Tools}: Biopython and related libraries are used for sequence analysis, primer design, and structural modeling.
    \item \textbf{Machine Learning Algorithms}: Scikit-learn is utilized for optimizing primer sequences and predicting the impact of mutations on enzyme performance.
\end{ilongenum}

\subsection{Significance of the Study}
The integration of biophysical, bioinformatics, and machine learning techniques enables a comprehensive approach to studying and engineering the \textit{dhaA} gene and its protein product. This study lays the groundwork for experimental validation and practical applications in biotechnology, particularly in the design of efficient biocatalysts.


% section 2: Computational Analysis of DhaA
\section{Computational Analysis of Haloalkane Dehalogenase}

\subsection{Biophysical Analysis Using HOLE}
The biophysical analysis focuses on characterizing the catalytic channel of the DhaA enzyme. HOLE was employed to:
\begin{longenum}
    \item Measure the dimensions of the channel and identify bottlenecks.
    \item Visualize the spatial configuration of the channel, highlighting regions critical for substrate binding and product release.
    \item Provide quantitative data for further mutagenesis strategies.
\end{longenum}

\subsubsection{Three Tasks for HOLE Analysis}
\begin{enumerate}
    \item Map the channel dimensions and generate a profile of pore radii along the pathway.
    \item Identify key structural bottlenecks that influence enzymatic function.
    \item Visualize the channel in top-down and side views to correlate spatial features with functional relevance.
\end{enumerate}

The HOLE analysis revealed:
\begin{itemize}
    \item Narrow regions within the channel that likely regulate substrate access.
    \item Spatial features that can be targeted for mutagenesis to enhance catalytic efficiency.
    \item Figures \ref{fig:hole_top} and \ref{fig:hole_side} illustrate the channel's geometry from different perspectives.
\end{itemize}

% % section 3: Computational Methods
% \section{Computational Methods}
% \section{Dataset Preparation}
% The structural dataset for the DhaA enzyme was obtained from the Protein Data Bank (PDB ID: 4E46). Preprocessing steps included:
% \begin{itemize}
%     \item Removing water molecules and irrelevant ligands.
%     \item Validating the structural integrity of the catalytic channel.
% \end{itemize}

% \section{Plasmid and Primer Design}
% Plasmids and primers were designed computationally to achieve:
% \begin{itemize}
%     \item High specificity for target regions of the \textit{dhaA} gene.
%     \item Compatibility with site-directed mutagenesis protocols.
%     \item Optimization of experimental efficiency using Biopython and machine learning tools.
% \end{itemize}

% \section{Optimization Techniques}
% Algorithms from Scikit-learn were employed to optimize primer sequences and predict the impact of mutations on enzymatic properties.

% % section 4: Results and Discussion
% \section{Results and Discussion}
% \section{Biophysical Analysis Results}
% The HOLE analysis successfully characterized the catalytic channel of the DhaA enzyme. Narrow regions and bottlenecks were identified as key targets for mutagenesis. Quantitative measurements and visualizations provided a foundation for designing improved variants of the enzyme.

% \section{Plasmid and Primer Design Results}
% Plasmid constructs were successfully designed to include both wild-type and mutant variants of \textit{dhaA}. Primers demonstrated high specificity and melting temperature compatibility, ensuring their effectiveness in experimental protocols.

% \section{Discussion of Results}
% The integration of biophysical analysis and computational design highlights the potential for improving the catalytic properties of DhaA. Identified bottlenecks offer clear targets for future mutagenesis experiments.

% section 5: Conclusion
\section{Conclusion}
\begin{itemize}
    \item The biophysical analysis of DhaA using HOLE provided critical insights into the enzyme's catalytic channel.
    \item Computationally designed plasmids and primers showed high potential for successful mutagenesis experiments.
    \item This work lays a computational foundation for further experimental validation and optimization of DhaA variants.
\end{itemize}

% References
\section*{References}
\addcontentsline{toc}{section}{References}
\bibliographystyle{plain}
\bibliography{references}  % Provide a .bib file with your references

% Appendices
\appendix
\section{Supplementary Data}
Include additional data, computational logs, or other relevant materials here.

\end{document}