diff --git a/README.md b/README.md index 83a5e01..ed692a8 100644 --- a/README.md +++ b/README.md @@ -20,11 +20,11 @@ Pkg.add("Backboner") ## Types and functions -The `Protein` type wraps a vector of `Chain`s, which in turn wraps the `Backbone{4}` type (4, because it stores the positions of 4 atoms per residue: N, CA, C, O). The `Backbone{N}` type has the `N` type parameter in order to remain flexible. It allows one pass only the N, CA, and C atoms of a backbone, such that the O atom positions can added in using the `add_oxygens` function. +Proteins are represented as vectors of `ProteinChain`s, which in turn wrap the `Backbone{3}` type to store the coordinates of N, Ca, and C atoms. -The secondary structure of an entire chain is described by a `Vector{Char}`, where '-' stands for coil/loop, 'H' for helix, and 'E' for strand. For assignment of secondary structure, this package uses the [AssigningSecondaryStructure.jl](https://github.com/MurrellGroup/AssigningSecondaryStructure.jl) package, which implements a simplified version of the DSSP algorithm. +The secondary structures of a chain are described by a `Vector{Char}`, where '-' stands for coil/loop, 'H' for helix, and 'E' for strand. For assignment of secondary structure, this package uses the [AssigningSecondaryStructure.jl](https://github.com/MurrellGroup/AssigningSecondaryStructure.jl) package, which implements a simplified version of the DSSP algorithm. -Protein backbones can be loaded from a PDB file using the `pdb_to_protein` function, which returns a `Protein` instance. Inversely, a `Protein` instance can be written to a PDB file using the `protein_to_pdb` function. +Proteins can be loaded from a PDB file using the `pdb_to_protein` function, which returns a `Vector{ProteinChain}` instance. Inversely, a `Vector{ProteinChain}` instance can be written to a PDB file using the `protein_to_pdb` function. ## Example @@ -32,12 +32,12 @@ Protein backbones can be loaded from a PDB file using the `pdb_to_protein` funct julia> using Backboner julia> protein = pdb_to_protein("test/data/1ZAK.pdb") -2-element Protein{Float32}: - Chain A with 220 residues - Chain B with 220 residues +2-element Vector{ProteinChain}: + ProteinChain A with 220 residues + ProteinChain B with 220 residues julia> chain = protein["A"] -Chain A with 220 residues +ProteinChain A with 220 residues julia> chain.backbone 3×4×220 Backbone{4, Float32}: diff --git a/docs/src/index.md b/docs/src/index.md index 26723ef..d5b33ef 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -33,25 +33,23 @@ The `Protein` type wraps a vector of `Chain`s. julia> using Backboner julia> protein = pdb_to_protein("test/data/1ZAK.pdb") -2-element Protein{Float32}: - Chain A with 220 residues - Chain B with 220 residues +2-element Vector{ProteinChain}: + ProteinChain A with 220 residues + ProteinChain B with 220 residues julia> chain = protein["A"] # chains can be accessed by name -Chain A with 220 residues +ProteinChain A with 220 residues julia> protein["A"] == protein[1] # numeric indexing also works true -julia> new_protein = Protein([protein["A"]]) # create a new protein with a single chain -1-element Protein{Float32}: - Chain A with 220 residues +julia> new_protein = [protein["A"]] # create a new protein with a single chain +1-element Vector{ProteinChain}: + ProteinChain A with 220 residues julia> protein_to_pdb(new_protein, "test/data/1ZAK_A.pdb"); ``` -The `Chain` type wraps the `Backbone{4}` type (4, because it stores the positions of 4 atoms per residue: N, CA, C, O). - ## API Reference ```@autodocs diff --git a/docs/src/oxygen.md b/docs/src/oxygen.md index bdc4535..7f682ac 100644 --- a/docs/src/oxygen.md +++ b/docs/src/oxygen.md @@ -6,12 +6,12 @@ The `Backbone` type has a type parameter `N` to represent the number of atoms pe julia> using Backboner julia> protein = pdb_to_protein("test/data/1ZAK.pdb") -2-element Protein{Float32}: - Chain A with 220 residues - Chain B with 220 residues +2-element Vector{ProteinChain}: + ProteinChain A with 220 residues + ProteinChain B with 220 residues julia> chain = protein["A"] -Chain A with 220 residues +ProteinChain A with 220 residues julia> backbone4 = chain.backbone 3×4×220 Backbone{4, Float32}: diff --git a/docs/src/types.md b/docs/src/types.md index 8379c52..01a43b1 100644 --- a/docs/src/types.md +++ b/docs/src/types.md @@ -6,23 +6,17 @@ The `Backbone` type is designed to efficiently store and manipulate the three-di `Backbone{N, T}` is a wrapper around a 3xNxL array, where: - **3** are the three spatial dimensions for the coordinates. -- **N** is the number of atoms per residue. +- **N** is the number of atoms in the backbone per residue. - **L** is the number of residues in the backbone. - **T** is the element type of the coordinate array. ## Chain -A `Chain` represents a protein chain, and holds an identifier (usually a single letter), backbone atom coordinates, the amino acid sequence, and secondary structure information. +A `ProteinChain` represents a protein chain, and holds an identifier (usually a single letter), backbone atom coordinates, the amino acid sequence, and secondary structure information. +- `id`: A string identifier for the chain. - `backbone`: An instance of `Backbone{4}`, storing the coordinates of backbone atoms. - `aavector`: A vector for storing the amino acid sequence. - `ssvector`: A vector for storing the secondary structure. -The `Chain` type is designed to provide a comprehensive and consistent representation of a protein chain, ensuring that the backbone coordinates align with the corresponding amino acid sequences and secondary structures. - -## Protein - -The `Protein` type holds multiple `Chain` instances, representing complete protein structures. - -- Stores a collection of `Chain` objects. -- Includes a dictionary for quick access to chains via their identifiers. +The `ProteinChain` type is designed to provide a comprehensive and consistent representation of a protein chain, ensuring that the backbone coordinates align with the corresponding amino acid sequences and secondary structures. \ No newline at end of file diff --git a/src/assign.jl b/src/assign.jl index 7d38aae..219013b 100644 --- a/src/assign.jl +++ b/src/assign.jl @@ -7,7 +7,7 @@ import AssigningSecondaryStructure: assign_secondary_structure!, assign_secondar Uses a simplified version of DSSP to fill the secondary structure vector of each chain with '-' (coil/loop), 'H' (helix), and 'E' (strand). """ -function assign_secondary_structure!(protein::Protein) +function assign_secondary_structure!(protein::Vector{ProteinChain}) ss_vectors = assign_secondary_structure([chain.backbone.coords for chain in protein]) for (chain, ssvector) in zip(protein, ss_vectors) @assert length(chain.ssvector) == length(ssvector) @@ -21,7 +21,7 @@ end Returns a new protein with secondary structure assigned. """ -function assign_secondary_structure(protein::Protein) +function assign_secondary_structure(protein::Vector{ProteinChain}) new_protein = deepcopy(protein) assign_secondary_structure!(new_protein) return new_protein diff --git a/src/chain.jl b/src/chain.jl index 8e6584a..0737249 100644 --- a/src/chain.jl +++ b/src/chain.jl @@ -1,17 +1,17 @@ -export Chain +export ProteinChain """ - Chain <: AbstractVector{Residue} + ProteinChain <: AbstractVector{Residue} A chain has an identifier (usually a single letter) and holds the backbone atom coordinates, amino acid sequence, and secondary structures of a protein chain. """ -struct Chain <: AbstractVector{Residue} +struct ProteinChain <: AbstractVector{Residue} id::AbstractString backbone::Backbone{4} aavector::Vector{Char} ssvector::Vector{Char} - function Chain( + function ProteinChain( id::AbstractString, backbone::Backbone{N}; aavector::Vector{Char} = fill('G', length(backbone)), @@ -26,15 +26,15 @@ struct Chain <: AbstractVector{Residue} return new(id, backbone, aavector, ssvector) end - Chain(backbone::Backbone; kwargs...) = Chain("_", backbone; kwargs...) + ProteinChain(backbone::Backbone; kwargs...) = ProteinChain("_", backbone; kwargs...) end -@inline Base.:(==)(chain1::Chain, chain2::Chain) = chain1.id == chain2.id && chain1.backbone == chain2.backbone && chain1.ssvector == chain2.ssvector -@inline Base.length(chain::Chain) = length(chain.backbone) -@inline Base.size(chain::Chain) = (length(chain),) -@inline Base.getindex(chain::Chain, i::Integer) = Residue(i, chain.backbone, chain.aavector[i], chain.ssvector[i]) +@inline Base.:(==)(chain1::ProteinChain, chain2::ProteinChain) = chain1.id == chain2.id && chain1.backbone == chain2.backbone && chain1.ssvector == chain2.ssvector +@inline Base.length(chain::ProteinChain) = length(chain.backbone) +@inline Base.size(chain::ProteinChain) = (length(chain),) +@inline Base.getindex(chain::ProteinChain, i::Integer) = Residue(i, chain.backbone, chain.aavector[i], chain.ssvector[i]) -Base.summary(chain::Chain) = "Chain $(chain.id) with $(length(chain)) residue$(length(chain) == 1 ? "" : "s")" -Base.show(io::IO, chain::Chain) = print(io, summary(chain)) +Base.summary(chain::ProteinChain) = "ProteinChain $(chain.id) with $(length(chain)) residue$(length(chain) == 1 ? "" : "s")" +Base.show(io::IO, chain::ProteinChain) = print(io, summary(chain)) -has_assigned_ss(chain::Chain) = has_assigned_ss(chain.ssvector) +has_assigned_ss(chain::ProteinChain) = has_assigned_ss(chain.ssvector) diff --git a/src/io.jl b/src/io.jl index 58ed290..44168b2 100644 --- a/src/io.jl +++ b/src/io.jl @@ -37,19 +37,12 @@ function Backbone(atoms::Vector{PDBTools.Atom}) return Backbone(coords) end -function Chain(atoms::Vector{PDBTools.Atom}) +function ProteinChain(atoms::Vector{PDBTools.Atom}) id = PDBTools.chain(atoms[1]) @assert allequal(PDBTools.chain.(atoms)) "atoms must be from the same chain" backbone = Backbone(atoms) aavector = [get(ONE_LETTER_AA_CODES, atom.resname, 'X') for atom in atoms if atom.name == "CA"] - return Chain(id, backbone, aavector=aavector) -end - -function Protein(atoms::Vector{PDBTools.Atom}) - filter!(a -> a.name in ["N", "CA", "C", "O"], atoms) - ids = PDBTools.chain.(atoms) - chains = [Chain(atoms[ids .== id]) for id in unique(ids)] - return Protein(chains) + return ProteinChain(id, backbone, aavector=aavector) end """ @@ -57,9 +50,15 @@ end Assumes that each residue starts with four atoms: N, CA, C, O. """ -pdb_to_protein(filename::String) = Protein(PDBTools.readPDB(filename)) +function pdb_to_protein(filename::String) + atoms = PDBTools.readPDB(filename) + filter!(a -> a.name in ["N", "CA", "C", "O"], atoms) + ids = PDBTools.chain.(atoms) + chains = [ProteinChain(atoms[ids .== id]) for id in unique(ids)] + return chains +end -function protein_to_pdb(protein::Protein, filename, header=:auto, footer=:auto) +function protein_to_pdb(protein::Vector{ProteinChain}, filename, header=:auto, footer=:auto) atoms = PDBTools.Atom[] index = 0 residue_index = 0 diff --git a/src/protein.jl b/src/protein.jl index b11042b..006e2a2 100644 --- a/src/protein.jl +++ b/src/protein.jl @@ -1,28 +1,3 @@ -export Protein +@inline Base.getindex(protein::AbstractVector{ProteinChain}, id::AbstractString) = protein[findfirst(c -> c.id == id, protein)] -""" - Protein <: AbstractVector{Chain} - -A wrapper for a vector of chains. -Chains can be accessed by index or by ID. -""" -struct Protein <: AbstractVector{Chain} - chains::Vector{Chain} - id_dict::Dict{AbstractString, Chain} - - function Protein(chains::Vector{Chain}) - @assert length(unique([chain.id for chain in chains])) == length(chains) - id_dict = Dict{AbstractString, Chain}(chain.id => chain for chain in chains) - return new(chains, id_dict) - end -end - -@inline Base.:(==)(protein1::Protein, protein2::Protein) = protein1.chains == protein2.chains -@inline Base.size(protein::Protein) = size(protein.chains) -@inline Base.length(protein::Protein) = length(protein.chains) -@inline Base.getindex(protein::Protein, i) = protein.chains[i] -@inline Base.getindex(protein::Protein, id::AbstractString) = protein.id_dict[String(id)] - -Base.summary(protein::Protein) = "Protein with $(length(protein)) chain$(length(protein) == 1 ? "" : "s")" - -has_assigned_ss(protein::Protein) = all(has_assigned_ss, protein.chains) \ No newline at end of file +has_assigned_ss(protein::AbstractVector{ProteinChain}) = all(has_assigned_ss, protein) \ No newline at end of file diff --git a/test/chain.jl b/test/chain.jl index 4a37fc6..ee07af0 100644 --- a/test/chain.jl +++ b/test/chain.jl @@ -4,7 +4,7 @@ coords = randn(3, 4, 5) backbone = Backbone(coords) - chain = Chain("A", backbone) + chain = ProteinChain("A", backbone) @test chain.id == "A" @test chain.backbone.coords == coords @test chain.aavector == fill('G', length(chain)) @@ -12,12 +12,12 @@ @test !has_assigned_ss(chain) @test length(chain) == 5 @test size(chain) == (5,) - @test Chain(remove_column(backbone, 4)).backbone == add_oxygens(remove_column(backbone, 4)) - @test Chain(backbone).id == "_" + @test ProteinChain(remove_column(backbone, 4)).backbone == add_oxygens(remove_column(backbone, 4)) + @test ProteinChain(backbone).id == "_" @test chain[1] == Residue(1, backbone, 'G', ' ') - @test summary(chain) == "Chain A with 5 residues" + @test summary(chain) == "ProteinChain A with 5 residues" io = IOBuffer() show(io, chain) diff --git a/test/protein.jl b/test/protein.jl index df6ec65..8292ed4 100644 --- a/test/protein.jl +++ b/test/protein.jl @@ -1,14 +1,12 @@ @testset "protein.jl" begin @testset "Protein" begin - A = Chain("A", Backbone(randn(3, 4, 5))) - B = Chain("B", Backbone(randn(3, 4, 6))) - protein = Protein([A, B]) + A = ProteinChain("A", Backbone(randn(3, 4, 5))) + B = ProteinChain("B", Backbone(randn(3, 4, 6))) + protein = [A, B] @test protein[1] == protein["A"] == A @test protein[2] == protein["B"] == B - @test length(protein) == 2 @test length.(protein) == [5, 6] - @test summary(protein) == "Protein with 2 chains" @test !has_assigned_ss(protein) end