Retire Sahara: remove repo content

Sahara project is retiring
- https://review.opendev.org/c/openstack/governance/+/919374

this commit remove the content of this project repo

Depends-On: https://review.opendev.org/c/openstack/project-config/+/919376
Change-Id: I65e351ac7287c871ff4b9b38d89d2fbe5aa1ac22
This commit is contained in:
Ghanshyam Mann 2024-05-10 17:29:31 -07:00
parent 48a42bcd45
commit 12d6cc42ae
78 changed files with 8 additions and 5069 deletions

30
.gitignore vendored
View File

@ -1,30 +0,0 @@
*.egg-info
*.egg[s]
*.log
*.py[co]
.coverage
.testrepository
.tox
.stestr
.venv
.idea
AUTHORS
ChangeLog
build
cover
develop-eggs
dist
doc/build
doc/html
eggs
etc/sahara.conf
etc/sahara/*.conf
etc/sahara/*.topology
sdist
target
tools/lintstack.head.py
tools/pylint_exceptions
doc/source/sample.config
# Files created by releasenotes build
releasenotes/build

View File

@ -1,3 +0,0 @@
[DEFAULT]
test_path=./sahara_plugin_spark/tests/unit
top_dir=./

View File

@ -1,10 +0,0 @@
- project:
templates:
- check-requirements
- openstack-python3-zed-jobs
- publish-openstack-docs-pti
- release-notes-jobs-python3
check:
jobs:
- sahara-buildimages-spark:
voting: false

View File

@ -1,19 +0,0 @@
The source repository for this project can be found at:
https://opendev.org/openstack/sahara-plugin-spark
Pull requests submitted through GitHub are not monitored.
To start contributing to OpenStack, follow the steps in the contribution guide
to set up and use Gerrit:
https://docs.openstack.org/contributors/code-and-documentation/quick-start.html
Bugs should be filed on Storyboard:
https://storyboard.openstack.org/#!/project/openstack/sahara-plugin-spark
For more specific information about contributing to this repository, see the
sahara-plugin-spark contributor guide:
https://docs.openstack.org/sahara-plugin-spark/latest/contributor/contributing.html

175
LICENSE
View File

@ -1,175 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

View File

@ -1,38 +1,10 @@
========================
Team and repository tags
========================
This project is no longer maintained.
.. image:: https://governance.openstack.org/tc/badges/sahara.svg
:target: https://governance.openstack.org/tc/reference/tags/index.html
.. Change things from this point on
OpenStack Data Processing ("Sahara") Spark Plugin
==================================================
OpenStack Sahara Spark Plugin provides the users the option to
start Spark clusters on OpenStack Sahara.
Check out OpenStack Sahara documentation to see how to deploy the
Spark Plugin.
Sahara at wiki.openstack.org: https://wiki.openstack.org/wiki/Sahara
Storyboard project: https://storyboard.openstack.org/#!/project/openstack/sahara-plugin-spark
Sahara docs site: https://docs.openstack.org/sahara/latest/
Quickstart guide: https://docs.openstack.org/sahara/latest/user/quickstart.html
How to participate: https://docs.openstack.org/sahara/latest/contributor/how-to-participate.html
Source: https://opendev.org/openstack/sahara-plugin-spark
Bugs and feature requests: https://storyboard.openstack.org/#!/project/openstack/sahara-plugin-spark
Release notes: https://docs.openstack.org/releasenotes/sahara-plugin-spark/
License
-------
Apache License Version 2.0 http://www.apache.org/licenses/LICENSE-2.0
The contents of this repository are still available in the Git
source code management system. To see the contents of this
repository before it reached its end of life, please check out the
previous commit with "git checkout HEAD^1".
For any further questions, please email
openstack-discuss@lists.openstack.org or join #openstack-dev on
OFTC.

View File

@ -1 +0,0 @@
[python: **.py]

View File

@ -1,9 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
openstackdocstheme>=2.2.1 # Apache-2.0
os-api-ref>=1.4.0 # Apache-2.0
reno>=3.1.0 # Apache-2.0
sphinx>=2.0.0,!=2.1.0 # BSD
sphinxcontrib-httpdomain>=1.3.0 # BSD
whereto>=0.3.0 # Apache-2.0

View File

@ -1,214 +0,0 @@
# -*- coding: utf-8 -*-
#
# sahara-plugin-spark documentation build configuration file.
#
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
'reno.sphinxext',
'openstackdocstheme',
]
# openstackdocstheme options
openstackdocs_repo_name = 'openstack/sahara-plugin-spark'
openstackdocs_pdf_link = True
openstackdocs_use_storyboard = True
openstackdocs_projects = [
'sahara'
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
copyright = '2015, Sahara team'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = []
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'native'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'openstackdocs'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
#html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'saharasparkplugin-testsdoc'
# -- Options for LaTeX output --------------------------------------------------
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'doc-sahara-plugin-spark.tex', 'Sahara Spark Plugin Documentation',
'Sahara team', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
smartquotes_excludes = {'builders': ['latex']}
# -- Options for manual page output --------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'sahara-plugin-spark', 'sahara-plugin-spark Documentation',
['Sahara team'], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'sahara-plugin-spark', 'sahara-plugin-spark Documentation',
'Sahara team', 'sahara-plugin-spark', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'

View File

@ -1,14 +0,0 @@
============================
So You Want to Contribute...
============================
For general information on contributing to OpenStack, please check out the
`contributor guide <https://docs.openstack.org/contributors/>`_ to get started.
It covers all the basics that are common to all OpenStack projects: the
accounts you need, the basics of interacting with our Gerrit review system, how
we communicate as a community, etc.
sahara-plugin-spark is maintained by the OpenStack Sahara project.
To understand our development process and how you can contribute to it, please
look at the Sahara project's general contributor's page:
http://docs.openstack.org/sahara/latest/contributor/contributing.html

View File

@ -1,8 +0,0 @@
=================
Contributor Guide
=================
.. toctree::
:maxdepth: 2
contributing

View File

@ -1,8 +0,0 @@
Spark plugin for Sahara
=======================
.. toctree::
:maxdepth: 2
user/index
contributor/index

View File

@ -1,9 +0,0 @@
==========
User Guide
==========
.. toctree::
:maxdepth: 2
spark-plugin

View File

@ -1,108 +0,0 @@
Spark Plugin
============
The Spark plugin for sahara provides a way to provision Apache Spark clusters
on OpenStack in a single click and in an easily repeatable fashion.
Currently Spark is installed in standalone mode, with no YARN or Mesos
support.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `spark` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 2.3
- Ubuntu 16.04, CentOS 7
- sahara-image-pack
- 2.3
- based on CDH 5.11
use --plugin_version to specify the minor version: 2.3.2 (default),
2.3.1 or 2.3.0
* - 2.3
- Ubuntu 16.04
- sahara-image-create
- 2.3.0
- based on CDH 5.11
* - 2.2
- Ubuntu 16.04, CentOS 7
- sahara-image-pack
- 2.2
- based on CDH 5.11
use --plugin_version to specify the minor version: 2.2.1 (default),
or 2.2.0
* - 2.2
- Ubuntu 16.04
- sahara-image-create
- 2.2.0
- based on CDH 5.11
For more information about building image, refer to
:sahara-doc:`Sahara documentation <user/building-guest-images.html>`.
The Spark plugin requires an image to be tagged in the sahara image registry
with two tags: 'spark' and '<Spark version>' (e.g. '1.6.0').
The image requires a username. For more information, refer to the
:sahara-doc:`registering image <user/registering-image.html>` section
of the Sahara documentation.
Note that the Spark cluster is deployed using the scripts available in the
Spark distribution, which allow the user to start all services (master and
slaves), stop all services and so on. As such (and as opposed to CDH HDFS
daemons), Spark is not deployed as a standard Ubuntu service and if the
virtual machines are rebooted, Spark will not be restarted.
Build settings
~~~~~~~~~~~~~~
When ``sahara-image-create`` is used, you can override few settings
by exporting the corresponding environment variables
before starting the build command:
* ``SPARK_DOWNLOAD_URL`` - download link for Spark
Spark configuration
-------------------
Spark needs few parameters to work and has sensible defaults. If needed they
can be changed when creating the sahara cluster template. No node group
options are available.
Once the cluster is ready, connect with ssh to the master using the `ubuntu`
user and the appropriate ssh key. Spark is installed in `/opt/spark` and
should be completely configured and ready to start executing jobs. At the
bottom of the cluster information page from the OpenStack dashboard, a link to
the Spark web interface is provided.
Cluster Validation
------------------
When a user creates an Hadoop cluster using the Spark plugin, the cluster
topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for the
Spark plugin:
+ Cluster must contain exactly one HDFS namenode
+ Cluster must contain exactly one Spark master
+ Cluster must contain at least one Spark slave
+ Cluster must contain at least one HDFS datanode
The tested configuration co-locates the NameNode with the master and a
DataNode with each slave to maximize data locality.

View File

@ -1,6 +0,0 @@
---
upgrade:
- |
Python 2.7 support has been dropped. Last release of sahara and its plugins
to support python 2.7 is OpenStack Train. The minimum version of Python now
supported by sahara and its plugins is Python 3.6.

View File

@ -1,4 +0,0 @@
---
features:
- |
Adding abilitiy to create spark images using Sahara Image Pack.

View File

@ -1,6 +0,0 @@
===========================
2023.2 Series Release Notes
===========================
.. release-notes::
:branch: stable/2023.2

View File

@ -1,210 +0,0 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Sahara Release Notes documentation build configuration file
extensions = [
'reno.sphinxext',
'openstackdocstheme'
]
# openstackdocstheme options
openstackdocs_repo_name = 'openstack/sahara-plugin-spark'
openstackdocs_use_storyboard = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
copyright = '2015, Sahara Developers'
# Release do not need a version number in the title, they
# cover multiple versions.
# The full version, including alpha/beta/rc tags.
release = ''
# The short X.Y version.
version = ''
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'native'
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'openstackdocs'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
# html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'SaharaSparkReleaseNotesdoc'
# -- Options for LaTeX output ---------------------------------------------
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
('index', 'SaharaSparkReleaseNotes.tex',
'Sahara Spark Plugin Release Notes Documentation',
'Sahara Developers', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'saharasparkreleasenotes',
'Sahara Spark Plugin Release Notes Documentation',
['Sahara Developers'], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'SaharaSparkReleaseNotes',
'Sahara Spark Plugin Release Notes Documentation',
'Sahara Developers', 'SaharaSparkReleaseNotes',
'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False
# -- Options for Internationalization output ------------------------------
locale_dirs = ['locale/']

View File

@ -1,16 +0,0 @@
===================================
Sahara Spark Plugin Release Notes
===================================
.. toctree::
:maxdepth: 1
unreleased
2023.2
zed
xena
wallaby
victoria
ussuri
train
stein

View File

@ -1,53 +0,0 @@
# Andreas Jaeger <jaegerandi@gmail.com>, 2019. #zanata
# Andreas Jaeger <jaegerandi@gmail.com>, 2020. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-04-24 23:45+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2020-04-25 10:40+0000\n"
"Last-Translator: Andreas Jaeger <jaegerandi@gmail.com>\n"
"Language-Team: German\n"
"Language: de\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
msgid "1.0.0"
msgstr "1.0.0"
msgid "Adding abilitiy to create spark images using Sahara Image Pack."
msgstr "Spark Abbilder können jetzt mit dem Sahara Image Pack erzeugt werden."
msgid "Current Series Release Notes"
msgstr "Aktuelle Serie Releasenotes"
msgid "New Features"
msgstr "Neue Funktionen"
msgid ""
"Python 2.7 support has been dropped. Last release of sahara and its plugins "
"to support python 2.7 is OpenStack Train. The minimum version of Python now "
"supported by sahara and its plugins is Python 3.6."
msgstr ""
"Python 2.7 Unterstützung wurde beendet. Der letzte Release von Sahara und "
"seinen Plugins der Python 2.7 unterstützt ist OpenStack Train. Die minimal "
"Python Version welche von Sahara und seinen Plugins unterstützt wird, ist "
"Python 3.6."
msgid "Sahara Spark Plugin Release Notes"
msgstr "Sahara Spark Plugin Releasenotes"
msgid "Stein Series Release Notes"
msgstr "Stein Serie Releasenotes"
msgid "Train Series Release Notes"
msgstr "Train Serie Releasenotes"
msgid "Upgrade Notes"
msgstr "Aktualisierungsnotizen"
msgid "Ussuri Series Release Notes"
msgstr "Ussuri Serie Releasenotes"

View File

@ -1,57 +0,0 @@
# Andi Chandler <andi@gowling.com>, 2020. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-10-07 22:08+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2020-11-04 12:47+0000\n"
"Last-Translator: Andi Chandler <andi@gowling.com>\n"
"Language-Team: English (United Kingdom)\n"
"Language: en_GB\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
msgid "1.0.0"
msgstr "1.0.0"
msgid "3.0.0"
msgstr "3.0.0"
msgid "Adding abilitiy to create spark images using Sahara Image Pack."
msgstr "Adding ability to create spark images using Sahara Image Pack."
msgid "Current Series Release Notes"
msgstr "Current Series Release Notes"
msgid "New Features"
msgstr "New Features"
msgid ""
"Python 2.7 support has been dropped. Last release of sahara and its plugins "
"to support python 2.7 is OpenStack Train. The minimum version of Python now "
"supported by sahara and its plugins is Python 3.6."
msgstr ""
"Python 2.7 support has been dropped. Last release of Sahara and its plugins "
"to support Python 2.7 is OpenStack Train. The minimum version of Python now "
"supported by Sahara and its plugins is Python 3.6."
msgid "Sahara Spark Plugin Release Notes"
msgstr "Sahara Spark Plugin Release Notes"
msgid "Stein Series Release Notes"
msgstr "Stein Series Release Notes"
msgid "Train Series Release Notes"
msgstr "Train Series Release Notes"
msgid "Upgrade Notes"
msgstr "Upgrade Notes"
msgid "Ussuri Series Release Notes"
msgstr "Ussuri Series Release Notes"
msgid "Victoria Series Release Notes"
msgstr "Victoria Series Release Notes"

View File

@ -1,33 +0,0 @@
# Surit Aryal <aryalsurit@gmail.com>, 2019. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2019-07-23 14:26+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2019-08-02 08:15+0000\n"
"Last-Translator: Surit Aryal <aryalsurit@gmail.com>\n"
"Language-Team: Nepali\n"
"Language: ne\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
msgid "1.0.0"
msgstr "१.."
msgid "Adding abilitiy to create spark images using Sahara Image Pack."
msgstr "Sahara Image Pack प्रयोग गरेर स्पार्क छविहरू सिर्जना गर्ने क्षमता थप्दै।"
msgid "Current Series Release Notes"
msgstr "Current Series रिलीज नोट्स"
msgid "New Features"
msgstr "नयाँ सुविधाहरू"
msgid "Sahara Spark Plugin Release Notes"
msgstr "Sahara Spark Plugin नोट जारी गर्नुहोस्"
msgid "Stein Series Release Notes"
msgstr "Stein Series नोट जारी गर्नुहोस्"

View File

@ -1,6 +0,0 @@
===================================
Stein Series Release Notes
===================================
.. release-notes::
:branch: stable/stein

View File

@ -1,6 +0,0 @@
==========================
Train Series Release Notes
==========================
.. release-notes::
:branch: stable/train

View File

@ -1,5 +0,0 @@
==============================
Current Series Release Notes
==============================
.. release-notes::

View File

@ -1,6 +0,0 @@
===========================
Ussuri Series Release Notes
===========================
.. release-notes::
:branch: stable/ussuri

View File

@ -1,6 +0,0 @@
=============================
Victoria Series Release Notes
=============================
.. release-notes::
:branch: stable/victoria

View File

@ -1,6 +0,0 @@
============================
Wallaby Series Release Notes
============================
.. release-notes::
:branch: stable/wallaby

View File

@ -1,6 +0,0 @@
=========================
Xena Series Release Notes
=========================
.. release-notes::
:branch: stable/xena

View File

@ -1,6 +0,0 @@
========================
Zed Series Release Notes
========================
.. release-notes::
:branch: stable/zed

View File

@ -1,14 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
pbr!=2.1.0,>=2.0.0 # Apache-2.0
Babel!=2.4.0,>=2.3.4 # BSD
eventlet>=0.26.0 # MIT
oslo.i18n>=3.15.3 # Apache-2.0
oslo.log>=3.36.0 # Apache-2.0
oslo.serialization!=2.19.1,>=2.18.0 # Apache-2.0
oslo.utils>=3.33.0 # Apache-2.0
requests>=2.14.2 # Apache-2.0
sahara>=10.0.0.0b1

View File

@ -1,26 +0,0 @@
# Copyright (c) 2014 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# It's based on oslo.i18n usage in OpenStack Keystone project and
# recommendations from https://docs.openstack.org/oslo.i18n/latest/
# user/usage.html
import oslo_i18n
_translators = oslo_i18n.TranslatorFactory(domain='sahara_plugin_spark')
# The primary translation function using the well-known name "_"
_ = _translators.primary

View File

@ -1,63 +0,0 @@
# Andreas Jaeger <jaegerandi@gmail.com>, 2019. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark VERSION\n"
"Report-Msgid-Bugs-To: https://bugs.launchpad.net/openstack-i18n/\n"
"POT-Creation-Date: 2019-09-20 17:24+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2019-09-25 06:54+0000\n"
"Last-Translator: Andreas Jaeger <jaegerandi@gmail.com>\n"
"Language-Team: German\n"
"Language: de\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
#, python-format
msgid "%s or more"
msgstr "%s oder mehr"
msgid "1 or more"
msgstr "1 oder mehr"
msgid "Await DataNodes start up"
msgstr "Warten, bis DataNodes gestartet wird"
#, python-format
msgid "Decommission %s"
msgstr "Außerkraftsetzung %s"
#, python-format
msgid "Number of %(dn)s instances should not be less than %(replication)s"
msgstr ""
"Die Anzahl der %(dn)s-Instanzen sollte nicht kleiner als %(replication)s sein"
msgid "Push configs to nodes"
msgstr "Push-Konfigurationen zu Knoten"
#, python-format
msgid "Spark plugin cannot scale nodegroup with processes: %s"
msgstr "Das Spark-Plugin kann Knotengruppen nicht mit Prozessen skalieren: %s"
#, python-format
msgid ""
"Spark plugin cannot shrink cluster because there would be not enough nodes "
"for HDFS replicas (replication factor is %s)"
msgstr ""
"Das Spark-Plug-in kann den Cluster nicht verkleinern, da nicht genügend "
"Knoten für HDFS-Replikate vorhanden sind (der Replikationsfaktor ist%s)"
msgid "Spark {base} or higher required to run {type} jobs"
msgstr "Spark {base} oder höher erforderlich, um {type} Jobs auszuführen"
msgid ""
"This plugin provides an ability to launch Spark on Hadoop CDH cluster "
"without any management consoles."
msgstr ""
"Dieses Plugin bietet die Möglichkeit, Spark auf dem Hadoop CDH-Cluster ohne "
"Verwaltungskonsolen zu starten."
#, python-format
msgid "Waiting on %d DataNodes to start up"
msgstr "Warten auf %d DataNodes zum Starten"

View File

@ -1,62 +0,0 @@
# Andi Chandler <andi@gowling.com>, 2020. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark VERSION\n"
"Report-Msgid-Bugs-To: https://bugs.launchpad.net/openstack-i18n/\n"
"POT-Creation-Date: 2020-04-26 20:56+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2020-05-05 11:25+0000\n"
"Last-Translator: Andi Chandler <andi@gowling.com>\n"
"Language-Team: English (United Kingdom)\n"
"Language: en_GB\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
#, python-format
msgid "%s or more"
msgstr "%s or more"
msgid "1 or more"
msgstr "1 or more"
msgid "Await DataNodes start up"
msgstr "Await DataNodes start up"
#, python-format
msgid "Decommission %s"
msgstr "Decommission %s"
#, python-format
msgid "Number of %(dn)s instances should not be less than %(replication)s"
msgstr "Number of %(dn)s instances should not be less than %(replication)s"
msgid "Push configs to nodes"
msgstr "Push configs to nodes"
#, python-format
msgid "Spark plugin cannot scale nodegroup with processes: %s"
msgstr "Spark plugin cannot scale nodegroup with processes: %s"
#, python-format
msgid ""
"Spark plugin cannot shrink cluster because there would be not enough nodes "
"for HDFS replicas (replication factor is %s)"
msgstr ""
"Spark plugin cannot shrink cluster because there would be not enough nodes "
"for HDFS replicas (replication factor is %s)"
msgid "Spark {base} or higher required to run {type} jobs"
msgstr "Spark {base} or higher required to run {type} jobs"
msgid ""
"This plugin provides an ability to launch Spark on Hadoop CDH cluster "
"without any management consoles."
msgstr ""
"This plugin provides an ability to launch Spark on Hadoop CDH cluster "
"without any management consoles."
#, python-format
msgid "Waiting on %d DataNodes to start up"
msgstr "Waiting on %d DataNodes to start up"

View File

@ -1,64 +0,0 @@
# Andreas Jaeger <jaegerandi@gmail.com>, 2019. #zanata
# suhartono <cloudsuhartono@gmail.com>, 2019. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark VERSION\n"
"Report-Msgid-Bugs-To: https://bugs.launchpad.net/openstack-i18n/\n"
"POT-Creation-Date: 2019-09-30 09:25+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2019-10-08 10:52+0000\n"
"Last-Translator: Andreas Jaeger <jaegerandi@gmail.com>\n"
"Language-Team: Indonesian\n"
"Language: id\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=1; plural=0\n"
#, python-format
msgid "%s or more"
msgstr "%s or more"
msgid "1 or more"
msgstr "1 atau lebih"
msgid "Await DataNodes start up"
msgstr "Tunggu DataNodes mulai"
#, python-format
msgid "Decommission %s"
msgstr "Decommission %s"
#, python-format
msgid "Number of %(dn)s instances should not be less than %(replication)s"
msgstr "Jumlah instance %(dn)s tidak boleh kurang dari %(replication)s"
msgid "Push configs to nodes"
msgstr "Dorong konfigurasi ke node"
#, python-format
msgid "Spark plugin cannot scale nodegroup with processes: %s"
msgstr "Plugin Spark tidak dapat menskala nodegroup dengan proses: %s"
#, python-format
msgid ""
"Spark plugin cannot shrink cluster because there would be not enough nodes "
"for HDFS replicas (replication factor is %s)"
msgstr ""
"Plugin Spark tidak dapat mengecilkan cluster karena tidak akan ada cukup "
"node untuk replika HDFS (faktor replikasi adalah %s)"
msgid "Spark {base} or higher required to run {type} jobs"
msgstr ""
"Spark {base} atau lebih tinggi diperlukan untuk menjalankan jobs {type}"
msgid ""
"This plugin provides an ability to launch Spark on Hadoop CDH cluster "
"without any management consoles."
msgstr ""
"Plugin ini menyediakan kemampuan untuk meluncurkan Spark pada cluster Hadoop "
"CDH tanpa konsol manajemen."
#, python-format
msgid "Waiting on %d DataNodes to start up"
msgstr "Menunggu %d DataNodes untuk memulai"

View File

@ -1,62 +0,0 @@
# Surit Aryal <aryalsurit@gmail.com>, 2019. #zanata
msgid ""
msgstr ""
"Project-Id-Version: sahara-plugin-spark VERSION\n"
"Report-Msgid-Bugs-To: https://bugs.launchpad.net/openstack-i18n/\n"
"POT-Creation-Date: 2019-07-23 14:26+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"PO-Revision-Date: 2019-08-02 08:40+0000\n"
"Last-Translator: Surit Aryal <aryalsurit@gmail.com>\n"
"Language-Team: Nepali\n"
"Language: ne\n"
"X-Generator: Zanata 4.3.3\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
#, python-format
msgid "%s or more"
msgstr "%s वा अधिक"
msgid "1 or more"
msgstr "१ वा अधिक"
msgid "Await DataNodes start up"
msgstr "प्रतीक्षा डाटा Nodes शुरू"
#, python-format
msgid "Decommission %s"
msgstr "Decommission %s"
#, python-format
msgid "Number of %(dn)s instances should not be less than %(replication)s"
msgstr "%(dn)s घटनाहरूको संख्या %(replication)s भन्दा कम हुनुहुन्न"
msgid "Push configs to nodes"
msgstr "कन्फिगरेसन नोडहरूमा पुश गर्नुहोस्"
#, python-format
msgid "Spark plugin cannot scale nodegroup with processes: %s"
msgstr "Spark pluginले प्रक्रियाहरूसँग नोड ग्रुप मापन गर्न सक्दैन: %s"
#, python-format
msgid ""
"Spark plugin cannot shrink cluster because there would be not enough nodes "
"for HDFS replicas (replication factor is %s)"
msgstr ""
"Spark plugin क्लस्टर कम गर्न सक्दैन किनभने त्यहाँ HDFS प्रतिकृतिहरु लागि पर्याप्त नोड्स "
"(प्रतिकृति कारक %s हो)"
msgid "Spark {base} or higher required to run {type} jobs"
msgstr "Spark {base} or higher required to run {type} jobs"
msgid ""
"This plugin provides an ability to launch Spark on Hadoop CDH cluster "
"without any management consoles."
msgstr ""
"यो प्लगइनले कुनै व्यवस्थापन कन्सोल बिना Hadoop CDH क्लस्टरमा Spark सुरू गर्न क्षमता "
"प्रदान गर्दछ।"
#, python-format
msgid "Waiting on %d DataNodes to start up"
msgstr "%d DataNodes सुरू गर्न पर्खँदै"

View File

@ -1,527 +0,0 @@
# Copyright (c) 2014 Hoang Do, Phuc Vo, P. Michiardi, D. Venzano
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from oslo_config import cfg
from oslo_log import log as logging
from sahara.plugins import provisioning as p
from sahara.plugins import swift_helper as swift
from sahara.plugins import topology_helper as topology
from sahara.plugins import utils
LOG = logging.getLogger(__name__)
CONF = cfg.CONF
CORE_DEFAULT = utils.load_hadoop_xml_defaults(
'plugins/spark/resources/core-default.xml', 'sahara_plugin_spark')
HDFS_DEFAULT = utils.load_hadoop_xml_defaults(
'plugins/spark/resources/hdfs-default.xml', 'sahara_plugin_spark')
SWIFT_DEFAULTS = swift.read_default_swift_configs()
XML_CONFS = {
"HDFS": [CORE_DEFAULT, HDFS_DEFAULT, SWIFT_DEFAULTS]
}
_default_executor_classpath = ":".join(
['/usr/lib/hadoop-mapreduce/hadoop-openstack.jar'])
SPARK_CONFS = {
'Spark': {
"OPTIONS": [
{
'name': 'Executor extra classpath',
'description': 'Value for spark.executor.extraClassPath'
' in spark-defaults.conf'
' (default: %s)' % _default_executor_classpath,
'default': '%s' % _default_executor_classpath,
'priority': 2,
},
{
'name': 'Master port',
'description': 'Start the master on a different port'
' (default: 7077)',
'default': '7077',
'priority': 2,
},
{
'name': 'Worker port',
'description': 'Start the Spark worker on a specific port'
' (default: random)',
'default': 'random',
'priority': 2,
},
{
'name': 'Master webui port',
'description': 'Port for the master web UI (default: 8080)',
'default': '8080',
'priority': 1,
},
{
'name': 'Worker webui port',
'description': 'Port for the worker web UI (default: 8081)',
'default': '8081',
'priority': 1,
},
{
'name': 'Worker cores',
'description': 'Total number of cores to allow Spark'
' applications to use on the machine'
' (default: all available cores)',
'default': 'all',
'priority': 2,
},
{
'name': 'Worker memory',
'description': 'Total amount of memory to allow Spark'
' applications to use on the machine, e.g. 1000m,'
' 2g (default: total memory minus 1 GB)',
'default': 'all',
'priority': 1,
},
{
'name': 'Worker instances',
'description': 'Number of worker instances to run on each'
' machine (default: 1)',
'default': '1',
'priority': 2,
},
{
'name': 'Spark home',
'description': 'The location of the spark installation'
' (default: /opt/spark)',
'default': '/opt/spark',
'priority': 2,
},
{
'name': 'Minimum cleanup seconds',
'description': 'Job data will never be purged before this'
' amount of time elapses (default: 86400 = 1 day)',
'default': '86400',
'priority': 2,
},
{
'name': 'Maximum cleanup seconds',
'description': 'Job data will always be purged after this'
' amount of time elapses (default: 1209600 = 14 days)',
'default': '1209600',
'priority': 2,
},
{
'name': 'Minimum cleanup megabytes',
'description': 'No job data will be purged unless the total'
' job data exceeds this size (default: 4096 = 4GB)',
'default': '4096',
'priority': 2,
},
]
}
}
HADOOP_CONF_DIR = "/etc/hadoop/conf"
ENV_CONFS = {
"HDFS": {
'Name Node Heap Size': 'HADOOP_NAMENODE_OPTS=\\"-Xmx%sm\\"',
'Data Node Heap Size': 'HADOOP_DATANODE_OPTS=\\"-Xmx%sm\\"'
}
}
ENABLE_DATA_LOCALITY = p.Config('Enable Data Locality', 'general', 'cluster',
config_type="bool", priority=1,
default_value=True, is_optional=True)
ENABLE_SWIFT = p.Config('Enable Swift', 'general', 'cluster',
config_type="bool", priority=1,
default_value=True, is_optional=False)
DATANODES_STARTUP_TIMEOUT = p.Config(
'DataNodes startup timeout', 'general', 'cluster', config_type='int',
priority=1, default_value=10800, is_optional=True,
description='Timeout for DataNodes startup, in seconds')
# Default set to 1 day, which is the default Keystone token
# expiration time. After the token is expired we can't continue
# scaling anyway.
DECOMMISSIONING_TIMEOUT = p.Config('Decommissioning Timeout', 'general',
'cluster', config_type='int', priority=1,
default_value=86400, is_optional=True,
description='Timeout for datanode'
' decommissioning operation'
' during scaling, in seconds')
HIDDEN_CONFS = ['fs.defaultFS', 'dfs.namenode.name.dir',
'dfs.datanode.data.dir']
CLUSTER_WIDE_CONFS = ['dfs.block.size', 'dfs.permissions', 'dfs.replication',
'dfs.replication.min', 'dfs.replication.max',
'io.file.buffer.size']
PRIORITY_1_CONFS = ['dfs.datanode.du.reserved',
'dfs.datanode.failed.volumes.tolerated',
'dfs.datanode.max.xcievers', 'dfs.datanode.handler.count',
'dfs.namenode.handler.count']
# for now we have not so many cluster-wide configs
# lets consider all of them having high priority
PRIORITY_1_CONFS += CLUSTER_WIDE_CONFS
def _initialise_configs():
configs = []
for service, config_lists in XML_CONFS.items():
for config_list in config_lists:
for config in config_list:
if config['name'] not in HIDDEN_CONFS:
cfg = p.Config(config['name'], service, "node",
is_optional=True, config_type="string",
default_value=str(config['value']),
description=config['description'])
if cfg.default_value in ["true", "false"]:
cfg.config_type = "bool"
cfg.default_value = (cfg.default_value == 'true')
elif utils.is_int(cfg.default_value):
cfg.config_type = "int"
cfg.default_value = int(cfg.default_value)
if config['name'] in CLUSTER_WIDE_CONFS:
cfg.scope = 'cluster'
if config['name'] in PRIORITY_1_CONFS:
cfg.priority = 1
configs.append(cfg)
for service, config_items in ENV_CONFS.items():
for name, param_format_str in config_items.items():
configs.append(p.Config(name, service, "node",
default_value=1024, priority=1,
config_type="int"))
for service, config_items in SPARK_CONFS.items():
for item in config_items['OPTIONS']:
cfg = p.Config(name=item["name"],
description=item["description"],
default_value=item["default"],
applicable_target=service,
scope="cluster", is_optional=True,
priority=item["priority"])
configs.append(cfg)
configs.append(DECOMMISSIONING_TIMEOUT)
configs.append(ENABLE_SWIFT)
configs.append(DATANODES_STARTUP_TIMEOUT)
if CONF.enable_data_locality:
configs.append(ENABLE_DATA_LOCALITY)
return configs
# Initialise plugin Hadoop configurations
PLUGIN_CONFIGS = _initialise_configs()
def get_plugin_configs():
return PLUGIN_CONFIGS
def generate_cfg_from_general(cfg, configs, general_config,
rest_excluded=False):
if 'general' in configs:
for nm in general_config:
if nm not in configs['general'] and not rest_excluded:
configs['general'][nm] = general_config[nm]['default_value']
for name, value in configs['general'].items():
if value:
cfg = _set_config(cfg, general_config, name)
LOG.debug("Applying config: {name}".format(name=name))
else:
cfg = _set_config(cfg, general_config)
return cfg
def _get_hostname(service):
return service.hostname() if service else None
def generate_xml_configs(configs, storage_path, nn_hostname, hadoop_port):
if hadoop_port is None:
hadoop_port = 8020
cfg = {
'fs.defaultFS': 'hdfs://%s:%s' % (nn_hostname, str(hadoop_port)),
'dfs.namenode.name.dir': extract_hadoop_path(storage_path,
'/dfs/nn'),
'dfs.datanode.data.dir': extract_hadoop_path(storage_path,
'/dfs/dn'),
'dfs.hosts': '/etc/hadoop/dn.incl',
'dfs.hosts.exclude': '/etc/hadoop/dn.excl'
}
# inserting user-defined configs
for key, value in extract_hadoop_xml_confs(configs):
cfg[key] = value
# Add the swift defaults if they have not been set by the user
swft_def = []
if is_swift_enabled(configs):
swft_def = SWIFT_DEFAULTS
swift_configs = extract_name_values(swift.get_swift_configs())
for key, value in swift_configs.items():
if key not in cfg:
cfg[key] = value
# invoking applied configs to appropriate xml files
core_all = CORE_DEFAULT + swft_def
if CONF.enable_data_locality:
cfg.update(topology.TOPOLOGY_CONFIG)
# applying vm awareness configs
core_all += topology.vm_awareness_core_config()
xml_configs = {
'core-site': utils.create_hadoop_xml(cfg, core_all),
'hdfs-site': utils.create_hadoop_xml(cfg, HDFS_DEFAULT)
}
return xml_configs
def _get_spark_opt_default(opt_name):
for opt in SPARK_CONFS["Spark"]["OPTIONS"]:
if opt_name == opt["name"]:
return opt["default"]
return None
def generate_spark_env_configs(cluster):
configs = []
# master configuration
sp_master = utils.get_instance(cluster, "master")
configs.append('SPARK_MASTER_IP=' + sp_master.hostname())
# point to the hadoop conf dir so that Spark can read things
# like the swift configuration without having to copy core-site
# to /opt/spark/conf
configs.append('HADOOP_CONF_DIR=' + HADOOP_CONF_DIR)
masterport = utils.get_config_value_or_default("Spark",
"Master port",
cluster)
if masterport and masterport != _get_spark_opt_default("Master port"):
configs.append('SPARK_MASTER_PORT=' + str(masterport))
masterwebport = utils.get_config_value_or_default("Spark",
"Master webui port",
cluster)
if (masterwebport and
masterwebport != _get_spark_opt_default("Master webui port")):
configs.append('SPARK_MASTER_WEBUI_PORT=' + str(masterwebport))
# configuration for workers
workercores = utils.get_config_value_or_default("Spark",
"Worker cores",
cluster)
if workercores and workercores != _get_spark_opt_default("Worker cores"):
configs.append('SPARK_WORKER_CORES=' + str(workercores))
workermemory = utils.get_config_value_or_default("Spark",
"Worker memory",
cluster)
if (workermemory and
workermemory != _get_spark_opt_default("Worker memory")):
configs.append('SPARK_WORKER_MEMORY=' + str(workermemory))
workerport = utils.get_config_value_or_default("Spark",
"Worker port",
cluster)
if workerport and workerport != _get_spark_opt_default("Worker port"):
configs.append('SPARK_WORKER_PORT=' + str(workerport))
workerwebport = utils.get_config_value_or_default("Spark",
"Worker webui port",
cluster)
if (workerwebport and
workerwebport != _get_spark_opt_default("Worker webui port")):
configs.append('SPARK_WORKER_WEBUI_PORT=' + str(workerwebport))
workerinstances = utils.get_config_value_or_default("Spark",
"Worker instances",
cluster)
if (workerinstances and
workerinstances != _get_spark_opt_default("Worker instances")):
configs.append('SPARK_WORKER_INSTANCES=' + str(workerinstances))
return '\n'.join(configs)
# workernames need to be a list of worker names
def generate_spark_slaves_configs(workernames):
return '\n'.join(workernames)
def generate_spark_executor_classpath(cluster):
cp = utils.get_config_value_or_default("Spark",
"Executor extra classpath",
cluster)
if cp:
return "spark.executor.extraClassPath " + cp
return "\n"
def extract_hadoop_environment_confs(configs):
"""Returns environment specific Hadoop configurations.
:returns: list of Hadoop parameters which should be passed via environment
"""
lst = []
for service, srv_confs in configs.items():
if ENV_CONFS.get(service):
for param_name, param_value in srv_confs.items():
for cfg_name, cfg_format_str in ENV_CONFS[service].items():
if param_name == cfg_name and param_value is not None:
lst.append(cfg_format_str % param_value)
return lst
def extract_hadoop_xml_confs(configs):
"""Returns xml specific Hadoop configurations.
:returns: list of Hadoop parameters which should be passed into general
configs like core-site.xml
"""
lst = []
for service, srv_confs in configs.items():
if XML_CONFS.get(service):
for param_name, param_value in srv_confs.items():
for cfg_list in XML_CONFS[service]:
names = [cfg['name'] for cfg in cfg_list]
if param_name in names and param_value is not None:
lst.append((param_name, param_value))
return lst
def generate_hadoop_setup_script(storage_paths, env_configs):
script_lines = ["#!/bin/bash -x"]
script_lines.append("echo -n > /tmp/hadoop-env.sh")
for line in env_configs:
if 'HADOOP' in line:
script_lines.append('echo "%s" >> /tmp/hadoop-env.sh' % line)
script_lines.append("cat /etc/hadoop/hadoop-env.sh >> /tmp/hadoop-env.sh")
script_lines.append("cp /tmp/hadoop-env.sh /etc/hadoop/hadoop-env.sh")
hadoop_log = storage_paths[0] + "/log/hadoop/\\$USER/"
script_lines.append('sed -i "s,export HADOOP_LOG_DIR=.*,'
'export HADOOP_LOG_DIR=%s," /etc/hadoop/hadoop-env.sh'
% hadoop_log)
hadoop_log = storage_paths[0] + "/log/hadoop/hdfs"
script_lines.append('sed -i "s,export HADOOP_SECURE_DN_LOG_DIR=.*,'
'export HADOOP_SECURE_DN_LOG_DIR=%s," '
'/etc/hadoop/hadoop-env.sh' % hadoop_log)
for path in storage_paths:
script_lines.append("chown -R hadoop:hadoop %s" % path)
script_lines.append("chmod -f -R 755 %s ||"
"echo 'Permissions unchanged'" % path)
return "\n".join(script_lines)
def generate_job_cleanup_config(cluster):
spark_config = {
'minimum_cleanup_megabytes': utils.get_config_value_or_default(
"Spark", "Minimum cleanup megabytes", cluster),
'minimum_cleanup_seconds': utils.get_config_value_or_default(
"Spark", "Minimum cleanup seconds", cluster),
'maximum_cleanup_seconds': utils.get_config_value_or_default(
"Spark", "Maximum cleanup seconds", cluster)
}
job_conf = {
'valid': (
_convert_config_to_int(
spark_config['maximum_cleanup_seconds']) > 0 and
_convert_config_to_int(
spark_config['minimum_cleanup_megabytes']) > 0 and
_convert_config_to_int(
spark_config['minimum_cleanup_seconds']) > 0)
}
if job_conf['valid']:
job_conf['cron'] = utils.get_file_text(
'plugins/spark/resources/spark-cleanup.cron',
'sahara_plugin_spark'),
job_cleanup_script = utils.get_file_text(
'plugins/spark/resources/tmp-cleanup.sh.template',
'sahara_plugin_spark')
job_conf['script'] = job_cleanup_script.format(**spark_config)
return job_conf
def _convert_config_to_int(config_value):
try:
return int(config_value)
except ValueError:
return -1
def extract_name_values(configs):
return {cfg['name']: cfg['value'] for cfg in configs}
def make_hadoop_path(base_dirs, suffix):
return [base_dir + suffix for base_dir in base_dirs]
def extract_hadoop_path(lst, hadoop_dir):
if lst:
return ",".join(make_hadoop_path(lst, hadoop_dir))
def _set_config(cfg, gen_cfg, name=None):
if name in gen_cfg:
cfg.update(gen_cfg[name]['conf'])
if name is None:
for name in gen_cfg:
cfg.update(gen_cfg[name]['conf'])
return cfg
def _get_general_config_value(conf, option):
if 'general' in conf and option.name in conf['general']:
return conf['general'][option.name]
return option.default_value
def _get_general_cluster_config_value(cluster, option):
return _get_general_config_value(cluster.cluster_configs, option)
def is_data_locality_enabled(cluster):
if not CONF.enable_data_locality:
return False
return _get_general_cluster_config_value(cluster, ENABLE_DATA_LOCALITY)
def is_swift_enabled(configs):
return _get_general_config_value(configs, ENABLE_SWIFT)
def get_decommissioning_timeout(cluster):
return _get_general_cluster_config_value(cluster, DECOMMISSIONING_TIMEOUT)
def get_port_from_config(service, name, cluster=None):
address = utils.get_config_value_or_default(service, name, cluster)
return utils.get_port_from_address(address)

View File

@ -1,60 +0,0 @@
# Copyright (c) 2014 Mirantis Inc.
# Copyright (c) 2015 ISPRAS
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from sahara.plugins import edp
from sahara.plugins import exceptions as ex
from sahara.plugins import utils as plugin_utils
from sahara_plugin_spark.i18n import _
class EdpEngine(edp.PluginsSparkJobEngine):
edp_base_version = "1.6.0"
def __init__(self, cluster):
super(EdpEngine, self).__init__(cluster)
self.master = plugin_utils.get_instance(cluster, "master")
self.plugin_params["spark-user"] = ""
self.plugin_params["spark-submit"] = os.path.join(
plugin_utils.
get_config_value_or_default("Spark", "Spark home", self.cluster),
"bin/spark-submit")
self.plugin_params["deploy-mode"] = "client"
port_str = str(
plugin_utils.get_config_value_or_default(
"Spark", "Master port", self.cluster))
self.plugin_params["master"] = ('spark://%(host)s:' + port_str)
driver_cp = plugin_utils.get_config_value_or_default(
"Spark", "Executor extra classpath", self.cluster)
self.plugin_params["driver-class-path"] = driver_cp
@staticmethod
def edp_supported(version):
return version >= EdpEngine.edp_base_version
@staticmethod
def job_type_supported(job_type):
return job_type in edp.PluginsSparkJobEngine.get_supported_job_types()
def validate_job_execution(self, cluster, job, data):
if not self.edp_supported(cluster.hadoop_version):
raise ex.PluginInvalidDataException(
_('Spark {base} or higher required to run {type} jobs').format(
base=EdpEngine.edp_base_version, type=job.type))
super(EdpEngine, self).validate_job_execution(cluster, job, data)

View File

@ -1,44 +0,0 @@
# Copyright (c) 2019 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from sahara.plugins import images
from sahara.plugins import utils as plugin_utils
_validator = images.SaharaImageValidator.from_yaml(
'plugins/spark/resources/images/image.yaml',
resource_roots=['plugins/spark/resources/images'],
package='sahara_plugin_spark')
def get_image_arguments():
return _validator.get_argument_list()
def pack_image(remote, test_only=False, image_arguments=None):
_validator.validate(remote, test_only=test_only,
image_arguments=image_arguments)
def validate_images(cluster, test_only=False, image_arguments=None):
image_arguments = get_image_arguments()
if not test_only:
instances = plugin_utils.get_instances(cluster)
else:
instances = plugin_utils.get_instances(cluster)[0]
for instance in instances:
with instance.remote() as r:
_validator.validate(r, test_only=test_only,
image_arguments=image_arguments)

View File

@ -1,588 +0,0 @@
# Copyright (c) 2014 Hoang Do, Phuc Vo, P. Michiardi, D. Venzano
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import copy
import os
from oslo_config import cfg
from oslo_log import log as logging
from sahara.plugins import conductor
from sahara.plugins import context
from sahara.plugins import exceptions as ex
from sahara.plugins import provisioning as p
from sahara.plugins import recommendations_utils as ru
from sahara.plugins import swift_helper
from sahara.plugins import topology_helper as th
from sahara.plugins import utils
from sahara_plugin_spark.i18n import _
from sahara_plugin_spark.plugins.spark import config_helper as c_helper
from sahara_plugin_spark.plugins.spark import edp_engine
from sahara_plugin_spark.plugins.spark import images
from sahara_plugin_spark.plugins.spark import run_scripts as run
from sahara_plugin_spark.plugins.spark import scaling as sc
from sahara_plugin_spark.plugins.spark import shell_engine
LOG = logging.getLogger(__name__)
CONF = cfg.CONF
class SparkProvider(p.ProvisioningPluginBase):
def __init__(self):
self.processes = {
"HDFS": ["namenode", "datanode"],
"Spark": ["master", "slave"]
}
def get_title(self):
return "Apache Spark"
def get_description(self):
return _("This plugin provides an ability to launch Spark on Hadoop "
"CDH cluster without any management consoles.")
def get_labels(self):
default = {'enabled': {'status': True}, 'stable': {'status': True}}
deprecated = {'enabled': {'status': True},
'deprecated': {'status': True}}
result = {'plugin_labels': copy.deepcopy(default)}
stable_versions = ['2.3', '2.2']
result['version_labels'] = {
version: copy.deepcopy(
default if version in stable_versions else deprecated
) for version in self.get_versions()
}
return result
def get_versions(self):
return ['2.3', '2.2', '2.1.0', '1.6.0']
def get_configs(self, hadoop_version):
return c_helper.get_plugin_configs()
def get_node_processes(self, hadoop_version):
return self.processes
def validate(self, cluster):
nn_count = sum([ng.count for ng
in utils.get_node_groups(cluster, "namenode")])
if nn_count != 1:
raise ex.InvalidComponentCountException("namenode", 1, nn_count)
dn_count = sum([ng.count for ng
in utils.get_node_groups(cluster, "datanode")])
if dn_count < 1:
raise ex.InvalidComponentCountException("datanode", _("1 or more"),
nn_count)
rep_factor = utils.get_config_value_or_default('HDFS',
"dfs.replication",
cluster)
if dn_count < rep_factor:
raise ex.InvalidComponentCountException(
'datanode', _('%s or more') % rep_factor, dn_count,
_('Number of %(dn)s instances should not be less '
'than %(replication)s')
% {'dn': 'datanode', 'replication': 'dfs.replication'})
# validate Spark Master Node and Spark Slaves
sm_count = sum([ng.count for ng
in utils.get_node_groups(cluster, "master")])
if sm_count < 1:
raise ex.RequiredServiceMissingException("Spark master")
if sm_count >= 2:
raise ex.InvalidComponentCountException("Spark master", "1",
sm_count)
sl_count = sum([ng.count for ng
in utils.get_node_groups(cluster, "slave")])
if sl_count < 1:
raise ex.InvalidComponentCountException("Spark slave",
_("1 or more"),
sl_count)
def update_infra(self, cluster):
pass
def configure_cluster(self, cluster):
self._setup_instances(cluster)
@utils.event_wrapper(
True, step=utils.start_process_event_message("NameNode"))
def _start_namenode(self, nn_instance):
with utils.get_remote(nn_instance) as r:
run.format_namenode(r)
run.start_processes(r, "namenode")
def start_spark(self, cluster):
sm_instance = utils.get_instance(cluster, "master")
if sm_instance:
self._start_spark(cluster, sm_instance)
@utils.event_wrapper(
True, step=utils.start_process_event_message("SparkMasterNode"))
def _start_spark(self, cluster, sm_instance):
with utils.get_remote(sm_instance) as r:
run.start_spark_master(r, self._spark_home(cluster))
LOG.info("Spark service has been started")
def start_cluster(self, cluster):
nn_instance = utils.get_instance(cluster, "namenode")
dn_instances = utils.get_instances(cluster, "datanode")
# Start the name node
self._start_namenode(nn_instance)
# start the data nodes
self._start_datanode_processes(dn_instances)
run.await_datanodes(cluster)
LOG.info("Hadoop services have been started")
with utils.get_remote(nn_instance) as r:
r.execute_command("sudo -u hdfs hdfs dfs -mkdir -p /user/$USER/")
r.execute_command("sudo -u hdfs hdfs dfs -chown $USER "
"/user/$USER/")
# start spark nodes
self.start_spark(cluster)
swift_helper.install_ssl_certs(utils.get_instances(cluster))
LOG.info('Cluster has been started successfully')
self._set_cluster_info(cluster)
def _spark_home(self, cluster):
return utils.get_config_value_or_default("Spark",
"Spark home",
cluster)
def _extract_configs_to_extra(self, cluster):
sp_master = utils.get_instance(cluster, "master")
sp_slaves = utils.get_instances(cluster, "slave")
extra = dict()
config_master = config_slaves = ''
if sp_master is not None:
config_master = c_helper.generate_spark_env_configs(cluster)
if sp_slaves is not None:
slavenames = []
for slave in sp_slaves:
slavenames.append(slave.hostname())
config_slaves = c_helper.generate_spark_slaves_configs(slavenames)
else:
config_slaves = "\n"
# Any node that might be used to run spark-submit will need
# these libs for swift integration
config_defaults = c_helper.generate_spark_executor_classpath(cluster)
extra['job_cleanup'] = c_helper.generate_job_cleanup_config(cluster)
extra['sp_master'] = config_master
extra['sp_slaves'] = config_slaves
extra['sp_defaults'] = config_defaults
if c_helper.is_data_locality_enabled(cluster):
topology_data = th.generate_topology_map(
cluster, CONF.enable_hypervisor_awareness)
extra['topology_data'] = "\n".join(
[k + " " + v for k, v in topology_data.items()]) + "\n"
return extra
def _add_instance_ng_related_to_extra(self, cluster, instance, extra):
extra = extra.copy()
ng = instance.node_group
nn = utils.get_instance(cluster, "namenode")
extra['xml'] = c_helper.generate_xml_configs(
ng.configuration(), instance.storage_paths(), nn.hostname(), None)
extra['setup_script'] = c_helper.generate_hadoop_setup_script(
instance.storage_paths(),
c_helper.extract_hadoop_environment_confs(ng.configuration()))
return extra
def _start_datanode_processes(self, dn_instances):
if len(dn_instances) == 0:
return
utils.add_provisioning_step(
dn_instances[0].cluster_id,
utils.start_process_event_message("DataNodes"), len(dn_instances))
with context.PluginsThreadGroup() as tg:
for i in dn_instances:
tg.spawn('spark-start-dn-%s' % i.instance_name,
self._start_datanode, i)
@utils.event_wrapper(mark_successful_on_exit=True)
def _start_datanode(self, instance):
with instance.remote() as r:
run.start_processes(r, "datanode")
def _setup_instances(self, cluster, instances=None):
extra = self._extract_configs_to_extra(cluster)
if instances is None:
instances = utils.get_instances(cluster)
self._push_configs_to_nodes(cluster, extra, instances)
def _push_configs_to_nodes(self, cluster, extra, new_instances):
all_instances = utils.get_instances(cluster)
utils.add_provisioning_step(
cluster.id, _("Push configs to nodes"), len(all_instances))
with context.PluginsThreadGroup() as tg:
for instance in all_instances:
extra = self._add_instance_ng_related_to_extra(
cluster, instance, extra)
if instance in new_instances:
tg.spawn('spark-configure-%s' % instance.instance_name,
self._push_configs_to_new_node, cluster,
extra, instance)
else:
tg.spawn('spark-reconfigure-%s' % instance.instance_name,
self._push_configs_to_existing_node, cluster,
extra, instance)
@utils.event_wrapper(mark_successful_on_exit=True)
def _push_configs_to_new_node(self, cluster, extra, instance):
files_hadoop = {
os.path.join(c_helper.HADOOP_CONF_DIR,
"core-site.xml"): extra['xml']['core-site'],
os.path.join(c_helper.HADOOP_CONF_DIR,
"hdfs-site.xml"): extra['xml']['hdfs-site'],
}
sp_home = self._spark_home(cluster)
files_spark = {
os.path.join(sp_home, 'conf/spark-env.sh'): extra['sp_master'],
os.path.join(sp_home, 'conf/slaves'): extra['sp_slaves'],
os.path.join(sp_home,
'conf/spark-defaults.conf'): extra['sp_defaults']
}
files_init = {
'/tmp/sahara-hadoop-init.sh': extra['setup_script'],
'id_rsa': cluster.management_private_key,
'authorized_keys': cluster.management_public_key
}
# pietro: This is required because the (secret) key is not stored in
# .ssh which hinders password-less ssh required by spark scripts
key_cmd = ('sudo cp $HOME/id_rsa $HOME/.ssh/; '
'sudo chown $USER $HOME/.ssh/id_rsa; '
'sudo chmod 600 $HOME/.ssh/id_rsa')
storage_paths = instance.storage_paths()
dn_path = ' '.join(c_helper.make_hadoop_path(storage_paths,
'/dfs/dn'))
nn_path = ' '.join(c_helper.make_hadoop_path(storage_paths,
'/dfs/nn'))
hdfs_dir_cmd = ('sudo mkdir -p %(nn_path)s %(dn_path)s &&'
'sudo chown -R hdfs:hadoop %(nn_path)s %(dn_path)s &&'
'sudo chmod 755 %(nn_path)s %(dn_path)s' %
{"nn_path": nn_path, "dn_path": dn_path})
with utils.get_remote(instance) as r:
r.execute_command(
'sudo chown -R $USER:$USER /etc/hadoop'
)
r.execute_command(
'sudo chown -R $USER:$USER %s' % sp_home
)
r.write_files_to(files_hadoop)
r.write_files_to(files_spark)
r.write_files_to(files_init)
r.execute_command(
'sudo chmod 0500 /tmp/sahara-hadoop-init.sh'
)
r.execute_command(
'sudo /tmp/sahara-hadoop-init.sh '
'>> /tmp/sahara-hadoop-init.log 2>&1')
r.execute_command(hdfs_dir_cmd)
r.execute_command(key_cmd)
if c_helper.is_data_locality_enabled(cluster):
r.write_file_to(
'/etc/hadoop/topology.sh',
utils.get_file_text(
'plugins/spark/resources/topology.sh',
'sahara_plugin_spark'))
r.execute_command(
'sudo chmod +x /etc/hadoop/topology.sh'
)
self._write_topology_data(r, cluster, extra)
self._push_master_configs(r, cluster, extra, instance)
self._push_cleanup_job(r, cluster, extra, instance)
@utils.event_wrapper(mark_successful_on_exit=True)
def _push_configs_to_existing_node(self, cluster, extra, instance):
node_processes = instance.node_group.node_processes
need_update_hadoop = (c_helper.is_data_locality_enabled(cluster) or
'namenode' in node_processes)
need_update_spark = ('master' in node_processes or
'slave' in node_processes)
if need_update_spark:
sp_home = self._spark_home(cluster)
files = {
os.path.join(sp_home,
'conf/spark-env.sh'): extra['sp_master'],
os.path.join(sp_home, 'conf/slaves'): extra['sp_slaves'],
os.path.join(
sp_home,
'conf/spark-defaults.conf'): extra['sp_defaults']
}
r = utils.get_remote(instance)
r.write_files_to(files)
self._push_cleanup_job(r, cluster, extra, instance)
if need_update_hadoop:
with utils.get_remote(instance) as r:
self._write_topology_data(r, cluster, extra)
self._push_master_configs(r, cluster, extra, instance)
def _write_topology_data(self, r, cluster, extra):
if c_helper.is_data_locality_enabled(cluster):
topology_data = extra['topology_data']
r.write_file_to('/etc/hadoop/topology.data', topology_data)
def _push_master_configs(self, r, cluster, extra, instance):
node_processes = instance.node_group.node_processes
if 'namenode' in node_processes:
self._push_namenode_configs(cluster, r)
def _push_cleanup_job(self, r, cluster, extra, instance):
node_processes = instance.node_group.node_processes
if 'master' in node_processes:
if extra['job_cleanup']['valid']:
r.write_file_to('/etc/hadoop/tmp-cleanup.sh',
extra['job_cleanup']['script'])
r.execute_command("chmod 755 /etc/hadoop/tmp-cleanup.sh")
cmd = 'sudo sh -c \'echo "%s" > /etc/cron.d/spark-cleanup\''
r.execute_command(cmd % extra['job_cleanup']['cron'])
else:
r.execute_command("sudo rm -f /etc/hadoop/tmp-cleanup.sh")
r.execute_command("sudo rm -f /etc/crond.d/spark-cleanup")
def _push_namenode_configs(self, cluster, r):
r.write_file_to('/etc/hadoop/dn.incl',
utils.generate_fqdn_host_names(
utils.get_instances(cluster, "datanode")))
r.write_file_to('/etc/hadoop/dn.excl', '')
def _set_cluster_info(self, cluster):
nn = utils.get_instance(cluster, "namenode")
sp_master = utils.get_instance(cluster, "master")
info = {}
if nn:
address = utils.get_config_value_or_default(
'HDFS', 'dfs.http.address', cluster)
port = address[address.rfind(':') + 1:]
info['HDFS'] = {
'Web UI': 'http://%s:%s' % (nn.get_ip_or_dns_name(), port)
}
info['HDFS']['NameNode'] = 'hdfs://%s:8020' % nn.hostname()
if sp_master:
port = utils.get_config_value_or_default(
'Spark', 'Master webui port', cluster)
if port is not None:
info['Spark'] = {
'Web UI': 'http://%s:%s' % (
sp_master.get_ip_or_dns_name(), port)
}
ctx = context.ctx()
conductor.cluster_update(ctx, cluster, {'info': info})
# Scaling
def validate_scaling(self, cluster, existing, additional):
self._validate_existing_ng_scaling(cluster, existing)
self._validate_additional_ng_scaling(cluster, additional)
def decommission_nodes(self, cluster, instances):
sls = utils.get_instances(cluster, "slave")
dns = utils.get_instances(cluster, "datanode")
decommission_dns = False
decommission_sls = False
for i in instances:
if 'datanode' in i.node_group.node_processes:
dns.remove(i)
decommission_dns = True
if 'slave' in i.node_group.node_processes:
sls.remove(i)
decommission_sls = True
nn = utils.get_instance(cluster, "namenode")
spark_master = utils.get_instance(cluster, "master")
if decommission_sls:
sc.decommission_sl(spark_master, instances, sls)
if decommission_dns:
sc.decommission_dn(nn, instances, dns)
def scale_cluster(self, cluster, instances):
master = utils.get_instance(cluster, "master")
r_master = utils.get_remote(master)
run.stop_spark(r_master, self._spark_home(cluster))
self._setup_instances(cluster, instances)
nn = utils.get_instance(cluster, "namenode")
run.refresh_nodes(utils.get_remote(nn), "dfsadmin")
dn_instances = [instance for instance in instances if
'datanode' in instance.node_group.node_processes]
self._start_datanode_processes(dn_instances)
swift_helper.install_ssl_certs(instances)
run.start_spark_master(r_master, self._spark_home(cluster))
LOG.info("Spark master service has been restarted")
def _get_scalable_processes(self):
return ["datanode", "slave"]
def _validate_additional_ng_scaling(self, cluster, additional):
scalable_processes = self._get_scalable_processes()
for ng_id in additional:
ng = utils.get_by_id(cluster.node_groups, ng_id)
if not set(ng.node_processes).issubset(scalable_processes):
raise ex.NodeGroupCannotBeScaled(
ng.name, _("Spark plugin cannot scale nodegroup"
" with processes: %s") %
' '.join(ng.node_processes))
def _validate_existing_ng_scaling(self, cluster, existing):
scalable_processes = self._get_scalable_processes()
dn_to_delete = 0
for ng in cluster.node_groups:
if ng.id in existing:
if ng.count > existing[ng.id] and ("datanode" in
ng.node_processes):
dn_to_delete += ng.count - existing[ng.id]
if not set(ng.node_processes).issubset(scalable_processes):
raise ex.NodeGroupCannotBeScaled(
ng.name, _("Spark plugin cannot scale nodegroup"
" with processes: %s") %
' '.join(ng.node_processes))
dn_amount = len(utils.get_instances(cluster, "datanode"))
rep_factor = utils.get_config_value_or_default('HDFS',
"dfs.replication",
cluster)
if dn_to_delete > 0 and dn_amount - dn_to_delete < rep_factor:
raise ex.ClusterCannotBeScaled(
cluster.name, _("Spark plugin cannot shrink cluster because "
"there would be not enough nodes for HDFS "
"replicas (replication factor is %s)") %
rep_factor)
def get_edp_engine(self, cluster, job_type):
if edp_engine.EdpEngine.job_type_supported(job_type):
return edp_engine.EdpEngine(cluster)
if shell_engine.ShellEngine.job_type_supported(job_type):
return shell_engine.ShellEngine(cluster)
return None
def get_edp_job_types(self, versions=None):
res = {}
for vers in self.get_versions():
if not versions or vers in versions:
res[vers] = shell_engine.ShellEngine.get_supported_job_types()
if edp_engine.EdpEngine.edp_supported(vers):
res[vers].extend(
edp_engine.EdpEngine.get_supported_job_types())
return res
def get_edp_config_hints(self, job_type, version):
if (edp_engine.EdpEngine.edp_supported(version) and
edp_engine.EdpEngine.job_type_supported(job_type)):
return edp_engine.EdpEngine.get_possible_job_config(job_type)
if shell_engine.ShellEngine.job_type_supported(job_type):
return shell_engine.ShellEngine.get_possible_job_config(job_type)
return {}
def get_open_ports(self, node_group):
cluster = node_group.cluster
ports_map = {
'namenode': [8020, 50070, 50470],
'datanode': [50010, 1004, 50075, 1006, 50020],
'master': [
int(utils.get_config_value_or_default("Spark", "Master port",
cluster)),
int(utils.get_config_value_or_default("Spark",
"Master webui port",
cluster)),
],
'slave': [
int(utils.get_config_value_or_default("Spark",
"Worker webui port",
cluster))
]
}
ports = []
for process in node_group.node_processes:
if process in ports_map:
ports.extend(ports_map[process])
return ports
def recommend_configs(self, cluster, scaling=False):
want_to_configure = {
'cluster_configs': {
'dfs.replication': ('HDFS', 'dfs.replication')
}
}
provider = ru.HadoopAutoConfigsProvider(
want_to_configure, self.get_configs(
cluster.hadoop_version), cluster, scaling)
provider.apply_recommended_configs()
def get_image_arguments(self, hadoop_version):
if hadoop_version in ['1.6.0', '2.1.0']:
return NotImplemented
return images.get_image_arguments()
def pack_image(self, hadoop_version, remote,
test_only=False, image_arguments=None):
images.pack_image(remote, test_only=test_only,
image_arguments=image_arguments)
def validate_images(self, cluster, test_only=False, image_arguments=None):
if cluster.hadoop_version not in ['1.6.0', '2.1.0']:
images.validate_images(cluster,
test_only=test_only,
image_arguments=image_arguments)

View File

@ -1,21 +0,0 @@
Apache Spark and HDFS Configurations for Sahara
===============================================
This directory contains default XML configuration files and Spark scripts:
* core-default.xml,
* hdfs-default.xml,
* spark-env.sh.template,
* topology.sh
These files are used by Sahara's plugin for Apache Spark and Cloudera HDFS.
XML config files were taken from here:
* https://github.com/apache/hadoop-common/blob/release-1.2.1/src/core/core-default.xml
* https://github.com/apache/hadoop-common/blob/release-1.2.1/src/hdfs/hdfs-default.xml
Cloudera packages use the same configuration files as standard Apache Hadoop.
XML configs are used to expose default Hadoop configurations to the users through
Sahara's REST API. It allows users to override some config values which will be
pushed to the provisioned VMs running Hadoop services as part of appropriate xml
config.

View File

@ -1,632 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into core-site.xml and change them -->
<!-- there. If core-site.xml does not already exist, create it. -->
<configuration>
<!--- global properties -->
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
<property>
<name>hadoop.http.filter.initializers</name>
<value></value>
<description>A comma separated list of class names. Each class in the list
must extend org.apache.hadoop.http.FilterInitializer. The corresponding
Filter will be initialized. Then, the Filter will be applied to all user
facing jsp and servlet web pages. The ordering of the list defines the
ordering of the filters.</description>
</property>
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
<description>Class for user to group mapping (get groups for a given user)
</description>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
<description>Is service-level authorization enabled?</description>
</property>
<property>
<name>hadoop.security.instrumentation.requires.admin</name>
<value>false</value>
<description>
Indicates if administrator ACLs are required to access
instrumentation servlets (JMX, METRICS, CONF, STACKS).
</description>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
<description>Possible values are simple (no authentication), and kerberos
</description>
</property>
<property>
<name>hadoop.security.token.service.use_ip</name>
<value>true</value>
<description>Controls whether tokens always use IP addresses. DNS changes
will not be detected if this option is enabled. Existing client connections
that break will always reconnect to the IP of the original host. New clients
will connect to the host's new IP but fail to locate a token. Disabling
this option will allow existing and new clients to detect an IP change and
continue to locate the new host's token.
</description>
</property>
<property>
<name>hadoop.security.use-weak-http-crypto</name>
<value>false</value>
<description>If enabled, use KSSL to authenticate HTTP connections to the
NameNode. Due to a bug in JDK6, using KSSL requires one to configure
Kerberos tickets to use encryption types that are known to be
cryptographically weak. If disabled, SPNEGO will be used for HTTP
authentication, which supports stronger encryption types.
</description>
</property>
<!--
<property>
<name>hadoop.security.service.user.name.key</name>
<value></value>
<description>Name of the kerberos principal of the user that owns
a given service daemon
</description>
</property>
-->
<!--- logging properties -->
<property>
<name>hadoop.logfile.size</name>
<value>10000000</value>
<description>The max size of each log file</description>
</property>
<property>
<name>hadoop.logfile.count</name>
<value>10</value>
<description>The max number of log files</description>
</property>
<!-- i/o properties -->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</description>
</property>
<property>
<name>io.bytes.per.checksum</name>
<value>512</value>
<description>The number of bytes per checksum. Must not be larger than
io.file.buffer.size.</description>
</property>
<property>
<name>io.skip.checksum.errors</name>
<value>false</value>
<description>If true, when a checksum error is encountered while
reading a sequence file, entries are skipped, instead of throwing an
exception.</description>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
<description>A list of the compression codec classes that can be used
for compression/decompression.</description>
</property>
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization</value>
<description>A list of serialization classes that can be used for
obtaining serializers and deserializers.</description>
</property>
<!-- file system properties -->
<property>
<name>fs.defaultFS</name>
<value>file:///</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
<description>Number of minutes between trash checkpoints.
If zero, the trash feature is disabled.
</description>
</property>
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
<description>The FileSystem for s3: uris.</description>
</property>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
<description>The FileSystem for s3n: (Native S3) uris.</description>
</property>
<property>
<name>fs.kfs.impl</name>
<value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value>
<description>The FileSystem for kfs: uris.</description>
</property>
<property>
<name>fs.hftp.impl</name>
<value>org.apache.hadoop.hdfs.HftpFileSystem</value>
</property>
<property>
<name>fs.hsftp.impl</name>
<value>org.apache.hadoop.hdfs.HsftpFileSystem</value>
</property>
<property>
<name>fs.webhdfs.impl</name>
<value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value>
</property>
<property>
<name>fs.ftp.impl</name>
<value>org.apache.hadoop.fs.ftp.FTPFileSystem</value>
<description>The FileSystem for ftp: uris.</description>
</property>
<property>
<name>fs.ramfs.impl</name>
<value>org.apache.hadoop.fs.InMemoryFileSystem</value>
<description>The FileSystem for ramfs: uris.</description>
</property>
<property>
<name>fs.har.impl</name>
<value>org.apache.hadoop.fs.HarFileSystem</value>
<description>The filesystem for Hadoop archives. </description>
</property>
<property>
<name>fs.har.impl.disable.cache</name>
<value>true</value>
<description>Don't cache 'har' filesystem instances.</description>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>${hadoop.tmp.dir}/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary images to merge.
If this is a comma-delimited list of directories then the image is
replicated in all of the directories for redundancy.
</description>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>${fs.checkpoint.dir}</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary edits to merge.
If this is a comma-delimited list of directoires then teh edits is
replicated in all of the directoires for redundancy.
Default value is same as fs.checkpoint.dir
</description>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>3600</value>
<description>The number of seconds between two periodic checkpoints.
</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>The size of the current edit log (in bytes) that triggers
a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
</description>
</property>
<property>
<name>fs.s3.block.size</name>
<value>67108864</value>
<description>Block size to use when writing files to S3.</description>
</property>
<property>
<name>fs.s3.buffer.dir</name>
<value>${hadoop.tmp.dir}/s3</value>
<description>Determines where on the local filesystem the S3 filesystem
should store files before sending them to S3
(or after retrieving them from S3).
</description>
</property>
<property>
<name>fs.s3.maxRetries</name>
<value>4</value>
<description>The maximum number of retries for reading or writing files to S3,
before we signal failure to the application.
</description>
</property>
<property>
<name>fs.s3.sleepTimeSeconds</name>
<value>10</value>
<description>The number of seconds to sleep between each S3 retry.
</description>
</property>
<property>
<name>local.cache.size</name>
<value>10737418240</value>
<description>The limit on the size of cache you want to keep, set by default
to 10GB. This will act as a soft limit on the cache directory for out of band data.
</description>
</property>
<property>
<name>io.seqfile.compress.blocksize</name>
<value>1000000</value>
<description>The minimum block size for compression in block compressed
SequenceFiles.
</description>
</property>
<property>
<name>io.seqfile.lazydecompress</name>
<value>true</value>
<description>Should values of block-compressed SequenceFiles be decompressed
only when necessary.
</description>
</property>
<property>
<name>io.seqfile.sorter.recordlimit</name>
<value>1000000</value>
<description>The limit on number of records to be kept in memory in a spill
in SequenceFiles.Sorter
</description>
</property>
<property>
<name>io.mapfile.bloom.size</name>
<value>1048576</value>
<description>The size of BloomFilter-s used in BloomMapFile. Each time this many
keys is appended the next BloomFilter will be created (inside a DynamicBloomFilter).
Larger values minimize the number of filters, which slightly increases the performance,
but may waste too much space if the total number of keys is usually much smaller
than this number.
</description>
</property>
<property>
<name>io.mapfile.bloom.error.rate</name>
<value>0.005</value>
<description>The rate of false positives in BloomFilter-s used in BloomMapFile.
As this value decreases, the size of BloomFilter-s increases exponentially. This
value is the probability of encountering false positives (default is 0.5%).
</description>
</property>
<property>
<name>hadoop.util.hash.type</name>
<value>murmur</value>
<description>The default implementation of Hash. Currently this can take one of the
two values: 'murmur' to select MurmurHash and 'jenkins' to select JenkinsHash.
</description>
</property>
<!-- ipc properties -->
<property>
<name>ipc.client.idlethreshold</name>
<value>4000</value>
<description>Defines the threshold number of connections after which
connections will be inspected for idleness.
</description>
</property>
<property>
<name>ipc.client.kill.max</name>
<value>10</value>
<description>Defines the maximum number of clients to disconnect in one go.
</description>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>10000</value>
<description>The maximum time in msec after which a client will bring down the
connection to the server.
</description>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>10</value>
<description>Indicates the number of retries a client will make to establish
a server connection.
</description>
</property>
<property>
<name>ipc.server.listen.queue.size</name>
<value>128</value>
<description>Indicates the length of the listen queue for servers accepting
client connections.
</description>
</property>
<property>
<name>ipc.server.tcpnodelay</name>
<value>false</value>
<description>Turn on/off Nagle's algorithm for the TCP socket connection on
the server. Setting to true disables the algorithm and may decrease latency
with a cost of more/smaller packets.
</description>
</property>
<property>
<name>ipc.client.tcpnodelay</name>
<value>false</value>
<description>Turn on/off Nagle's algorithm for the TCP socket connection on
the client. Setting to true disables the algorithm and may decrease latency
with a cost of more/smaller packets.
</description>
</property>
<!-- Web Interface Configuration -->
<property>
<name>webinterface.private.actions</name>
<value>false</value>
<description> If set to true, the web interfaces of JT and NN may contain
actions, such as kill job, delete file, etc., that should
not be exposed to public. Enable this option if the interfaces
are only reachable by those who have the right authorization.
</description>
</property>
<!-- Proxy Configuration -->
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.StandardSocketFactory</value>
<description> Default SocketFactory to use. This parameter is expected to be
formatted as "package.FactoryClassName".
</description>
</property>
<property>
<name>hadoop.rpc.socket.factory.class.ClientProtocol</name>
<value></value>
<description> SocketFactory to use to connect to a DFS. If null or empty, use
hadoop.rpc.socket.class.default. This socket factory is also used by
DFSClient to create sockets to DataNodes.
</description>
</property>
<property>
<name>hadoop.socks.server</name>
<value></value>
<description> Address (host:port) of the SOCKS server to be used by the
SocksSocketFactory.
</description>
</property>
<!-- Topology Configuration -->
<property>
<name>topology.node.switch.mapping.impl</name>
<value>org.apache.hadoop.net.ScriptBasedMapping</value>
<description> The default implementation of the DNSToSwitchMapping. It
invokes a script specified in topology.script.file.name to resolve
node names. If the value for topology.script.file.name is not set, the
default value of DEFAULT_RACK is returned for all node names.
</description>
</property>
<property>
<name>net.topology.impl</name>
<value>org.apache.hadoop.net.NetworkTopology</value>
<description> The default implementation of NetworkTopology which is classic three layer one.
</description>
</property>
<property>
<name>topology.script.file.name</name>
<value></value>
<description> The script name that should be invoked to resolve DNS names to
NetworkTopology names. Example: the script would take host.foo.bar as an
argument, and return /rack1 as the output.
</description>
</property>
<property>
<name>topology.script.number.args</name>
<value>100</value>
<description> The max number of args that the script configured with
topology.script.file.name should be run with. Each arg is an
IP address.
</description>
</property>
<property>
<name>hadoop.security.uid.cache.secs</name>
<value>14400</value>
<description> NativeIO maintains a cache from UID to UserName. This is
the timeout for an entry in that cache. </description>
</property>
<!-- HTTP web-consoles Authentication -->
<property>
<name>hadoop.http.authentication.type</name>
<value>simple</value>
<description>
Defines authentication used for Oozie HTTP endpoint.
Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
</description>
</property>
<property>
<name>hadoop.http.authentication.token.validity</name>
<value>36000</value>
<description>
Indicates how long (in seconds) an authentication token is valid before it has
to be renewed.
</description>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>${user.home}/hadoop-http-auth-signature-secret</value>
<description>
The signature secret for signing the authentication tokens.
If not set a random secret is generated at startup time.
The same secret should be used for JT/NN/DN/TT configurations.
</description>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
<description>
The domain to use for the HTTP cookie that stores the authentication token.
In order to authentiation to work correctly across all Hadoop nodes web-consoles
the domain must be correctly set.
IMPORTANT: when using IP addresses, browsers ignore cookies with domain settings.
For this setting to work properly all nodes in the cluster must be configured
to generate URLs with hostname.domain names on it.
</description>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>true</value>
<description>
Indicates if anonymous requests are allowed when using 'simple' authentication.
</description>
</property>
<property>
<name>hadoop.http.authentication.kerberos.principal</name>
<value>HTTP/localhost@LOCALHOST</value>
<description>
Indicates the Kerberos principal to be used for HTTP endpoint.
The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification.
</description>
</property>
<property>
<name>hadoop.http.authentication.kerberos.keytab</name>
<value>${user.home}/hadoop.keytab</value>
<description>
Location of the keytab file with the credentials for the principal.
Referring to the same keytab file Oozie uses for its Kerberos credentials for Hadoop.
</description>
</property>
<property>
<name>hadoop.relaxed.worker.version.check</name>
<value>false</value>
<description>
By default datanodes refuse to connect to namenodes if their build
revision (svn revision) do not match, and tasktrackers refuse to
connect to jobtrackers if their build version (version, revision,
user, and source checksum) do not match. This option changes the
behavior of hadoop workers to only check for a version match (eg
"1.0.2") but ignore the other build fields (revision, user, and
source checksum).
</description>
</property>
<property>
<name>hadoop.skip.worker.version.check</name>
<value>false</value>
<description>
By default datanodes refuse to connect to namenodes if their build
revision (svn revision) do not match, and tasktrackers refuse to
connect to jobtrackers if their build version (version, revision,
user, and source checksum) do not match. This option changes the
behavior of hadoop workers to skip doing a version check at all.
This option supersedes the 'hadoop.relaxed.worker.version.check'
option.
</description>
</property>
<property>
<name>hadoop.jetty.logs.serve.aliases</name>
<value>true</value>
<description>
Enable/Disable aliases serving from jetty
</description>
</property>
<property>
<name>ipc.client.fallback-to-simple-auth-allowed</name>
<value>false</value>
<description>
When a client is configured to attempt a secure connection, but attempts to
connect to an insecure server, that server may instruct the client to
switch to SASL SIMPLE (unsecure) authentication. This setting controls
whether or not the client will accept this instruction from the server.
When false (the default), the client will not allow the fallback to SIMPLE
authentication, and will abort the connection.
</description>
</property>
</configuration>

View File

@ -1,709 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into hdfs-site.xml and change them -->
<!-- there. If hdfs-site.xml does not already exist, create it. -->
<configuration>
<property>
<name>dfs.namenode.logging.level</name>
<value>info</value>
<description>The logging level for dfs namenode. Other values are "dir"(trac
e namespace mutations), "block"(trace block under/over replications and block
creations/deletions), or "all".</description>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value></value>
<description>
RPC address that handles all clients requests. If empty then we'll get the
value from fs.default.name.
The value of this property will take the form of hdfs://nn-host1:rpc-port.
</description>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:50090</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
<description>
The datanode server address and port for data transfer.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>3</value>
<description>The number of server threads for the datanode.</description>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.https.enable</name>
<value>false</value>
<description>Decide if HTTPS(SSL) is supported on HDFS
</description>
</property>
<property>
<name>dfs.https.need.client.auth</name>
<value>false</value>
<description>Whether SSL client certificate authentication is required
</description>
</property>
<property>
<name>dfs.https.server.keystore.resource</name>
<value>ssl-server.xml</value>
<description>Resource file from which ssl server keystore
information will be extracted
</description>
</property>
<property>
<name>dfs.https.client.keystore.resource</name>
<value>ssl-client.xml</value>
<description>Resource file from which ssl client keystore
information will be extracted
</description>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50470</value>
</property>
<property>
<name>dfs.datanode.dns.interface</name>
<value>default</value>
<description>The name of the Network Interface from which a data node should
report its IP address.
</description>
</property>
<property>
<name>dfs.datanode.dns.nameserver</name>
<value>default</value>
<description>The host name or IP address of the name server (DNS)
which a DataNode should use to determine the host name used by the
NameNode for communication and display purposes.
</description>
</property>
<property>
<name>dfs.replication.considerLoad</name>
<value>true</value>
<description>Decide if chooseTarget considers the target's load or not
</description>
</property>
<property>
<name>dfs.default.chunk.view.size</name>
<value>32768</value>
<description>The number of bytes to view for a file on the browser.
</description>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>0</value>
<description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.name.edits.dir</name>
<value>${dfs.name.dir}</value>
<description>Determines where on the local filesystem the DFS name node
should store the transaction (edits) file. If this is a comma-delimited list
of directories then the transaction file is replicated in all of the
directories, for redundancy. Default value is same as dfs.name.dir
</description>
</property>
<property>
<name>dfs.namenode.edits.toleration.length</name>
<value>0</value>
<description>
The length in bytes that namenode is willing to tolerate when the edit log
is corrupted. The edit log toleration feature checks the entire edit log.
It computes read length (the length of valid data), corruption length and
padding length. In case that corruption length is non-zero, the corruption
will be tolerated only if the corruption length is less than or equal to
the toleration length.
For disabling edit log toleration feature, set this property to -1. When
the feature is disabled, the end of edit log will not be checked. In this
case, namenode will startup normally even if the end of edit log is
corrupted.
</description>
</property>
<property>
<name>dfs.web.ugi</name>
<value>webuser,webgroup</value>
<description>The user account used by the web interface.
Syntax: USERNAME,GROUP1,GROUP2, ...
</description>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>
If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
</description>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>supergroup</value>
<description>The name of the group of super-users.</description>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>false</value>
<description>
If "true", access tokens are used as capabilities for accessing datanodes.
If "false", no access tokens are checked on accessing datanodes.
</description>
</property>
<property>
<name>dfs.block.access.key.update.interval</name>
<value>600</value>
<description>
Interval in minutes at which namenode updates its access keys.
</description>
</property>
<property>
<name>dfs.block.access.token.lifetime</name>
<value>600</value>
<description>The lifetime of access tokens in minutes.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>755</value>
<description>Permissions for the directories on on the local filesystem where
the DFS data node store its blocks. The permissions can either be octal or
symbolic.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.replication.max</name>
<value>512</value>
<description>Maximal block replication.
</description>
</property>
<property>
<name>dfs.replication.min</name>
<value>1</value>
<description>Minimal block replication.
</description>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
<description>The default block size for new files.</description>
</property>
<property>
<name>dfs.df.interval</name>
<value>60000</value>
<description>Disk usage statistics refresh interval in msec.</description>
</property>
<property>
<name>dfs.client.block.write.retries</name>
<value>3</value>
<description>The number of retries for writing blocks to the data nodes,
before we signal failure to the application.
</description>
</property>
<property>
<name>dfs.blockreport.intervalMsec</name>
<value>3600000</value>
<description>Determines block reporting interval in milliseconds.</description>
</property>
<property>
<name>dfs.blockreport.initialDelay</name> <value>0</value>
<description>Delay for first block report in seconds.</description>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>3</value>
<description>Determines datanode heartbeat interval in seconds.</description>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>10</value>
<description>The number of server threads for the namenode.</description>
</property>
<property>
<name>dfs.safemode.threshold.pct</name>
<value>0.999f</value>
<description>
Specifies the percentage of blocks that should satisfy
the minimal replication requirement defined by dfs.replication.min.
Values less than or equal to 0 mean not to wait for any particular
percentage of blocks before exiting safemode.
Values greater than 1 will make safe mode permanent.
</description>
</property>
<property>
<name>dfs.namenode.safemode.min.datanodes</name>
<value>0</value>
<description>
Specifies the number of datanodes that must be considered alive
before the name node exits safemode.
Values less than or equal to 0 mean not to take the number of live
datanodes into account when deciding whether to remain in safe mode
during startup.
Values greater than the number of datanodes in the cluster
will make safe mode permanent.
</description>
</property>
<property>
<name>dfs.safemode.extension</name>
<value>30000</value>
<description>
Determines extension of safe mode in milliseconds
after the threshold level is reached.
</description>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>1048576</value>
<description>
Specifies the maximum amount of bandwidth that each datanode
can utilize for the balancing purpose in term of
the number of bytes per second.
</description>
</property>
<property>
<name>dfs.hosts</name>
<value></value>
<description>Names a file that contains a list of hosts that are
permitted to connect to the namenode. The full pathname of the file
must be specified. If the value is empty, all hosts are
permitted.</description>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value></value>
<description>Names a file that contains a list of hosts that are
not permitted to connect to the namenode. The full pathname of the
file must be specified. If the value is empty, no hosts are
excluded.</description>
</property>
<property>
<name>dfs.max.objects</name>
<value>0</value>
<description>The maximum number of files, directories and blocks
dfs supports. A value of zero indicates no limit to the number
of objects that dfs supports.
</description>
</property>
<property>
<name>dfs.namenode.decommission.interval</name>
<value>30</value>
<description>Namenode periodicity in seconds to check if decommission is
complete.</description>
</property>
<property>
<name>dfs.namenode.decommission.nodes.per.interval</name>
<value>5</value>
<description>The number of nodes namenode checks if decommission is complete
in each dfs.namenode.decommission.interval.</description>
</property>
<property>
<name>dfs.replication.interval</name>
<value>3</value>
<description>The periodicity in seconds with which the namenode computes
repliaction work for datanodes. </description>
</property>
<property>
<name>dfs.access.time.precision</name>
<value>3600000</value>
<description>The access time for HDFS file is precise upto this value.
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
</description>
</property>
<property>
<name>dfs.support.append</name>
<description>
This option is no longer supported. HBase no longer requires that
this option be enabled as sync is now enabled by default. See
HADOOP-8230 for additional information.
</description>
</property>
<property>
<name>dfs.namenode.delegation.key.update-interval</name>
<value>86400000</value>
<description>The update interval for master key for delegation tokens
in the namenode in milliseconds.
</description>
</property>
<property>
<name>dfs.namenode.delegation.token.max-lifetime</name>
<value>604800000</value>
<description>The maximum lifetime in milliseconds for which a delegation
token is valid.
</description>
</property>
<property>
<name>dfs.namenode.delegation.token.renew-interval</name>
<value>86400000</value>
<description>The renewal interval for delegation token in milliseconds.
</description>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
<description>The number of volumes that are allowed to
fail before a datanode stops offering service. By default
any volume failure will cause a datanode to shutdown.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
<description>Specifies the maximum number of threads to use for transferring data
in and out of the DN.
</description>
</property>
<property>
<name>dfs.datanode.readahead.bytes</name>
<value>4193404</value>
<description>
While reading block files, if the Hadoop native libraries are available,
the datanode can use the posix_fadvise system call to explicitly
page data into the operating system buffer cache ahead of the current
reader's position. This can improve performance especially when
disks are highly contended.
This configuration specifies the number of bytes ahead of the current
read position which the datanode will attempt to read ahead. This
feature may be disabled by configuring this property to 0.
If the native libraries are not available, this configuration has no
effect.
</description>
</property>
<property>
<name>dfs.datanode.drop.cache.behind.reads</name>
<value>false</value>
<description>
In some workloads, the data read from HDFS is known to be significantly
large enough that it is unlikely to be useful to cache it in the
operating system buffer cache. In this case, the DataNode may be
configured to automatically purge all data from the buffer cache
after it is delivered to the client. This behavior is automatically
disabled for workloads which read only short sections of a block
(e.g HBase random-IO workloads).
This may improve performance for some workloads by freeing buffer
cache spage usage for more cacheable data.
If the Hadoop native libraries are not available, this configuration
has no effect.
</description>
</property>
<property>
<name>dfs.datanode.drop.cache.behind.writes</name>
<value>false</value>
<description>
In some workloads, the data written to HDFS is known to be significantly
large enough that it is unlikely to be useful to cache it in the
operating system buffer cache. In this case, the DataNode may be
configured to automatically purge all data from the buffer cache
after it is written to disk.
This may improve performance for some workloads by freeing buffer
cache spage usage for more cacheable data.
If the Hadoop native libraries are not available, this configuration
has no effect.
</description>
</property>
<property>
<name>dfs.datanode.sync.behind.writes</name>
<value>false</value>
<description>
If this configuration is enabled, the datanode will instruct the
operating system to enqueue all written data to the disk immediately
after it is written. This differs from the usual OS policy which
may wait for up to 30 seconds before triggering writeback.
This may improve performance for some workloads by smoothing the
IO profile for data written to disk.
If the Hadoop native libraries are not available, this configuration
has no effect.
</description>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
<description>Whether datanodes should use datanode hostnames when
connecting to other datanodes for data transfer.
</description>
</property>
<property>
<name>dfs.client.local.interfaces</name>
<value></value>
<description>A comma separated list of network interface names to use
for data transfer between the client and datanodes. When creating
a connection to read from or write to a datanode, the client
chooses one of the specified interfaces at random and binds its
socket to the IP of that interface. Individual names may be
specified as either an interface name (eg "eth0"), a subinterface
name (eg "eth0:0"), or an IP address (which may be specified using
CIDR notation to match a range of IPs).
</description>
</property>
<property>
<name>dfs.image.transfer.bandwidthPerSec</name>
<value>0</value>
<description>
Specifies the maximum amount of bandwidth that can be utilized
for image transfer in term of the number of bytes per second.
A default value of 0 indicates that throttling is disabled.
</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>false</value>
<description>
Enable WebHDFS (REST API) in Namenodes and Datanodes.
</description>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
<value>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
<name>dfs.namenode.invalidate.work.pct.per.iteration</name>
<value>0.32f</value>
<description>
*Note*: Advanced property. Change with caution.
This determines the percentage amount of block
invalidations (deletes) to do over a single DN heartbeat
deletion command. The final deletion count is determined by applying this
percentage to the number of live nodes in the system.
The resultant number is the number of blocks from the deletion list
chosen for proper invalidation over a single heartbeat of a single DN.
Value should be a positive, non-zero percentage in float notation (X.Yf),
with 1.0f meaning 100%.
</description>
</property>
<property>
<name>dfs.namenode.replication.work.multiplier.per.iteration</name>
<value>2</value>
<description>
*Note*: Advanced property. Change with caution.
This determines the total amount of block transfers to begin in
parallel at a DN, for replication, when such a command list is being
sent over a DN heartbeat by the NN. The actual number is obtained by
multiplying this multiplier with the total number of live nodes in the
cluster. The result number is the number of blocks to begin transfers
immediately for, per DN heartbeat. This number can be any positive,
non-zero integer.
</description>
</property>
<property>
<name>dfs.namenode.avoid.read.stale.datanode</name>
<value>false</value>
<description>
Indicate whether or not to avoid reading from &quot;stale&quot; datanodes whose
heartbeat messages have not been received by the namenode
for more than a specified time interval. Stale datanodes will be
moved to the end of the node list returned for reading. See
dfs.namenode.avoid.write.stale.datanode for a similar setting for writes.
</description>
</property>
<property>
<name>dfs.namenode.avoid.write.stale.datanode</name>
<value>false</value>
<description>
Indicate whether or not to avoid writing to &quot;stale&quot; datanodes whose
heartbeat messages have not been received by the namenode
for more than a specified time interval. Writes will avoid using
stale datanodes unless more than a configured ratio
(dfs.namenode.write.stale.datanode.ratio) of datanodes are marked as
stale. See dfs.namenode.avoid.read.stale.datanode for a similar setting
for reads.
</description>
</property>
<property>
<name>dfs.namenode.stale.datanode.interval</name>
<value>30000</value>
<description>
Default time interval for marking a datanode as "stale", i.e., if
the namenode has not received heartbeat msg from a datanode for
more than this time interval, the datanode will be marked and treated
as "stale" by default. The stale interval cannot be too small since
otherwise this may cause too frequent change of stale states.
We thus set a minimum stale interval value (the default value is 3 times
of heartbeat interval) and guarantee that the stale interval cannot be less
than the minimum value.
</description>
</property>
<property>
<name>dfs.namenode.write.stale.datanode.ratio</name>
<value>0.5f</value>
<description>
When the ratio of number stale datanodes to total datanodes marked
is greater than this ratio, stop avoiding writing to stale nodes so
as to prevent causing hotspots.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value></value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.namenode.plugins</name>
<value></value>
<description>Comma-separated list of namenode plug-ins to be activated.
</description>
</property>
</configuration>

View File

@ -1,7 +0,0 @@
#!/bin/bash
if [ $test_only -eq 0 ]; then
systemctl stop hadoop-hdfs-datanode
systemctl stop hadoop-hdfs-namenode
else
exit 0
fi

View File

@ -1,43 +0,0 @@
#!/bin/bash
CDH_VERSION=5.11
CDH_MINOR_VERSION=5.11.0
if [ ! -f /etc/yum.repos.d/cloudera-cdh5.repo ]; then
if [ $test_only -eq 0 ]; then
echo '[cloudera-cdh5]' > /etc/yum.repos.d/cloudera-cdh5.repo
echo "name=Cloudera's Distribution for Hadoop, Version 5" >> /etc/yum.repos.d/cloudera-cdh5.repo
echo "baseurl=http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/$CDH_MINOR_VERSION/" >> /etc/yum.repos.d/cloudera-cdh5.repo
echo "gpgkey = http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/RPM-GPG-KEY-cloudera" >> /etc/yum.repos.d/cloudera-cdh5.repo
echo 'gpgcheck = 1' >> /etc/yum.repos.d/cloudera-cdh5.repo
echo '[cloudera-manager]' > /etc/yum.repos.d/cloudera-manager.repo
echo 'name=Cloudera Manager' >> /etc/yum.repos.d/cloudera-manager.repo
echo "baseurl=http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/$CDH_MINOR_VERSION/" >> /etc/yum.repos.d/cloudera-manager.repo
echo "gpgkey = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/RPM-GPG-KEY-cloudera" >> /etc/yum.repos.d/cloudera-manager.repo
echo 'gpgcheck = 1' >> /etc/yum.repos.d/cloudera-manager.repo
echo '[navigator-keytrustee]' > /etc/yum.repos.d/kms.repo
echo "name=Cloudera's Distribution for navigator-Keytrustee, Version 5" >> /etc/yum.repos.d/kms.repo
RETURN_CODE="$(curl -s -o /dev/null -w "%{http_code}" http://archive.cloudera.com/navigator-keytrustee5/redhat/7/x86_64/navigator-keytrustee/$CDH_MINOR_VERSION/)"
if [ "$RETURN_CODE" == "404" ]; then
echo "baseurl=http://archive.cloudera.com/navigator-keytrustee5/redhat/7/x86_64/navigator-keytrustee/$CDH_VERSION/" >> /etc/yum.repos.d/kms.repo
else
echo "baseurl=http://archive.cloudera.com/navigator-keytrustee5/redhat/7/x86_64/navigator-keytrustee/$CDH_MINOR_VERSION/" >> /etc/yum.repos.d/kms.repo
fi
echo "gpgkey = http://archive.cloudera.com/navigator-keytrustee5/redhat/7/x86_64/navigator-keytrustee/RPM-GPG-KEY-cloudera" >> /etc/yum.repos.d/kms.repo
echo 'gpgcheck = 1' >> /etc/yum.repos.d/kms.repo
echo "[cloudera-kafka]" > /etc/yum.repos.d/cloudera-kafka.repo
echo "name=Cloudera's Distribution for kafka, Version 2.2.0" >> /etc/yum.repos.d/cloudera-kafka.repo
echo "baseurl=http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/2.2.0/" >> /etc/yum.repos.d/cloudera-kafka.repo
echo "gpgkey = http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/RPM-GPG-KEY-cloudera" >> /etc/yum.repos.d/cloudera-kafka.repo
echo "gpgcheck = 1" >> /etc/yum.repos.d/cloudera-kafka.repo
yum clean all
else
exit 0
fi
fi

View File

@ -1,20 +0,0 @@
#!/bin/bash
hadoop="2.6.0"
HDFS_LIB_DIR=${hdfs_lib_dir:-"/usr/share/hadoop/lib"}
HADOOP_SWIFT_JAR_NAME="hadoop-openstack.jar"
if [ $test_only -eq 0 ]; then
mkdir -p $HDFS_LIB_DIR
curl -sS -o $HDFS_LIB_DIR/$HADOOP_SWIFT_JAR_NAME $swift_url
if [ $? -ne 0 ]; then
echo -e "Could not download Swift Hadoop FS implementation.\nAborting"
exit 1
fi
chmod 0644 $HDFS_LIB_DIR/$HADOOP_SWIFT_JAR_NAME
else
exit 0
fi

View File

@ -1,30 +0,0 @@
#!/bin/bash
EXTJS_DESTINATION_DIR="/var/lib/oozie"
EXTJS_DOWNLOAD_URL="https://tarballs.openstack.org/sahara-extra/dist/common-artifacts/ext-2.2.zip"
extjs_basepath=$(basename ${EXTJS_DOWNLOAD_URL})
extjs_archive=/tmp/${extjs_basepath}
extjs_folder="${extjs_basepath%.*}"
setup_extjs() {
curl -sS -o $extjs_archive $EXTJS_DOWNLOAD_URL
mkdir -p $EXTJS_DESTINATION_DIR
}
if [ -z "${EXTJS_NO_UNPACK:-}" ]; then
if [ ! -d "${EXTJS_DESTINATION_DIR}/${extjs_folder}" ]; then
setup_extjs
unzip -o -d "$EXTJS_DESTINATION_DIR" $extjs_archive
rm -f $extjs_archive
else
exit 0
fi
else
if [ ! -f "${EXTJS_DESTINATION_DIR}/${extjs_basepath}" ]; then
setup_extjs
mv $extjs_archive $EXTJS_DESTINATION_DIR
else
exit 0
fi
fi

View File

@ -1,41 +0,0 @@
#!/bin/bash
tmp_dir=/tmp/spark
CDH_VERSION=5.11
mkdir -p $tmp_dir
if [ ! -d /opt/spark ]; then
if [ $test_only -eq 0 ]; then
# The user is not providing his own Spark distribution package
if [ -z "${SPARK_DOWNLOAD_URL:-}" ]; then
# Check hadoop version
# INFO on hadoop versions: http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
# Now the below is just a sanity check
if [ -z "${SPARK_HADOOP_DL:-}" ]; then
SPARK_HADOOP_DL=hadoop2.7
fi
SPARK_DOWNLOAD_URL="http://archive.apache.org/dist/spark/spark-$plugin_version/spark-$plugin_version-bin-$SPARK_HADOOP_DL.tgz"
fi
echo "Downloading SPARK"
spark_file=$(basename "$SPARK_DOWNLOAD_URL")
wget -O $tmp_dir/$spark_file $SPARK_DOWNLOAD_URL
echo "$SPARK_DOWNLOAD_URL" > $tmp_dir/spark_url.txt
echo "Extracting SPARK"
extract_folder=$(tar tzf $tmp_dir/$spark_file | sed -e 's@/.*@@' | uniq)
echo "Decompressing Spark..."
tar xzf $tmp_dir/$spark_file
rm $tmp_dir/$spark_file
echo "Moving SPARK to /opt/"
# Placing spark in /opt/spark
mv $extract_folder /opt/spark/
mv $tmp_dir/spark_url.txt /opt/spark/
rm -Rf $tmp_dir
else
exit 1
fi
fi

View File

@ -1,12 +0,0 @@
#!/bin/bash
SPARK_JARS_DIR_PATH="/opt/spark/jars"
HADOOP_TOOLS_DIR_PATH="/opt/hadoop/share/hadoop/tools/lib"
HADOOP_COMMON_DIR_PATH="/opt/hadoop/share/hadoop/common/lib"
# The hadoop-aws and aws-java-sdk libraries are missing here, but we
# cannot copy them from the Hadoop folder on-disk due to
# version/patching issues
curl -sS https://tarballs.openstack.org/sahara-extra/dist/common-artifacts/hadoop-aws-2.7.3.jar -o $SPARK_JARS_DIR_PATH/hadoop-aws.jar
curl -sS https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar -o $SPARK_JARS_DIR_PATH/aws-java-sdk.jar

View File

@ -1,61 +0,0 @@
arguments:
plugin_version:
description: The version of Spark to install. Defaults to 2.3.2
default: 2.3.2
choices:
- 2.3.2
- 2.3.1
- 2.3.0
- 2.2.1
- 2.2.0
java_distro:
default: openjdk
description: The distribution of Java to install. Defaults to openjdk.
choices:
- openjdk
- oracle-java
hdfs_lib_dir:
default: /usr/lib/hadoop-mapreduce
description: The path to HDFS lib. Defaults to /usr/lib/hadoop-mapreduce.
required: False
swift_url:
default: https://tarballs.openstack.org/sahara-extra/dist/hadoop-openstack/master/hadoop-openstack-2.6.0.jar
description: Location of the swift jar file.
required: False
validators:
- os_case:
- redhat:
- package: wget
- script: centos/wget_cdh_repo
- ubuntu:
- script: ubuntu/wget_cdh_repo
- argument_case:
argument_name: java_distro
cases:
openjdk:
- os_case:
- redhat:
- package: java-1.8.0-openjdk-devel
- ubuntu:
- package: openjdk-8-jdk
- script:
common/install_spark:
env_vars: [plugin_version, cdh_version]
- os_case:
- ubuntu:
- script: ubuntu/config_spark
- package: ntp
- package:
- hadoop-hdfs-namenode
- hadoop-hdfs-datanode
- script: common/install_extjs
- os_case:
- redhat:
- script: centos/turn_off_services
- ubuntu:
- script: ubuntu/turn_off_services
- script: common/manipulate_s3
- script:
common/add_jar:
env_vars: [hdfs_lib_dir, swift_url]

View File

@ -1,12 +0,0 @@
#!/bin/bash
firstboot_script_name="/opt/spark/firstboot.sh"
sed -i -e "s,^exit 0$,[ -f $firstboot_script_name ] \&\& sh $firstboot_script_name; exit 0," /etc/rc.local
user_and_group_names="ubuntu:ubuntu"
cat >> $firstboot_script_name <<EOF
#!/bin/sh
chown -R $user_and_group_names /opt/spark
chown -R $user_and_group_names /etc/hadoop
rm $firstboot_script_name
EOF

View File

@ -1,7 +0,0 @@
#!/bin/bash
if [ $test_only -eq 0 ]; then
update-rc.d -f hadoop-hdfs-datanode remove
update-rc.d -f hadoop-hdfs-namenode remove
else
exit 0
fi

View File

@ -1,36 +0,0 @@
#!/bin/bash
CDH_VERSION=5.11
if [ ! -f /etc/apt/sources.list.d/cdh5.list ]; then
if [ $test_only -eq 0 ]; then
# Add repository with postgresql package (it's dependency of cloudera packages)
# Base image doesn't contain this repo
echo 'deb http://nova.clouds.archive.ubuntu.com/ubuntu/ xenial universe multiverse main' >> /etc/apt/sources.list
# Cloudera repositories
echo "deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh$CDH_VERSION contrib" > /etc/apt/sources.list.d/cdh5.list
echo "deb-src http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh$CDH_VERSION contrib" >> /etc/apt/sources.list.d/cdh5.list
wget -qO - http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key | apt-key add -
echo "deb [arch=amd64] http://archive.cloudera.com/cm5/ubuntu/xenial/amd64/cm xenial-cm$CDH_VERSION contrib" > /etc/apt/sources.list.d/cm5.list
echo "deb-src http://archive.cloudera.com/cm5/ubuntu/xenial/amd64/cm xenial-cm$CDH_VERSION contrib" >> /etc/apt/sources.list.d/cm5.list
wget -qO - http://archive.cloudera.com/cm5/ubuntu/xenial/amd64/cm/archive.key | apt-key add -
wget -O /etc/apt/sources.list.d/kms.list http://archive.cloudera.com/navigator-keytrustee5/ubuntu/xenial/amd64/navigator-keytrustee/cloudera.list
wget -qO - http://archive.cloudera.com/navigator-keytrustee5/ubuntu/xenial/amd64/navigator-keytrustee/archive.key | apt-key add -
# add Kafka repository
echo 'deb http://archive.cloudera.com/kafka/ubuntu/xenial/amd64/kafka/ xenial-kafka2.2.0 contrib' >> /etc/apt/sources.list
wget -qO - https://archive.cloudera.com/kafka/ubuntu/xenial/amd64/kafka/archive.key | apt-key add -
#change repository priority
echo 'Package: zookeeper\nPin: origin "archive.cloudera.com"\nPin-Priority: 1001' > /etc/apt/preferences.d/cloudera-pin
apt-get update
else
exit 0
fi
fi

View File

@ -1,2 +0,0 @@
# Cleans up old Spark job directories once per hour.
0 * * * * root /etc/hadoop/tmp-cleanup.sh

View File

@ -1,21 +0,0 @@
#!/usr/bin/env bash
# This file contains environment variables required to run Spark. Copy it as
# spark-env.sh and edit that to configure Spark for your site.
#
# The following variables can be set in this file:
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
# we recommend setting app-wide options in the application's driver program.
# Examples of node-specific options : -Dspark.local.dir, GC options
# Examples of app-wide options : -Dspark.serializer
#
# If using the standalone deploy mode, you can also set variables for it here:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node

View File

@ -1,48 +0,0 @@
#!/bin/sh
MINIMUM_CLEANUP_MEGABYTES={minimum_cleanup_megabytes}
MINIMUM_CLEANUP_SECONDS={minimum_cleanup_seconds}
MAXIMUM_CLEANUP_SECONDS={maximum_cleanup_seconds}
CURRENT_TIMESTAMP=`date +%s`
POSSIBLE_CLEANUP_THRESHOLD=$(($CURRENT_TIMESTAMP - $MINIMUM_CLEANUP_SECONDS))
DEFINITE_CLEANUP_THRESHOLD=$(($CURRENT_TIMESTAMP - $MAXIMUM_CLEANUP_SECONDS))
unset MAY_DELETE
unset WILL_DELETE
if [ ! -d /tmp/spark-edp ]
then
exit 0
fi
cd /tmp/spark-edp
for JOB in $(find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n')
do
for EXECUTION in $(find $JOB -maxdepth 1 -mindepth 1 -type d -printf '%f\n')
do
TIMESTAMP=`stat $JOB/$EXECUTION --printf '%Y'`
if [[ $TIMESTAMP -lt $DEFINITE_CLEANUP_THRESHOLD ]]
then
WILL_DELETE="$WILL_DELETE $JOB/$EXECUTION"
else
if [[ $TIMESTAMP -lt $POSSIBLE_CLEANUP_THRESHOLD ]]
then
MAY_DELETE="$MAY_DELETE $JOB/$EXECUTION"
fi
fi
done
done
for EXECUTION in $WILL_DELETE
do
rm -Rf $EXECUTION
done
for EXECUTION in $(ls $MAY_DELETE -trd)
do
if [[ `du -s -BM | grep -o '[0-9]\+'` -le $MINIMUM_CLEANUP_MEGABYTES ]]; then
break
fi
rm -Rf $EXECUTION
done

View File

@ -1,21 +0,0 @@
#!/bin/bash
HADOOP_CONF=/etc/hadoop
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/topology.data
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/default/rack "
else
echo -n "$result "
fi
done

View File

@ -1,90 +0,0 @@
# Copyright (c) 2014 Hoang Do, Phuc Vo, P. Michiardi, D. Venzano
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from oslo_log import log as logging
from sahara.plugins import utils
from sahara_plugin_spark.i18n import _
from sahara_plugin_spark.plugins.spark import config_helper as c_helper
LOG = logging.getLogger(__name__)
def start_processes(remote, *processes):
for proc in processes:
if proc == "namenode":
remote.execute_command("sudo service hadoop-hdfs-namenode start")
elif proc == "datanode":
remote.execute_command("sudo service hadoop-hdfs-datanode start")
else:
remote.execute_command("screen -d -m sudo hadoop %s" % proc)
def refresh_nodes(remote, service):
remote.execute_command("sudo -u hdfs hadoop %s -refreshNodes"
% service)
def format_namenode(nn_remote):
nn_remote.execute_command("sudo -u hdfs hadoop namenode -format")
def clean_port_hadoop(nn_remote):
nn_remote.execute_command(("sudo netstat -tlnp"
"| awk '/:8020 */"
"{split($NF,a,\"/\"); print a[1]}'"
"| xargs sudo kill -9"))
def start_spark_master(nn_remote, sp_home):
nn_remote.execute_command("bash " + os.path.join(sp_home,
"sbin/start-all.sh"))
def stop_spark(nn_remote, sp_home):
nn_remote.execute_command("bash " + os.path.join(sp_home,
"sbin/stop-all.sh"))
@utils.event_wrapper(
True, step=_("Await DataNodes start up"), param=("cluster", 0))
def await_datanodes(cluster):
datanodes_count = len(utils.get_instances(cluster, "datanode"))
if datanodes_count < 1:
return
log_msg = _("Waiting on %d DataNodes to start up") % datanodes_count
with utils.get_instance(cluster, "namenode").remote() as r:
utils.plugin_option_poll(
cluster, _check_datanodes_count,
c_helper.DATANODES_STARTUP_TIMEOUT,
log_msg, 1, {"remote": r, "count": datanodes_count})
def _check_datanodes_count(remote, count):
if count < 1:
return True
LOG.debug("Checking DataNodes count")
ex_code, stdout = remote.execute_command(
'sudo su -lc "hdfs dfsadmin -report" hdfs | '
r'grep \'Live datanodes\|Datanodes available:\' | '
r'grep -o \'[0-9]\+\' | head -n 1')
LOG.debug("DataNodes count='{count}'".format(count=stdout.strip()))
return stdout and int(stdout) == count

View File

@ -1,106 +0,0 @@
# Copyright (c) 2014 Hoang Do, Phuc Vo, P. Michiardi, D. Venzano
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from sahara.plugins import context
from sahara.plugins import utils
from sahara_plugin_spark.i18n import _
from sahara_plugin_spark.plugins.spark import config_helper as c_helper
from sahara_plugin_spark.plugins.spark import run_scripts as run
@utils.event_wrapper(True, step=_("Decommission %s") % "Slaves")
def decommission_sl(master, inst_to_be_deleted, survived_inst):
if survived_inst is not None:
slavenames = []
for slave in survived_inst:
slavenames.append(slave.hostname())
slaves_content = c_helper.generate_spark_slaves_configs(slavenames)
else:
slaves_content = "\n"
cluster = master.cluster
sp_home = utils.get_config_value_or_default("Spark", "Spark home", cluster)
r_master = utils.get_remote(master)
run.stop_spark(r_master, sp_home)
# write new slave file to master
files = {os.path.join(sp_home, 'conf/slaves'): slaves_content}
r_master.write_files_to(files)
# write new slaves file to each survived slave as well
for i in survived_inst:
with utils.get_remote(i) as r:
r.write_files_to(files)
run.start_spark_master(r_master, sp_home)
def _is_decommissioned(r, inst_to_be_deleted):
cmd = r.execute_command("sudo -u hdfs hadoop dfsadmin -report")
datanodes_info = parse_dfs_report(cmd[1])
for i in inst_to_be_deleted:
for dn in datanodes_info:
if (dn["Name"].startswith(i.internal_ip)) and (
dn["Decommission Status"] != "Decommissioned"):
return False
return True
@utils.event_wrapper(True, step=_("Decommission %s") % "DataNodes")
def decommission_dn(nn, inst_to_be_deleted, survived_inst):
with utils.get_remote(nn) as r:
r.write_file_to('/etc/hadoop/dn.excl',
utils.generate_fqdn_host_names(
inst_to_be_deleted))
run.refresh_nodes(utils.get_remote(nn), "dfsadmin")
context.sleep(3)
utils.plugin_option_poll(
nn.cluster, _is_decommissioned, c_helper.DECOMMISSIONING_TIMEOUT,
_("Decommission %s") % "DataNodes", 3, {
'r': r, 'inst_to_be_deleted': inst_to_be_deleted})
r.write_files_to({
'/etc/hadoop/dn.incl': utils.
generate_fqdn_host_names(survived_inst),
'/etc/hadoop/dn.excl': ""})
def parse_dfs_report(cmd_output):
report = cmd_output.rstrip().split(os.linesep)
array = []
started = False
for line in report:
if started:
array.append(line)
if line.startswith("Datanodes available"):
started = True
res = []
datanode_info = {}
for i in range(0, len(array)):
if array[i]:
idx = str.find(array[i], ':')
name = array[i][0:idx]
value = array[i][idx + 2:]
datanode_info[name.strip()] = value.strip()
if not array[i] and datanode_info:
res.append(datanode_info)
datanode_info = {}
if datanode_info:
res.append(datanode_info)
return res

View File

@ -1,28 +0,0 @@
# Copyright (c) 2015 OpenStack Foundation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from sahara.plugins import edp
from sahara.plugins import utils as plugin_utils
class ShellEngine(edp.PluginsSparkShellJobEngine):
def __init__(self, cluster):
super(ShellEngine, self).__init__(cluster)
self.master = plugin_utils.get_instance(cluster, "master")
@staticmethod
def job_type_supported(job_type):
return (job_type in edp.PluginsSparkShellJobEngine.
get_supported_job_types())

View File

@ -1,17 +0,0 @@
# Copyright (c) 2014 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from sahara_plugin_spark.utils import patches
patches.patch_all()

View File

@ -1,53 +0,0 @@
# Copyright (c) 2013 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from oslotest import base
from sahara.plugins import context
from sahara.plugins import db as db_api
from sahara.plugins import main
from sahara.plugins import utils
class SaharaTestCase(base.BaseTestCase):
def setUp(self):
super(SaharaTestCase, self).setUp()
self.setup_context()
utils.rpc_setup('all-in-one')
def setup_context(self, username="test_user", tenant_id="tenant_1",
auth_token="test_auth_token", tenant_name='test_tenant',
service_catalog=None, **kwargs):
self.addCleanup(context.set_ctx,
context.ctx() if context.has_ctx() else None)
context.set_ctx(context.PluginsContext(
username=username, tenant_id=tenant_id,
auth_token=auth_token, service_catalog=service_catalog or {},
tenant_name=tenant_name, **kwargs))
def override_config(self, name, override, group=None):
main.set_override(name, override, group)
self.addCleanup(main.clear_override, name, group)
class SaharaWithDbTestCase(SaharaTestCase):
def setUp(self):
super(SaharaWithDbTestCase, self).setUp()
self.override_config('connection', "sqlite://", group='database')
db_api.setup_db()
self.addCleanup(db_api.drop_db)

View File

@ -1,99 +0,0 @@
# Copyright (c) 2014 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import copy
import xml.dom.minidom as xml
from unittest import mock
from sahara.plugins import swift_helper as swift
from sahara.plugins import utils
from sahara_plugin_spark.plugins.spark import config_helper as c_helper
from sahara_plugin_spark.tests.unit import base as test_base
class ConfigHelperUtilsTest(test_base.SaharaTestCase):
def test_make_hadoop_path(self):
storage_paths = ['/mnt/one', '/mnt/two']
paths = c_helper.make_hadoop_path(storage_paths, '/spam')
expected = ['/mnt/one/spam', '/mnt/two/spam']
self.assertEqual(expected, paths)
@mock.patch('sahara.plugins.utils.get_config_value_or_default')
def test_cleanup_configs(self, get_config_value):
getter = lambda plugin, key, cluster: plugin_configs[key] # noqa: E731
get_config_value.side_effect = getter
plugin_configs = {"Minimum cleanup megabytes": 4096,
"Minimum cleanup seconds": 86400,
"Maximum cleanup seconds": 1209600}
configs = c_helper.generate_job_cleanup_config(None)
self.assertTrue(configs['valid'])
expected = ["MINIMUM_CLEANUP_MEGABYTES=4096",
"MINIMUM_CLEANUP_SECONDS=86400",
"MAXIMUM_CLEANUP_SECONDS=1209600"]
for config_value in expected:
self.assertIn(config_value, configs['script'])
self.assertIn("0 * * * * root /etc/hadoop/tmp-cleanup.sh",
configs['cron'][0])
plugin_configs['Maximum cleanup seconds'] = 0
configs = c_helper.generate_job_cleanup_config(None)
self.assertFalse(configs['valid'])
self.assertNotIn(configs, 'script')
self.assertNotIn(configs, 'cron')
plugin_configs = {"Minimum cleanup megabytes": 0,
"Minimum cleanup seconds": 0,
"Maximum cleanup seconds": 1209600}
configs = c_helper.generate_job_cleanup_config(None)
self.assertFalse(configs['valid'])
self.assertNotIn(configs, 'script')
self.assertNotIn(configs, 'cron')
@mock.patch("sahara.plugins.swift_utils.retrieve_auth_url")
def test_generate_xml_configs(self, auth_url):
auth_url.return_value = "http://localhost:5000/v2/"
# Make a dict of swift configs to verify generated values
swift_vals = c_helper.extract_name_values(swift.get_swift_configs())
# Make sure that all the swift configs are in core-site
c = c_helper.generate_xml_configs({}, ['/mnt/one'], 'localhost', None)
doc = xml.parseString(c['core-site'])
configuration = doc.getElementsByTagName('configuration')
properties = utils.get_property_dict(configuration[0])
self.assertDictContainsSubset(swift_vals, properties)
# Make sure that user values have precedence over defaults
c = c_helper.generate_xml_configs(
{'HDFS': {'fs.swift.service.sahara.tenant': 'fred'}},
['/mnt/one'], 'localhost', None)
doc = xml.parseString(c['core-site'])
configuration = doc.getElementsByTagName('configuration')
properties = utils.get_property_dict(configuration[0])
mod_swift_vals = copy.copy(swift_vals)
mod_swift_vals['fs.swift.service.sahara.tenant'] = 'fred'
self.assertDictContainsSubset(mod_swift_vals, properties)
# Make sure that swift configs are left out if not enabled
c = c_helper.generate_xml_configs(
{'HDFS': {'fs.swift.service.sahara.tenant': 'fred'},
'general': {'Enable Swift': False}},
['/mnt/one'], 'localhost', None)
doc = xml.parseString(c['core-site'])
configuration = doc.getElementsByTagName('configuration')
properties = utils.get_property_dict(configuration[0])
for key in mod_swift_vals.keys():
self.assertNotIn(key, properties)

View File

@ -1,87 +0,0 @@
# Copyright (c) 2013 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from sahara.plugins import base as pb
from sahara.plugins import edp
from sahara_plugin_spark.plugins.spark import plugin as pl
from sahara_plugin_spark.tests.unit import base
class SparkPluginTest(base.SaharaWithDbTestCase):
def setUp(self):
super(SparkPluginTest, self).setUp()
self.override_config("plugins", ["spark"])
pb.setup_plugins()
def _init_cluster_dict(self, version):
cluster_dict = {
'name': 'cluster',
'plugin_name': 'spark',
'hadoop_version': version,
'default_image_id': 'image'}
return cluster_dict
class SparkProviderTest(base.SaharaTestCase):
def setUp(self):
super(SparkProviderTest, self).setUp()
def test_supported_job_types(self):
provider = pl.SparkProvider()
res = provider.get_edp_job_types()
self.assertEqual([edp.JOB_TYPE_SHELL, edp.JOB_TYPE_SPARK],
res['1.6.0'])
self.assertEqual([edp.JOB_TYPE_SHELL, edp.JOB_TYPE_SPARK],
res['2.1.0'])
self.assertEqual([edp.JOB_TYPE_SHELL, edp.JOB_TYPE_SPARK],
res['2.2'])
self.assertEqual([edp.JOB_TYPE_SHELL, edp.JOB_TYPE_SPARK],
res['2.3'])
def test_edp_config_hints(self):
provider = pl.SparkProvider()
res = provider.get_edp_config_hints(edp.JOB_TYPE_SHELL, "1.6.0")
self.assertEqual({'configs': {}, 'args': [], 'params': {}},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SPARK, "1.6.0")
self.assertEqual({'args': [], 'configs': []},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SPARK, "2.1.0")
self.assertEqual({'args': [], 'configs': []},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SHELL, "2.1.0")
self.assertEqual({'args': [], 'configs': {}, 'params': {}},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SPARK, "2.2")
self.assertEqual({'args': [], 'configs': []},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SHELL, "2.2")
self.assertEqual({'args': [], 'configs': {}, 'params': {}},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SPARK, "2.3")
self.assertEqual({'args': [], 'configs': []},
res['job_config'])
res = provider.get_edp_config_hints(edp.JOB_TYPE_SHELL, "2.3")
self.assertEqual({'args': [], 'configs': {}, 'params': {}},
res['job_config'])

View File

@ -1,108 +0,0 @@
# Copyright (c) 2013 Mirantis Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import eventlet
EVENTLET_MONKEY_PATCH_MODULES = dict(os=True,
select=True,
socket=True,
thread=True,
time=True)
def patch_all():
"""Apply all patches.
List of patches:
* eventlet's monkey patch for all cases;
* minidom's writexml patch for py < 2.7.3 only.
"""
eventlet_monkey_patch()
patch_minidom_writexml()
def eventlet_monkey_patch():
"""Apply eventlet's monkey patch.
This call should be the first call in application. It's safe to call
monkey_patch multiple times.
"""
eventlet.monkey_patch(**EVENTLET_MONKEY_PATCH_MODULES)
def eventlet_import_monkey_patched(module):
"""Returns module monkey patched by eventlet.
It's needed for some tests, for example, context test.
"""
return eventlet.import_patched(module, **EVENTLET_MONKEY_PATCH_MODULES)
def patch_minidom_writexml():
"""Patch for xml.dom.minidom toprettyxml bug with whitespaces around text
We apply the patch to avoid excess whitespaces in generated xml
configuration files that brakes Hadoop.
(This patch will be applied for all Python versions < 2.7.3)
Issue: http://bugs.python.org/issue4147
Patch: http://hg.python.org/cpython/rev/cb6614e3438b/
Description: http://ronrothman.com/public/leftbraned/xml-dom-minidom-\
toprettyxml-and-silly-whitespace/#best-solution
"""
import sys
if sys.version_info >= (2, 7, 3):
return
import xml.dom.minidom as md
def element_writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent + "<" + self.tagName)
attrs = self._get_attributes()
a_names = list(attrs.keys())
a_names.sort()
for a_name in a_names:
writer.write(" %s=\"" % a_name)
md._write_data(writer, attrs[a_name].value)
writer.write("\"")
if self.childNodes:
writer.write(">")
if (len(self.childNodes) == 1
and self.childNodes[0].nodeType == md.Node.TEXT_NODE):
self.childNodes[0].writexml(writer, '', '', '')
else:
writer.write(newl)
for node in self.childNodes:
node.writexml(writer, indent + addindent, addindent, newl)
writer.write(indent)
writer.write("</%s>%s" % (self.tagName, newl))
else:
writer.write("/>%s" % (newl))
md.Element.writexml = element_writexml
def text_writexml(self, writer, indent="", addindent="", newl=""):
md._write_data(writer, "%s%s%s" % (indent, self.data, newl))
md.Text.writexml = text_writexml

View File

@ -1,43 +0,0 @@
[metadata]
name = sahara-plugin-spark
summary = Spark Plugin for Sahara Project
description_file = README.rst
license = Apache Software License
python_requires = >=3.8
classifiers =
Programming Language :: Python
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Environment :: OpenStack
Intended Audience :: Information Technology
Intended Audience :: System Administrators
License :: OSI Approved :: Apache Software License
Operating System :: POSIX :: Linux
author = OpenStack
author_email = openstack-discuss@lists.openstack.org
home_page = https://docs.openstack.org/sahara/latest/
[files]
packages =
sahara_plugin_spark
[entry_points]
sahara.cluster.plugins =
spark = sahara_plugin_spark.plugins.spark.plugin:SparkProvider
[compile_catalog]
directory = sahara_plugin_spark/locale
domain = sahara_plugin_spark
[update_catalog]
domain = sahara_plugin_spark
output_dir = sahara_plugin_spark/locale
input_file = sahara_plugin_spark/locale/sahara_plugin_spark.pot
[extract_messages]
keywords = _ gettext ngettext l_ lazy_gettext
mapping_file = babel.cfg
output_file = sahara_plugin_spark/locale/sahara_plugin_spark.pot

View File

@ -1,20 +0,0 @@
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import setuptools
setuptools.setup(
setup_requires=['pbr>=2.0.0'],
pbr=True)

View File

@ -1,17 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
hacking>=3.0.1,<3.1.0 # Apache-2.0
bandit>=1.1.0 # Apache-2.0
bashate>=0.5.1 # Apache-2.0
coverage!=4.4,>=4.0 # Apache-2.0
doc8>=0.6.0 # Apache-2.0
fixtures>=3.0.0 # Apache-2.0/BSD
oslotest>=3.2.0 # Apache-2.0
stestr>=1.0.0 # Apache-2.0
pylint==1.4.5 # GPLv2
testscenarios>=0.4 # Apache-2.0/BSD
testtools>=2.4.0 # MIT
sahara>=10.0.0.0b1

92
tox.ini
View File

@ -1,92 +0,0 @@
[tox]
envlist = pep8
minversion = 3.18.0
skipsdist = True
# this allows tox to infer the base python from the environment name
# and override any basepython configured in this file
ignore_basepython_conflict = true
[testenv]
basepython = python3
usedevelop = True
install_command = pip install {opts} {packages}
setenv =
VIRTUAL_ENV={envdir}
DISCOVER_DIRECTORY=sahara_plugin_spark/tests/unit
deps =
-c{env:UPPER_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master}
-r{toxinidir}/requirements.txt
-r{toxinidir}/test-requirements.txt
commands = stestr run {posargs}
passenv =
http_proxy
https_proxy
no_proxy
[testenv:debug-py36]
basepython = python3.6
commands = oslo_debug_helper -t sahara_plugin_spark/tests/unit {posargs}
[testenv:debug-py37]
basepython = python3.7
commands = oslo_debug_helper -t sahara_plugin_spark/tests/unit {posargs}
[testenv:pep8]
deps =
-c{env:UPPER_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master}
-r{toxinidir}/requirements.txt
-r{toxinidir}/test-requirements.txt
-r{toxinidir}/doc/requirements.txt
commands =
flake8 {posargs}
doc8 doc/source
[testenv:venv]
commands = {posargs}
[testenv:docs]
deps =
-c{env:UPPER_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master}
-r{toxinidir}/doc/requirements.txt
commands =
rm -rf doc/build/html
sphinx-build -W -b html doc/source doc/build/html
allowlist_externals =
rm
[testenv:pdf-docs]
deps = {[testenv:docs]deps}
commands =
rm -rf doc/build/pdf
sphinx-build -W -b latex doc/source doc/build/pdf
make -C doc/build/pdf
allowlist_externals =
make
rm
[testenv:releasenotes]
deps =
-c{env:UPPER_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master}
-r{toxinidir}/doc/requirements.txt
commands =
rm -rf releasenotes/build releasenotes/html
sphinx-build -a -E -W -d releasenotes/build/doctrees -b html releasenotes/source releasenotes/build/html
allowlist_externals = rm
[testenv:debug]
# It runs tests from the specified dir (default is sahara_plugin_spark/tests)
# in interactive mode, so, you could use pbr for tests debug.
# Example usage: tox -e debug -- -t sahara_plugin_spark/tests/unit some.test.path
# https://docs.openstack.org/oslotest/latest/features.html#debugging-with-oslo-debug-helper
commands = oslo_debug_helper -t sahara_plugin_spark/tests/unit {posargs}
[flake8]
show-source = true
builtins = _
exclude=.venv,.git,.tox,dist,doc,*lib/python*,*egg,tools
# [H904] Delay string interpolations at logging calls
# [H106] Don't put vim configuration in source files
# [H203] Use assertIs(Not)None to check for None.
# [H204] Use assert(Not)Equal to check for equality
# [H205] Use assert(Greater|Less)(Equal) for comparison
enable-extensions=H904,H106,H203,H204,H205